CN114969221A

CN114969221A - Method for updating map and related equipment

Info

Publication number: CN114969221A
Application number: CN202110192752.2A
Authority: CN
Inventors: 邓乃铭; 何凯文; 李江伟; 罗巍
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2022-08-30

Abstract

The embodiment of the application provides a method for updating a map and related equipment, and relates to the technical field of positioning and image processing. The method provided by the embodiment of the application can update the map used by the VPS in real time, and ensure that the environment reflected by the map is consistent with the environment of the real world, so as to improve the accuracy of VPS positioning. The method comprises the following steps: acquiring a positioning image and position information, wherein the position information is used for indicating the position of shooting the positioning image, and the position information is obtained by carrying out visual positioning on the positioning image according to a map; determining a corresponding contrast image according to the position information; calculating the similarity of the positioning image and the contrast image; if the similarity indicates that the positioning image is similar to the comparison image, the map is not updated; and if the similarity represents that the positioning image is not similar to the comparison image, updating the map according to the positioning image.

Description

Method for updating map and related equipment

Technical Field

The embodiment of the application relates to the technical field of positioning and image processing, in particular to a method for updating a map and related equipment.

Background

A Visual Positioning System (VPS) is a technology for Positioning an electronic device based on an image acquired by the electronic device. For example, the electronic device acquires a first image, matches the first image with an image in an image library to obtain a second image with the highest matching degree with the first image in the image library, and determines the current position of the electronic device according to the shooting position of the second image.

Wherein the shooting position of each image in the image library is known, and the shooting position is the position information of the equipment for acquiring the image. Therefore, the image is matched with the images in the image library to obtain a second image with the highest similarity with the first image, and the shooting position of the first image can be determined according to the shooting position of the second image, so that the aim of positioning the electronic equipment is fulfilled.

With the wide application of Augmented Reality (AR), autonomous robots, unmanned vehicles, and other technologies are continuously developed. Particularly, when the VPS is applied to an autonomous robot and an unmanned vehicle, higher requirements are put on the positioning accuracy, the positioning speed and the stability of the VPS positioning system of the VPS. And a high-precision map is adopted for positioning in the VPS positioning process so as to improve the positioning precision of the VPS. However, the real world environment may change, for example, a billboard is added to a building, a road sign is added to a road, and the like. If the real world environment changes and the environment reflected by the high-precision map does not change, the phenomenon of VPS positioning inaccuracy or positioning failure can occur when the VPS positioning is carried out according to the high-precision map. Therefore, updating the high-precision map used by the VPS in real time to ensure that the environment reflected by the high-precision map is consistent with the environment of the real world is a key factor for ensuring the stability and sustainability of the VPS.

Disclosure of Invention

The application provides a method and related equipment for updating a map, so that the map used by a VPS can be updated in real time, the environment reflected by the map is ensured to be consistent with the environment of the real world, and the accuracy of VPS positioning is improved.

In order to achieve the technical purpose, the following technical scheme is adopted in the application:

in a first aspect, the present application provides a method of updating a map, which may include: and acquiring a positioning image and position information corresponding to the positioning image. It will be appreciated that the location information may indicate where the device taking the scout image is located. The position information is obtained by performing visual positioning on the positioning image according to the map. And determining a corresponding contrast image according to the position information, and calculating the similarity between the positioning image and the contrast image, wherein the similarity can represent the similarity between the positioning image and the contrast image.

That is to say, in the method provided in the embodiment of the present application, the positioning image is an image obtained by VPS positioning shooting, and a corresponding comparison image is generated according to the position information of the positioning image. That is, if the position information of the contrast image and the positioning image is the same, the contrast image and the positioning image can be regarded as the same image. By comparing the comparison image with the positioning image, whether the real environment reflected by the positioning image changes or not can be judged according to the similarity. If the similarity representation positioning image is similar to the comparison image, which indicates that the real environment reflected by the positioning image is not changed, the map does not need to be updated. If the similarity represents that the positioning image is not similar to the comparison image, which indicates that the real environment reflected by the positioning image is changed, the map can be updated according to the positioning image.

In a possible design manner of the first aspect, the similarity may include a local similarity and a global similarity. The local similarity can represent the local similarity degree in the positioning image and the contrast image, and the global similarity can represent the global similarity degree in the positioning image and the contrast image.

The calculating the similarity between the positioning image and the contrast image may include: firstly, calculating the global similarity of a positioning image and a contrast image; if the global similarity degree of the positioning image and the comparison image is characterized by the global similarity degree smaller than the preset threshold value, the positioning image and the comparison image are completely different images, and the probability of positioning failure is high. Therefore, in this case, in order to avoid that the map is modified in the case of a positioning failure, which affects the accuracy of the map, the map is not updated. If the global similarity characterizes that the global similarity of the positioning image and the contrast image is larger than a preset threshold value, the positioning image and the contrast image are similar based on the global situation. In this case, local similarity of the positioning image and the comparison image may be further calculated to determine whether to update the high-precision map from the positioning image.

In another possible design of the first aspect, the similarity includes a local similarity and a global similarity. The local similarity can represent the local similarity of the positioning image and the contrast image, and the global similarity can represent the global similarity of the positioning image and the contrast image.

The calculating the similarity between the positioning image and the contrast image may specifically include: and constructing a multitask network architecture, wherein the multitask network architecture is used for processing the comparison image and the positioning image so as to obtain an image processing result. It should be noted that, when the multitask network architecture processes the positioning image and the comparison image, two tasks of the image may be processed simultaneously, and the result may be output. For example, the multitask network architecture can simultaneously calculate the global similarity and the local similarity of the contrast image and the positioning image, and the output result includes the global similarity and the local similarity.

Specifically, a positioning image and a comparison image are input into the multitask network architecture, and a processing result is obtained, wherein the processing result comprises global similarity and local similarity. That is, the image processing tasks of the multitasking network architecture include calculating the global similarity between the contrast image and the positioning image, and calculating the local similarity between the contrast image and the positioning image.

In another possible design manner of the first aspect, the position information includes a shooting location and a shooting pose, the shooting pose indicates a shooting field angle of the electronic device that generates the positioning image, and the map is a point cloud map.

The determining the corresponding contrast image according to the position information may specifically include: and determining the position coordinates of the point cloud map indicated by the shooting place, and generating a three-dimensional image corresponding to the shooting angle of view according to the shooting angle of view corresponding to the shooting pose based on the shooting position. That is, the angle of view of the contrast image and the positioning image is the same. Rendering the three-dimensional image to generate a two-dimensional image, the two-dimensional image being taken as a contrast image.

In another possible design manner of the first aspect, an image library is preset, where the image library includes at least one preset image and position information corresponding to the preset image. The position information includes a shooting location and a shooting pose indicating a shooting angle of view of the electronic device that generates the positioning image.

The determining the corresponding contrast image according to the position information may specifically include: and acquiring at least one preset image according to the position information of the positioning image, and generating a comparison image from the at least one preset image in an image synthesis mode to enable the position information of the comparison image to be the same as the position information of the positioning image.

In another possible design manner of the first aspect, the method may further include: the method comprises the steps of acquiring an original image from the electronic equipment, wherein the original image is an image which is shot by the electronic equipment and is used for carrying out visual positioning on the electronic equipment. And (3) eliminating the dynamic objects in the original image by adopting an image processing algorithm to generate a positioning image. Wherein the dynamic object comprises a vehicle and/or an animal.

It can be understood that the original image is an image obtained by the electronic device capturing a real environment, and the original image may include dynamic objects such as vehicles, people, and pets in the environment. The dynamic objects can interfere image contrast, and in order to reduce interference of the dynamic objects on the positioning image, the dynamic objects are removed by adopting an image processing algorithm after the original image is obtained so as to generate the positioning image.

In another possible design manner of the first aspect, the calculating the local similarity between the positioning image and the contrast image may specifically include: based on a preset ROI dividing mode, dividing the positioning image into at least one ROI, and dividing the contrast image into at least one ROI, wherein the ROI comprises a target object, and the target object can be at least one of a building, a road or a road sign. For the ROI of the same location on the scout image and the contrast image, the following operations are performed: and comparing the ROI on the positioning image with the ROI on the comparison image, and calculating the similarity of the ROI to obtain the local similarity.

It will be appreciated that for each ROI a corresponding local similarity can be calculated, i.e. there is at least one result of the local similarity of the scout image and the contrast image. Therefore, a determination needs to be made on the result of each local similarity separately to determine whether each ROI has changed in order to update the map.

In another possible design manner of the first aspect, if the similarity characterizing positioning image is not similar to the comparison image, updating the map according to the positioning image may specifically include: and if the local similarity representation contrast image is not similar to the positioning image, generating a 3D bounding box according to the ROI in the positioning image, wherein the 3D bounding box comprises the ROI area point cloud and the access times of the ROI area point cloud. It should be noted that, the calculating the local similarity is to calculate each ROI, and when the local similarity corresponding to the ROI represents that the positioning image is not similar to the contrast image. Furthermore, this is the ROI at the same position on the scout image and the contrast image, i.e. the ROI characterizing the scout image and the ROI of the contrast image are not similar.

And carrying out visual positioning according to the map, and counting the access times of the point cloud of the ROI area during the visual positioning. As can be appreciated, the map of visual positioning includes a 3D bounding box. And according to the access times of the point clouds of the ROI region in the 3D bounding box, if the access times of the point clouds of the ROI region are determined to exceed the preset times, the original ROI in the map is updated to be the ROI in the positioning image.

In another possible design manner of the first aspect, the method may further include: and generating corresponding ROI area point cloud according to the ROI area in the positioning image.

In another possible design manner of the first aspect, the method may further include: and if the global similarity representation contrast image is not similar to the positioning image, determining that the positioning of the positioning image fails.

In a second aspect, the present application provides an apparatus for updating a map, comprising one or more processors; a memory; and one or more computer programs. Wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus for updating a map, cause the processor to perform the steps of:

and acquiring a positioning image and position information corresponding to the positioning image. The location information may indicate a location where a device that captured the positioning image is located. The position information is obtained by performing visual positioning on the positioning image according to the map. And determining a corresponding contrast image according to the position information, and calculating the similarity between the positioning image and the contrast image, wherein the similarity can represent the similarity between the positioning image and the contrast image. If the similarity represents that the positioning image is similar to the comparison image, which indicates that the real environment reflected by the positioning image is not changed, the map does not need to be updated. If the similarity represents that the positioning image is not similar to the comparison image, which indicates that the real environment reflected by the positioning image is changed, the map can be updated according to the positioning image.

In a possible design of the second aspect, the similarity may include a local similarity and a global similarity. The local similarity can represent the local similarity degree in the positioning image and the contrast image, and the global similarity can represent the global similarity degree in the positioning image and the contrast image.

When the processor calculates the similarity between the positioning image and the contrast image, the following steps are executed: firstly, calculating the global similarity of a positioning image and a contrast image; if the global similarity degree of the positioning image and the comparison image is characterized by the global similarity degree smaller than the preset threshold value, the positioning image and the comparison image are completely different images, and the probability of positioning failure is high. If the global similarity characterizes that the global similarity of the positioning image and the contrast image is larger than a preset threshold value, the positioning image and the contrast image are similar based on the global situation. And calculating local similarity of the positioning image and the comparison image so as to judge whether to update the high-precision map according to the positioning image.

In another possible design manner of the second aspect, the similarity includes a local similarity and a global similarity. The local similarity can represent the local similarity of the positioning image and the contrast image, and the global similarity can represent the global similarity of the positioning image and the contrast image.

When the processor calculates the similarity between the positioning image and the contrast image, the following steps are executed: and constructing a multitask network architecture, wherein the multitask network architecture is used for processing the comparison image and the positioning image so as to obtain an image processing result. And inputting a positioning image and a comparison image into the multitask network architecture, and acquiring a processing result, wherein the processing result comprises global similarity and local similarity. That is, the image processing tasks of the multitasking network architecture include calculating the global similarity between the contrast image and the positioning image, and calculating the local similarity between the contrast image and the positioning image.

In another possible design manner of the second aspect, the position information includes a shooting location and a shooting pose, the shooting pose indicates a shooting field angle of the electronic device that generates the positioning image, and the map is a point cloud map.

When the processor determines the corresponding contrast image according to the position information, the following steps are specifically executed: and determining the position coordinates of the point cloud map indicated by the shooting position, and generating a three-dimensional image corresponding to the shooting visual angle according to the shooting visual angle corresponding to the shooting pose based on the shooting position. Rendering the three-dimensional image to generate a two-dimensional image, the two-dimensional image being taken as a contrast image.

In another possible design manner of the second aspect, the apparatus for updating a map may preset an image library, where the image library includes at least one preset image and position information corresponding to the preset image. The position information includes a shooting location and a shooting pose indicating a shooting angle of view of the electronic device that generates the positioning image.

When the processor determines the corresponding contrast image according to the position information, the following steps are specifically executed: and acquiring at least one preset image according to the position information of the positioning image, and generating a comparison image from the at least one preset image in an image synthesis mode to enable the position information of the comparison image to be the same as the position information of the positioning image.

In another possible design of the second aspect, the processor is further configured to: the method comprises the steps of acquiring an original image from the electronic equipment, wherein the original image is an image which is shot by the electronic equipment and is used for carrying out visual positioning on the electronic equipment. And eliminating dynamic objects in the original image by adopting an image processing algorithm to generate a positioning image. Wherein the dynamic object comprises a vehicle and/or an animal.

In another possible design manner of the second aspect, when the processor calculates the local similarity between the positioning image and the contrast image, the following steps are specifically executed: based on the preset ROI dividing mode, the positioning image is divided into at least one ROI, and the contrast image is divided into at least one ROI, wherein the ROI comprises a target object, and the target object can be at least one of a building, a road or a road sign. For the ROI at the same position on the scout image and the contrast image, the following operations are performed: and comparing the ROI on the positioning image with the ROI on the comparison image, and calculating the similarity of the ROI to obtain the local similarity.

In another possible design manner of the second aspect, if the similarity characterizing positioning image is not similar to the comparison image, the processor may specifically perform: and if the local similarity representation contrast image is not similar to the positioning image, generating a 3D bounding box according to the ROI in the positioning image, wherein the 3D bounding box comprises the ROI area point cloud and the access times of the ROI area point cloud. And carrying out visual positioning according to the map, and counting the access times of the point cloud of the ROI area during the visual positioning. As can be appreciated, the map of visual positioning includes a 3D bounding box. And according to the access times of the point clouds of the ROI region in the 3D bounding box, if the access times of the point clouds of the ROI region are determined to exceed the preset times, the original ROI in the map is updated to be the ROI in the positioning image.

In another possible design manner of the second aspect, the processor may be further configured to: and generating corresponding ROI area point cloud according to the ROI area in the positioning image.

In another possible design of the second aspect, the processor may be further configured to: and if the global similarity representation contrast image is not similar to the positioning image, determining that the positioning of the positioning image fails.

In a third aspect, the present application further provides an electronic device, including: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the electronic device, cause the electronic device to perform the method of the first aspect and any one of its possible designs.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which includes computer instructions, when the computer instructions are executed on an electronic device, cause the electronic device to perform the method of the first aspect and any possible design thereof.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method performed by the electronic device in the first aspect and any possible design thereof.

In a sixth aspect, an embodiment of the present application provides a chip system, where the chip system is applied to an electronic device. The chip system includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a line; the interface circuit is used for receiving signals from a memory of the electronic equipment and sending the signals to the processor, and the signals comprise computer instructions stored in the memory; the computer instructions, when executed by a processor, cause the electronic device to perform the method of the first aspect and any one of its possible designs, as described above.

It is to be understood that the beneficial effects that can be achieved by the apparatus for updating a map according to the second aspect, the electronic device according to the third aspect, the computer-readable storage medium according to the fourth aspect, the computer program product according to the fifth aspect, and the chip system according to the sixth aspect provided by the present application can refer to the beneficial effects in the first aspect and any one of the possible design manners thereof, and are not described herein again.

Drawings

Fig. 1 is a schematic diagram of a process for updating a map according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a positioning image according to an embodiment of the present disclosure;

fig. 3 is a flowchart of updating a high-precision map according to an embodiment of the present application;

fig. 4 is a schematic diagram of an original image according to an embodiment of the present application;

fig. 5A is a schematic diagram of an image in a preset image library according to an embodiment of the present disclosure;

fig. 5B is a schematic diagram of an image in another preset image library according to the embodiment of the present application;

fig. 5C is a schematic diagram of an image in another preset image library provided in the embodiment of the present application;

FIG. 6 is a schematic diagram of a comparative image provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of another comparative image provided in an embodiment of the present application;

FIG. 8 is a flowchart of a method for updating a map according to an embodiment of the present disclosure;

fig. 9 is a flowchart of another method for updating a high-precision map according to an embodiment of the present application;

FIG. 10A is a schematic diagram of another comparative image provided in the embodiments of the present application;

FIG. 10B is a schematic view of another positioning image provided in accordance with an embodiment of the present application;

fig. 11 is a schematic diagram of a thermodynamic diagram provided in an embodiment of the present application.

Detailed Description

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.

In popular terms, the high-precision map is an electronic map with higher positioning accuracy and more data dimensions. The positioning accuracy is higher, and the positioning accurate to the centimeter level can be realized by adopting a high-precision map. Data dimensions are more embodied, and the high-precision map also comprises static information related to traffic, wherein the static information is driving auxiliary information related to traffic, such as lane width, lane line type, lane height limit, guard railing, road edge type, roadside landmarks and the like. That is, the high-precision map includes a visual feature description file, which may include static information related to traffic, and may further include environmental information related to positioning, such as static information of an indoor scene.

VPS is based on high-precision map positioning, and can provide positioning accuracy on the centimeter level. For example, when the driving assistance information related to traffic changes in the real world, the static information in the high-precision map also needs to be adaptively changed. Therefore, the accuracy of VPS positioning is ensured in the process of adopting high-precision map navigation. That is, the accuracy of VPS positioning can be ensured by updating the high-precision map from time to time.

Generally, high-precision maps can be maintained by means of manual complementary mining. Specifically, when the driving assistance information related to traffic is found to be changed, related workers acquire images in the real environment, and the high-precision map is updated according to the acquired images. The method has certain information lag, low efficiency and high cost of manual supplementary mining.

With the development of network technology, network transmission bandwidth is improved, and network delay is reduced. Therefore, the high-precision map can be updated by adopting crowdsourcing data in a network transmission mode. For example, a map information collection device is installed for a taxi, and the map information collection device is used for collecting driving data on a driving road section of the taxi. Therefore, after the taxi drives a section of distance, the acquisition equipment can acquire the driving data on the section of distance, and the vehicle-mounted equipment uploads the driving data so as to update the high-precision map according to the driving data.

In a first implementation, a computing system receives a set of area description files from a plurality of first mobile devices, and the computing system may update a high-precision map from the set of area description files. The set of region description files comprises a plurality of region description files, and one region description file is a spatial feature point cloud of the region.

Specifically, the computing system includes a merge module, a location module, and a query module. The merging module is used for receiving the area description file from the first mobile equipment and storing the area description file in the first data storage device. The positioning module is used for generating a positioning area description file of the area from the area description file and storing the positioning area description file in the second storage device. The computing system receives a positioning image from the second mobile device, and the query module is configured to send a positioning area description file to the second mobile device, where the positioning area description file is a spatial feature point cloud of an area in the positioning image of the second mobile device.

Wherein the computing system updates the positioning area description file based on positioning feedback data (e.g., data such as whether the positioning results are accurate) from the second mobile device. Wherein the positioning area description file comprises a plurality of spatial features in the positioning area. The computing system maintains spatial features within the localization area description file and scores the spatial features. When the computing system detects a spatial feature in the region in the feedback data from the second mobile device, the score for the spatial region is increased. And when the computing system detects that the obtained spatial feature in the area is lower than a preset threshold value, the computing system eliminates the spatial feature from the positioning area description file.

In the implementation, the first mobile device generates the area description file according to the detected spatial feature data and the statistics, and the first mobile device does not have a function of detecting whether the environment in the real world changes. The computing system receives a set of region profiles from the first mobile device, and a large set of region profiles may place a great deal of operational processing pressure on the merge module and storage pressure on the first data storage device.

Secondly, the acquisition device installed on the first mobile device does not acquire images, and the acquisition device generates an area description file through sensor data. The sensors in the acquisition equipment comprise a positioning sensor, an inertial measurement sensor and the like. So that the application range of the acquisition equipment is limited. If the situation that the acquisition equipment cannot receive the positioning signals exists in the indoor environment, the acquisition device cannot acquire indoor positioning information.

In a second implementation, a crowdsourcing map is constructed based on crowdsourcing data, and the high-precision map is updated using the crowdsourcing map. The crowd-sourced map comprises information related to driving roads, such as lane signs.

Specifically, after the crowdsourcing map is constructed according to the crowdsourcing data, the accuracy of the crowdsourcing map can be improved according to the relationship between the roads and the ground objects in the crowdsourcing map and the relationship between the roads and the ground objects in the high-accuracy map. And comparing the crowdsourcing map with the high-precision map to obtain a difference value between the crowdsourcing map and the high-precision map, and updating the high-precision map by using the crowdsourcing map when the difference value is greater than a preset threshold value.

The implementation mode is a high-precision map which is applied to the field of unmanned driving and specially designed for unmanned vehicles. In this case, information on roads such as lane lines in the high-precision map is particularly important. In addition, in the process of updating the high-precision map, the updating of elements having topological relations, such as lane lines, is mainly considered, and the updated map elements are too single, so that the map updating method is difficult to be applied to updating the VPS high-precision map.

The embodiment of the application provides a method for updating a map, which comprises the steps of obtaining a positioning image used for VPS positioning after VPS positioning is carried out by adopting a high-precision map, generating a comparison image with the same viewing angle as the positioning image, and judging whether a real environment reflected by the positioning image is changed or not by adopting an image comparison mode. If the real environment reflected by the positioning image is determined to be changed, updating the high-precision map according to the positioning image; if the real environment reflected by the positioning image is determined not to be changed, the high-precision map does not need to be updated. The positioning image used in VPS positioning is uploaded by a user, and the positioning image is crowdsourcing data. That is, the method of the embodiment of the present application is to update a high-precision map based on crowd-sourced data.

It should be noted that, when VPS positioning is adopted, the positioning information acquired by the electronic device includes a six degree of freedom (six degrees of freedom) pose. The 6DOF is 6 degrees of freedom, including displacement of 3 degrees of freedom and spatial rotation of 3 degrees of freedom, and the 3 degrees of freedom include three directions of front and back, up and down, left and right. The 6DOF pose in the embodiment of the present application represents: when the electronic device is placed in the world coordinate system of the real world, the electronic device is displaced and rotated in 3 degrees of freedom on the world coordinate system. The world coordinate system may be a coordinate system defined at any position in the real world, for example, the coordinate system of the electronic device capturing the positioning image may be the world coordinate system.

It is worth mentioning that VPS is positioned using high-precision maps, which can be accurate to centimeter level. Therefore, with VPS positioning of high-precision maps, the positioning accuracy can be accurate to the centimeter level.

The method provided by the embodiment of the application will be described below with reference to the accompanying drawings.

The application scenarios of the method for updating the map provided by the embodiment of the application are as follows: the electronic device interacts with a server (or cloud device) that includes and is maintained by a high-precision map. After VPS positioning is carried out on the electronic equipment, a positioning image and a 6DOF pose corresponding to the positioning image can be sent to a server.

It is understood that the electronic device may be a mobile phone, a tablet computer, a desktop, a laptop, a handheld computer, a notebook computer, a vehicle-mounted device, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, etc., and the embodiment of the present application is not limited to a specific form of the electronic device.

Referring to fig. 1, a process in which a server acquires an original image from an electronic device and updates a high-precision map according to the original image is shown. Wherein, the original image is a shot image collected by the electronic equipment.

In some implementations, the electronic device captures at least one original image and sends the at least one original image to the server. The server performs VPS positioning on an original image and determines a 6DOF pose corresponding to the original image.

In other implementations, the electronic device captures a positioning video and sends the positioning video to the server. Because the positioning video is composed of continuous multi-frame images, the server takes one frame image in the positioning video as an original image, VPS positioning is carried out on the original image, and the 6DOF pose corresponding to the original image is determined.

In the original image, dynamic objects such as vehicles and animals may exist, and these dynamic objects affect the accuracy of image matching, so that an image processing algorithm needs to be adopted to remove the dynamic objects in the original image. Specifically, the server removes dynamic objects in the original image by adopting an image processing algorithm to generate a positioning image.

In addition, if the server comprises a preset image library, the server can select multiple images from the preset image library according to the 6DOF poses, and the multiple images are synthesized into a contrast image in a view angle synthesis mode. If the server does not comprise the preset image library, the server can generate a rendering image according to the 6DOF pose in an image rendering mode, and the rendering image is used as a comparison image.

Furthermore, the positioning image and the comparison image are subjected to image comparison to judge whether the real world reflected by the positioning image changes.

Illustratively, a global similarity of the scout image and the contrast image is calculated. If the global similarity of the positioning image and the comparison image is higher than a preset threshold value, the visual angles of the characterization positioning image and the comparison image are the same, and the VPS positioning success can be determined. If the global similarity of the positioning image and the comparison image is lower than a preset threshold, the visual angles of the characterization positioning image and the comparison image are different, and the VPS positioning failure can be determined. In this case, because VPS positioning fails, the perspective of the comparison image is different from that of the positioning image, and it cannot be determined whether the real world reflected by the positioning image has changed. Therefore, it is not necessary to update the high-precision map based on the positioning image.

Also illustratively, local similarity of the scout image and the contrast image is calculated. The local similarity can be characterized by: the degree of similarity of the comparison image and the positioning image in the local area of the images. It should be noted that, when the global similarity indicates that the VPS positioning is successful, the server calculates the local similarity to determine whether the high-precision map needs to be updated according to the positioning image.

Specifically, the positioning image and the contrast image are segmented into a plurality of regions of Interest (ROIs), and the ROI regions of the positioning image and the contrast image are respectively compared to calculate the local similarity of the ROIs. If the local similarity representation contrast image and the positioning image of the ROI area are the same, the local area is not changed, namely the environment of the real world is the same as the environment reflected by the high-precision map, and the high-precision map does not need to be updated. And if the local part similarity representation contrast image of the ROI area is different from the positioning image, changing the local area, and updating the high-precision map according to the positioning image.

In the embodiment of the present application, during image processing, the ROI region is a region selected from an image, which is a region of interest for image analysis. For example, when the processor processes the image as shown in fig. 2, the image shown in fig. 2 includes a first building 01, a second building 02, and a street lamp 03. If the first building 01, the second building 02 and the street lamps 03 are changed in the image shown in fig. 2, the change in the real world is illustrated. That is, the areas of interest for image processing are the first building 01, the second building 02, and the street lamps 03. Therefore, the image as shown in fig. 2 is segmented into 3 ROI regions, the ROI region 11 including the first building 01, the ROI region 12 including the second building 02, and the ROI region 13 including the street lamps 03. Wherein, the ROI may be a region to be processed, which is delineated by a box (as shown in fig. 2), a circle, an ellipse, an irregular polygon, or the like.

When the ROI area of the positioning image is determined to be changed, the high-precision map can be updated according to the positioning image.

Illustratively, three-dimensional reconstruction is performed based on a Structure From Motion (SFM) algorithm to obtain an SFM model (or referred to as sparse point cloud) corresponding to the positioning image. When the ROI area of the positioning image is determined to be changed, parts (namely point clouds) related to the ROI area in the SFM are changed, and a three-dimensional (3D) bounding box is adopted for recording the access times of the point clouds of the ROI area. Wherein the 3D bounding box is a rectangular bounding box for bounding the scout image, including the ROI area point cloud.

And inserting the ROI area point cloud into the original RGB point cloud to obtain the SFM model RGB point cloud. At this time, the point cloud includes the original ROI area point cloud and the ROI area point cloud in the localization image. The 3D bounding box list includes the changed regions in the point clouds and records the number of visits of each region point cloud.

For example, the 3D bounding box list includes the original ROI area point cloud (or referred to as the first point cloud), and the ROI area point cloud (the second point cloud) in the localization image, and the 3D bounding box list is used to record the number of visits of the first point cloud and the second point cloud in the VPS localization process. Within the preset time, when the server VPS locates, the server can count the access times of the first point cloud and the second point cloud. And if the access times of the first point cloud are greater than the access times of the point cloud corresponding to the positioning image, which indicates that the real world is not changed, the point cloud is not updated. And if the access times of the first point cloud are less than the access times of the point cloud corresponding to the positioning image, the change of the real world is shown, the point cloud of the positioning image is updated to the position of the first point cloud, so that the point cloud image is updated, and the high-precision map is updated.

Example 1

Please refer to fig. 3, which is a flowchart illustrating a method for updating a map according to an embodiment of the present disclosure. As shown in fig. 3, the method includes steps 301-308.

It should be noted that in the embodiment of the present application, the mobile phone interacts with the cloud device, the mobile phone sends an image to the cloud device, and the cloud device determines the location information of the mobile phone according to the image. The cloud equipment comprises a high-precision map, the cloud equipment provides VPS positioning service based on the high-precision map, and the cloud equipment can implement the method for updating the map.

Step 301: and acquiring a positioning image and a 6DOF pose corresponding to the positioning image.

And the 6DOF pose represents the position information of the positioning image shot by the mobile phone.

In some implementations, the mobile phone captures an original image, and the mobile phone sends the original image to the cloud device. Due to the presence of dynamic objects (e.g., animals, vehicles, etc.) in the original image, these dynamic objects may interfere with the real environment reflected by the original image. Therefore, the cloud device acquires an original image, and an image processing algorithm is adopted to remove a dynamic object in the original image so as to acquire a positioning image (or referred to as a first positioning image).

For example, the cloud device may process the original image by using a deep learning related algorithm to remove a dynamic object in the original image. The deep learning related algorithm includes, but is not limited to, semantic segmentation, instance segmentation, object detection, and the like.

For example, the original image is processed using semantic segmentation. The semantic segmentation can classify the pixel points in the original image and classify the pixels of the same type into one class. Assuming 2 people are included in the image, the 2 people in the image will be labeled the same color (e.g., red) by processing the image using semantic segmentation. If the original image comprises buildings, streets, vehicles, trees and pedestrians, different colors are respectively marked for the buildings, the streets, the vehicles, the trees and the pedestrians in the original image after semantic segmentation is adopted, and each object corresponds to one color. For example, the vehicle is blue, the pedestrian is red, etc. And removing dynamic objects in the original image, namely deleting pixels corresponding to blue and red in the image to obtain a positioning image.

As another example, the original image is processed using example segmentation. In contrast to semantic segmentation, instance segmentation will distinguish different individuals among similar objects. Assuming that example segmentation is used to identify a person in an image, the image includes two persons, and example segmentation is used to perform pixel segmentation on each person in the category of the person, i.e., each person is labeled with a different color. And if the original image comprises buildings, streets, vehicles, trees and pedestrians, and the vehicles and the pedestrians in the original image are removed by adopting example segmentation. In this way, the example segmentation can classify each vehicle and pedestrian in the original image, label the vehicles and pedestrians with different colors, and remove the marked vehicles and driving vehicles to obtain the positioning image.

As another example, the original image is processed using object detection. Target detection may identify a target object (e.g., a vehicle, a pedestrian, etc.) in the original image, as well as the location of the target object in the original image. Assuming that the target objects are vehicles and pedestrians, the original images are processed by target detection, so that the positions of the vehicles and the pedestrians in the original images can be located, and marked vehicles and driving vehicles are removed to obtain the located images.

In other implementations, the mobile phone shoots to obtain a positioning video, the mobile phone sends the positioning video to the cloud equipment, and the cloud equipment performs VPS positioning according to the positioning video. The cloud equipment acquires a positioning video, determines any frame image in the positioning video as an original image, and eliminates a dynamic object in the original image by adopting an image processing algorithm to acquire the positioning image. The specific implementation of obtaining the positioning image is the same as the above implementation, and is not described herein again.

It should be mentioned that when the mobile phone sends the original image or the positioning video to the cloud device, the original image or the positioning video further includes camera parameter information such as camera parameters of the camera and a view angle.

The camera intrinsic parameters are parameters related to the characteristics of the camera itself, such as the focal length of the camera, the pixel size, and the like.

Illustratively, the camera includes an optical lens and an optical sensor, and light reflected by the object travels to the optical lens, and after being reflected or refracted by the optical lens, the light travels to the optical sensor, and is sensed by the optical sensor, so that the object is imaged. Ideally, the optical axis of the optical lens passes through the center point of the imaging region. However, in the actual imaging process, the optical axis of the optical lens of the camera does not pass through the central point of the imaging area, and the parameter in the camera internal parameter can describe the error.

As another example, in the imaging process of the camera, when the optical sensor images, the image of the object should be reduced in the x direction and the y direction at the same scale. However, in practice, the lens is not perfectly circular, and the pixels on the optical sensor are not in the positive direction of the close arrangement, and these changes make the image obtained by the optical sensor to be reduced in different proportions in the x direction and the y direction. The parameters in the camera parameters may describe the difference in the scale of the reduction in the two directions, and so that the number of pixels on the imaged image may describe the size of the object (i.e., the correspondence between the number of pixels and the size of the object). Further, the camera intrinsic parameters can also describe the size of the object according to the number of pixels, so that the size of the object in the image in the three-dimensional space can be determined according to the parameters in the camera intrinsic parameters.

The angle of view of a camera is defined as the angle of view, which is defined by taking an optical lens as a vertex and taking the maximum range of light rays reflected by a photographed object that can pass through the lens as an edge. The size of the field angle determines the field of view of the optical lens, and the larger the field of view, the larger the field of view of the optical lens.

It should be noted that the central point of the lens of the camera is respectively connected with the two end points of the diagonal line of the imaging plane, and the included angle formed by the connecting lines is the visual angle of the lens. The visual angle refers to a visual range that the electronic device can shoot, and can also be understood as a shooting width range, and the visual angle is related to the shooting posture of the electronic device during shooting.

Step 302: contrast images were generated from the 6DOF pose.

In a first implementation, the cloud device includes a preset image library, where multiple images (or called preset images) are pre-stored in the preset image library, and each preset image corresponds to a respective 6DOF pose. The mobile phone is located at a shooting position indicated by the 6DOF pose to obtain a positioning image, and the cloud equipment generates a contrast image based on a plurality of images prestored in a preset image library according to the position information indicated by the 6DOF pose and the field angle of the positioning image. That is, the shot position of the contrast image is the position indicated by the 6DOF pose. Theoretically, the angle of view of the contrast image is the same as the angle of view of the positioning image.

Illustratively, assume that the positioning image is an image as shown in FIG. 4. Fig. 4 is a positioning image from which a dynamic object (person) is removed. The cloud device determines an image related to the positioning image from a preset image library when generating the comparison image, as shown in fig. 5A, 5B and 5C. Among them, the photographing positions of fig. 5A, 5B, and 5C are close to the photographing position of the positioning image, and thus, it is determined that fig. 5A, 5B, and 5C are related to the positioning image. By adopting the image synthesis method, the images 5A, 5B and 5C are synthesized into the image with the same view angle as the positioning image. As shown in fig. 6, the comparison image after the composition of fig. 5A, 5B and 5C is shown.

In a second implementation, the cloud device does not include a preset image library, and the cloud device may generate a comparison image based on the point cloud. The cloud equipment can determine and position shooting positions of the images according to the position information indicated by the 6DOF pose. The cloud device generates a corresponding 3D image from the point cloud image based on the shooting location and the field angle of the positioning image, renders the 3D image and generates a 2D contrast image. Similarly, it can be determined that the angle of view of the contrast image is the same as the angle of view of the positioning image.

Illustratively, continuing with the example of positioning the image as shown in FIG. 4. The cloud device acquires a point cloud image as shown in fig. 7 based on the 6DOF pose and the field angle of the positioning image. The cloud device converts the point cloud image as shown in fig. 7 into a 3D image, and renders the 3D image to generate a 2D contrast image.

Step 303: and comparing the positioning image with the comparison image, and calculating the global similarity of the positioning image and the comparison image.

The global similarity represents the similarity degree of the positioning image and the comparison image based on the whole image (or understood as the image outline), for example, the positioning image includes a road, a house and a street lamp, and the comparison image includes a road, a house and a street lamp. The global similarity of the positioning image and the comparison image can be determined based on the number of the objects included in the positioning image and the comparison image and the similarity degree of each object.

It should be noted that the angle of view is an angle between a lens and an imaging plane in the electronic device, that is, when images of the same scene are acquired, if the angle of view is different, target objects included in the captured images are also different. It can be considered that if the viewing angles of the positioning image and the comparison image are the same, the global similarity of the positioning image and the comparison image is higher. Although the angle of view of the contrast image generated by the electronic device is the same as the angle of view of the positioning image, it cannot be determined that the angle of view of the positioning image is the same as that of the contrast image, and thus it cannot be determined that the global similarity between the contrast image and the positioning image is high. Therefore, it is still necessary to use an image processing algorithm to calculate the global similarity between the positioning image and the contrast image.

Wherein, the view angle of the image may affect the number of objects comprised by the image. The larger the perspective of the image, the more objects may be in the image. The angle of view of the positioning image and the contrast image may be compared according to the number of objects included in the positioning image and the contrast image, and the degree of similarity of the objects. If the number of the objects in the comparison image is the same as that in the positioning image and the similarity degree of each object is higher, it is indicated that the visual angles of the positioning image and the comparison image are the same and the global similarity degree is higher. It will be appreciated that the global similarity may be used to determine whether the perspective of the scout image and the contrast image is the same. Whether the positioning image and the contrast image are identical or not can be determined from the image as a whole according to the global similarity.

It can be understood that a common image processing mode can be adopted when calculating the global similarity of the positioning image and the contrast image. For example, a hash algorithm is used to calculate the global similarity of the images, an Euclidean Distance (Euclidean Distance) between two images is calculated to obtain the global similarity of the images, and a convolutional neural network is used to calculate the global similarity of the images.

Illustratively, taking calculation of the euclidean distance as an example, the numerical value of the euclidean distance is taken as the global similarity. The euclidean distance is a distance value used to measure the spatial existence of an object (or called an individual) in an image, that is, the euclidean distance can measure the similarity between two ROIs in the image.

In another example, the ROI region of the positioning image and the ROI region of the contrast image are determined, the similarity of the plurality of ROI regions is obtained, and the similarity information of the plurality of ROI regions is used as the global similarity. The dividing mode of the ROI area in the positioning image is the same as the dividing mode of the ROI area in the comparison image.

Specifically, the similarity of the ROI regions at the same position is compared, and the similarity of each ROI region in the two images is calculated to obtain the similarity information of each ROI region. And calculating the sum of the similarity of all ROI areas, and taking the calculation result as the global similarity. And judging the similarity degree of the positioning image and the contrast image according to the global similarity.

Wherein, the global similarity of the positioning image and the contrast image is calculated by adopting an image processing technology. And, a preset first threshold may be set for determining the degree of similarity between the positioning image and the comparison image.

If the calculated global similarity is less than or equal to (or less than) the first threshold, it indicates that the field angles of the positioning image and the comparison image are the same, and the comparison image and the positioning image are the same as each other in the whole image. Step 305-step 308 are performed.

If the calculated global similarity is larger than (or equal to or larger than) the first threshold, the viewing angles of the positioning image and the comparison image are different. Step 304 is performed.

Step 304: and if the global similarity representation positioning image is different from the comparison image, determining that VPS positioning fails and not updating the high-precision map.

It will be appreciated that since the contrast image is derived from the 6DOF pose of the positioning image, the field of view of the positioning image and the contrast image are theoretically the same. If the angle of view of the positioning image and the angle of view of the comparison image are different, the 6DOF pose of the positioning image is not the real position information for shooting the positioning image, therefore, the 6DOF pose of the positioning image is inaccurate, namely, the positioning image and the comparison image reflect the real environment of different areas. Therefore, whether the real environment corresponding to the comparison image changes or not cannot be judged according to the positioning image, and the high-precision map cannot be updated according to the positioning image.

In a possible implementation manner, a manner of calculating the global similarity between the comparison image and the positioning image may be adopted to determine whether VPS positioning is successful.

Step 305: and if the global similarity representation positioning image is the same as the comparison image, determining that the VPS positioning is successful.

The global similarity represents that the positioning image and the contrast image are the same, that is, the field angles of the positioning image and the contrast image are the same. In this way, whether to update the high-precision map can be judged according to the positioning image.

Step 306: and calculating the local similarity of the positioning image and the contrast image.

It is understood that at least one object (e.g., a building, a landmark, etc.) is included in both the positioning image and the comparison image, and the local similarity is the similarity between the objects in the positioning image and the comparison image.

For example, the same location in the scout image and the comparison image is the first building, and the similarity of the first building of the scout image and the comparison image is compared. If the house with the house similarity representation comparison image is the same as the first building of the positioning image, the first building in the real environment is not changed, and the comparison image represents the first building marked on the high-precision map, so that the first building marked on the high-precision map does not need to be updated.

Illustratively, the contrast image and the scout image are divided into a plurality of ROI areas based on the same ROI area division manner. The ROI areas at the same position in the image are used as a pair of ROI areas, and the similarity (namely the local similarity) of the pair of ROI areas is calculated so as to judge whether the ROI areas are changed or not according to the local similarity.

The image processing method can be adopted to calculate the similarity of the ROI area in the positioning image and the contrast image. For a specific implementation, reference may be made to the above description, which is not repeated herein.

In some implementations, a preset second threshold may be set for determining whether the ROI region is changed. For example, if the calculated local similarity is less than or equal to (or less than) the second threshold, which indicates that the ROI region has not changed, step 307 is performed; if the calculated local similarity is greater than (or equal to or greater than) the second threshold, it indicates that the ROI region is changed, step 308 is performed.

Step 307: and if the local similarity representation ROI areas are the same, not updating the high-precision map.

It can be understood that the ROI areas of the positioning image and the contrast image have a high similarity degree, which indicates that the real environment reflected by the ROI areas in the positioning image does not change, and therefore, the part of the ROI areas in the high-precision map does not need to be updated.

Step 308: and if the local similarity representation ROI areas are different, updating the high-precision map according to the positioning image.

It should be noted that the similarity between the ROI areas of the positioning image and the comparison image is low, and the real environment corresponding to the ROI area reflected by the positioning image can be determined, which is different from the real environment of the ROI area reflected by the high-precision map. This means that it can be determined based on the scout image that the ROI area has changed in the real environment. Therefore, the high-precision map can be updated according to the positioning image so as to modify the high-precision map to improve the accuracy of VPS positioning.

It should be noted that a plurality of ROI regions may be segmented in the positioning image and the comparison image, and the cloud device may sequentially calculate the local similarity of each ROI region, and execute the contents in steps 306 to 308 until the local similarity of each ROI region in the positioning image is calculated.

The following describes a specific implementation of updating a high-precision map from a comparison image.

When the high-precision map is updated by using the comparison image, the high-precision map can be updated by using the method flow shown in fig. 8. As shown in fig. 8, the method includes steps 308-1-308-4.

Step 308-1: and creating a corresponding three-dimensional model according to the ROI area of the positioning image, wherein the three-dimensional model is a point cloud model of an object in the ROI area.

Wherein the high-precision map is a point cloud map. When the high-precision map is updated according to the positioning image, a point cloud model of the ROI is required to be created according to the positioning image. The point cloud corresponding to the ROI area exists in the current high-precision map, and if the fact that the real environment reflected by the ROI area changes can be determined, the point cloud of the ROI area in the high-precision map can be updated, so that the purpose of updating the high-precision map is achieved.

In some implementations, the high-precision map may be updated according to existing algorithms for generating point clouds. For example, a Structure From Motion (SFM) algorithm may be employed to cause a corresponding point cloud model to be generated from the positioning image.

Step 308-2: and generating an updated point cloud map according to the three-dimensional model, wherein the updated point cloud map comprises a point cloud model of the ROI regional object, and storing the original local point cloud in the high-precision map, wherein the original local point cloud comprises the ROI regional object in the contrast image.

It will be appreciated that in this case, the high-precision map includes a point cloud model that contrasts the image ROI area objects, as well as a point cloud model that locates the image ROI area objects.

In the process of updating the high-precision map, changes in the real environment can be verified according to a large number of positioning images. And if the positioning image is not consistent with the comparison image every time, updating the high-precision map according to the positioning image. The high-precision map may be changed frequently, which affects the accuracy of VPS positioning. In addition, because the positioning image reflects the change of the real environment in only one angle, the precision of the point cloud model created according to the positioning image is lower than that of the original high-precision map. Therefore, when the ROI area of the positioning image is determined to be different from that of the contrast image, a point cloud model of the ROI area in the positioning image is generated, and the change of the real environment is verified according to the image (such as a second positioning image) transmitted by other equipment.

Step 308-3: in VPS localization, the access volume of the updated point cloud map and the original local point cloud is recorded.

The access amount is the number of times that other positioning images are updated point cloud maps and original local point clouds.

Illustratively, the cloud device receives a VPS positioning service request, the VPS positioning service request comprises a second positioning image, and a 6DOF pose corresponding to the second positioning image is determined according to the high-precision map.

And if the morphological structure embodied by the ROI area in the second positioning image is the same as that of the updated point cloud map, the access times of recording the updated point cloud map are increased once.

It can be understood that in the VPS positioning process, the number of times of accessing the image including the ROI to the updated point cloud map in the high-precision map is counted to verify whether the real environment changes.

In some implementations, a 3D bounding box is set up in the cloud device, the bounding box including an updated point cloud map and a counter. The counter is used for counting the access times of the updated point cloud. When there are a plurality of update point cloud maps, the cloud device includes a 3D bounding box list, the 3D bounding box list including a plurality of 3D bounding boxes.

It should be noted that in the process of increasing the number of accesses of the updated point cloud map, the ROI area in the second positioning image may include richer details of the object. Therefore, the point cloud map can be optimized and updated according to the second positioning image, so that the accuracy of updating the point cloud map is improved.

Step 308-4: and if the access amount of the updated point cloud map exceeds a preset threshold value, updating the original local point cloud in the high-precision map into the updated point cloud map, and generating the updated high-precision map.

For example, a third threshold may be set, and if the number of access times of the updated point cloud map exceeds (is greater than) the third threshold, it is determined that the ROI area in the real environment has indeed changed, and the updated point cloud map is used to replace the original local point cloud, so as to achieve the purpose of updating the high-precision map.

In some implementations, the number of times of access to the updated point cloud map within the preset time may be counted, and if the number of times of access to the updated point cloud within the preset time exceeds a third threshold, it is determined that the ROI area corresponding to the real environment has indeed changed, and the updated point cloud map is used to replace the original local point cloud.

In other implementations, the access times of the original local point cloud and the access times of the updated point cloud map can be counted, if the access times of the updated point cloud map are larger than the access times of the original local point cloud, the ROI corresponding to the real environment is determined to be changed, and the updated point cloud map is adopted to replace the original local point cloud.

Firstly, the function of detecting the environment change is added at the front end, the difference between crowdsourcing data and a high-precision map is compared, and once the environment change is not detected, the map updating is immediately abandoned, so that the redundant calculation of the rear end is avoided. Secondly, the invention applies a pure visual solution scheme and can update the high-precision map in real time in all weather and all scenes. Thirdly, the method ensures the regional continuity of the new and old space features by means of maintaining the 3D bounding box, and either completely retains the old space features or completely retains the new space features, thereby considering the property of the real-world object component to which each space feature belongs.

Example 2

According to the method, when the positioning image and the comparison image are judged to be the same or not, the global similarity of the positioning image and the comparison image is calculated respectively, and then the global similarity of the positioning image and the comparison image is calculated. The embodiment of the application can create a multitask network architecture which is used for processing the positioning image and the comparison image to obtain a processing result, so that whether the high-precision map is updated or not can be judged according to the processing result.

Please refer to fig. 9, which is a flowchart illustrating a method for updating a map according to an embodiment of the present application. As shown in fig. 9, the method includes steps 901-908.

Step 901: and acquiring a positioning image and a 6DOF pose corresponding to the positioning image.

Step 902: contrast images were generated from the 6DOF pose.

Steps 901 to 902 are the same as steps 301 to 302 in the above embodiments, and details of the implementation may refer to the above embodiments, which are not described herein.

Step 903: and comparing the positioning image with the comparison image, and calculating the similarity of the positioning image and the comparison image, wherein the similarity comprises global similarity and local similarity.

The cloud equipment is provided with a multitask network architecture, and the multitask network architecture can calculate the global similarity and the local similarity of the positioning image and the comparison image. That is, the output result of the multitask network architecture is the result of global similarity of the positioning image and the comparison image, and the result of local similarity. It should be noted that, after the local similarity of the positioning image and the comparison image is calculated, the multitask network architecture gives a result of the local similarity in a thermodynamic diagram manner.

It will be appreciated that the thermodynamic diagram is an image in which the scout image and the contrast image are distinguished by colour markings. For example, a region with low local similarity is marked by red in the thermodynamic diagram, which indicates that the region is a region where the positioning image and the contrast image are completely different; the green mark indicates a region with high local similarity, which indicates that the region is the same as the positioning image and the contrast image. The closer the color on the thermodynamic diagram is to red, the lower the similarity of the region is, and the closer the color is to green, the higher the similarity of the region is.

For example, assume that fig. 10A shows a contrast image, and fig. 10B shows a localization image. When calculating the local similarity of the contrast image and the positioning image, the positioning image and the contrast image are divided into a plurality of ROI areas. The contrast image shown in fig. 10A is divided into an ROI region 11, an ROI region 12, and an ROI region 13; the scout image shown in fig. 10B is divided into an ROI area 21, an ROI area 22, an ROI area 23, and an ROI area 24. The ROI areas in the positioning image and the comparison image are respectively subjected to image processing, the local similarity is calculated, and the multitask network outputs a thermodynamic diagram as shown in fig. 11. As shown in fig. 11, in this thermodynamic diagram, ROI region 24 is a first color indicating that the similarity between the localization image and the contrast image in this region is low, and the similarity between the localization image and the contrast image in this region is high, and the other regions are a second color indicating that the similarity between the localization image and the contrast image in this region is high.

It should be noted that the above-mentioned multitasking network may also be referred to as an end-to-end network, i.e. an input-to-output model. The end-to-end model gives a result of global similarity and a result of local similarity of the positioning image and the comparison image at one time.

Step 904: and if the global similarity representation positioning image is different from the comparison image, determining that VPS positioning fails and not updating the high-precision map.

It can be understood that the multitask network outputs the result of the global similarity and the result of the local similarity, when judging whether to update the high-precision map according to the positioning image, firstly judging whether the VPS positioning is successful according to the result of the global similarity and the result of the local similarity, and if the VPS positioning fails, checking the result of the local similarity is not needed.

The global similarity is determined in the same manner as in step 304, and is not described herein again.

Step 905: if the global similarity representation positioning image is the same as the comparison image, determining that VPS positioning is successful, and judging whether to update the high-precision map according to the local similarity result.

Step 906: and judging whether the local similarity represents that the ROI areas are the same. If the local similarity characterization ROI areas are the same, executing step 907; if the local similarity characterization ROI area is different, go to step 908.

It should be noted that the global similarity indicates that VPS positioning is successful, and whether the high-precision map needs to be updated according to the positioning image can be determined according to a local similarity result, i.e., a thermodynamic diagram.

Step 907: the high-precision map is not updated.

Step 908: and updating the high-precision map according to the positioning image.

If it is determined that the high-precision map is updated according to the positioning image, the specific updating method is the same as the method shown in fig. 8, and is not described herein again.

The above description has been given by taking the electronic device as a cloud device as an example, and when the electronic device is another device, the map may be updated by using the above method. And will not be described in detail herein.

It is understood that the electronic device includes hardware structures and/or software modules for performing the functions in order to realize the functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

An embodiment of the present application further provides an electronic device, including: one or more processors and one or more memories. One or more memories are coupled to the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the associated method steps described above to implement the method of updating a map in the embodiments described above.

Embodiments of the present application further provide a chip system, where the chip system includes at least one processor and at least one interface circuit. The processor and the interface circuit may be interconnected by wires. For example, the interface circuit may be used to receive signals from other devices (e.g., a memory of an electronic device). As another example, the interface circuit may be used to send signals to other devices (e.g., a processor). Illustratively, the interface circuit may read instructions stored in the memory and send the instructions to the processor. The instructions, when executed by the processor, may cause the electronic device to perform the various steps in the embodiments described above. Of course, the chip system may further include other discrete devices, which is not specifically limited in this embodiment of the present application.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on the electronic device, the electronic device is enabled to execute each function or step executed by the mobile phone in the foregoing method embodiment.

The embodiment of the present application further provides a computer program product, which when running on a computer, causes the computer to execute each function or step executed by the mobile phone in the above method embodiments.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application, or portions of the technical solutions that substantially contribute to the prior art, or all or portions of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of updating a map, the method comprising:

acquiring a positioning image and position information corresponding to the positioning image, wherein the position information is used for indicating the position of shooting the positioning image, and the position information is obtained by carrying out visual positioning on the positioning image according to a map;

determining a corresponding contrast image according to the position information;

calculating the similarity of the positioning image and the contrast image, wherein the similarity is used for representing the similarity of the positioning image and the contrast image;

if the similarity represents that the positioning image is similar to the contrast image, the map is not updated;

and if the similarity represents that the positioning image is not similar to the comparison image, updating the map according to the positioning image.

2. The method according to claim 1, wherein the similarity includes a local similarity and a global similarity, the local similarity is used for representing the similarity of local areas in the positioning image and the contrast image, and the global similarity is used for representing the global similarity of the positioning image and the contrast image;

the calculating the similarity between the positioning image and the contrast image comprises:

calculating the global similarity of the positioning image and the contrast image;

if the global similarity represents that the global similarity degree of the positioning image and the comparison image is smaller than a preset threshold value, the map is not updated;

if the global similarity represents that the global similarity of the positioning image and the contrast image is larger than or equal to the preset threshold value, calculating the local similarity of the positioning image and the contrast image.

3. The method according to claim 1, wherein the similarity includes a local similarity for characterizing a local similarity between the positioning image and the contrast image, and a global similarity for characterizing a global similarity between the positioning image and the contrast image;

constructing a multitask network architecture, wherein the multitask network architecture is used for processing the comparison image and the positioning image to obtain an image processing result;

and inputting the positioning image and the comparison image into the multitask network architecture to obtain a processing result, wherein the processing result comprises the global similarity and the local similarity.

4. The method according to any one of claims 1 to 3, wherein the position information includes a shooting location and a shooting pose indicating a shooting field angle of an electronic device that generates the positioning image, and the map is a point cloud map;

the determining a corresponding contrast image according to the position information includes:

determining position coordinates of the point cloud map indicated by the shooting location;

generating a three-dimensional image corresponding to the shooting visual angle according to the shooting visual angle corresponding to the shooting pose based on the shooting place;

rendering the three-dimensional image to generate a two-dimensional image, the two-dimensional image being a contrast image.

5. The method according to any one of claims 1-3, wherein an image library is preset, wherein the image library comprises at least one preset image and position information corresponding to the preset image; the position information includes a shooting location and a shooting pose indicating a shooting field angle of an electronic device that generated the positioning image;

acquiring at least one preset image according to the position information of the positioning image;

and generating a comparison image from the at least one preset image by adopting an image synthesis mode, so that the position information of the comparison image is the same as that of the positioning image.

6. The method according to any one of claims 1-5, further comprising:

acquiring an original image from electronic equipment, wherein the original image is an image which is obtained by shooting by the electronic equipment and is used for carrying out visual positioning on the electronic equipment;

removing dynamic objects in the original image by adopting an image processing algorithm to generate a positioning image;

wherein the dynamic object comprises a vehicle and/or an animal.

7. The method of claim 2, wherein said calculating local similarity of said scout image and said contrast image comprises:

dividing the positioning image into at least one ROI and dividing the contrast image into at least one ROI based on a preset ROI dividing mode, wherein the ROI comprises a target object which comprises at least one of a building, a road or a road sign;

for the ROI at the same position on the positioning image and the comparison image, the following operations are performed:

and comparing the ROI on the positioning image with the ROI on the comparison image, and calculating the similarity of the ROI to obtain the local similarity.

8. The method of claim 7, wherein if the similarity indicates that the positioning image and the comparison image are not similar, updating the map according to the positioning image comprises:

if the local similarity represents that the ROI areas of the contrast image and the positioning image are not similar, generating a three-dimensional (3D) bounding box according to the ROI in the positioning image, wherein the 3D bounding box comprises ROI area point cloud and the access times of the ROI area point cloud;

carrying out visual positioning according to the map, and counting the access times of the point cloud of the ROI area during the visual positioning, wherein the map comprises the 3D bounding box;

and according to the access times of the point clouds of the ROI region in the 3D bounding box, if the access times of the point clouds of the ROI region are determined to exceed the preset times, the original ROI in the map is updated to be the ROI in the positioning image.

9. The method of claim 8, further comprising:

and generating corresponding ROI area point cloud according to the ROI area of the positioning image.

10. The method of any of claims 2-3, 7-9, further comprising:

and if the global similarity represents that the comparison image is not similar to the positioning image, determining that the positioning of the positioning image fails.

11. An apparatus for updating a map, comprising one or more processors; a memory;

and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus for updating a map, cause the apparatus for updating a map to perform the method of any one of claims 1-10.

12. An electronic device, comprising: one or more processors; a memory;

and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the electronic device, cause the electronic device to perform the method of any of claims 1-10.

13. A computer readable storage medium comprising computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-10.