CN115439536B

CN115439536B - Visual map updating method and device and electronic equipment

Info

Publication number: CN115439536B
Application number: CN202210992488.5A
Authority: CN
Inventors: 王星博; 张晋川
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2023-09-26
Anticipated expiration: 2042-08-18
Also published as: CN115439536A

Abstract

The disclosure provides a visual map updating method, a visual map updating device and electronic equipment, relates to the technical field of image processing, and in particular relates to the technical field of artificial intelligence, computer vision and augmented reality. The specific implementation scheme is as follows: acquiring M first images in a target scene, wherein the M first images comprise image contents of a first position and image contents of a second position, the first position is a position which can be positioned through a first visual map, the second position is a position which cannot be positioned through the first visual map, the first visual map is a visual map which is pre-constructed by taking a first coordinate system as a reference, and the first visual map is associated with first data; constructing a second visual map under the target scene based on the M first images by taking a second coordinate system as a reference, wherein the second visual map is associated with second data; determining a transformation relation from the second coordinate system to the first coordinate system based on the M first images; based on the transformation relationship, the second data is fused into the first data to update the first visual map.

Description

Visual map updating method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of image processing, in particular to the technical fields of artificial intelligence, computer vision, augmented reality and the like, and particularly relates to a visual map updating method, a visual map updating device and electronic equipment.

Background

The visual positioning enhancement service (Visual Positioning and Augmenting Service, VPAS) inputs images shot by the user camera, calculates the pose of the user camera in the local map coordinate system through the steps of image searching, feature matching, pose solving and the like, and accordingly achieves visual positioning of the user position.

If the user shoots images at these positions and performs positioning based on the shot images, visual positioning or incorrect visual positioning results cannot be achieved or resolved due to the change of local scenes of the map caused by construction or store change.

At present, a user usually finds that the scene cannot be positioned, and judges that the reason is because after the scene of the map is changed, the user manually carries a designated camera device to acquire the image of the region again, constructs a new visual map by the acquired image, and replaces the original visual map by the new visual map.

Disclosure of Invention

The disclosure provides a visual map updating method, a visual map updating device and electronic equipment.

According to a first aspect of the present disclosure, there is provided a visual map updating method including:

acquiring M first images in a target scene, wherein the M first images comprise image contents of a first position and image contents of a second position, the first position is a position which can be positioned through a first visual map, the second position is a position which can not be positioned through the first visual map, the first visual map is a visual map which is pre-constructed by taking a first coordinate system as a reference, the first visual map is associated with first data, and M is an integer larger than 1;

Constructing a second visual map under the target scene based on the M first images and taking a second coordinate system as a reference, wherein the second visual map is associated with second data;

determining a transformation relationship from the second coordinate system to the first coordinate system based on the M first images;

and fusing the second data into the first data based on the transformation relation to update the first visual map.

According to a second aspect of the present disclosure, there is provided a visual map updating apparatus comprising:

the first acquisition module is used for acquiring M first images in a target scene, wherein the M first images comprise image contents of a first position and image contents of a second position, the first position is a position which can be positioned through a first visual map, the second position is a position which cannot be positioned through the first visual map, the first visual map is a visual map which is pre-constructed by taking a first coordinate system as a reference, the first visual map is associated with first data, and M is an integer larger than 1;

the building module is used for building a second visual map under the target scene based on the M first images and taking a second coordinate system as a reference, and the second visual map is associated with second data;

A determining module, configured to determine a transformation relationship from the second coordinate system to the first coordinate system based on the M first images;

and the fusion module is used for fusing the second data into the first data based on the transformation relation so as to update the first visual map.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any of the methods of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements any of the methods of the first aspect.

According to the technology disclosed by the invention, the problem that the updating efficiency of the visual map is lower is solved, and the updating efficiency of the visual map is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow diagram of a visual map updating method according to a first embodiment of the present disclosure;

fig. 2 is a schematic structural view of a visual map updating apparatus according to a second embodiment of the present disclosure;

fig. 3 is a schematic block diagram of an example electronic device used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First embodiment

As shown in fig. 1, the present disclosure provides a visual map updating method, including the steps of:

step S101: obtaining M first images in a target scene, wherein the M first images comprise image contents of a first position and image contents of a second position, the first position is a position which can be positioned through a first visual map, the second position is a position which can not be positioned through the first visual map, the first visual map is a visual map which is built in advance by taking a first coordinate system as a reference, and the first visual map is associated with first data.

Wherein M is an integer greater than 1.

In this embodiment, the visual map updating method relates to the technical field of image processing, in particular to the technical fields of artificial intelligence, computer vision and augmented reality, and can be widely applied to the visual map updating scene. The visual map updating method of the embodiment of the present disclosure may be performed by the visual map updating apparatus of the embodiment of the present disclosure. The visual map updating apparatus of the embodiment of the present disclosure may be configured in any electronic device to perform the visual map updating method of the embodiment of the present disclosure.

In an application scenario, the electronic device may be a server for positioning a visual map, in a VPAS visual positioning task, a user may take an image of a surrounding environment with a terminal, such as a mobile phone, and upload the image to the server, perform visual positioning in a pre-constructed first visual map, and after resolving a positioning result, the server sends the position of the user to the user terminal, so that the user experiences navigation. If the user shoots images at these positions and performs positioning based on the shot images, the visual positioning cannot be realized or an erroneous visual positioning result is resolved, and therefore the first visual map needs to be updated periodically.

The target scene may be a scene that includes a location in the first visual map where visual localization is not possible, which may include one, two, or even a plurality of locations where visual localization is not possible. These locations where visual localization is not possible may be locations that change from previous scenes, for example, at location a, the store at location a may be a clothing store when the first visual map is constructed, after which the store at location a becomes a food outlet due to the store planning.

The positions where the visual positioning cannot be achieved may also be positions expanded relative to the previous scene, for example, a first visual map of a shopping square is pre-constructed, and due to the need of scale expansion, some stores are additionally expanded around the shopping square and incorporated into the shopping square, and the additionally expanded store positions cannot achieve the visual positioning in the first visual map.

Where the target scene includes two or more locations where visual localization is not possible, the locations may all be adjacent, or may not be adjacent, or may be partially adjacent, or may not be adjacent.

The target scene may also include at least one location where visual localization is achievable in the first visual map, which may or may not be contiguous, not specifically defined herein.

The positions in the target scene generally need to be relatively continuous, have a relative positional relationship, so that the construction of a visual map can be achieved, for example, the target scene includes a position a, a position B, and a position C, which are respectively adjacent and not far apart.

Accordingly, the M first images in the target scene may include image content of a first location that is locatable by the first visual map and image content of a second location that is not locatable by the first visual map.

The first visual map may be a visual map pre-constructed based on a first coordinate system, the first coordinate system may be a three-dimensional 3D coordinate system, the first visual map may be formed by 3D points representing positions, the first points are associated with first data, and the first data may include 3D point data, image data and the like for constructing the first visual map.

The M first images stored in advance may be acquired, or M first images sent by other electronic devices may be received. In an application scenario, a video of a target scene photographed by a user terminal through a camera may be received, and the video may include M first images.

For example, in an application scenario, a VPAS service is already deployed in a shopping mall, and when a user cannot experience the VPAS service in a certain area in the shopping mall, a common reason is that the first visual map cannot identify the area due to a change in the surrounding scenario of the area. For this case, if the user cannot successfully initiate the positioning experience VPAS service between both points a and B of the shopping mall, the nearby location C may initiate the visual positioning. The user may capture a video, starting at position C, and proceeding to positions a and B.

After the video acquisition is completed, the user can upload the video to the server, and correspondingly, the electronic equipment can receive the video in the target scene and acquire M first images in the target scene.

Under a complex scene, for example, a shopping square comprises a plurality of floors, a user can also select the shopping square and the floors to be updated to send to the electronic equipment, so that the electronic equipment can select a first visual map corresponding to the shopping square from a map server for updating, and the first data associated with the first visual map can be the data associated with the floors in the data associated with the first visual map.

Step S102: and constructing a second visual map under the target scene based on the M first images by taking a second coordinate system as a reference, wherein the second visual map is associated with second data.

In this step, the second coordinate system may be a three-dimensional coordinate system, which is different from the first coordinate system, wherein the second coordinate system being different from the first coordinate system may refer to at least one of the origin of coordinates and the direction of coordinate axes being different.

The second visual map may be a visual map constructed based on the second coordinate system, specifically, a 3D point at a position corresponding to any one of the M first images may be used as a coordinate origin or a fixed point of the second coordinate system, and based on the positional relationship of the M first images, the second visual map under the target scene may be constructed by adopting an existing or new visual map construction manner. The positional relationship of the M first images may refer to a relationship between positions related to image contents of the M first images, and the relationship may include a distance relationship, a direction relationship, and the like.

The second visual map may be a local visual map, which may be composed of 3D points characterizing positions in the M first images, which are associated with second data, which may include 3D point data characterizing positions in the target scene, and the M first images.

Step S103: based on the M first images, a transformation relationship of the second coordinate system to the first coordinate system is determined.

In this step, the transformation relation may refer to transforming the second coordinate system by transformation parameters, so that the transformed second coordinate system may be aligned with the first coordinate system, where the transformation parameters may include rotation parameters and translation parameters, i.e. the alignment of the two coordinate systems may be achieved by corresponding rotation and translation.

For each first image, the pose T of the user camera in the second coordinate system can be calculated in a visual positioning mode _CW ，T _CW ＝[R _CW ,t _CW ]The pose can be used for positioning the position corresponding to the first image, namely, if the pose of the user camera in the map coordinate system can be calculated, the visual positioning of the position in the first image can be realized. Wherein R is _CW A rotation matrix from the second coordinate system to the camera coordinate system, t _CW Is the position of the camera coordinate system center in the second coordinate system.

Meanwhile, for each first image, the pose of the user camera in the first coordinate system can be calculated in a visual positioning mode, and as the position which cannot be positioned through the first visual map exists in the target scene, the pose of the user camera in the first coordinate system cannot be necessarily calculated for the first image. And if the pose of the user camera in the first coordinate system can be calculated based on the first image, determining that the first image is visually positioned successfully in the first visual map.

Accordingly, for a first image that is successfully visually located in a first visual map and a second image that is successfully located in a second visual map, two poses may be corresponding, respectively a pose relative to the first visual map and a pose relative to the second visual map.

An equation of the transformation relationship of the second coordinate system to the first coordinate system may be constructed for a plurality of first images, such as 4 first images, that are visually located successfully in the first visual map based on their pose relative to the first visual map and pose relative to the second visual map, by iterating the closest point (Iterative Closest Point, ICP) algorithm Jie Suanchu.

Step S104: and fusing the second data into the first data based on the transformation relation to update the first visual map.

In this step, the 3D point data in the second visual map may be transformed into the first coordinate system based on the transformation relationship of the second coordinate system to the first coordinate system, and the 3D point data transformed into the first coordinate system may be fused into the first data.

Correspondingly, the first visual map constructed based on the fused first data, namely the updated first visual map, not only comprises the 3D point data constructed before, but also comprises the newly added 3D point data representing each position in the target scene, so that the visual positioning of each position in the updated first visual map in the target scene can be realized.

In an application scenario, after the electronic device completes updating the first visual map, the updated first visual map may replace the visual map that is originally constructed, and upload the updated first visual map to the map server. Meanwhile, the electronic device can send a message of completion of updating the visual map to the user, and accordingly, the user can re-experience the VPAS service in an area where visual positioning cannot be realized before.

In this embodiment, the 3D point data in the second visual map is integrally transformed into the first coordinate system by constructing the transformation relationship from the second coordinate system to the first coordinate system, so that the flow of updating the visual map can be simplified, and the efficiency of updating the visual map can be improved. And, compared with the first image with failed visual positioning being identified again by triangulating the first image with successful visual positioning, in a way of updating the first visual map, the scene limitation of the first image with successful visual positioning can be relieved (the scene limitation exists by triangulating the first image with successful visual positioning, such as that the position distance corresponding to the two first images cannot be too close) by reconstructing the new visual map and determining the transformation relation from the second coordinate system to the first coordinate system so as to integrally transform the 3D point data in the second visual map into the first coordinate system based on the transformation relation, thereby improving the success rate and accuracy of the visual map updating.

Optionally, the step S103 specifically includes:

acquiring N target images from the M first images, wherein the target images are images which are successfully positioned in the first visual map, and N is an integer greater than 1;

acquiring N first poses of the N target images relative to the first visual map and N second poses of the N target images relative to the second visual map;

based on the N first poses and the N second poses, a transformation relationship from the second coordinate system to the first coordinate system is calculated.

In this embodiment, N target images may be obtained from M first images, where the target images may be images that are successfully positioned in the first visual map, and the N target images may be all images that are successfully positioned in the first visual map, or may be partial images that are successfully positioned in the first visual map, where for example, 8 images that are successfully positioned in the first visual map exist in the M first images, and 5 images thereof are taken as N target images, so as to determine a transformation relationship from the second coordinate system to the first coordinate system.

An existing or new visual positioning mode can be adopted, visual positioning is attempted in the first visual map based on the first images, if the visual positioning is successful, the first images are used as target images, correspondingly, N target images can be obtained from M first images, N is smaller than M, and the value of N cannot be too small, for example, N can be larger than or equal to 4.

Under the condition that visual positioning is successful, N first poses of N target images relative to the first visual map and N second poses of N target images relative to the second visual map can be respectively obtained through pose solving in the visual positioning process.

Based on the N first poses and the N second poses, an equation of a transformation relation is constructed, and an ICP algorithm is adopted to calculate the transformation relation from the second coordinate system to the first coordinate system. In this way, determination of the transformation relationship of the second coordinate system to the first coordinate system can be achieved.

Optionally, the first data includes an image for constructing the first visual map, and before acquiring N target images from the M first images, the method further includes:

for each first image of the M first images, acquiring global features of the first image, and acquiring local features of the first image, wherein the local features comprise first features for characterizing key points in the first image;

Acquiring a second image matched with the global feature from the first data;

based on the first feature and a pre-acquired second feature, matching the key points in the first image with the key points in the second image to obtain a first matching pair, wherein the second feature is a feature for representing the key points in the second image;

and carrying out pose solving based on the first matching pair to obtain a visual positioning result, wherein the visual positioning result is used for representing whether the first image is successfully positioned in the first visual map or not.

In this embodiment, the visual positioning of the first image in the first visual map may be performed through steps such as image searching, feature matching, pose solving, and the like, to obtain a visual positioning result.

Specifically, the first data may include images for constructing the first visual map, global features and local features of the first image may be extracted by a deep learning model, and global features and local features of the images in the first data may be searched for second images identical or similar to the first image from the first data by means of global feature matching, where the number of the second images is at least one.

And then, matching the key points in the first image with the key points in the second image in a local feature matching mode to obtain a first matching pair, wherein the first matching pair can be a matching pair of the 2D key points in the first image and the 2D key points in the second image.

Correspondingly, triangularization processing can be performed based on the first matching pair, so that a second matching pair of the key points in the first image and the 3D points in the first visual map is obtained, the second matching pair can be a matching pair of the 2D key points and the 3D points, pose solving is performed based on the second matching pair, and a visual positioning result is obtained. In this way, visual localization of the first image in the first visual map may be achieved.

Optionally, the performing pose solving based on the first matching pair to obtain a visual positioning result includes:

acquiring a second matching pair of the first image key point in the first matching pair and the three-dimensional point in the first visual map based on the matching relation between the key point of the second image and the three-dimensional point in the first visual map;

and carrying out pose solving based on the second matching pair to obtain the visual positioning result.

In this embodiment, in the generation process of the first visual map, the 3D points in the first visual map may be constructed based on the triangularization process of the 2D key points in the image, so that the 2D key points in the image and the 3D points in the first visual map may be associated and stored. Correspondingly, based on the matching relation between the key points of the second image and the three-dimensional points in the first visual map, a second matching pair of the 2D key points in the first image and the 3D points in the first visual map can be determined, so that pose solving can be performed based on the second matching pair, visual positioning of the first image in the first visual map is realized, and the steps of pose solving are simplified.

Optionally, the performing pose solving based on the second matching pair to obtain the visual positioning result includes:

under the condition that the pose of the first image relative to the first visual map can be obtained based on the second matching pair, determining that the visual positioning result is successful in visual positioning;

and under the condition that the pose of the first image relative to the first visual map can not be obtained based on the second matching pair, determining that the visual positioning result is visual positioning failure.

In this embodiment, when the pose of the first image relative to the first visual map is obtained by solving based on the second matching pair, it is described that there is a 3D point in the first visual map that is fully matched with a 2D key point in the first image, and accordingly, visual positioning of a position corresponding to the first image may be achieved, and it is determined that the visual positioning result is that the visual positioning is successful.

And under the condition that the pose of the first image relative to the first visual map can not be obtained based on the second matching pair, the fact that 3D points which are matched with 2D key points in the first image in full scale do not exist in the first visual map is explained, the position of the position in the first image in the first visual map can not be determined, and accordingly the visual positioning result is the visual positioning failure.

In this way, the visual localization process of the image in the first visual map can be simplified.

Optionally, the step S104 specifically includes:

transforming three-dimensional points used for representing positions in the second visual map into the first coordinate system based on the transformation relation, wherein the second data comprise the three-dimensional points used for representing positions in the second visual map and the M first images;

three-dimensional points transformed into the first coordinate system are added to the first data in association with the M first images.

In this embodiment, the transformation parameters for representing the transformation relationship may be multiplied by the 3D point data for representing the position in the second visual map, so as to transform the 3D point in the second visual map into the first coordinate system, obtain new 3D point data, and add the new 3D point data and the M first images to the first data in association with each other, so as to perform fusion of the two visual maps on the data layer, and implement updating of the first visual map.

Optionally, the M first images are images in the video under the target scene, and the step S102 specifically includes:

and taking three-dimensional points which are corresponding to the first images of the first frames of the video in the M first images as coordinate origins of the second coordinate system, and constructing a second visual map under the target scene based on the position relation of the M first images.

In this embodiment, the M first images may be image frames in the video, and the video uploaded by the user may be split into image frames to obtain the M first images.

The 3D point of the first image corresponding to the first frame of the video in the M first images can be used as the coordinate origin of the second coordinate system, and based on the position relation of the M first images, the second visual map in the target scene is constructed in an existing or new visual map construction mode, so that the construction of the second visual map can be simplified.

Second embodiment

As shown in fig. 2, the present disclosure provides a visual map updating apparatus 200, including:

a first obtaining module 201, configured to obtain M first images in a target scene, where the M first images include image contents of a first location and image contents of a second location, where the first location is a location that is locatable by a first visual map, the second location is a location that is not locatable by the first visual map, the first visual map is a visual map that is pre-constructed based on a first coordinate system, the first visual map is associated with first data, and M is an integer greater than 1;

a building module 202, configured to build a second visual map under the target scene based on the M first images and taking a second coordinate system as a reference, where the second visual map is associated with second data;

A determining module 203, configured to determine a transformation relationship from the second coordinate system to the first coordinate system based on the M first images;

and a fusion module 204, configured to fuse the second data into the first data based on the transformation relationship, so as to update the first visual map.

Optionally, the determining module 203 is specifically configured to:

Optionally, the first data includes an image for constructing the first visual map, and the apparatus further includes:

a second obtaining module, configured to obtain, for each of the M first images, a global feature of the first image, and obtain a local feature of the first image, where the local feature includes a first feature for characterizing a key point in the first image;

A third obtaining module, configured to obtain a second image matched with the global feature from the first data;

the matching module is used for matching the key points in the first image with the key points in the second image based on the first features and the pre-acquired second features to obtain a first matching pair, wherein the second features are features used for representing the key points in the second image;

and the pose solving module is used for carrying out pose solving based on the first matching pair to obtain a visual positioning result, and the visual positioning result is used for representing whether the first image is successfully positioned in the first visual map or not.

Optionally, the pose solving module includes:

the acquisition unit is used for acquiring a second matching pair of the first image key points in the first matching pair and the three-dimensional points in the first visual map based on the matching relation between the key points of the second image and the three-dimensional points in the first visual map;

and the pose solving unit is used for carrying out pose solving based on the second matching pair to obtain the visual positioning result.

Optionally, the pose solving unit is specifically configured to:

Optionally, the fusion module 204 is specifically configured to:

Optionally, the M first images are images in a video of the target scene, and the building module 202 is specifically configured to:

The visual map updating apparatus 200 provided in the present disclosure can implement each process implemented by the embodiment of the visual map updating method, and can achieve the same beneficial effects, so that repetition is avoided, and no further description is given here.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 3 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 3, the apparatus 300 includes a computing unit 301 that may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the device 300 may also be stored. The computing unit 301, the ROM 302, and the RAM 303 are connected to each other by a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Various components in device 300 are connected to I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, etc.; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, an optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the device 300 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the respective methods and processes described above, such as a visual map updating method. For example, in some embodiments, the visual map updating method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 300 via the ROM 302 and/or the communication unit 309. When the computer program is loaded into the RAM 303 and executed by the computing unit 301, one or more steps of the visual map updating method described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the visual map updating method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A visual map updating method, comprising:

fusing the second data into the first data based on the transformation relationship to update the first visual map;

the determining, based on the M first images, a transformation relationship of the second coordinate system to the first coordinate system, includes:

based on the N first poses and the N second poses, calculating a transformation relation from the second coordinate system to the first coordinate system;

the first data includes images for constructing the first visual map, and the method further includes, prior to acquiring N target images from the M first images:

Matching key points in the first image with key points in a second image in the first data based on first features of the first image and second features acquired in advance to obtain a first matching pair, wherein the second image is matched with global features of the first image, the first features are local features used for representing the key points in the first image, and the second features are local features used for representing the key points in the second image;

and carrying out pose solving based on the second matching pair to obtain a visual positioning result.

2. The method of claim 1, wherein prior to the acquiring N target images from the M first images, the method further comprises:

Acquiring a second image matched with the global feature from the first data;

3. The method of claim 2, wherein the pose solving based on the second matching pair to obtain the visual positioning result comprises:

4. The method of claim 1, wherein the fusing the second data into the first data based on the transformation relationship comprises:

5. The method of claim 1, wherein the M first images are in-video images of the target scene, and the constructing a second visual map of the target scene based on the M first images and based on a second coordinate system comprises:

6. A visual map updating apparatus comprising:

the fusion module is used for fusing the second data into the first data based on the transformation relation so as to update the first visual map;

the determining module is specifically configured to:

the first data includes an image for constructing the first visual map, the apparatus further comprising: the matching module and the pose solving module are used for matching the pose of the object; wherein,

The matching module is used for matching the key points in the first image with the key points in the second image in the first data based on the first features of the first image and the second features acquired in advance to obtain a first matching pair, the second image is matched with the global features of the first image, the first features are local features used for representing the key points in the first image, and the second features are local features used for representing the key points in the second image;

the pose solving module comprises:

and the pose solving unit is used for carrying out pose solving based on the second matching pair to obtain a visual positioning result.

7. The apparatus of claim 6, wherein the apparatus further comprises:

8. The apparatus of claim 7, wherein the pose solving unit is specifically configured to:

9. The apparatus of claim 6, wherein the fusion module is specifically configured to:

10. The apparatus of claim 6, wherein the M first images are in-video images of the target scene, and the building module is specifically configured to:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.