CN114627268A

CN114627268A - Visual map updating method and device, electronic equipment and medium

Info

Publication number: CN114627268A
Application number: CN202210239607.XA
Authority: CN
Inventors: 王星博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-14

Abstract

The present disclosure provides a method, an apparatus, an electronic device and a medium for updating a visual map in an augmented reality scene, and relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, image processing, augmented reality, and the like. The implementation scheme is as follows: determining a plurality of reference images shot by the mobile terminal in a first scene, wherein the first scene has a corresponding visual map; in response to determining that at least one of the plurality of reference images does not match the visual map, determining pose information of the mobile terminal when capturing the at least one reference image; and updating the visual map at least based on the at least one reference image and the pose information corresponding to the at least one reference image.

Description

Visual map updating method and device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision, image processing, augmented reality, and the like, and in particular, to a method and an apparatus for updating a visual map based on augmented reality, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of computer technology, Augmented Reality (AR) technology has been widely applied in many fields such as movie and television, games, maps, and the like. Through the augmented reality technology, real world information and virtual world information are mutually blended, so that the sensory experience beyond reality is realized.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for updating a visual map in an augmented reality scene.

According to an aspect of the present disclosure, there is provided an augmented reality-based visual map updating method, including: determining a plurality of reference images shot by the mobile terminal in a first scene, wherein the first scene has a corresponding visual map; in response to determining that at least one of the plurality of reference images does not match the visual map, determining pose information of the mobile terminal when capturing the at least one reference image; and updating the visual map at least based on the at least one reference image and the pose information corresponding to the at least one reference image.

According to another aspect of the present disclosure, there is provided an augmented reality-based visual map updating apparatus including: a first determining unit configured to determine a plurality of reference images captured by the mobile terminal in a first scene, wherein the first scene has a corresponding visual map; a second determination unit configured to determine pose information of the mobile terminal when capturing the at least one reference image in response to determining that the at least one reference image of the plurality of reference images does not match the visual map; and the updating unit is configured to update the visual map at least based on the at least one reference image and the pose information corresponding to the at least one reference image.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the above-described method when executed by a processor.

According to one or more embodiments of the present disclosure, the update efficiency of the visual map can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 shows a schematic diagram of AR navigation in an indoor scene;

FIG. 2 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of an augmented reality based visual map update method according to an embodiment of the present disclosure;

FIG. 4 illustrates a schematic diagram of determining pose information of a mobile terminal by an inertial measurement system according to an exemplary embodiment of the disclosure;

fig. 5 illustrates a block diagram of a visual map updating apparatus based on augmented reality according to an exemplary embodiment of the present disclosure; and

FIG. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", and the like to describe various elements is not intended to limit the positional relationship, the temporal relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

AR navigation by means of Augmented Reality (AR) technology is currently widely used. Particularly in indoor scenes, satellite navigation and positioning are difficult to realize due to attenuation of satellite signals, and the advantages of AR navigation based on visual positioning are particularly highlighted.

In the AR navigation, a plurality of scene pictures in a scene are acquired in advance through manual acquisition, and a visual map corresponding to the scene is constructed by utilizing the plurality of scene pictures. The visual map includes a plurality of object points in the scene that can be used for positioning and three-dimensional coordinates of each of the plurality of object points in a world coordinate system.

After the user inputs the destination, the reference image is shot in the scene through the mobile terminal. By utilizing the corresponding relation between the feature points in the reference image and the object points in the visual map, the pose information of the terminal equipment when shooting the reference image, such as 6DoF pose, can be determined, and the visual positioning of the user is realized. The pose information not only comprises three-dimensional position coordinates of the mobile terminal in a world coordinate system, but also comprises three-dimensional space orientation of the mobile terminal in the world coordinate system. In this way, a route can be planned according to the destination and the user location, and an identifier for navigation is added to the reference image, indicating the user's route to the destination.

For example, fig. 1 is a schematic diagram of a user performing AR navigation in an indoor scene. After a user inputs a destination, a reference image is shot in an indoor scene through a mobile phone, visual positioning of the user is achieved based on the corresponding relation between the feature point in the reference image and the object point of the visual map, namely the pose information of the terminal device when shooting the reference image is determined, a moving line from the current position of the user to the destination is planned, and the user is guided to go upstairs through an elevator to reach the destination by displaying an upward arrow on the reference image.

However, due to construction or other reasons, a local area in a scene may change, and if a user takes a reference image in the area, an incorrect positioning result cannot be positioned or calculated. Only if the visual map corresponding to the scene is updated in time to ensure the consistency of the visual map and the current scene, the user can successfully realize AR navigation.

In the related art, when the user cannot normally perform AR navigation, a scene picture is collected again in the scene in a manual collection manner to construct a new visual map. This not only consumes additional labor cost, but also results in the map updating speed lagging behind the scene changing speed. In other words, every time a local area in a scene changes, it is necessary to wait for manual re-acquisition of a scene picture and then regenerate a visual map before normal AR navigation can be resumed. Before that, the local area cannot realize normal AR navigation, which undoubtedly affects the stability of the navigation function and reduces the user experience.

Based on this, the present disclosure proposes a map updating method, in response to determining that at least one reference image of a plurality of reference images captured by a mobile terminal does not match a visual map, determining pose information of the mobile terminal at the time of capturing each of the at least one reference image, and further updating the visual map based on at least the pose information corresponding to each of the at least one reference image and the at least one reference image. Therefore, when the reference image is determined not to be matched with the visual map, the visual map can be updated in a local area corresponding to the reference image in a targeted manner based on the reference image and the pose information thereof, extra manual image acquisition is not needed, the labor cost is saved, and the map updating efficiency is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 2 illustrates a schematic diagram of an exemplary system 200 in which various methods and apparatus described herein may be implemented, according to an embodiment of the present disclosure. Referring to fig. 2, the system 200 includes one or

more client devices

201, 202, 203, 204, 205, and 206, a server 220, and one or more communication networks 210 coupling the one or more client devices to the server 220. The

client devices

201, 202, 203, 204, 205, and 206 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 220 may run one or more services or software applications that enable the method of map updating to be performed.

In some embodiments, server 220 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, such as provided to users of

client devices

201, 202, 203, 204, 205, and/or 206 under a software as a service (SaaS) model.

In the configuration shown in fig. 2, server 220 may include one or more components that implement the functions performed by server 220. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

201, 202, 203, 204, 205, and/or 206 may, in turn, utilize one or more client applications to interact with server 220 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 200. Accordingly, fig. 2 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use the

client device

201, 202, 203, 204, 205, and/or 206 to obtain the reference image as a mobile terminal in the present disclosure. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 2 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

201, 202, 203, 204, 205, and/or 206 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems; or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 210 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 210 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

Server 220 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. Server 220 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, server 220 may run one or more services or software applications that provide the functionality described below.

The computing units in server 220 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. Server 220 may also run any of a variety of additional server applications and/or mid-tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 220 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

201, 202, 203, 204, 205, and/or 206. Server 220 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

201, 202, 203, 204, 205, and/or 206.

In some embodiments, server 220 may be a server of a distributed system, or a server that incorporates a blockchain. The server 220 may also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 200 may also include one or more databases 230. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 230 may be used to store information such as audio files and video files. Database 230 may reside in various locations. For example, the database used by server 220 may be local to server 220, or may be remote from server 220 and may communicate with server 220 via a network-based or dedicated connection. The database 230 may be of different types. In certain embodiments, the database used by server 220 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of databases 230 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or conventional stores supported by a file system.

The system 200 of fig. 2 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 3 shows a flowchart of an augmented reality based visual map updating method according to an exemplary embodiment of the present disclosure, the method 300 comprising: step S301, determining a plurality of reference images shot by the mobile terminal in a first scene, wherein the first scene has a corresponding visual map; step S302, in response to the fact that at least one reference image in the plurality of reference images is not matched with the visual map, determining the pose information of the mobile terminal when the at least one reference image is shot; and step S303, updating the visual map at least based on the at least one reference image and the pose information corresponding to the at least one reference image.

Therefore, when the reference image is determined not to be matched with the visual map, the visual map can be updated in a local area corresponding to the reference image in a targeted manner based on the reference image and the pose information thereof, extra manual acquisition is not needed, the labor cost is saved, and the map updating efficiency is improved.

According to some embodiments, determining pose information of the mobile terminal when capturing the at least one reference image may comprise: determining pose information of the mobile terminal when capturing each of the at least one reference image, and updating the visual map based on at least the at least one reference image and the pose information corresponding to the at least one reference image may include: updating the visual map based at least on the respective pose information for the at least one reference image and each of the at least one reference image.

With respect to step S301, according to some embodiments, determining a plurality of reference images taken by the mobile terminal in the first scene may include: receiving a plurality of candidate images shot by a mobile terminal in a first scene and motion information of the mobile terminal in the process of shooting the candidate images, wherein the motion information is obtained through an inertial measurement system; and determining the plurality of candidate images as the plurality of reference images in response to determining that the motion information indicates that the mobile terminal is in a motion state.

Among the candidate images, a reference image used for performing positioning may be used, or another picture not used for positioning may be used. For other pictures not used for positioning, such as a person self-portrait, etc., which do not contain feature points for positioning, the picture not matching with the visual map cannot indicate that the first scene has changed, and should not be used as a basis for updating the visual map. Therefore, these pictures should be excluded before performing the map update.

When visual positioning is intended, a user tends to be in a motion state, for example, to find a destination in a walking state, or to select a reference image for easy positioning by moving a mobile terminal. Therefore, whether the received candidate images are reference images can be verified based on the motion information of the mobile terminal in the process of shooting the candidate images, so that the interference of non-reference images on map updating can be eliminated.

According to some embodiments, the mobile terminal is provided with an inertia measurement system, and the inertia measurement system utilizes inertia sensitive elements such as a gyroscope, an accelerometer and the like, so that displacement information and rotation information of the mobile terminal can be measured in real time.

According to some embodiments, the motion information may include at least one of displacement information and rotation information.

In one embodiment, N +1 candidate images C are captured between times k + N_k～C_k+NIn the course of (where N denotes the time interval of shooting), moveMotion information of terminal

Can be expressed as:

wherein, the first and the second end of the pipe are connected with each other,

representing the relative rotation matrix of the mobile terminal from time k to time k + N,

representing the relative displacement matrix of the mobile terminal from time k to time k + N.

In particular if the relative rotation matrix

Resolved rotation angle of mobile terminal from time k to time k + N

> threshold value alpha_THAnd is relatively displaced

Modulo length > threshold t_THIt is determined that the motion information indicates that the mobile terminal is in a motion state.

According to some embodiments, determining the plurality of reference images taken by the mobile terminal in the first scene may further comprise: receiving a plurality of candidate images shot by the mobile terminal in a first scene and information of included angles between the mobile terminal and the ground in the process of shooting the candidate images; and determining the candidate images as the reference images in response to the fact that the included angle information indicates that the included angles between the mobile terminal and the ground in the process of shooting the candidate images are all larger than a first preset threshold value.

When the user intends to perform visual positioning, the mobile terminal is in a state of forming a certain included angle with the ground, so that the reference image shot by the mobile terminal contains an object point which can be used for positioning in the first scene, and the mobile terminal is not in a state of being approximately horizontal to the ground, so that the reference image only can contain information of the top of the head or the ground. Therefore, whether the received candidate images are reference images or not can be verified based on the information of the included angles between the mobile terminal and the ground in the process of shooting the candidate images, and therefore the interference of the non-reference images on map updating can be eliminated.

In one embodiment, N +1 candidate images C are captured between times k + N_k～C_k+NIn the process, the included angle information of the mobile terminal may be expressed as:

[θ_k，…，θ_k+N]

wherein, theta_k+iIs shown in the photographing candidate image C_k+iAnd when the mobile terminal is in the state of being positioned, the plane of the mobile terminal forms an included angle with the ground. If all included angles theta_k，…，θ_k+NAre all greater than the third threshold value theta_THAnd determining the plurality of candidate images as a plurality of reference images.

With respect to step S302, according to some embodiments, the at least one reference image is two or more reference images continuously captured by the mobile terminal, and the method further includes: before determining that the mobile terminal is capturing the pose information of the at least one reference image, determining whether the at least one reference image matches the visual map by: determining that at least one of the plurality of reference images does not match the visual map may include: determining a degree of match of each of the at least one reference image with the visual map; and determining that the at least one reference image does not match the visual map in response to the proportion of the reference images, of the at least one reference image, of which the matching degree satisfies the preset condition being less than a second preset threshold.

In the above steps, by determining whether the plurality of reference images match the visual map, it can be determined whether the first scene has changed from the currently stored visual map, and then it is determined whether the update of the visual map needs to be performed. However, for each individual reference image, if the reference image does not match the visual map due to poor shooting angle, poor image quality, etc., a misjudgment of the first scene change may be caused, and thus the update of the visual map may be performed erroneously. Therefore, the occurrence of the above-described erroneous judgment can be avoided by taking two or more reference images obtained by continuous shooting as a whole and taking the ratio of the reference image whose matching degree with the visual map satisfies the preset condition as the basis of the judgment.

In particular, the second preset threshold may be 50%.

According to some embodiments, the degree of matching satisfying the preset condition may include the degree of matching being greater than a third preset threshold. And then whether each reference image is matched with the visual map can be conveniently judged.

According to some embodiments, the visual map is constructed based on a plurality of scene pictures previously captured in the first scene, and wherein determining a degree of match of each of the at least one reference images with the visual map may comprise: for each of at least one reference image, determining a similarity of the reference image to each of a first number of scene pictures that are most similar among a plurality of scene pictures; and determining an average of the similarity of the reference image and each of the first number of scene pictures as a matching degree of the reference image and the visual map.

The scene pictures are two-dimensional pictures used for constructing a visual map. Therefore, the embodiment can simplify the judgment of the matching degree between the two-dimensional reference image and the three-dimensional visual map into the judgment of the similarity between the two-dimensional reference image and a plurality of two-dimensional scene pictures, and reduces the processing difficulty and the requirement on computing resources in the processor.

According to some embodiments, determining the similarity of the reference image to each of the most similar first number of scene pictures in the plurality of scene pictures may comprise: extracting feature information of a reference image and feature information of each of a first number of scene pictures; and determining a similarity of the reference image to each of the first number of scene pictures based on the feature information of the reference image and the feature information of each of the first number of scene pictures.

Therefore, the similarity between the reference image and the scene picture can be further determined, the similarity between the characteristic information of the reference image and the characteristic information of the scene picture can be simplified, and the processing difficulty and the requirement on computing resources in the processor are further reduced.

According to some embodiments, the feature information of the reference image and the feature information of each of the first number of scene pictures may be extracted through a neural network, which is not described herein again.

According to some embodiments, the mobile terminal may include an inertial measurement system, and the pose information of the mobile terminal when capturing each of the at least one reference image is obtained by the inertial measurement system.

Specifically, after the mobile terminal determines the reference pose information by successfully implementing the visual localization through the reference image for the last time, the pose information of the mobile terminal when each of the at least one reference image is photographed may be determined based on the reference pose information and the relative pose information measured by the inertial measurement system after the mobile terminal successfully implements the visual localization for the last time. The relative pose information measured by the inertial measurement system may include relative rotation information and relative displacement information.

Fig. 4 shows a schematic diagram of determining pose information of a mobile terminal by an inertial measurement system according to an exemplary embodiment of the present disclosure.

As shown in FIG. 4, the mobile terminals are respectively at t_kTime t_k+1Time t and_k+2visual positioning is successfully realized through reference images at all times, and corresponding pose information T is respectively determined^k、T^k+1And T^k+2. However, in a period between any two of the above-described times, for example, at t_kTime t and_k+1t between moments_lAnd t_l+1At the moment, the reference image shot by the mobile terminal is not matched with the visual map, so that the mobile terminalPose information cannot be acquired through visual positioning.

In this case, it can be determined that the mobile terminal is at t by the relative pose information provided by the inertial measurement system of the mobile terminal_lAnd t_l+1And (5) pose information of the moment. For example, by mixing t_kPose information T of time^kAs reference pose information, superimposing the mobile terminal provided by the inertial measurement system on the reference pose information at t_kTime and t_lRelative pose information between moments can determine that the mobile terminal is at t_lAnd (5) pose information of the moment.

According to some embodiments, updating the visual map based on at least the at least one reference image and the pose information corresponding to the at least one reference image may include: for a first reference image of the at least one reference image, in response to determining that a first feature point in the first reference image and a second feature point in a second reference image of the plurality of reference images other than the first reference image are both from a same object point in the first scene, updating the visual map based on coordinates of the first feature point in the first reference image, coordinates of the second feature point in the second reference image, pose information corresponding to the first reference image, and pose information corresponding to the second reference image.

It can be understood that the first feature point in the first reference image and the second feature point in the second reference image are image points corresponding to object points for positioning in the first scene, which are extracted from the first reference image and the second reference image, respectively. The first feature point and the second feature point respectively have corresponding description information, and the description information may be gray scale or color gradient change around a specific point in the image, and may be represented in a vector form. By comparing the description information corresponding to the first feature point and the description information corresponding to the second feature point, it can be determined whether the first feature point and the second feature point are from the same object point in the first scene.

According to some embodiments, the second reference image may be preferentially determined from a reference image that is adjacent to the first reference image in shooting time among the plurality of reference images. Because the coincidence degree of the reference images close to each other in the shooting time is higher in the picture content, the second reference image meeting the condition is easier to retrieve, and the first feature point in the first reference image can be matched with one second feature point in the second reference image.

According to some embodiments, updating the visual map based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, the pose information corresponding to the first reference image, and the pose information corresponding to the second reference image may include: determining coordinates of object points from which the first characteristic point and the second characteristic point come in a world coordinate system based on the coordinates of the first characteristic point in the first reference image, the coordinates of the second characteristic point in the second reference image, the pose information corresponding to the first reference image and the pose information corresponding to the second reference image; and updating the visual map based on the coordinates of the object point in the world coordinate system. Therefore, the visual map can be updated timely and specifically, and AR positioning can be effectively carried out.

According to some embodiments, coordinates of the object points corresponding to the first feature point and the second feature point in the world coordinate system may be determined by using a parallax principle. The coordinates of the object point in the world coordinate system can be determined by calculating the position deviation between the first characteristic point and the second characteristic point by utilizing the image point corresponding to the object point acquired by the mobile terminal from different positions, namely the first characteristic point in the first reference image and the second characteristic point in the second reference image.

In one embodiment, the coordinates of the object point in the world coordinate system may be determined by:

determining an internal reference matrix K of a shooting device in the mobile terminal, wherein the internal reference matrix K can be expressed as:

wherein f is_xLength of focal length in x-axis direction, f_yLength of focal length in y-axis direction, c_x，c_yIs the offset of the optical axis from the coordinate center of the projection plane.

Representing pose information of a first reference image as T_AAnd the pose information of the second reference image is represented as T_BThe coordinate of the first feature point on the first reference image is represented as [ u ]_A,v_A]And the coordinate of the second feature point on the second positioning image is expressed as [ u [ ]_B,v_B]First projection matrices P for first reference images may be respectively represented_AAnd a second projection matrix P for a second reference image_B：

P_A＝K·T_A

P_B＝K.T_B

Constructing a linear homogeneous system of equations:

wherein the content of the first and second substances,

and

respectively represent a matrix P_ALines 1, 2 and 3,

and

respectively represent a matrix P_BLines 1, 2 and 3.

Solving the X in the linear homogeneous equation set, the coordinate O of the object point in the world coordinate system from which the first characteristic point and the second characteristic point are both derived can be obtained:

fig. 5 shows a block diagram of an augmented reality-based visual map updating apparatus according to an exemplary embodiment of the present disclosure, the apparatus 500 including: a first determining unit 501 configured to determine a plurality of reference images captured by the mobile terminal in a first scene, wherein the first scene has a corresponding visual map; a second determining unit 502 configured to determine pose information of the mobile terminal when capturing the at least one reference image in response to determining that the at least one reference image of the plurality of reference images does not match the visual map; and an updating unit 503 configured to update the visual map based on at least the at least one reference image and pose information corresponding to the at least one reference image.

According to some embodiments, the mobile terminal comprises an inertial measurement system, and the pose information of the mobile terminal when capturing the at least one reference image is obtained by the inertial measurement system.

According to some embodiments, the first determination unit comprises: the mobile terminal comprises a first receiving subunit and a second receiving subunit, wherein the first receiving subunit is configured to receive a plurality of candidate images shot by the mobile terminal in a first scene and motion information of the mobile terminal in the process of shooting the candidate images, and the motion information is obtained through the inertial measurement system; and a first determining subunit configured to determine the plurality of candidate images as the plurality of reference images in response to determining that the motion information indicates that the mobile terminal is in a motion state.

According to some embodiments, the first determining unit further comprises: the second receiving subunit is configured to receive a plurality of candidate images shot by the mobile terminal in a first scene, and information of included angles between the mobile terminal and the ground in the process of shooting the candidate images; and the second determining subunit is configured to determine the plurality of candidate images as the plurality of reference images in response to determining that the included angle information indicates that the included angles between the mobile terminal and the ground in the process of shooting the plurality of candidate images are all larger than a first preset threshold value.

According to some embodiments, the at least one reference image is two or more reference images captured by the mobile terminal continuously, and wherein the apparatus further comprises a third determining unit comprising: a third determining subunit configured to determine a degree of matching of each of the at least one reference image with the visual map; and a fourth determining subunit configured to determine that the at least one reference image does not match the visual map in response to a proportion of reference images, of the at least one reference image, for which a matching degree satisfies a preset condition being less than a second preset threshold.

According to some embodiments, the visual map is constructed based on a plurality of scene pictures previously captured in the first scene, and wherein the third determining subunit comprises: a sub-unit for determining, for each of the at least one reference image, a similarity of the reference image to each of a first number of scene pictures that are most similar among the plurality of scene pictures; and a subunit for determining an average of the similarity of the reference image to each of the first number of scene pictures as a degree of matching of the reference image to the visual map.

According to some embodiments, the update unit comprises: an updating subunit, configured to update, for a first reference image of the at least one reference image, the visual map based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, the pose information corresponding to the first reference image, and the pose information corresponding to the second reference image, in response to determining that both the first feature point in the first reference image and the second feature point in a second reference image of the plurality of reference images other than the reference image are from the same object point in the first scene.

According to some embodiments, the update subunit comprises: a subunit, configured to determine coordinates of an object point in a world coordinate system from which both the first feature point and the second feature point are derived, based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, pose information corresponding to the first reference image, and pose information corresponding to the second reference image; and a subunit for updating the visual map based on the coordinates of the object point in the world coordinate system.

The present disclosure also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any one of the methods described above.

The present disclosure also provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform any one of the methods described above.

The present disclosure also provides a computer program product comprising a computer program, wherein the computer program realizes any of the methods described above when executed by a processor.

Referring to fig. 6, a block diagram of a structure of an electronic device 600, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 608 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the map update method. For example, in some embodiments, the map update method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the map updating method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the map update method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. An augmented reality based visual map updating method comprising:

determining a plurality of reference images shot by a mobile terminal in a first scene, wherein the first scene has a corresponding visual map;

in response to determining that at least one of the plurality of reference images does not match the visual map, determining pose information of the mobile terminal when capturing the at least one reference image; and

and updating the visual map at least based on the at least one reference image and the pose information corresponding to the at least one reference image.

2. The method according to claim 1, wherein the mobile terminal comprises an inertial measurement system, and the pose information of the mobile terminal when capturing the at least one reference image is derived by the inertial measurement system.

3. The method of claim 2, wherein the determining the plurality of reference images taken by the mobile terminal in the first scene comprises:

receiving a plurality of candidate images shot by the mobile terminal in a first scene and motion information of the mobile terminal in the process of shooting the candidate images, wherein the motion information is obtained through the inertial measurement system; and

determining the plurality of candidate images as the plurality of reference images in response to determining that the motion information indicates that the mobile terminal is in a motion state.

4. The method of claim 3, wherein the motion information comprises at least one of displacement information and rotation information.

5. The method of any of claims 1-4, wherein the determining the plurality of reference images taken by the mobile terminal in the first scene further comprises:

receiving a plurality of candidate images shot by the mobile terminal in a first scene and information of included angles between the mobile terminal and the ground in the process of shooting the candidate images; and

and determining the candidate images as the reference images in response to the fact that the included angle information indicates that the included angles between the mobile terminal and the ground in the process of shooting the candidate images are all larger than a first preset threshold value.

6. The method according to any one of claims 1 to 5, wherein the at least one reference image is two or more reference images continuously captured by the mobile terminal, the method further comprising:

before determining pose information of the mobile terminal when the at least one reference image is captured, determining whether the at least one reference image matches the visual map by:

determining a degree of match of each of the at least one reference image with the visual map; and

and in response to the proportion of the reference images with the matching degree meeting the preset condition in the at least one reference image being smaller than a second preset threshold value, determining that the at least one reference image is not matched with the visual map.

7. The method of claim 6, wherein the matching degree satisfying a preset condition comprises the matching degree being greater than a third preset threshold.

8. The method of claim 6 or 7, wherein the visual map is constructed based on a plurality of scene pictures previously captured in the first scene, and wherein the determining a degree of match of each of the at least one reference image with the visual map comprises:

for each of the at least one reference image, determining a similarity of the reference image to each of a first number of scene pictures that are most similar among the plurality of scene pictures; and

determining an average of the similarity of the reference image and each of the first number of scene pictures as a degree of matching of the reference image with the visual map.

9. The method of claim 8, wherein the determining the similarity of the reference image to each of the most similar first number of the plurality of scene pictures comprises:

extracting feature information of the reference image and feature information of each of the first number of scene pictures; and

determining a similarity of the reference image to each of the first number of scene pictures based on the feature information of the reference image and the feature information of each of the first number of scene pictures.

10. The method of any of claims 1 to 9, wherein the updating the visual map based at least on the at least one reference image and pose information to which the at least one reference image corresponds comprises:

for a first reference image of the at least one reference image, in response to determining that both a first feature point in the first reference image and a second feature point in a second reference image of the plurality of reference images other than the first reference image are from the same object point in the first scene, updating the visual map based on coordinates of the first feature point in the first reference image, coordinates of the second feature point in the second reference image, pose information corresponding to the first reference image, and pose information corresponding to the second reference image.

11. The method of claim 10, wherein the updating the visual map based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, pose information corresponding to the first reference image, and pose information corresponding to the second reference image comprises:

determining coordinates of object points corresponding to the first feature point and the second feature point in a world coordinate system based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, the pose information corresponding to the first reference image and the pose information corresponding to the second reference image; and

updating the visual map based on coordinates of the object point in a world coordinate system.

12. An augmented reality based visual map updating apparatus comprising:

a first determining unit configured to determine a plurality of reference images captured by a mobile terminal in a first scene, wherein the first scene has a corresponding visual map;

a second determination unit configured to determine pose information of the mobile terminal when capturing at least one reference image of the plurality of reference images in response to determining that the at least one reference image does not match the visual map; and "

An updating unit configured to update the visual map based on at least the at least one reference image and pose information corresponding to the at least one reference image.

13. The apparatus of claim 12, wherein the mobile terminal comprises an inertial measurement system, and pose information of the mobile terminal when capturing the at least one reference image is derived by the inertial measurement system.

14. The apparatus of claim 13, wherein the first determining unit comprises:

the mobile terminal comprises a first receiving subunit and a second receiving subunit, wherein the first receiving subunit is configured to receive a plurality of candidate images shot by the mobile terminal in a first scene and motion information of the mobile terminal in the process of shooting the candidate images, and the motion information is obtained through the inertial measurement system; and

a first determining subunit configured to determine the plurality of candidate images as the plurality of reference images in response to determining that the motion information indicates that the mobile terminal is in a motion state.

15. The apparatus of any of claims 12 to 14, wherein the first determining unit further comprises:

the second receiving subunit is configured to receive a plurality of candidate images shot by the mobile terminal in a first scene, and information of included angles between the mobile terminal and the ground in the process of shooting the candidate images; and

the second determining subunit is configured to determine the plurality of candidate images as the plurality of reference images in response to determining that the included angle information indicates that the included angles between the mobile terminal and the ground in the process of shooting the plurality of candidate images are all larger than a first preset threshold value.

16. The apparatus according to any one of claims 12 to 15, wherein the at least one reference image is two or more reference images continuously captured by the mobile terminal, and wherein the apparatus further comprises a third determining unit comprising:

a third determining subunit configured to determine a degree of matching of each of the at least one reference image with the visual map; and

a fourth determining subunit configured to determine that the at least one reference image does not match the visual map in response to a proportion of reference images, of the at least one reference image, for which a matching degree satisfies a preset condition being less than a second preset threshold.

17. The apparatus of claim 16, wherein the visual map is constructed based on a plurality of scene pictures pre-captured in the first scene, and wherein the third determining subunit comprises:

a sub-unit for determining, for each of the at least one reference image, a similarity of the reference image to each of a first number of scene pictures that are most similar among the plurality of scene pictures; and

a subunit for determining an average of the similarity of the reference image to each of the first number of scene pictures as a degree of matching of the reference image to the visual map.

18. The apparatus according to any one of claims 12 to 17, wherein the updating unit comprises:

an updating subunit configured to update, for a first reference image of the at least one reference image, the visual map based on the coordinates of the first feature point in the first reference image, the coordinates of the second feature point in the second reference image, the pose information corresponding to the first reference image, and the pose information corresponding to the second reference image, in response to determining that both the first feature point in the first reference image and the second feature point in a second reference image of the plurality of reference images other than the first reference image are from the same object point in the first scene.

19. The apparatus of claim 18, wherein the update subunit comprises:

a subunit, configured to determine coordinates of object points corresponding to the first feature point and the second feature point in a world coordinate system based on the coordinate of the first feature point in the first reference image, the coordinate of the second feature point in the second reference image, pose information corresponding to the first reference image, and pose information corresponding to the second reference image; and

a subunit for updating the visual map based on coordinates of the object point in a world coordinate system.

20. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

21. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-11.

22. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-11 when executed by a processor.