CN116679824A

CN116679824A - Man-machine interaction method and device in augmented reality AR scene and electronic equipment

Info

Publication number: CN116679824A
Application number: CN202210168435.1A
Authority: CN
Inventors: 王润之; 冯思淇; 李江伟; 时天欣; 徐其超; 林涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2023-09-01
Also published as: WO2023160072A1

Abstract

The embodiment of the application provides a man-machine interaction method, a device and electronic equipment in an Augmented Reality (AR) scene, wherein in the method, a real scene can be shot, the shot real scene is displayed on an interface of a terminal, and an identifier of a virtual object to be selected is also displayed on the interface; responding to the operation of the user on the identification of the first virtual object, and displaying the first virtual object in a real scene shot by the terminal; and in response to the operation of the user on the identification of the second virtual object, displaying the second virtual object in the real scene in which the first virtual object is displayed. The embodiment of the application can enrich man-machine interaction modes in the augmented reality AR scene and enrich the patterns for displaying the virtual objects in the AR scene.

Description

Man-machine interaction method and device in augmented reality AR scene and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of Augmented Reality (AR), in particular to a man-machine interaction method, a man-machine interaction device and electronic equipment in an Augmented Reality (AR) scene.

Background

With the continuous enhancement of processing and rendering capabilities of terminals, many applications based on augmented reality (augmented reality, AR) technology are increasing, such as AR measurement applications or AR props.

Currently, the style of displaying virtual objects in an augmented reality AR scene is single.

Disclosure of Invention

The embodiment of the application provides a man-machine interaction method, a man-machine interaction device and electronic equipment in an Augmented Reality (AR) scene, which can enrich the patterns of displaying virtual objects in the AR scene.

In a first aspect, an embodiment of the present application provides a human-computer interaction method in an AR scene, where an execution body for executing the method may be a terminal or a chip in the terminal, and the following embodiment is described by taking the terminal as an example.

In this method, in one embodiment, the terminal may photograph a real scene, and the photographed real scene is displayed on an interface of the terminal, on which an identification of a virtual object to be selected is also displayed. The terminal may display the first virtual object in a real scene photographed by the terminal in response to an operation of a user for identification of the first virtual object, and display the second virtual object in a real scene in which the first virtual object has been displayed in response to an operation of the user for identification of the second virtual object.

In the embodiment of the application, the user can continuously operate the identification of the virtual object, and a plurality of virtual objects are displayed in the real scene shot by the terminal, and the plurality of virtual objects can be the same or different. In the embodiment of the application, the user is not limited to the limitation of one-time use of the AR prop, the virtual object can be displayed at the corresponding position of the real scene, the style of displaying the virtual object in the AR scene can be enriched, the man-machine interaction mode in the AR scene can be enriched, and the user experience is improved.

In the method, in one embodiment, a terminal may photograph a real scene and display the photographed real scene on an interface of the terminal. And a preset position exists on the interface of the terminal, and the corresponding position of the real scene corresponding to the preset position is used for displaying the virtual object. The preset position may be at least one.

In this embodiment, if the virtual object is preset, when the terminal displays the photographed real scene, the virtual object may be displayed at a corresponding position of the real scene corresponding to the preset position. The first virtual object is displayed at a third position of a real scene photographed by the terminal, and the second virtual object is displayed at a fourth position of the real scene photographed by the terminal. The second virtual object may be the same as or different from the first virtual object. In one embodiment, the third location of the real scene may be a location in the real scene corresponding to the first location, and the fourth location of the real scene may be a location in the real scene corresponding to the second location.

In this embodiment, if the virtual object is set by the user, when the terminal displays the photographed real scene, the identifier of the virtual object to be selected may be displayed on the interface of the terminal. The terminal can respond to the operation of the user on the identification of the first virtual object, display the first virtual object at the corresponding position of the real scene corresponding to the preset position, and respond to the operation of the user on the identification of the second virtual object, and display the second virtual object at the corresponding position of the real scene corresponding to the preset position.

Or, the terminal may display the first virtual object at a corresponding position of the real scene corresponding to one preset position in response to the operation of the user on the identification of the first virtual object, and display the second virtual object at a corresponding position of the real scene corresponding to another preset position in response to the operation of the user on the identification of the second virtual object.

In the method, in one embodiment, a terminal may photograph a real scene and display the photographed real scene on an interface of the terminal. The user can also control the terminal to display the first virtual object at one position of the photographed real scene and display the second virtual object at another position of the photographed real scene in a voice interaction mode.

In the following embodiments, a procedure in which a terminal displays a first virtual object in a photographed real scene and displays a second virtual object in a real scene in which the first virtual object has been displayed is described:

taking a terminal as an example, a first virtual object is displayed in a photographed real scene, the terminal responds to the operation of the user on the identification of the first virtual object, obtains a first pose of the terminal, obtains a first mapping point of the first position on a first virtual plane, and obtains a three-dimensional coordinate of the first mapping point in a camera coordinate system. In this way, the terminal may display the first virtual object in the real scene shot by the terminal according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system. In one embodiment, the first location may be preset or determined by user operation.

Specifically, the terminal may send and display the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system, so that the terminal displays the first virtual object in a real scene shot by the terminal.

In a possible implementation manner, a preset virtual plane set is preset in the embodiment of the present application, where the set may include, for example, a first virtual plane, a second virtual plane, and a third virtual plane. The second virtual plane may be perpendicular to the second virtual plane and the third virtual plane, respectively, and the second virtual plane and the third virtual plane may be perpendicular to each other. Any of the first virtual plane, the second virtual plane, and the third virtual plane may be planar with the ground (or horizontal plane).

In one embodiment, if an intersection exists between a ray from the first pose to the two-dimensional coordinate and a first virtual plane, the terminal may acquire the two-dimensional coordinate of the first position in the image coordinate system, and take the intersection between the ray from the first pose to the two-dimensional coordinate and the first virtual plane as the first mapping point.

In one embodiment, if the ray from the first pose to the two-dimensional coordinate has no intersection point with the first virtual plane, acquiring intersection points of the ray from the first pose to the two-dimensional coordinate and other virtual planes in the preset virtual plane set; and taking the intersection point of the virtual plane and other virtual planes in the preset virtual plane set as the first mapping point.

In the embodiment of the application, the virtual plane can be preset, so that the operation of scanning the plane in advance when a user uses the AR function can be avoided, the operation of the user is simplified, and the efficiency is improved. In addition, by setting a plurality of different virtual planes in the embodiment of the application, the intersection point between the ray from the first pose to the two-dimensional coordinate and one virtual plane can be ensured, so that smooth generation of the virtual object is ensured, and the generation accuracy of the virtual object is improved.

In one possible implementation manner, in response to the operation of the user on the identification of the first virtual object, the terminal may quickly generate the first virtual object according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system. However, in the embodiment of the present application, in order to enable a user to actually feel that a first virtual object exists in a real scene, and along with the change of the pose of a terminal, the size of the first virtual object is not mutated, the terminal may track an image block corresponding to the first position, and when the terminal is in a second pose, acquire a two-dimensional coordinate of the first position in the image coordinate system; and displaying the first virtual object in the real scene shot by the terminal according to the first pose, the two-dimensional coordinate of the first position in the image coordinate system when the first pose is, the second pose and the two-dimensional coordinate of the first position in the image coordinate system when the second pose is.

Wherein, in one possible implementation, the distance between the second pose and the first pose is greater than or equal to a distance threshold; and/or the frame number of the image shot by the terminal in the process of moving from the first pose to the second pose is greater than or equal to a preset frame number; and/or the duration of the terminal moving from the first pose to the second pose is longer than a preset duration; and/or the second pose is a pose when the terminal successfully triangulates the first mapping point.

In the embodiment of the present application, the displaying, by the terminal, the first virtual object in the real scene photographed by the terminal according to the first pose, the two-dimensional coordinate of the first position in the image coordinate system when the first pose, the second pose, and the two-dimensional coordinate of the first position in the image coordinate system when the second pose, may specifically include: the terminal obtains a three-dimensional coordinate of the first position in a world coordinate system according to the first pose, a two-dimensional coordinate of the first position in the image coordinate system when the first pose is, the second pose and a two-dimensional coordinate of the first position in the image coordinate system when the second pose is, and obtains a scaling corresponding to the first position according to a first distance and a second distance, wherein the first distance is: the distance from the first pose to the first mapping point is as follows: distance from the first pose to a three-dimensional coordinate of the first position in the world coordinate system.

After obtaining the scaling corresponding to the first position, the terminal may scale the three-dimensional coordinates of the second pose and the first position in the world coordinate system according to the scaling corresponding to the first position, to obtain a third pose and a scaled three-dimensional coordinate corresponding to the first position, and further display the first virtual object in the real scene shot by the terminal according to the third pose and the scaled three-dimensional coordinate corresponding to the first position.

In one possible implementation manner, the terminal may scale the three-dimensional coordinate of the first location in the world coordinate system to the three-dimensional coordinate of the first mapping point in the camera coordinate system, where the scaled three-dimensional coordinate of the first location is the same as the three-dimensional coordinate of the first mapping point in the camera coordinate system. Furthermore, the terminal can correspondingly scale the second pose according to the scaling ratio of the three-dimensional coordinate of the first position in the world coordinate system, and a third pose is obtained.

The above embodiments briefly describe a process of displaying the first virtual object in a real scene photographed by the terminal, where displaying the second virtual object in the real scene photographed by the terminal in response to the user's operation on the identification of the second virtual object may refer to a description related to displaying the first virtual object by the terminal.

In one possible implementation manner, if the terminal displays the virtual object at a plurality of positions of the real scene shot by the terminal, based on the description in the above embodiment, the terminal needs to send and display each pose (zoomed pose) of the terminal and the zoomed three-dimensional coordinates corresponding to the positions, so that the calculation amount of the terminal in rendering and displaying the first virtual object and the second virtual object is large and the speed is slow. In the embodiment of the application, in order to reduce the calculation amount of the terminal and improve the speed of displaying the virtual object by the terminal, the scaled three-dimensional coordinates corresponding to each pose can be unified under the same pose (under a coordinate system).

And the terminal is in the first pose when the user executes the operation of identifying the second virtual object. The terminal may obtain a second mapping point of the second position on a second virtual plane in response to the user's operation on the identification of the second virtual object, and obtain three-dimensional coordinates of the second mapping point in the camera coordinate system, and display the second virtual object in a real scene captured by the terminal (the real scene is a real scene displayed on a terminal interface on which the first virtual object is displayed) according to the first pose and the three-dimensional coordinates of the second mapping point in the camera coordinate system. Wherein the second position is a second mapping point on the second virtual plane, and the process of obtaining the two-dimensional coordinates of the second mapping point may refer to the related description of the first mapping point.

In one possible implementation manner, the second pose is a pose of the user after the user performs the identification of the second virtual object, when the terminal is the second pose, the terminal may scale, according to a scaling scale corresponding to the second position, three-dimensional coordinates of the second pose and the second position in the world coordinate system, so as to obtain a fifth pose and a scaled three-dimensional coordinate corresponding to the second position, where the three-dimensional coordinates of the second position in the world coordinate system are: based on the first pose, the two-dimensional coordinates of the second position in the image coordinate system at the first pose, the second pose, and the two-dimensional coordinates of the second position in the image coordinate system at the second pose.

The terminal obtains the three-dimensional coordinate of the second position in the world coordinate system, the scaling corresponding to the second position, and the scaling process can refer to the related description corresponding to the first position.

In an embodiment, the terminal may display the second virtual object in the real scene shot by the terminal according to the fifth pose and the scaled three-dimensional coordinate corresponding to the second position.

Specifically, the terminal may translate the fifth pose to the third pose, and translate the scaled three-dimensional coordinate corresponding to the second position according to the distance and the direction from the translation of the fifth pose to the third pose, so as to obtain the scaled three-dimensional coordinate corresponding to the second position under the third pose. Furthermore, the terminal may display the second virtual object in the real scene shot by the terminal according to the third pose and the scaled three-dimensional coordinate corresponding to the second position in the third pose.

In the embodiment of the application, the scaled three-dimensional coordinates corresponding to each position can be displayed under the pose of the same terminal in a unified manner, instead of displaying the scaled three-dimensional coordinates corresponding to each position and the pose of the terminal, the calculated amount of the terminal when rendering the virtual object can be reduced, the speed of displaying the virtual object can be improved, and the user experience can be further improved.

In one possible implementation, the user may also interact with virtual objects displayed on the terminal. For example, the user can select, delete, move, zoom, etc. the virtual object, and the user can interact with the virtual object in a voice manner, etc.

In a second aspect, an embodiment of the present application provides a human-computer interaction device in an AR scene, where the device may be a terminal in the first aspect or a chip in the terminal. The apparatus may include:

in one embodiment, a shooting module is used for shooting a real scene.

The display module is used for displaying a photographed real scene on an interface of the terminal, the interface is also provided with an identification of a virtual object to be selected, the first virtual object is displayed in the real scene photographed by the terminal in response to the operation of the user on the identification of the first virtual object, and the second virtual object is displayed in the real scene displayed with the first virtual object in response to the operation of the user on the identification of the second virtual object.

In a possible implementation manner, a processing module is used for responding to the operation of the user on the identification of the first virtual object and acquiring a first pose of the terminal; acquiring a first mapping point of the first position on a first virtual plane, wherein the first position is a preset position on the interface or a position determined by the user on the interface; and acquiring the three-dimensional coordinates of the first mapping point in a camera coordinate system.

The display module is specifically configured to display the first virtual object in a real scene shot by the terminal according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system.

In a possible implementation manner, the processing module is specifically configured to acquire a two-dimensional coordinate of the first position in an image coordinate system; and taking an intersection point of the ray from the first pose to the two-dimensional coordinate and the first virtual plane as the first mapping point.

In a possible implementation manner, the first virtual plane is included in a preset virtual plane set, and the processing module is further configured to obtain an intersection point of the ray from the first pose to the two-dimensional coordinate and other virtual planes in the preset virtual plane set if there is no intersection point between the ray from the first pose to the two-dimensional coordinate and the first virtual plane; and taking the intersection point of the virtual plane and other virtual planes in the preset virtual plane set as the first mapping point.

In a possible implementation manner, the processing module is further configured to track an image block corresponding to the first position; and when the terminal is in the second pose, acquiring the two-dimensional coordinates of the first position in the image coordinate system.

The display module is further configured to display the first virtual object in a real scene shot by the terminal according to the first pose, a two-dimensional coordinate of the first position in the image coordinate system when the first pose is, the second pose, and a two-dimensional coordinate of the first position in the image coordinate system when the second pose is.

In one possible implementation, the distance between the second pose and the first pose is greater than or equal to a distance threshold; and/or the frame number of the image shot by the terminal in the process of moving from the first pose to the second pose is greater than or equal to a preset frame number; and/or the duration of the terminal moving from the first pose to the second pose is longer than a preset duration; and/or the second pose is a pose when the terminal successfully triangulates the first mapping point.

In a possible implementation manner, the processing module is specifically configured to obtain a three-dimensional coordinate of the first position in a world coordinate system according to the first pose, a two-dimensional coordinate of the first position in the image coordinate system when the first pose is performed, the second pose, and a two-dimensional coordinate of the first position in the image coordinate system when the second pose is performed; according to a first distance and a second distance, obtaining a scaling corresponding to the first position, wherein the first distance is as follows: a distance from the first pose to a three-dimensional coordinate of the first mapping point in the camera coordinate system, wherein the second distance is: a distance from the first pose to a three-dimensional coordinate of the first position in the world coordinate system; and respectively scaling the three-dimensional coordinates of the second pose and the first position in the world coordinate system according to the scaling ratio corresponding to the first position to obtain scaled three-dimensional coordinates corresponding to the third pose and the first position.

The display module is further configured to display the first virtual object in a real scene shot by the terminal according to the third pose and the scaled three-dimensional coordinate corresponding to the first position.

In a possible implementation manner, the processing module is specifically configured to scale the three-dimensional coordinate of the first location in the world coordinate system to the three-dimensional coordinate of the first mapping point in the camera coordinate system, where the scaled three-dimensional coordinate corresponding to the first location is the same as the three-dimensional coordinate of the first mapping point in the camera coordinate system.

In a possible implementation manner, the terminal is the first pose when the user performs the operation of identifying the second virtual object, and the processing module is specifically configured to obtain, in response to the operation of the user on the identification of the second virtual object, a second mapping point of the second position on a second virtual plane, where the second position is a preset position on the interface or a position determined by the user on the interface; and acquiring the three-dimensional coordinates of the second mapping point in the camera coordinate system.

The display module is further configured to display the second virtual object in a real scene in which the first virtual object is displayed according to the first pose and the three-dimensional coordinates of the second mapping point in the camera coordinate system.

In a possible implementation manner, when the terminal is in the second position, the processing module is specifically configured to scale, according to a scaling ratio corresponding to the second position, three-dimensional coordinates of the second position and the second position in the world coordinate system, so as to obtain scaled three-dimensional coordinates corresponding to the second position and the fifth position, where the three-dimensional coordinates of the second position in the world coordinate system are: based on the first pose, the two-dimensional coordinates of the second position in the image coordinate system at the first pose, the second pose, and the two-dimensional coordinates of the second position in the image coordinate system at the second pose.

The display module is specifically configured to display the second virtual object in the real scene in which the first virtual object is displayed according to the fifth pose and the scaled three-dimensional coordinate corresponding to the second position.

In one possible implementation, a processing module is specifically configured to translate the fifth pose to the third pose; and translating the scaled three-dimensional coordinate corresponding to the second position according to the distance and the direction from the fifth pose to the third pose, so as to obtain the scaled three-dimensional coordinate corresponding to the second position under the third pose.

The display module is specifically configured to display the second virtual object in a real scene in which the first virtual object is displayed according to the third pose and the scaled three-dimensional coordinate corresponding to the second position in the third pose.

In a third aspect, an embodiment of the present application provides an electronic device, which may include: a processor and a memory. The memory is for storing computer executable program code, the program code comprising instructions; the instructions, when executed by a processor, cause the electronic device to perform the method as in the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, which may be a human-computer interaction device in an AR scene. The electronic device may comprise means, modules or circuits for performing the method provided in the first aspect above.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first aspect described above.

The advantages of each of the possible implementation manners of the second aspect to the sixth aspect may be referred to as the advantages brought by the first aspect, and are not described herein.

The embodiment of the application provides a man-machine interaction method, a device and electronic equipment in an Augmented Reality (AR) scene, wherein in the method, a terminal can shoot a real scene, the shot real scene is displayed on an interface of the terminal, and an identifier of a virtual object to be selected is also displayed on the interface; responding to the operation of the user on the identification of the first virtual object, and displaying the first virtual object in a real scene shot by the terminal; and in response to the operation of the user on the identification of the second virtual object, displaying the second virtual object in the real scene in which the first virtual object is displayed. The man-machine interaction method provided by the embodiment of the application not only can enrich man-machine interaction modes in the augmented reality AR scene, but also enriches the patterns for displaying the virtual objects in the AR scene.

Drawings

FIG. 1 is a schematic diagram of an AR application interface in the prior art;

FIG. 2 is a schematic diagram of an interface of another AR application in the prior art;

fig. 3 is a flowchart illustrating an embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an interface according to an embodiment of the present application;

FIG. 5 is a schematic illustration of another interface provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of another interface provided by an embodiment of the present application;

fig. 7 is a flowchart of another embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a virtual plane according to an embodiment of the present application;

fig. 9 is a schematic diagram of an image block corresponding to a first position according to an embodiment of the present application;

FIG. 10 is a schematic illustration of triangularization provided by an embodiment of the present application;

FIG. 11A is a schematic illustration of scaling provided by an embodiment of the present application;

FIG. 11B is another schematic view of scaling provided by an embodiment of the present application;

fig. 12 is a flowchart of another embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

fig. 13 is a schematic diagram of unifying scaled three-dimensional coordinates to a pose of a terminal according to an embodiment of the present application;

FIG. 14 is a schematic view of another interface provided by an embodiment of the present application;

FIG. 15 is a schematic view of another interface provided by an embodiment of the present application;

fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present application;

Fig. 17 is a flowchart of another embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

fig. 18 is a flowchart of another embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

fig. 19 is a flowchart of another embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

FIG. 1 is a schematic diagram of an AR application interface in the prior art. An augmented reality (augmented reality, AR) application is illustrated in fig. 1 as an AR measurement application. Referring to fig. 1, a user opens an AR measurement application, displays a real scene photographed by a terminal on an interface of the terminal, and displays a prompt message on the interface. For example, a prompt message such as "slow moving device, find a plane where an object is located" prompts the user to scan a real scene using the terminal to determine a plane in the real scene, and further measure a distance, an area, a volume, a height, and the like. In the AR application program, a real scene needs to be scanned in advance to obtain a plane in the real scene, so that functions in the AR application program can be continuously used, and the AR application program is complex in operation and low in efficiency. For example, AR applications, such as those developed based on ARkit and arore, require scanning real scenes for planar detection before using the functionality in the AR application.

Alternatively, in the prior art, the real scene does not need to be scanned in advance before the function in the AR application is used, but prior map information of the real scene needs to be acquired and stored in advance, and then the function in the AR application is used based on the prior map information of the real scene. Illustratively, the a priori map information may include, but is not limited to: a three-dimensional point cloud map of a real scene, a map including a plane, or the like. In such an example, because a priori map information of the real scene is stored in the terminal, the terminal may use functions in the AR application when using the AR application based on the pose of the terminal, and information such as a plane in the a priori map information. In the method, although the real scene does not need to be scanned when the AR application program is used, prior map information of the real scene needs to be acquired in advance, and professional staff is required to acquire the prior map information of the real scene, so that the difficulty is high and the efficiency is low.

FIG. 2 is a schematic diagram of an interface of another AR application in the prior art. The AR application is illustrated in fig. 2 as a short video class application that provides a "virtual object" prop, and the virtual object is illustrated in fig. 2 as a "cat". Referring to a and b in fig. 2, in the short video-based application program, a user may click on a "cat" prop when shooting a real scene, and the user clicks on an arbitrary position of the real scene, and the terminal may display a virtual cat at a corresponding position of the real scene displayed on the interface.

In the short video application program, the terminal does not need to scan a real scene in advance or acquire prior map information of the real scene, a virtual cat can be displayed at any position of the real scene, but a virtual object displayed by the terminal is limited by props, and the virtual cat can be generated only based on the styles of the props. For example, if the prop is in a style of one virtual cat, the user may use the prop to trigger the terminal to display one virtual cat, and if the prop is in a style of two virtual cats, the user may use the prop to trigger the terminal to display two virtual cats, which has a single interaction mode with the user. And the terminal does not support the user to click for multiple times and generate a virtual cat for multiple times, and in the use process of one prop, the terminal can only support the use of one prop. Accordingly, the man-machine interaction method in the AR scene shown in fig. 2 is single.

Based on the problems in the prior art, the embodiment of the application provides a man-machine interaction method in an AR scene, a terminal does not need to scan a plane in a real scene in advance or acquire prior map information of the real scene, and a plurality of virtual objects can be continuously generated and displayed in response to multiple triggers of a user, so that man-machine interaction modes can be enriched, and user experience is improved.

It should be understood that the embodiments of the present application do not limit the virtual objects, and as an example, the virtual objects may be: animals, figures, objects, etc.

The man-machine interaction method in the AR scene provided in the embodiment of the present application is applied to a terminal, which may be referred to as a User Equipment (UE) or the like, for example, the terminal may be a mobile phone, a tablet (portable android device, PAD), a personal digital assistant (personal digital assistant, PDA), a handheld device, a computing device or a wearable device with a wireless communication function, a Virtual Reality (VR) terminal device, an augmented reality (augmented reality, AR) terminal device, a terminal in an industrial control (industrial control), a terminal in a smart home (smart home), or the like.

The following describes a man-machine interaction method in an AR scene according to an embodiment of the present application with reference to a specific embodiment. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 3 is a flowchart illustrating an embodiment of a man-machine interaction method in an AR scene according to an embodiment of the present application. Referring to fig. 3, the man-machine interaction method in an AR scene provided by the embodiment of the present application may include:

S301, shooting a real scene, and displaying the shot real scene.

In one embodiment, the terminal may capture a real scene in response to an instruction of a user. In one embodiment, a real scene is opposed to a virtual scene, the real scene being a scene in reality. For example, if the user opens the AR application, or the user performs an operation, that is, is an instruction of the user, the terminal may be triggered to control the photographing device (such as a camera) of the terminal to start photographing the real scene. In one embodiment, the terminal may shoot a real scene before recording the video, or shoot a real scene during recording the video, and when the terminal shoots a real scene, the terminal may display the shot real scene on a screen of the terminal, or may say that the terminal may display a picture of the shot real scene on an interface of the terminal.

In fig. 4, taking a real scene captured before a terminal records a video as an example, referring to a in fig. 4, a user opens an AR application, and a screen preview box 41 and a capture control 42 are displayed on an interface of the terminal. In a scene where the user does not click on the photographing control 42, the terminal may photograph a real scene, and a picture of the photographed real scene is displayed in the picture preview box 41. Exemplary, as in a real scene, include a table. In one embodiment, it is conceivable that the user may click on the photographing control 42, the terminal may photograph the real scene, and the photographed real scene is displayed in the screen preview box 41.

S302, in response to a first operation of a user, a first virtual object is displayed in a real scene shot by the terminal.

The first operation may include, but is not limited to: the user operates the interface of the terminal, or the user speaks a section of voice, etc., and the embodiment of the application does not limit the first operation. The interface of the user operation terminal may include, but is not limited to: clicking, sliding, or the user does not contact the screen of the terminal to perform a gesture, etc.

In one embodiment, the first virtual object is preset. For example, if the first virtual object is a matchman, taking the first operation as an example of the user clicking on the interface of the terminal, and the user clicking on the position a on the interface, the terminal responds to the clicking operation of the user, and the matchman can be displayed at the corresponding position in the real scene corresponding to the position a, as shown in b in fig. 4. For example, taking a first operation as an example in which a user speaks a piece of voice, if the user speaks "display at the center position of the screen", the terminal may display a matchmaker at a corresponding position in a real scene corresponding to the "center position of the screen" in response to the user speaking the voice.

In one embodiment, the first virtual object is user-set.

For example, referring to a in fig. 5, when a user opens an AR application, at least one icon 43 of a virtual object to be selected is displayed on the interface of the terminal in addition to the screen preview frame 41 and the photographing control 42, and any one of the icons 43 of the virtual objects to be selected displayed on the user operation interface can determine the first virtual object. It should be appreciated that a in fig. 5 represents icons of virtual objects in text, such as the icon 43 of at least one virtual object to be selected may include: match, princess, cuboid, etc. In such an embodiment, the first operation may be: the user clicks the icon 43 of the virtual object to be selected and clicks the interface of the terminal, or the user presses the icon 43 of the virtual object to be selected without being put and slid to a position, or the user speaks a piece of voice, or the like.

For example, referring to b in fig. 5, the user may click on the "matchman" icon first and then click on the a-position on the interface of the terminal, and the terminal displays the matchman at the corresponding position in the real scene corresponding to the a-position in response to the first operation of the user. It should be appreciated that the user's hand with a first click in fig. 5 is represented by a dashed line and the user's hand with a later click is represented by a solid line.

For example, if the user can press the "matchman" icon without sliding to the a position on the interface of the terminal, the terminal may display the matchman at the corresponding position in the real scene corresponding to the a position in response to the first operation of the user. For example, if the user speaks "display matches at the center of the screen", the terminal may display matches at corresponding positions in the real scene corresponding to the "screen center" in response to the first operation of the user speaking the voice. In a scenario where the user adopts a voice manner, the terminal may recognize the voice from the user to determine whether the voice includes at least one virtual object to be selected displayed on the terminal interface, thereby determining the first virtual object. The terminal can use the virtual object to be selected in the voice as the first virtual object.

In one embodiment, the terminal may further perform edge detection on a shot image of the real scene in response to the first operation of the user, so as to display the first virtual object at a target position of the real scene shot by the terminal. For example, the table is included in the real scene, and the terminal may perform edge detection on the table to display the first virtual object on the table (at the target position), instead of displaying the first virtual object at other positions, so that the user can truly feel that the first virtual object is in the real scene, and the first virtual object is not inconsistent with the real scene. In one embodiment, the target location may be the ground, the object plane, the head of a person, the shoulders, etc.

S303, in response to a second operation of the user, displaying a second virtual object in a real scene displayed by the terminal.

The second operation may be the same as or different from the first operation, and the second operation may be described with reference to the first operation.

The second virtual object may be the same as or different from the first virtual object. In one embodiment, the second virtual object may be preset, or the first virtual object is set by the user, specifically, reference may be made to the description related to the first virtual object in S302.

For example, taking fig. 5 as an example, referring to c and d in fig. 5, after the terminal displays the matches at the corresponding positions of the real scene corresponding to the a position, the user may click on the "cuboid" icon again, and click on the B position, and then the terminal may display the cuboid at the corresponding positions in the real scene corresponding to the B position in response to the second operation of the user. In one embodiment, the real scene of displaying the second virtual object and the real scene of displaying the first virtual object are the same real scene, but the interface of the terminal may display the second virtual object and the first virtual object on one screen simultaneously for size reasons or shooting angle reasons, and it should be understood that in d of fig. 5, the second virtual object and the first virtual object are displayed on the same screen displayed by the terminal simultaneously for illustration.

In one embodiment, the corresponding position in the real scene corresponding to the B position may be referred to as a second position of the real scene photographed by the terminal. Accordingly, S303 may be replaced with: and responding to a second operation of the user, and displaying the second virtual object at a second position of the real scene shot by the terminal.

As in S302 and S303 above, the positions at which the virtual objects (the first virtual object and the second virtual object) are displayed are each set by the user, and in one embodiment, the positions at which the virtual objects are displayed may be preset.

The preset positions of the virtual object displayed on the interface of the terminal are an a position and a B position, and in a scene where the virtual object is preset, the user opens the AR application program, so that matches can be displayed at corresponding positions in a real scene corresponding to the a position and the B position.

In one embodiment, when the user performs the first operation and the user performs the second operation, the pictures of the real scene shot by the terminal may be the same or different, and may be referred to as the real scene shot by the terminal or the real scene displayed on the interface of the terminal.

For example, in a scenario in which the virtual object is set for the user, referring to a-c in fig. 6, the user opens the AR application, and the screen preview frame 41, the shooting control 42, the icon 43 of at least one virtual object to be selected, and the preset position 44 may be displayed on the interface of the terminal, where a in fig. 6 represents the preset positions (a position, B position) in boxes. The user can click on the 'matchman' icon, and the terminal responds to the operation of the user and can display the matchman at the corresponding position in the real scene corresponding to the position A and the position B. In an embodiment, the user may further sequentially select a first virtual object displayed at a corresponding position in the real scene corresponding to the a position and a second virtual object displayed at a corresponding position in the real scene corresponding to the B position, where the virtual objects displayed at different preset positions may be the same or different.

In a scenario where the position of displaying the virtual object is preset, S302 and S303 may be replaced with: and the terminal displays the first virtual object and the second virtual object at corresponding positions in the real scene corresponding to the preset positions of the interface.

In the embodiment of the application, the terminal does not need to acquire the scanning reality scene or the priori map information of the reality scene in advance, and can generate and display a plurality of virtual objects based on multiple operations of the user, so that the efficiency can be improved, the man-machine interaction mode can be enriched, and the user experience can be improved.

The embodiment shown in fig. 3 above teaches that the terminal can generate and display a plurality of virtual objects based on a plurality of operations of the user, and the following describes a specific process of generating and displaying a plurality of virtual objects by the terminal. In one embodiment, referring to fig. 7, S302 in the above embodiment may include:

s3021, in response to a first operation by a user, acquiring a first pose of the terminal and a two-dimensional coordinate of a first position in an image coordinate system.

The terminal responds to a first operation of a user, and the first pose of the terminal can be acquired. It should be understood that a sensor for detecting the pose of the terminal may be provided in the terminal, and the sensor may include, but is not limited to: acceleration sensor, angular velocity sensor, gravity detection sensor, etc. The terminal may acquire inertial measurement (inertial measurement unit, IMU) data of the terminal based on the acceleration sensor and the angular velocity sensor, and the terminal may acquire gravitational axis data of the terminal based on the gravitational detection sensor, and further, the terminal may acquire the first pose of the terminal based on the IMU data and the gravitational axis data.

The first position is used for displaying the position of the first virtual object at the corresponding position of the real scene, and the first position can be preset or set by a user, and it is understood that the first position is a position on an interface of the terminal. The picture shot by the terminal can be regarded as a plane, and then two-dimensional coordinates of the first position (a preset position or a position set by a user) in an image coordinate system are obtained. For example, if the lower left corner of the interface of the terminal can be taken as the origin of the two-dimensional coordinate system, the terminal can acquire the two-dimensional coordinate of the first position in the image coordinate system, and the setting of the image coordinate system is not limited in the embodiment of the application. In one embodiment, the two-dimensional coordinates of the first position in the image coordinate system may also be referred to as: the first position is a two-dimensional coordinate in a picture of a real scene photographed by the terminal.

S3022, tracking the first location.

In one embodiment, the manner in which the terminal tracks the first location may include, but is not limited to: feature point method, optical flow method, image block tracking method based on deep learning, etc. In one embodiment, the terminal tracking the first location may be understood as: the terminal tracks the image block corresponding to the first position.

The feature point method refers to that the terminal tracks the image block corresponding to the first position according to the features of the image block corresponding to the first position. The terminal may acquire the characteristics of the image block in a manner not limited to: scale-invariant feature transform (SIFT) algorithm, accelerated robust feature (speeded up robust features, SURF) algorithm, or fast orientation and rotation (oriented fast and rotated brief, ORB) algorithm.

Optical flow (optical flow) is the movement of an object caused by the movement of the object, scene or camera between two successive frames of images. The method is a two-dimensional vector field of an image in the translation process, and is a speed field for representing three-dimensional movement of an object point through a two-dimensional image, and image changes formed by movement in a tiny time interval are reflected to determine the movement direction and movement speed of the image point. Optical flow methods may include, but are not limited to: lukas-Kanade algorithm, KLT (Kanade-Lucas-Tomasi) tracking algorithm, and the like.

The image block tracking method based on deep learning can be understood as follows: and the terminal inputs the image block corresponding to the first position into a model which is trained on the basis of deep learning in advance, so as to track the image block corresponding to the first position. The embodiment of the application does not describe how to train the image block tracking model and the image block tracking model in detail.

In one embodiment, the first location may be a user-set location. Wherein the terminal may track the first location in response to a first operation by the user.

In one embodiment, the first position is preset. When the first virtual object is preset, S3021 may be replaced correspondingly with: the first location is tracked.

As an example, as shown in fig. 9, the terminal may track the image block showing the position of the virtual object, regardless of whether the position showing the virtual object is preset or set by the user. The image blocks corresponding to the first position are characterized in fig. 9 by the image blocks (pixel blocks) contained within the box.

S3023, obtaining the three-dimensional coordinates of the mapping point of the first position on the virtual plane according to the two-dimensional coordinates of the first position on the image coordinate system.

The virtual plane is preset. In the embodiment of the application, the plane in the real scene is not required to be scanned in advance to obtain the plane in the real scene, and the prior map information of the real scene is not required to be acquired, because the virtual plane is preset in the embodiment of the application, the virtual object can be displayed on the virtual plane. The virtual plane may be characterized by the three-dimensional coordinates of a point, and the normal vector to the virtual plane. Illustratively, the three-dimensional coordinates of the point corresponding to the virtual plane are (X0, Y0, Z1), and the normal vector of the virtual plane is vector n. It should be understood that the Z value of the virtual plane is a Z value in a camera coordinate system, where the camera coordinate system refers to a coordinate system with the optical center of the terminal as the origin, and in the embodiment of the present application, the position where the terminal is located is taken as an example where the optical center of the terminal is located, and in the embodiment of the present application, the position where the terminal is located may be understood as the pose of the terminal. In one embodiment, the mapping point of the first location on the virtual plane may be referred to as a first mapping point.

In one embodiment, the three-dimensional coordinates of the mapping point of the first position on the virtual plane can be understood as: the mapping point of the first position on the virtual plane is a three-dimensional coordinate in the camera coordinate system.

Fig. 10 is a schematic diagram of triangularization provided in the embodiment of the present application, and fig. 10 illustrates an example of a virtual plane and a plane of a screen displayed by a terminal. Referring to fig. 10, the first pose of the terminal is C1, it should be understood that C1 may be understood as a position of the optical center of the terminal when the pose of the terminal is the first pose, the position of the first position in the image coordinate system is a1, and the mapping point of the first position on the virtual plane may be understood as: c1 is directed toward the intersection A' of the ray of a1 and the virtual plane. The abscissa and the ordinate of the mapping point of the first position on the virtual plane are respectively the same as the abscissa and the ordinate of the two-dimensional coordinate of the first position, and the Z value of the mapping point of the first position on the virtual plane is the Z value of the virtual plane (preset value, such as the distance between the virtual plane and the plane where the picture displayed by the terminal is located). Illustratively, the two-dimensional coordinates of the first position in the image coordinate system are (X1, Y1), the Z value of the virtual plane is Z1, and the three-dimensional coordinates of the mapping point of the first position on the virtual plane are (X1, Y1, Z1).

In one embodiment, the virtual planes may be at least one, and it should be understood that the at least one virtual plane that is preset may be referred to as a preset virtual plane set. For example, the embodiment of the present application may be illustrated by presetting 3 virtual planes, where the preset 3 virtual planes may include: a first virtual plane, a second virtual plane, and a third virtual plane. Illustratively, the first virtual plane is parallel to the ground (or horizontal plane), the second virtual plane, the third virtual plane are both perpendicular to the first virtual plane, and the second virtual plane is perpendicular to the third virtual plane, as shown in fig. 8.

In this embodiment, after obtaining the two-dimensional coordinates of the first position in the image coordinate system, the terminal may obtain the rays of the first pose C1 to a1 as shown in fig. 10, and the terminal may detect whether the rays have an intersection a 'with the first virtual plane, and if the intersection a' exists, the terminal may use the a 'as a mapping point of the first position on the first virtual plane to obtain the three-dimensional coordinates of the a'. If the ray has no intersection point with the first virtual plane, the terminal can detect whether the ray has an intersection point with the second virtual plane, and further acquire the three-dimensional coordinate of the intersection point. In one embodiment, the terminal may sequentially obtain, according to the priority of at least one virtual plane, whether the rays of the first pose C1 to a1 have an intersection with the virtual plane, so as to obtain the three-dimensional coordinate of the intersection. That is, in order to ensure that the rays of the first pose C1 to a1 have an intersection point with a preset virtual plane, so as to lay a foundation for generating a virtual object subsequently, in the embodiment of the present application, a plurality of virtual planes may be preset to generate a virtual object on a virtual plane where the rays have an intersection point with the virtual plane.

S3024, displaying the first virtual object at the three-dimensional coordinates of the mapping point on the virtual plane.

As shown in b of fig. 5, the terminal may display the first virtual object at three-dimensional coordinates of the mapping point on the virtual plane (e.g., three-dimensional coordinates of the mapping point of the a position on the virtual plane). In one embodiment, the center of the first virtual object is the mapping point, and the embodiment of the application does not limit the relative position of the first virtual object and the mapping point.

In the embodiment of the application, the terminal can send and display the three-dimensional coordinates of the mapping points of the first pose and the first position on the virtual plane, so as to display the first virtual object at the three-dimensional coordinates of the mapping points on the virtual plane. It should be understood that the sending may be understood as: the process of sending and displaying the first virtual object is omitted in the embodiment of the present application. In this embodiment, the mapping point of the first position on the virtual plane is the position where the first virtual object is displayed, i.e. the first position.

S3025, three-dimensional coordinates of the first location in the real scene are acquired.

The user can move the mobile phone in the process of shooting by using the terminal, the pose of the terminal is changed in real time, and correspondingly, the two-dimensional coordinates of the first position in the image coordinate system are also changed in real time. If all the virtual objects are displayed on the mapping points corresponding to the virtual plane, the user cannot feel the association between the virtual objects and the real scene, and cannot feel that the virtual objects are in the real scene. In the embodiment of the application, in order to enable a user to feel that the virtual object really exists in the real scene, the terminal can acquire the three-dimensional coordinate of the first position in the real scene, and then the virtual object is displayed at the three-dimensional coordinate of the first position of the picture shot by the terminal in the real scene. In one embodiment, the three-dimensional coordinates of the first location in the real scene may be referred to as: the first location is a three-dimensional coordinate in a world coordinate system.

In one embodiment, when the photographing device set in the terminal is a monocular photographing device, the terminal may triangulate the mapping point to obtain three-dimensional coordinates of the first location in the real scene. The following describes a process of triangulating the mapping points by the terminal:

The terminal can acquire the three-dimensional coordinate of the first position in the real scene according to the first pose, the two-dimensional coordinate of the first position in the image coordinate system when the first pose is, the second pose and the two-dimensional coordinate of the first position in the image coordinate system when the second pose is. It will be appreciated that the second pose is a pose subsequent to the first pose in that the terminal can track the first position and thus can obtain the two-dimensional coordinates of the first position in the image coordinate system at the time of the second pose.

Referring to fig. 10, the second pose of the terminal is C2, and the two-dimensional coordinate of the first position in the image coordinate system in the second pose is a2, and in the embodiment of the present application, the intersection point of the ray with C1 directed to a1 and the ray with C2 directed to a2 may be taken as the position of the first position in the real scene, and the three-dimensional coordinate of the intersection point a may be taken as the three-dimensional coordinate of the first position in the real scene. It can be understood that, on the premise that the first pose C1, the second pose C2 and the two-dimensional coordinates a1 and a2 in the two pictures of the terminal are known, the three-dimensional coordinates of the intersection point a can be obtained by combining an internal parameter matrix of the terminal in a triangulation (triangulation) manner.

The specific way of triangularization is as follows, assuming that the coordinates of the camera coordinate system of the intersection point a in the first pose C1 are (x 1, y1, z 1), the coordinates of the camera coordinate system in the second pose C2 are (x 2, y2, z 2), the two-dimensional coordinates a1 of the first position in the image coordinate system in the first pose are (u 1, v 1), the two-dimensional coordinates a2 of the first position in the image coordinate system in the second pose are (u 2, v 2), and the internal parameter matrix K of the terminal, then according to the imaging process of the camera, equation 1 can be obtained:

In the above formula 1, d1 and d2 are depths in the camera coordinate system when the intersection point a is in the first pose C1 and the second pose C2, respectively. Then, according to the first pose C1 and the second pose C2 of the terminal, a rotation matrix R and a translation matrix t of the camera coordinate system of the first pose C1 transformed to the camera coordinate system of the second pose C2 can be obtained by calculation, and then equation 2 can be obtained:

substituting equation 1 into equation 2 yields equation 3:

equation 3 can be written as equation 4:

cd=t equation 4

Wherein the specific form of the coefficient C and the unknown D is shown in formula 5:

the unknown quantity D, that is, the depth of the intersection point a in the camera coordinate system at the first pose C1 and the second pose C2, can be solved by using the formulas 4 and 5, and after the depth is found, the three-dimensional coordinates of the intersection point a in the camera coordinate system at the times of C1 and C2 can be calculated according to the formula 1, and then the coordinates of the camera coordinate system can be converted into the coordinates in the real scene coordinate system by using the first pose C1 of the intersection point a, so that the three-dimensional coordinates of the intersection point a can be obtained. In one embodiment, the real scene coordinate system may be understood as a world coordinate system.

The specific process of triangulating the mapping points is described, and the conditions of triangulating the mapping points by the terminal are described below:

In one embodiment, the pose of the terminal is changed in real time, the terminal may obtain a distance between the pose of the terminal and the first pose, and triangulate the mapping point in response to the distance between the pose of the terminal and the first pose being greater than or equal to a preset distance. In connection with fig. 10, the terminal triangulates the mapping points in response to the distance between the second pose and the first pose being greater than or equal to a preset distance (the distance between C2 and C1 being greater than or equal to the preset distance).

In one embodiment, in the process of shooting a real scene by the terminal, images of continuous multiframes can be obtained, and the terminal can perform triangularization processing on the mapping points in response to the number of frames of the shot images reaching a preset number of frames. Or in this embodiment, on the premise that the number of frames of the image shot in the unit duration of the terminal is known, the terminal may also perform the triangulation processing on the mapping points in response to the shot duration reaching the preset duration.

In one embodiment, after the terminal acquires the first pose, the mapping points may be triangulated in combination with the pose of the terminal, and the terminal may obtain the three-dimensional coordinates of the first position in the real scene in response to successful triangularization. Conditions under which triangularization is successful may include, but are not limited to:

1. At least two images can track the image block corresponding to the first position.

2. The first position obtained by triangulation is not null in X, Y and Z in three-dimensional coordinates in a real scene.

3. The three-dimensional coordinate of the first position obtained by triangulation in the real scene is in front of the terminal (or the shooting device in the terminal), that is, the Z value of the three-dimensional coordinate of the first position in the camera coordinate system is larger than 0.

4. And carrying out triangularization on the image of the image block capable of being tracked to the first position to obtain a three-dimensional coordinate of the first position in the real scene, back-projecting the three-dimensional coordinate into a two-dimensional coordinate system in a picture to obtain a two-dimensional coordinate of a back-projected point, and calculating the distance between the back-projected point and the position on the actually observed terminal display picture, wherein the distance is smaller than or equal to a distance threshold.

In one embodiment, when the photographing device provided in the terminal is a binocular photographing device, the terminal may obtain, in the first pose, three-dimensional coordinates of the first position in the real scene according to the poses of the two cameras and two-dimensional coordinates of the first position in the pictures photographed by the two cameras, respectively, without obtaining, in the second pose, the three-dimensional coordinates of the first position in the real scene by means of the terminal.

S3026, obtaining a scaling corresponding to the first position according to the first pose, the three-dimensional coordinates of the mapping point of the first position on the virtual plane, and the three-dimensional coordinates of the first position in the real scene.

After the terminal obtains the three-dimensional coordinates of the first position in the real scene, if the virtual object is directly displayed at the three-dimensional coordinates of the first position in the real scene, if the terminal sends and displays the three-dimensional coordinates of the second pose and the first position in the real scene, the size of the virtual object can be changed instantaneously according to the principle of near-far-small, and the user experience is poor. For example, referring to fig. 10, because the three-dimensional coordinates of a1 in the real scene are farther from the first pose than the three-dimensional coordinates of the mapping points of a1 on the virtual plane, if the virtual object is displayed at a, the user can see that the virtual object on the screen is instantaneously smaller and the user experience is poor.

In the embodiment of the application, in order to enable a user to feel that a virtual object actually exists in a real scene (restore the position of the virtual object in the real scene), and in order to ensure that the size of the virtual object does not cause trouble to the user due to abrupt change, the terminal can acquire the scaling corresponding to the first position according to the three-dimensional coordinates of the mapping point of the first pose and the first position on the virtual plane and the three-dimensional coordinates of the first position in the real scene, and scale the three-dimensional coordinates of the second pose and the first position in the real scene according to the scaling.

Referring to fig. 10, the terminal may determine a scaling, i.e., a quotient of the first distance and the second distance, according to a first distance between the first pose and a' and a second distance between the first pose and a.

And S3027, scaling the three-dimensional coordinates of the second pose and the first position in the real scene according to the scaling ratio to obtain scaled three-dimensional coordinates corresponding to the third pose and the first position respectively.

Referring to fig. 10, the terminal may scale a to a 'based on the scaling, where a' is the three-dimensional coordinates of the scaled first position in the real scene, i.e., the scaled three-dimensional coordinates corresponding to the first position. Correspondingly, the second pose C2 may be scaled to C2'a, where C2' a is the scaled second pose, i.e. the third pose. In this embodiment, the terminal may send the C2'A, A' to display the first virtual object. The three-dimensional coordinates of the second pose and the first position in the real scene are scaled at the same time, that is, the relative position between the virtual object and the terminal is the same as the relative position in the real scene, so that the user can feel that the virtual object actually exists in the real scene, that is, the relative position relation between A 'and C2' A can represent the relative position relation between A and C2, and therefore the terminal displays C2'A, A', and the position of the virtual object in the real scene can be restored. In addition, because the three-dimensional coordinate corresponding to the sent and displayed first position is A', the phenomenon of size mutation of the virtual object is avoided, and the user experience can be improved.

And S3028, displaying the first virtual object according to the third pose and the scaled three-dimensional coordinate corresponding to the first position.

It should be noted that, because the relative positional relationship between a 'and C2' a can represent the relative positional relationship between a and C2, the size of the displayed first virtual object does not change suddenly according to the scaled third pose (C2 'a) and the scaled three-dimensional coordinate (a') corresponding to the first position, and the user does not feel the processing procedure of the terminal, as shown in b in fig. 5, and still sees the first virtual object in the real scene corresponding to the a position.

Similarly, in the foregoing embodiment S303, the process of displaying the second virtual object by the terminal may refer to the description related to displaying the first virtual object by the terminal, and accordingly, S303 may include:

s3031, in response to a second operation of the user, a fourth pose of the terminal and two-dimensional coordinates of the second position in the image coordinate system are acquired.

It is to be understood that S3031-S3038 may refer to the relevant descriptions in S3021-S3028. The second location may be set in advance or by a user with reference to the associated description of the first location.

In one embodiment, the fourth pose is the same as the first pose if the position at which the virtual object is displayed is preset. In one embodiment, if the position where the virtual object is displayed is set by the user, the fourth pose is a pose of the terminal when the user sets the second position, and the fourth pose may be the same as or different from the first pose. In the following embodiments, the case where the terminal positions when the user sets the first position and the second position are the same, that is, the case where the fourth position is the same as the first position, will be described.

S3032, tracking the second position.

S3033, according to the two-dimensional coordinates of the second position, the three-dimensional coordinates of the mapping point of the second position on the virtual plane are obtained.

Referring to fig. 11A, when the terminal is in the first pose C1, the position of the second position in the screen is shown as b1, and accordingly, the mapping point of the second position on the virtual plane is: and C1 is directed to an intersection point B 'of the ray of B1 and the virtual plane, and the three-dimensional coordinate of the mapping point of the second position on the virtual plane is the three-dimensional coordinate of B'. In one embodiment, the mapping point of the second location on the virtual plane may be referred to as a second mapping point. The three-dimensional coordinates of the mapped point of the second location on the virtual plane may be referred to as: the second map points are three-dimensional coordinates in the camera coordinate system.

S3034, displaying the second virtual object at the three-dimensional coordinates of the mapping point on the virtual plane.

It should be understood that the embodiment of the present application may display the second virtual object on the screen on which the first virtual object has been displayed. In one embodiment, the user may see the first object or the second object in a screen displaying the second virtual object. In one embodiment, the screen displaying the first virtual object and the screen displaying the second virtual object may be different, but the sum (panoramic screen) of the screen displaying the first virtual object and the screen displaying the second virtual object is the same real scene photographed by the terminal.

And S3035, acquiring the three-dimensional coordinates of the second position in the real scene.

Referring to fig. 11A, when the terminal is in the second position C2, the position of the second position in the screen is B2, and in the embodiment of the present application, the intersection point of the ray with C1 facing B1 and the ray with C2 facing B2 may be taken as the position of the second position in the real scene, and the three-dimensional coordinate of the intersection point B may be taken as the three-dimensional coordinate of the second position in the real scene. In one embodiment, the three-dimensional coordinates of the second location in the real scene may be referred to as: the second location is a three-dimensional coordinate in the world coordinate system.

S3036, according to the fourth pose, the three-dimensional coordinates of the mapping point of the second position on the virtual plane and the three-dimensional coordinates of the second position in the real scene, the scaling corresponding to the second position is obtained.

Referring to fig. 11A, the terminal may determine a scaling corresponding to the second position, i.e., a quotient of the third distance and the fourth distance, according to a third distance between the first poses C1 and B' and a fourth distance between the first poses C1 and B.

S3037, scaling the three-dimensional coordinates of the second pose and the second position in the real scene according to the scaling ratio to obtain a fifth pose and scaled three-dimensional coordinates corresponding to the second position.

Referring to fig. 11A, the terminal may scale B to B ', B' being the three-dimensional coordinates of the scaled second position in the real scene, that is, the scaled three-dimensional coordinates corresponding to the second position, based on the scaling ratio corresponding to the second position, and correspondingly may scale the second pose C2 to C2'B, where C2' B is the scaled second pose, that is, the fifth pose. In this embodiment, the terminal may send the C2'B, B' to display the second virtual object. The relative position relationship between B 'and C2' B can represent the relative position relationship between B and C2, so that the terminal sends and displays C2'B, B', which can restore the position of the virtual object in the real scene.

In fig. 11A, the case where the virtual plane corresponding to the first position and the virtual plane corresponding to the second position are the same is described as an example, and in one embodiment, because in the embodiment of the present application, multiple virtual planes may be preset, where the virtual plane corresponding to the first position and the virtual plane corresponding to the second position are different, in this case, the terminal obtains the three-dimensional coordinates of the mapping point of the second position on the virtual plane, obtains the three-dimensional coordinates of the second position in the real scene, and obtains the scaling ratio corresponding to the second position, scales the three-dimensional coordinates of the second position and the second position in the real scene, and may refer to the manner in fig. 11A, and unlike fig. 11A, the virtual plane adopted by the terminal is not the virtual plane corresponding to the first position, but is the virtual plane corresponding to the second position, as shown in fig. 11B. In one embodiment, if the virtual plane corresponding to the first position is a first virtual plane, the virtual plane corresponding to the second position is a second virtual plane, and the positional relationship between the first virtual plane and the second virtual plane in fig. 11B is illustrated.

S3038, displaying the second virtual object according to the fifth pose and the scaled three-dimensional coordinate corresponding to the second position.

The terminal may send and display the scaled three-dimensional coordinates (B ') corresponding to the fifth pose (C2' B) and the second position, so as to display the second virtual object.

In the embodiment of the application, the virtual plane is preset, and the virtual object can be displayed on the preset virtual plane, so that the plane in the real scene is not required to be obtained by scanning the real scene in advance, the prior map information of the real scene is also not required to be obtained, the response speed is high, and the efficiency is high. In addition, based on the three-dimensional coordinates of the first position and the second position in the real scene, the virtual object is displayed at the corresponding position in the real scene, so that the user feels that the virtual object really exists in the real scene, and based on the scaling corresponding to different positions, the phenomenon of abrupt change of the size of the virtual object is avoided, and the user experience is improved.

In the above embodiment, when the terminal displays the first virtual object, the third pose (C2 'a) and the scaled three-dimensional coordinate (a') corresponding to the first position may be displayed, and when the second virtual object is displayed, the fifth pose (C2 'B) and the scaled three-dimensional coordinate (B') corresponding to the second position may be displayed, so that the scaled three-dimensional coordinate corresponding to each position corresponds to the pose of one terminal, and thus the calculation amount is large and the speed is slow when the terminal displays the first virtual object and the second virtual object in the second pose.

In order to reduce the calculation amount of the terminal and improve the efficiency of displaying the first virtual object and the second virtual object by the terminal, in one embodiment, when the terminal is in the second position, the scaled three-dimensional coordinate (A ') corresponding to the first position and the scaled three-dimensional coordinate (B') corresponding to the second position can be unified under one terminal position, so that when the terminal displays the first virtual object and the second virtual object, only one terminal position and a plurality of scaled three-dimensional coordinates for displaying the virtual objects need to be displayed, and the speed of rendering and displaying the virtual objects by the terminal can be improved.

In one embodiment, referring to FIG. 12, S3028 and S3038 described above may be replaced with S3028A-S3029A:

and S3028A, unifying the scaled three-dimensional coordinate corresponding to the first position and the scaled three-dimensional coordinate corresponding to the second position under the third pose according to the third pose and the fifth pose to obtain the scaled three-dimensional coordinate corresponding to the first position under the third pose and the scaled three-dimensional coordinate corresponding to the second position.

In one embodiment, the terminal may use the three-dimensional coordinate system in which the third pose is located as a unified coordinate system, that is, unify the scaled three-dimensional coordinate corresponding to the first position and the scaled three-dimensional coordinate corresponding to the second position in the third pose. Referring to fig. 11A, the terminal may translate the fifth pose to the third pose, e.g., the terminal may translate the fifth pose C2'B to the third pose C2' a along a line of connection between the first pose C1 and the second pose C2. Accordingly, because the scaled three-dimensional coordinates corresponding to the first position are obtained according to the same scaling scale as the third pose, the scaled three-dimensional coordinates corresponding to the first position in the third pose are unchanged. Since the fifth pose C2'B translates to the third pose C2' a, the terminal may translate to B "in a direction parallel to" C2'B toward C2' a "on the virtual plane, a distance of translation equal to the distance between C2'B and C2' a. That is, C2'B, C2' A, B 'and B' may form a parallelogram. Correspondingly, the scaled three-dimensional coordinate corresponding to the second position in the third pose is the three-dimensional coordinate of B'.

In the embodiment of the application, the terminal can acquire the three-dimensional coordinate of B 'according to the relative position between the fifth pose C2' B and the third pose C2'A and the three-dimensional coordinate of B'.

In an embodiment, the terminal may further use the fifth pose as a unified pose, and translate the third pose to the fifth pose, so as to obtain the scaled three-dimensional coordinate corresponding to the first position under the fifth pose, and send and display the scaled three-dimensional coordinate corresponding to the second position.

In one embodiment, the preset positions may be multiple (not only two), or the user may set multiple positions, and in this embodiment, the terminal may obtain scaled three-dimensional coordinates corresponding to each position and the pose of the terminal. Referring to fig. 13, in order to reduce the amount of computation when the terminal renders the virtual object and to increase the speed of displaying the virtual object, the terminal may convert each anchor pose and the pose of the corresponding terminal into a plurality of anchor poses in one terminal pose. It should be understood that each location (e.g., a-location, B-location) is characterized by an anchor point in fig. 13, and the scaled three-dimensional coordinates corresponding to each location are characterized by an anchor point pose.

And S3029A, displaying the first virtual object and the second virtual object according to the third pose, the scaled three-dimensional coordinate corresponding to the first position under the third pose, and the scaled three-dimensional coordinate corresponding to the second position.

In this embodiment, when the terminal is in the second pose, the third pose (C2 'a), the scaled three-dimensional coordinate (a') corresponding to the first position, and the scaled three-dimensional coordinate (B ") corresponding to the second position in the third pose may be displayed to display the first virtual object and the second virtual object.

In the embodiment of the application, because the terminal displays the virtual object at the scaled three-dimensional coordinates corresponding to the first position and the second position, when the terminal converts the photographed real scene picture and then returns to the original real scene picture, the user can still see the first virtual object and the second virtual object on the interface of the terminal.

As illustrated in a of fig. 14, the real scene image photographed by the terminal includes a table, and a notebook computer is located on the right side of the table (the notebook computer is not illustrated in a of fig. 14 because the terminal photographs the table), the terminal displays a first virtual object "matchman" at the scaled three-dimensional coordinates corresponding to the a-position, and displays a second virtual object "cuboid" at the scaled three-dimensional coordinates corresponding to the B-position. The user holds the terminal to move to the right for shooting, and accordingly, a notebook computer is included in a real scene picture shot by the terminal, and a table is not included in the real scene picture, as shown in b in fig. 14, because the first virtual object and the second virtual object are anchored on the table in the real scene, and because the table is not included in the picture shown in b in fig. 14, the user cannot see the first virtual object and the second virtual object in the picture. In the screen shown in b in fig. 14, the user may also perform the first operation and/or the second operation as above to trigger the terminal to display other virtual objects in the screen, as shown in c in fig. 14, the terminal may display a third virtual object "matchman" in the screen in response to the user's operation.

In this example, when the user holds the terminal to move the photographing leftward, and re-photographs the screen including the table, the user may see the first virtual object at the scaled three-dimensional coordinates corresponding to the a position and see the second virtual object at the scaled three-dimensional coordinates corresponding to the B position, as shown by d in fig. 14.

It should be understood that, in the embodiment of the present application, because the terminal displays the virtual object at the scaled three-dimensional coordinate corresponding to the a position (or other positions), when the terminal converts the photographed real scene frame, the virtual object is still at the scaled three-dimensional coordinate corresponding to the a position (or other positions), and will not change with the change of the photographed frame, and when the terminal converts the terminal to the original real scene frame again, the user can still see the virtual object displayed at the corresponding scaled three-dimensional coordinate.

In one embodiment, the user may also interact with virtual objects displayed on the terminal. For example, the user can select, delete, move, zoom, etc. the virtual object, and the user can interact with the virtual object in a voice manner, etc.

The user may select a virtual object. For example, if the user clicks on a virtual object displayed on the terminal, the virtual object may be selected. As shown in a of fig. 15, the terminal displays a matchman and a rectangular parallelepiped on a screen displayed. The user clicks on the matchman and the terminal may display a box 151 to circle the matchman, indicating that the matchman is selected, as shown by b in fig. 15, in response to the clicking operation of the user.

The user may delete the virtual object. For example, a virtual object may be deleted if the user presses the virtual object displayed on the terminal for a long time. As the user presses the matchmaker displayed on the terminal long, the terminal may display a delete control 152 in the upper right corner of the matchmaker, as shown at c in fig. 15. The user clicks the delete control 152 and the terminal deletes the matchman, as shown by d in fig. 15, and only the cuboid is displayed on the terminal and the matchman is deleted.

The user may move the virtual object. For example, a virtual object displayed on a terminal may be moved, such as by a user pressing and dragging the virtual object for a long time. If the user presses the matchman displayed on the terminal for a long time and drags the matchman to another position C, the terminal can display the matchman at position C.

The user may scale the virtual object. For example, if the user uses a two-finger virtual object, the virtual object may be scaled. If a user places two fingers at a matchmaker displayed on the interface of the terminal, the two fingers are close to the matchmaker, and the two fingers are far away from the matchmaker, so that the matchmaker can be enlarged.

It should be understood that the operations when the user interacts with the virtual object are all illustrated as examples, and the embodiments of the present application do not limit the specific operations of the user.

In one embodiment, the user may also interact with the virtual object in a voice manner.

For example, the user may say "select all matches", and the terminal may select all matches in the current frame in response to the voice, e.g., the terminal may display a box 151 in the current frame to enclose the matches, indicating that the matches were selected, as shown by b in fig. 15. For example, the terminal may respond to the voice, may select matches in all pictures, as shown in a and b in fig. 14, both pictures including matches, may respond to the voice, may display a panorama picture including "table and notebook computer", and display a box 151 on the picture to enclose each match in the picture, as shown in e in fig. 15.

In one embodiment, the user may also change the properties of the virtual object in a voice manner. Attributes of the virtual object may include, but are not limited to: position, size, shape, color, motion, expression, etc. For example, as the user may say: turning the cuboid to red, the terminal adjusts the cuboid displayed in the screen to red in response to the voice.

The above describes a scenario in which a user can interact with a virtual object in a voice manner, and the embodiment of the application does not limit how the user interacts with the virtual object in a voice manner.

In the embodiment of the application, the user can interact with the virtual object displayed on the terminal, enriches the interaction mode of the user and the virtual object, and can improve the user experience.

In one embodiment, referring to fig. 16, the terminal may include: the system comprises an input data module, a multi-anchor point real-time tracking and managing module, an output result module, a rendering generation module and a user interaction module.

And the input data module is used for collecting data of interaction between a user and the terminal, shot picture data, IMU data, gravity axis data and the like. The multi-anchor real-time tracking and management module is used for executing S3021-S3023, S3025-S3027, S3031-S3033 and S3035-S3037 in the above embodiments. And an output result module for executing S3028A to S3029A in the above embodiment. And the rendering generation module is used for rendering and displaying the virtual object. And the user interaction module is used for realizing the interaction between the user and the virtual object displayed on the terminal. Wherein the steps performed by the modules may be simplified as shown in fig. 17.

In summary, the flow chart of the human-computer interaction method in the AR scene provided by the embodiment of the present application may be simplified to be shown in fig. 18.

In an embodiment, taking an example of displaying an identifier of a virtual object to be selected on an interface of a terminal as an illustration, the identifier of the virtual object to be selected may refer to a description related to an icon 43 of at least one virtual object to be selected, and in this embodiment, referring to fig. 19, a man-machine interaction method in an AR scene provided by an embodiment of the present application may include:

s1901, shooting a real scene, and displaying the shot real scene on an interface of the terminal, wherein the interface of the terminal also displays the identification of the virtual object to be selected.

S1901 may refer to the related description of S301. In the embodiment of the application, the identifier of the virtual object to be selected is also displayed on the interface of the terminal.

S1902, in response to an operation of the user for identification of the first virtual object, displaying the first virtual object in a real scene photographed by the terminal.

The user operates the identifiers of different virtual objects, and the terminal can be triggered to display the virtual objects at the corresponding positions of the real scene shot by the terminal. It should be noted that, in the embodiment of the present application, the terminal may display the virtual object at the corresponding position of the real scene shot by the terminal, and when the picture shot by the terminal is switched, the virtual object may not be displayed in the real scene shot by the terminal, but when the picture shot by the terminal is switched back to the real scene displayed with the virtual object, the terminal may display the virtual object in the shot real scene, which may be described with reference to fig. 14.

Wherein S1902 may refer to the related description in S302.

S1903, in response to the user' S operation of the identification of the second virtual object, displaying the second virtual object in the real scene in which the first virtual object has been displayed.

Wherein S1903 may refer to the related description in S302. S1902-S1903 may also refer to the drawing shown in fig. 5.

The implementation principle and technical effects of the embodiment of the present application are similar to those of the above embodiment, and are not repeated here.

Fig. 20 is a schematic structural diagram of a terminal according to an embodiment of the present application. Referring to fig. 20, the electronic device 2000 may include: a photographing module 2001, a display module 2002, and a processing module 2003. In one embodiment, the capturing module 2001 may be included in the input data module, the display module 2002 is included in the rendering generation module, and the processing module 2003 may include a multi-anchor real-time tracking and management module, an output result module, and a user interaction module.

In one embodiment, the capturing module 2001 is configured to capture a real scene.

The display module 2002 is configured to display a photographed real scene on an interface of a terminal, where an identifier of a virtual object to be selected is also displayed on the interface, and in response to an operation of a user on the identifier of a first virtual object, display the first virtual object in the real scene photographed by the terminal, and in response to an operation of the user on the identifier of a second virtual object, display the second virtual object in the real scene on which the first virtual object is displayed.

In a possible implementation manner, the processing module 2003 is configured to obtain a first pose of the terminal in response to an operation of the user on the identification of the first virtual object; acquiring a first mapping point of the first position on a first virtual plane, wherein the first position is a preset position on the interface or a position determined by the user on the interface; and acquiring the three-dimensional coordinates of the first mapping point in a camera coordinate system.

The display module 2002 is specifically configured to display the first virtual object in a real scene shot by the terminal according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system.

In one possible implementation, the processing module 2003 is specifically configured to acquire two-dimensional coordinates of the first location in an image coordinate system; and taking an intersection point of the ray from the first pose to the two-dimensional coordinate and the first virtual plane as the first mapping point.

In a possible implementation manner, the first virtual plane is included in a preset virtual plane set, and the processing module 2003 is further configured to obtain an intersection point of the ray from the first pose to the two-dimensional coordinate and other virtual planes in the preset virtual plane set if there is no intersection point between the ray from the first pose to the two-dimensional coordinate and the first virtual plane; and taking the intersection point of the virtual plane and other virtual planes in the preset virtual plane set as the first mapping point.

In a possible implementation manner, the processing module 2003 is further configured to track an image block corresponding to the first position; and when the terminal is in the second pose, acquiring the two-dimensional coordinates of the first position in the image coordinate system.

The display module 2002 is further configured to display the first virtual object in a real scene photographed by the terminal according to the first pose, a two-dimensional coordinate of the first position in the image coordinate system when the first pose is performed, the second pose, and a two-dimensional coordinate of the first position in the image coordinate system when the second pose is performed.

In a possible implementation manner, the processing module 2003 is specifically configured to obtain the three-dimensional coordinate of the first position in the world coordinate system according to the first pose, the two-dimensional coordinate of the first position in the image coordinate system in the first pose, the second pose, and the two-dimensional coordinate of the first position in the image coordinate system in the second pose; according to a first distance and a second distance, obtaining a scaling corresponding to the first position, wherein the first distance is as follows: a distance from the first pose to a three-dimensional coordinate of the first mapping point in the camera coordinate system, wherein the second distance is: a distance from the first pose to a three-dimensional coordinate of the first position in the world coordinate system; and respectively scaling the three-dimensional coordinates of the second pose and the first position in the world coordinate system according to the scaling ratio corresponding to the first position to obtain scaled three-dimensional coordinates corresponding to the third pose and the first position.

The display module 2002 is further configured to display the first virtual object in the real scene shot by the terminal according to the third pose and the scaled three-dimensional coordinate corresponding to the first position.

In a possible implementation manner, the processing module 2003 is specifically configured to scale the three-dimensional coordinate of the first location in the world coordinate system to the three-dimensional coordinate of the first mapping point in the camera coordinate system, where the scaled three-dimensional coordinate of the first location is the same as the three-dimensional coordinate of the first mapping point in the camera coordinate system.

In a possible implementation manner, the terminal is in the first pose when the user performs the operation of identifying the second virtual object, and the processing module 2003 is specifically configured to obtain, in response to the operation of the user on the identification of the second virtual object, a second mapping point of the second position on the second virtual plane, where the second position is a preset position on the interface or a position determined by the user on the interface; and acquiring the three-dimensional coordinates of the second mapping point in the camera coordinate system.

The display module 2002 is further configured to display the second virtual object in a real scene in which the first virtual object is displayed according to the first pose and the three-dimensional coordinates of the second mapping point in the camera coordinate system.

In a possible implementation manner, when the terminal is in the second position, the processing module 2003 is specifically configured to scale, according to a scaling ratio corresponding to the second position, three-dimensional coordinates of the second position and the second position in the world coordinate system, so as to obtain scaled three-dimensional coordinates corresponding to the second position and the fifth position, where the three-dimensional coordinates of the second position in the world coordinate system are: based on the first pose, the two-dimensional coordinates of the second position in the image coordinate system at the first pose, the second pose, and the two-dimensional coordinates of the second position in the image coordinate system at the second pose.

The display module 2002 is specifically configured to display the second virtual object in the real scene in which the first virtual object is displayed according to the fifth pose and the scaled three-dimensional coordinate corresponding to the second position.

In one possible implementation, the processing module 2003 is specifically configured to translate the fifth pose to the third pose; and translating the scaled three-dimensional coordinate corresponding to the second position according to the distance and the direction from the fifth pose to the third pose, so as to obtain the scaled three-dimensional coordinate corresponding to the second position under the third pose.

The display module 2002 is specifically configured to display the second virtual object in the real scene in which the first virtual object is displayed according to the third pose and the scaled three-dimensional coordinate corresponding to the second position in the third pose.

The terminal provided by the embodiment of the application can execute the steps in the embodiment, can realize the technical effects in the embodiment, and can refer to the related description in the embodiment.

In an embodiment, referring to fig. 21, an embodiment of the present application further provides an electronic device, where the electronic device may be a terminal as described in the foregoing embodiment, and the electronic device may include: the processor 2101 (e.g., a CPU), the memory 2102. Memory 2102 may include a high-speed random-access memory (RAM) or may further include a non-volatile memory (NVM), such as at least one disk memory, in which various instructions may be stored in memory 2102 for performing various processing functions and implementing method steps of the present application.

Optionally, the electronic device according to the present application may further include: a power supply 2103, a communication bus 2104, a communication port 2105, and a display 2106. The communication port 2105 is used to enable connection communication between the electronic device and other peripheral devices. In an embodiment of the application, memory 2102 is used to store computer executable program code, which includes instructions; when the processor 2101 executes the instructions, the instructions cause the processor 2101 of the electronic device to perform the actions in the above-described method embodiments, and the implementation principle and technical effects are similar, and are not described herein again. The display 2106 is used for displaying a real scene photographed by the terminal and displaying a virtual object in the real scene.

It should be noted that the modules or components described in the above embodiments may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integrated circuit, ASIC), or one or more microprocessors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program code, such as a controller. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The term "plurality" herein refers to two or more. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship; in the formula, the character "/" indicates that the front and rear associated objects are a "division" relationship. In addition, it should be understood that in the description of the present application, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not for indicating or implying any relative importance or order.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

It should be understood that, in the embodiment of the present application, the sequence number of each process does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Claims

1. A human-machine interaction method in an augmented reality AR scene, comprising:

shooting a real scene, and displaying the shot real scene on an interface of the terminal, wherein the interface is also displayed with an identification of a virtual object to be selected;

responding to the operation of the user on the identification of a first virtual object, and displaying the first virtual object in a real scene shot by the terminal;

and responding to the operation of the user on the identification of the second virtual object, and displaying the second virtual object in the real scene on which the first virtual object is displayed.

2. The method according to claim 1, wherein the displaying the first virtual object in the real scene photographed by the terminal in response to the user's operation of the identification of the first virtual object includes:

responding to the operation of the user on the identification of the first virtual object, and acquiring a first pose of the terminal;

acquiring a first mapping point of a first position on a first virtual plane, wherein the first position is a preset position on the interface or a position determined by the user on the interface;

acquiring a three-dimensional coordinate of the first mapping point in a camera coordinate system;

And displaying the first virtual object in the real scene shot by the terminal according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system.

3. The method of claim 2, wherein the obtaining a first mapping point of the first location on the first virtual plane comprises:

acquiring a two-dimensional coordinate of the first position in an image coordinate system;

and taking an intersection point of the ray from the first pose to the two-dimensional coordinate and the first virtual plane as the first mapping point.

4. The method of claim 3, wherein the first virtual plane is included in a set of preset virtual planes, the method further comprising:

if the ray from the first pose to the two-dimensional coordinate has no intersection point with the first virtual plane, acquiring intersection points of the ray from the first pose to the two-dimensional coordinate and other virtual planes in the preset virtual plane set;

and taking the intersection point of the virtual plane and other virtual planes in the preset virtual plane set as the first mapping point.

5. The method according to claim 3 or 4, wherein after the displaying the first virtual object in the real scene photographed by the terminal according to the first pose and the three-dimensional coordinates of the first mapping point in the camera coordinate system, further comprises:

Tracking the image block corresponding to the first position;

when the terminal is in the second pose, acquiring a two-dimensional coordinate of the first position in the image coordinate system;

and displaying the first virtual object in a real scene shot by the terminal according to the first pose, the two-dimensional coordinate of the first position in the image coordinate system when the first pose, the second pose and the two-dimensional coordinate of the first position in the image coordinate system when the second pose.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the distance between the second pose and the first pose is greater than or equal to a distance threshold; and/or the number of the groups of groups,

the frame number of the image shot by the terminal in the process of moving from the first pose to the second pose is larger than or equal to a preset frame number; and/or the number of the groups of groups,

the duration of the terminal moving from the first pose to the second pose is longer than a preset duration; and/or the number of the groups of groups,

and the second pose is the pose of the terminal when the first mapping point is successfully triangulated.

7. The method according to claim 5 or 6, wherein the displaying the first virtual object in the real scene photographed by the terminal according to the first pose, the two-dimensional coordinates of the first position in the image coordinate system at the first pose, the second pose, and the two-dimensional coordinates of the first position in the image coordinate system at the second pose comprises:

Acquiring a three-dimensional coordinate of the first position in a world coordinate system according to the first pose, a two-dimensional coordinate of the first position in the image coordinate system when the first pose is, the second pose and a two-dimensional coordinate of the first position in the image coordinate system when the second pose is;

according to a first distance and a second distance, obtaining a scaling corresponding to the first position, wherein the first distance is as follows: the distance from the first pose to the first mapping point is as follows: a distance from the first pose to a three-dimensional coordinate of the first position in the world coordinate system;

according to the scaling corresponding to the first position, scaling the three-dimensional coordinates of the second pose and the first position in the world coordinate system respectively to obtain scaled three-dimensional coordinates corresponding to the third pose and the first position;

and displaying the first virtual object in the real scene shot by the terminal according to the third pose and the scaled three-dimensional coordinate corresponding to the first position.

8. The method of claim 7, wherein scaling the three-dimensional coordinates of the first location in the world coordinate system according to the scaling corresponding to the first location comprises:

And scaling the three-dimensional coordinate of the first position in the world coordinate system to the three-dimensional coordinate of the first mapping point in the camera coordinate system, wherein the scaled three-dimensional coordinate corresponding to the first position is the same as the three-dimensional coordinate of the first mapping point in the camera coordinate system.

9. The method of any of claims 2-8, wherein the terminal is the first pose when the user performs the operation of identifying the second virtual object, the displaying the second virtual object in a real scene in which the first virtual object has been displayed in response to the user's operation of identifying the second virtual object comprising:

responding to the operation of the user on the identification of the second virtual object, and acquiring a second mapping point of a second position on a second virtual plane, wherein the second position is a preset position on the interface or a position determined by the user on the interface;

acquiring a three-dimensional coordinate of the second mapping point in the camera coordinate system;

and displaying the second virtual object in the real scene on which the first virtual object is displayed according to the three-dimensional coordinates of the first pose and the second mapping point in the camera coordinate system.

10. The method of claim 7, wherein the second pose is a pose after the user performs identification of the second virtual object, the method further comprising:

when the terminal is in the second position, respectively scaling three-dimensional coordinates of the second position in the world coordinate system according to scaling corresponding to the second position to obtain scaled three-dimensional coordinates of the second position in the world coordinate system, wherein the three-dimensional coordinates of the second position in the world coordinate system are as follows: acquiring based on the first pose, a two-dimensional coordinate of the second position in the image coordinate system at the first pose, the second pose, and a two-dimensional coordinate of the second position in the image coordinate system at the second pose;

and displaying the second virtual object in the real scene in which the first virtual object is displayed according to the fifth pose and the scaled three-dimensional coordinate corresponding to the second position.

11. The method of claim 10, wherein the displaying the second virtual object in the real scene in which the first virtual object has been displayed according to the fifth pose and the scaled three-dimensional coordinates corresponding to the second position comprises:

Translating the fifth pose to the third pose;

translating the scaled three-dimensional coordinate corresponding to the second position according to the distance and the direction from the fifth pose to the third pose, so as to obtain the scaled three-dimensional coordinate corresponding to the second position under the third pose;

and displaying the second virtual object in the real scene in which the first virtual object is displayed according to the third pose and the scaled three-dimensional coordinate corresponding to the second position under the third pose.

12. A human-machine interaction device in an augmented reality AR scene, comprising:

the shooting module is used for shooting a real scene;

a display module for:

displaying a photographed real scene on an interface of a terminal, wherein the interface also displays an identifier of a virtual object to be selected;

13. An electronic device, comprising: a processor and a memory;

The memory stores computer-executable instructions;

the processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-11.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or instructions, which when executed, implement the method of any of claims 1-11.

15. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the method of any of claims 1-11.