CN112947756A

CN112947756A - Content navigation method, device, system, computer equipment and storage medium

Info

Publication number: CN112947756A
Application number: CN202110235626.0A
Authority: CN
Inventors: 王子彬; 孙红亮; 揭志伟; 潘思霁
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2021-06-11

Abstract

The present disclosure provides a content navigation method, apparatus, system, computer device and storage medium, wherein the method comprises: acquiring a video frame image of a target scene where the AR equipment is located; sending the video frame image to a server; receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used for indicating navigation information in the target scene; and displaying the target navigation object in the video frame image and the target AR navigation content in an associated manner. Therefore, the navigation contents of all navigation objects in the target scene can be updated conveniently and timely.

Description

Content navigation method, device, system, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of Augmented Reality (AR) technologies, and in particular, to a content navigation method, apparatus, system, computer device, and storage medium.

Background

In scenes such as scenic spots and resort, the tourist is usually presented with the introduction information of each navigation object in the scene using a navigation board. The content of the navigation board is fixed, and the navigation board needs to be discarded and replaced by a new navigation board manually during updating or installation, so that the addition and the timely updating of navigation information are not facilitated.

Disclosure of Invention

The embodiment of the disclosure at least provides a content navigation method, a content navigation device, a content navigation system, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a content navigation method applied to an AR device for augmented display, where the content navigation method includes: acquiring a video frame image of a target scene where the AR equipment is located; sending the video frame image to a server; receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; and displaying the target navigation object in the video frame image and the target AR navigation content in an associated manner.

Therefore, by setting the AR navigation contents for each target navigation object in the target scene, when the user navigates, the target AR navigation contents are determined for the AR device from the AR navigation contents corresponding to the target scene directly based on the video frame image of the target scene where the AR device is located.

In an alternative embodiment, the target AR navigation content is determined based on a photographing angle change rate between a current video frame image and a history video frame image of the current video frame image; and aiming at any navigation object in the target scene, different shooting angle change rate intervals correspond to different AR navigation contents of the navigation object.

In this way, when the shooting angle change rate is smaller, the user is considered to have higher interest in the current object displayed in the video frame image, and therefore more and richer navigation contents can be selectively displayed for the user; and when the shooting angle change rate is large, the user is considered to have low interest in the current object displayed in the video frame image, so that only preliminary navigation content needs to be displayed for the user, the transmission of the target AR navigation content between the AR equipment and the server is reduced, and the data traffic consumed by transmission is saved.

In an optional embodiment, the method further comprises: responding to a trigger instruction input by a user to the target AR navigation content, and executing operation corresponding to the trigger instruction by using the target AR navigation content; the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

Therefore, the user can flexibly operate the target AR navigation content, and the use of the user is facilitated.

In an optional implementation manner, for a case that the trigger instruction includes the voice playing instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: playing voice content in the target AR navigation content, and/or broadcasting character content in the target AR navigation content through voice; for the case that the trigger instruction includes the sharing instruction, the executing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: generating sharing information based on the target AR navigation content, and issuing the sharing information to a target sharing platform corresponding to the sharing instruction; for a case that the trigger instruction includes the translation instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: translating the target AR navigation content into a target language; for a case that the trigger instruction includes the zoom-in display instruction, the performing, with the target AR navigation content, an operation corresponding to the trigger instruction includes: in an interactive interface of the AR device, amplifying and displaying the target AR navigation content; for a case that the trigger instruction includes the close instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: closing the target AR navigation content; for a case that the trigger instruction includes the evaluation instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: generating an evaluation request for the target AR navigation content based on the evaluation content input by the user, and sending the evaluation request to the server; receiving and displaying an evaluation result fed back by the server based on the evaluation request; the evaluation result includes target AR navigation content updated based on the evaluation content.

In an alternative embodiment, the associating shows the target navigation object and the target AR navigation content in the video frame image, comprising: determining a display position of the target AR navigation content based on a position of a target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position; or displaying the target AR navigation content at a preset position in an interactive interface of the AR device; or receiving first attitude information sent by the server; the first pose information is used to characterize a pose of the target AR navigation content relative to the AR device; displaying the target AR navigation content based on the first pose information; or determining the sight line position of the user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying the target AR navigation content corresponding to the target navigation object watched by the user.

Therefore, the display position of the target AR navigation content can be flexibly determined, and the situations that the navigation content shields the target object and the like are avoided.

In an optional implementation manner, in a case that the number of target AR navigation contents corresponding to the same target navigation object is multiple, the associating shows the target navigation object and the target AR navigation contents in the video frame image, including: sequentially displaying a plurality of target AR navigation contents according to a preset sequence of the target AR navigation contents; or, displaying a plurality of target AR navigation contents according to a duration that a user gazes at the target navigation object; or determining a display sequence of the target AR navigation contents according to the user attributes of the user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

Therefore, the display sequence of the target AR navigation content is determined for different users in a targeted manner, the target AR navigation content is more targeted to each user, and the display of the target AR navigation content is more flexible.

In a second aspect, an embodiment of the present disclosure provides a content navigation method applied to a server, where the content navigation method includes: receiving a video frame image sent by Augmented Reality (AR) equipment; determining second position and orientation information of the AR equipment in a scene coordinate system corresponding to a target scene based on the video frame image; determining target AR navigation content based on the second pose information; and sending the target AR navigation content to the AR device.

In an optional embodiment, after determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene, the method further includes: determining a target navigation object included in the video frame image based on the video frame image; determining third posture information of AR navigation content corresponding to the target navigation object in the scene coordinate system; the determining target AR navigation content based on the second pose information includes: and determining the target AR navigation content based on the second posture information and the third posture information.

In an optional embodiment, the determining target AR navigation content based on the second pose information includes: determining a plurality of alternative AR navigation contents based on the second pose information; determining a shooting angle change rate between the current video frame image and a historical video frame image of the current video frame image; determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections; and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content from the multiple alternative AR navigation contents.

In an optional embodiment, the method further comprises: determining first posture information based on the second posture information and the third posture information; the first pose information characterizes a pose of the target AR navigation content relative to the AR device; sending the first pose information to the AR device.

In an optional embodiment, the determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene includes: performing key point identification on the video frame image to obtain a first key point in the video frame image; and determining a second key point matched with the first key point from a high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in the scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional implementation manner, after performing the keypoint identification on the video frame image to obtain the first keypoint in the video frame image, the method further includes: determining a target pixel point corresponding to the first key point in the video frame image; determining a two-dimensional coordinate value of the target pixel point under an image coordinate system corresponding to the video frame image; the determining, based on the three-dimensional coordinate value of the second keypoint in the scene coordinate system, second pose information of the AR device in the scene coordinate system includes: and determining second position and posture information of the AR equipment in the scene coordinate system based on the two-dimensional coordinate value and the three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional embodiment, the determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene includes: performing scene key point identification on the video frame image, and determining a target pixel point corresponding to at least one scene key point in the video frame image; predicting the depth value of the video frame image, and determining the depth value of the target pixel point in the video frame image; and determining second position and posture information of the AR equipment based on the depth value of the target pixel point.

In an alternative embodiment, the representation of the target AR navigation content comprises at least one of: text, animation, images, and speech.

In a third aspect, an embodiment of the present disclosure further provides a content navigation apparatus, applied to an augmented reality AR device, including: the acquisition module is used for acquiring a video frame image of a target scene where the AR equipment is located; the first sending module is used for sending the video frame image to a server; a first receiving module, configured to receive target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; and the association module is used for associating and displaying the target navigation object and the target AR navigation content in the video frame image.

In an optional embodiment, the apparatus further includes a response module, configured to: responding to a trigger instruction input by a user to the target AR navigation content, and executing operation corresponding to the trigger instruction by using the target AR navigation content; the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

In an optional implementation manner, for a case that the trigger instruction includes the voice playing instruction, when performing an operation corresponding to the trigger instruction by using the target AR navigation content, the response module is configured to: playing voice content in the target AR navigation content, and/or broadcasting character content in the target AR navigation content through voice; for a case that the trigger instruction includes the sharing instruction, when the response module executes an operation corresponding to the trigger instruction by using the target AR navigation content, the response module is configured to: generating sharing information based on the target AR navigation content, and issuing the sharing information to a target sharing platform corresponding to the sharing instruction; for a case that the trigger instruction includes the translation instruction, the response module, when performing an operation corresponding to the trigger instruction using the target AR navigation content, is configured to: translating the target AR navigation content into a target language; for a case that the trigger instruction includes the zoom-in display instruction, the response module, when performing an operation corresponding to the trigger instruction using the target AR navigation content, is configured to: in an interactive interface of the AR device, amplifying and displaying the target AR navigation content; for a case that the trigger instruction includes the close instruction, the response module, when performing an operation corresponding to the trigger instruction using the target AR navigation content, is configured to: closing the target AR navigation content; for a case that the trigger instruction includes the evaluation instruction, the response module, when performing an operation corresponding to the trigger instruction using the target AR navigation content, is configured to: generating an evaluation request for the target AR navigation content based on the evaluation content input by the user, and sending the evaluation request to the server; receiving and displaying an evaluation result fed back by the server based on the evaluation request; the evaluation result includes target AR navigation content updated based on the evaluation content.

In an alternative embodiment, the association module, when presenting the target navigation object and the target AR navigation content in the video frame image in association, is configured to: determining a display position of the target AR navigation content based on a position of a target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position; or displaying the target AR navigation content at a preset position in an interactive interface of the AR device; or receiving first attitude information sent by the server; the first pose information is used to characterize a pose of the target AR navigation content relative to the AR device; displaying the target AR navigation content based on the first pose information; or determining the sight line position of the user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying the target AR navigation content corresponding to the target navigation object watched by the user.

In an optional implementation manner, in a case that the number of target AR navigation contents corresponding to the same target navigation object is multiple, when the target navigation object in the video frame image and the target AR navigation contents are displayed in an associated manner, the associating module is configured to: sequentially displaying a plurality of target AR navigation contents according to a preset sequence of the target AR navigation contents; or, displaying a plurality of target AR navigation contents according to a duration that a user gazes at the target navigation object; or determining a display sequence of the target AR navigation contents according to the user attributes of the user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

In a fourth aspect, an embodiment of the present disclosure further provides a content navigation apparatus, applied to a server, including: the second receiving module is used for receiving the video frame image sent by the AR equipment; a first determining module, configured to determine, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene; a second determination module for determining target AR navigation content based on the second pose information; and the second sending module is used for sending the target AR navigation content to the AR equipment.

In an optional implementation manner, the apparatus further includes a third determining module, configured to: determining a target navigation object included in the video frame image based on the video frame image; determining third posture information of AR navigation content corresponding to the target navigation object in the scene coordinate system; the determining target AR navigation content based on the second pose information includes: and determining the target AR navigation content based on the second posture information and the third posture information.

In an alternative embodiment, the third determining module, when determining the target AR navigation content based on the second pose information, is configured to: determining a plurality of alternative AR navigation contents based on the second pose information; determining a shooting angle change rate between the current video frame image and a historical video frame image of the current video frame image; determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections; and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content from the multiple alternative AR navigation contents.

In an optional implementation manner, the system further includes a third sending module, configured to: determining first posture information based on the second posture information and the third posture information; the first pose information characterizes a pose of the target AR navigation content relative to the AR device; sending the first pose information to the AR device.

In an optional embodiment, the first determining module, when determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene, is configured to: performing key point identification on the video frame image to obtain a first key point in the video frame image; and determining a second key point matched with the first key point from a high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in the scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional implementation manner, the apparatus further includes a fourth determining module, configured to: determining a target pixel point corresponding to the first key point in the video frame image; determining a two-dimensional coordinate value of the target pixel point under an image coordinate system corresponding to the video frame image; the determining, based on the three-dimensional coordinate value of the second keypoint in the scene coordinate system, second pose information of the AR device in the scene coordinate system includes: and determining second position and posture information of the AR equipment in the scene coordinate system based on the two-dimensional coordinate value and the three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional embodiment, the fourth determining module, when determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene, is configured to: performing scene key point identification on the video frame image, and determining a target pixel point corresponding to at least one scene key point in the video frame image; predicting the depth value of the video frame image, and determining the depth value of the target pixel point in the video frame image; and determining second position and posture information of the AR equipment based on the depth value of the target pixel point.

In a fifth aspect, an optional implementation manner of the present disclosure further provides a content navigation system, including an AR device and a server; wherein the content of the first and second substances,

the AR equipment is used for collecting a video frame image of a target scene where the AR equipment is located; sending the video frame image to a server; receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; associating and displaying a target navigation object and the target AR navigation content in the video frame image;

the server is used for receiving the video frame image sent by the AR equipment; determining second position and orientation information of the AR equipment in a scene coordinate system corresponding to a target scene based on the video frame image; determining target AR navigation content based on the second pose information; and sending the target AR navigation content to the AR device.

In a sixth aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform any one of the possible embodiments of the first aspect or the second aspect.

In a seventh aspect, alternative implementations of the present disclosure also provide a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform any one of the possible implementations of the first or second aspect.

For the description of the effects of the content navigation apparatus, the content navigation system, the computer device, and the computer-readable storage medium, reference is made to the description of the content navigation method, which is not repeated herein.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 illustrates a flow chart of a content navigation method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of another content navigation method provided by an embodiment of the present disclosure;

fig. 3 is a diagram illustrating a specific example of display content of an AR device when target AR navigation content and a target AR navigation object are displayed in association in a content navigation method provided by an embodiment of the present disclosure;

fig. 4 is a diagram illustrating a specific example of display content of an AR device when target AR navigation content is displayed in an interactive interface according to first pose information in a content navigation method provided by an embodiment of the present disclosure;

fig. 5 is a diagram illustrating a specific example of content displayed by an AR device when text navigation content is displayed in a text box form in a content navigation method provided by an embodiment of the present disclosure;

FIG. 6 illustrates a schematic diagram of a content navigation apparatus provided by an embodiment of the present disclosure;

FIG. 7 illustrates a schematic diagram of another content navigation apparatus provided by an embodiment of the present disclosure;

FIG. 8 illustrates a schematic diagram of a content navigation system provided by an embodiment of the present disclosure;

fig. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Research shows that the current content navigation method generally comprises the steps of setting a physical indicator in a scene for visiting or visiting, marking information such as characters and images in the indicator, and obtaining navigation information by a user through the indicator. The content navigation mode information is relatively fixed and can not be flexibly changed. When hope to add new content, perhaps hope to change current content, need the manual work to withdraw original sign, then change new sign, this kind of mode is unfavorable for the addition and the timely renewal of guide's information.

Based on the research, the disclosure provides a content navigation method, device, system, computer device and storage medium, by acquiring a video frame image of a scene where an AR device is located, sending the video frame image to a server, receiving and displaying target AR navigation content returned by the server based on the video frame image, and abandoning a traditional multiple replacement mode, the navigation content of each navigation object in the target scene can be updated in time.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, a content navigation method disclosed in the embodiments of the present disclosure is first described in detail, and an execution subject of the content navigation method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, where the computer device includes, for example: an AR device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the content navigation method may be implemented by a processor invoking computer readable instructions stored in a memory.

The following describes a content navigation method provided by the embodiment of the present disclosure, with a server as an execution subject.

Referring to fig. 1, a flowchart of a content navigation method provided by an embodiment of the present disclosure is shown, where the method includes steps S101 to S104, where:

s101: receiving a video frame image sent by Augmented Reality (AR) equipment;

s102: determining second position and posture information of the AR equipment under a scene coordinate system corresponding to the target scene based on the video frame image;

s103: determining target AR navigation content based on the second pose information;

s104: the target AR navigation content is sent to the AR device.

The embodiment of the disclosure determines the second pose information of the AR device in the target scene by processing the received video frame image shot by the AR device, thereby determining the target AR navigation content according to the second pose information and sending the target AR navigation content to the AR device. Because the AR navigation content is set for each target navigation object in the target scene, when the user navigates, the target AR navigation content is determined for the AR device from each AR navigation content corresponding to the target scene directly based on the video frame image of the target scene where the AR device is located.

The following describes the details of S101 to S104.

For the above S101, the AR device may, for example, acquire an image of a target scene by using an image acquisition apparatus, and may complete displaying of AR information. On the AR device, for example, an image acquisition device and an interactive interface may be provided; after the image acquisition device is started, images in the shooting view field of the image acquisition device can be acquired in real time to form a video stream; and displaying each frame of image in the video stream in real time in the interactive interface.

The video frame images sent by the AR device may include, for example, video frame images acquired by a user in a target scene; wherein, the target scene includes, but is not limited to, at least one of the following: outdoor scenes such as scenic spots, amusement parks and sports grounds, and closed places such as exhibition halls, offices, restaurants and houses.

When the user carries the AR equipment to be located in the target scene, the target scene can be shot by using the image acquisition device in the AR equipment to obtain video streams, and the AR equipment can sample video frame images from the video streams and send the video frame images to the server.

For the above S102, the server may determine the specific position and posture of the AR device in the target scene by using the video frame image, that is, the second posture information of the AR device in the target scene.

Specifically, the second pose information includes a first three-dimensional coordinate value of an optical center of an image acquisition device disposed on the AR device in a scene coordinate system, and optical axis orientation information of the image acquisition device; the optical axis orientation information may include, for example: the method comprises the following steps that the deflection angle and the pitch angle of an optical axis of image acquisition equipment in a scene coordinate system established based on a target scene are obtained; or the optical axis orientation information is, for example, a vector in the scene coordinate system.

In one possible implementation, when determining the second pose information of the AR device in the scene coordinate system corresponding to the target scene based on the video frame image acquired by the AR device, for example, the following manner may be adopted: performing key point identification on the video frame image to obtain a first key point in the video frame image; and determining a second key point matched with the first key point from the high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in a scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

In a specific implementation, the first key point includes, for example, at least one of: the key points of the contour information representing the contour of the object, the key points of the color block information representing the surface of the object and the key points of the texture change representing the surface of the object.

After the video frame image is subjected to key point identification to obtain a first key point in the video frame image, a matched second key point is determined from a pre-constructed high-precision three-dimensional map of a target scene. At this time, the object characterized by the determined second key point is the same object as the object characterized by the first key point. And then, the three-dimensional coordinate value of the second key point in the high-precision three-dimensional map is the three-dimensional coordinate value of the first key point in the high-precision three-dimensional map.

Here, the high-precision three-dimensional map of the target scene may be obtained by any one of the following methods, for example: simultaneous Localization and Mapping (SLAM) modeling, and Structure-From-Motion (SFM) modeling.

Illustratively, when a high-precision three-dimensional map of a target scene is constructed, a three-dimensional coordinate system is established by taking a preset coordinate point as an origin; the preset coordinate point can be a building coordinate point in a target scene or a coordinate point where camera equipment is located when a camera collects the target scene; the method comprises the steps that a camera collects video frame images, and a high-precision three-dimensional map of a target scene is constructed by tracking a sufficient number of key points in the video frame of the camera; the key points in the constructed high-precision three-dimensional map of the target scene also include the key point information of the object, namely the second key points.

Matching the first key point with a high-precision three-dimensional map of the target scene, determining a second key point, and reading a three-dimensional coordinate value (x) of the second key point in the high-precision three-dimensional map of the target scene₁,y₁,z₁). And then, determining second position and posture information of the AR equipment in a scene coordinate system based on the three-dimensional coordinate value of the second key point.

Specifically, when the second pose information of the AR device in the scene coordinate system is determined based on the three-dimensional coordinate value of the target second key point, the second pose information of the AR device in the high-precision three-dimensional map is recovered according to the three-dimensional coordinate value of the target second key point in the high-precision three-dimensional map, for example, by using a camera imaging principle.

When the second position and posture information of the AR equipment in the high-precision three-dimensional map is recovered by using the camera imaging principle, a camera coordinate system can be constructed by using the AR equipment; the origin of the camera coordinate system is a point where the optical center of the image acquisition equipment in the AR equipment is located; the z-axis is a straight line where the optical axis of the image acquisition device is located; the plane perpendicular to the optical axis and the optical center is the plane of the x axis and the y axis; the depth detection algorithm can be utilized to determine the depth value corresponding to each pixel point in the video frame image; after a target pixel point is determined in a video frame image, the depth value h of the target pixel point under a camera coordinate system can be obtained; namely, the three-dimensional coordinate value of the first key point corresponding to the target pixel point in the camera coordinate system can be obtained; and then, recovering the coordinate value of the origin of the camera coordinate system in the scene coordinate system, namely the position information of the AR equipment in the second attitude information of the scene coordinate system by using the three-dimensional coordinate value of the first key point in the camera coordinate system and the three-dimensional coordinate value of the first key point in the scene coordinate system, and determining the angle of the z axis in the scene coordinate system relative to each coordinate axis of the scene coordinate system by using the z axis of the camera coordinate system to obtain the attitude information of the AR equipment in the second attitude information of the scene coordinate system.

For example, the three-dimensional coordinate value of the target pixel point in the camera coordinate system is represented by (x)₂,y₂,h)。

Three-dimensional coordinate value (x) of second key point in scene coordinate system based on acquired target₁,y₁,z₁) And the three-dimensional coordinate value (x) of the determined target pixel point in the camera coordinate system₂,y₂H) according to the mapping relation (x)₁,y₁,z₁)→(x₂,y₂And h) determining second attitude information of the AR device in a scene coordinate system.

In another possible implementation, when determining the second pose information of the AR device in the scene coordinate system corresponding to the target scene based on the video frame image acquired by the AR device, the AR device may first perform scene key point identification on the video frame image, determine a target pixel point corresponding to at least one scene key point in the video frame image, perform depth value prediction on the video frame image, determine a depth value corresponding to each pixel point in the video frame image, and then determine the second pose information of the AR device based on the depth value corresponding to the target pixel point in each pixel point.

The scene key point may be a preset key point in the scene where the AR device is located, for example, a table corner, a table lamp, a pot plant, and the like, and the depth value of the target pixel point may be used to represent a distance between the scene key point corresponding to the target pixel point and the AR device.

The position coordinates of the scene key points in the scene coordinate system are preset and fixed, and the orientation information of the AR equipment in the scene coordinate system can be determined by determining the corresponding target pixel points of at least one scene key point in the video frame image; based on the depth value of the target pixel point corresponding to the scene key point, the pose information of the AR device in the scene coordinate system, that is, the second pose information of the AR device, can be determined.

For the above S103, the determined target AR navigation content is, for example, target AR content that can be displayed in the interactive interface of the AR device from preset AR content based on the second pose information.

In one possible implementation, when determining the target AR navigation content based on the second pose information, for example, the following manner may be adopted: determining a target navigation object included in the video frame image based on the video frame image; determining third posture information of AR navigation content corresponding to the target navigation object in a scene coordinate system; and determining target AR navigation content based on the second posture information and the third posture information.

Illustratively, the navigation objects may include objects within the target scene, including, for example, sights, museum exhibits, functional buildings, and the like. For at least part of the navigation objects within the target scene, AR navigation content corresponding to the objects within the target scene may be preset.

In a specific implementation, the target navigation object refers to at least a part of the navigation object included in the video frame image acquired by the AR device. The target navigation object may be obtained, for example, in the following manner: the method comprises the steps of carrying out key point identification on a video frame image to obtain a first key point, determining a target second key point matched with the first key point from second key points of a high-precision three-dimensional map of a target scene based on the first key point, and determining a navigation object to which the second key point belongs as a target navigation object.

Then, based on the target navigation object and the association between the target navigation object and the AR navigation content, the AR navigation content associated with the target navigation object is determined.

In one possible implementation, all AR navigation content associated with the target navigation object may be taken as target AR navigation content.

In another possible implementation, the target AR navigation content may be determined from all AR navigation content associated with the target navigation object based on certain filtering criteria.

For example, when AR navigation contents are set for the navigation object, a third pose information within the target scene may be determined for the AR navigation contents.

The third pose information of the AR navigation content in the target scene is, for example, the third pose information of the AR navigation content in the scene coordinate system, and the third pose information includes, for example: the AR navigation content is under the scene coordinate system, the three-dimensional coordinate value of the AR navigation content and the posture under the scene coordinate system.

And then after the target navigation object is determined, determining the target AR navigation content based on the second posture information and the third posture information of the AR navigation content corresponding to the target navigation object in the scene coordinate system.

In another embodiment of the present disclosure, after determining the target AR navigation content, the server may further determine the first pose information based on the second pose information and third pose information of the AR navigation content corresponding to the target navigation object in the scene coordinate system. The first position and posture information is used for determining a display position and a display posture of the AR device for displaying the target AR navigation content. Then, the server sends the first gesture information to the AR equipment, so that the AR equipment can display target AR content based on the first gesture information, and the AR navigation content can have a richer display form.

Or, the target AR bullet screen may be determined according to the shooting angle change rate, and specifically, the following method may be adopted: determining a plurality of alternative AR navigation contents based on the second pose information; determining a shooting angle change rate between a current video frame image and a historical video frame image of the current video frame image; determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections; and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content.

When determining a plurality of alternative AR navigation contents based on the second pose information, since the second pose information includes pose information of the AR device in the target scene, it is possible to determine a target navigation object that the AR device can shoot in the target scene using the second pose information, and to take the AR navigation contents corresponding to the target navigation object as the alternative AR navigation contents.

For example, in the case that the target navigation object is a landscape tower, the corresponding AR navigation content may include a name of a "landscape tower", and when displaying the AR navigation content of the target navigation object, the "landscape tower" is a more concise and generalized AR navigation content compared to other AR navigation contents, so when a user holds an AR device to photograph the target AR navigation object landscape tower, the AR navigation content such as the "landscape tower" may be only displayed when the user photographs the target AR navigation object landscape tower for a shorter time or photographs a main object that is not a subject of interest but is not the landscape tower.

In addition, when the target navigation object is a landscape tower, the corresponding AR navigation content may further include detailed introduction information that "the landscape tower is built in XXX years, and the designer who designed and built the landscape tower is XXXX … …", and when the user takes a picture of the landscape tower with the AR device, the user may display the detailed introduction information corresponding to the target navigation object while displaying the AR navigation content that is briefly summarized as "the landscape tower" when the shooting time is long or the landscape tower is taken as a subject object of interest.

Specifically, when determining the shooting angle change rate between the history video frame images corresponding to the current video frame image, it may be determined that the shooting angle of the user with respect to the target navigation object changes within a certain time interval, for example, 1 second or 2 seconds. A plurality of shooting angle change rate sections may also be provided, and taking the time interval as 1 second as an example, the angle change rate including shooting of the target navigation object may fall within a plurality of shooting angle change rate sections of [0 degrees/second, 10 degrees/second ], [10 degrees/second, 20 degrees/second ], … …, [170 degrees/second, 180 degrees/second ], for example. As the larger the shooting angle change rate is, the lower the attention of the user to the target navigation object is proved to be, for the target navigation object, the corresponding AR navigation content set for the shooting angle change rate interval with the larger numerical value such as [160 degrees/second, 170 degrees/second ], [170 degrees/second, 180 degrees/second ] can include concise and summarized AR navigation content, for example, "landscape tower"; for the shooting angle change rate interval with smaller values of [0 degree/second, 10 degrees/second ], [10 degrees/second, 20 degrees/second ], the corresponding AR navigation contents are set to include more detailed introduction information corresponding to the target navigation object, for example, "landscape tower built between XXX years, and designer designing the landscape tower is XXXX … …".

For the above S104, the server sends the determined target AR navigation content to the AR device, so that the AR device presents the target AR navigation content to the user.

The following describes a content navigation method provided by the embodiment of the present disclosure, taking an execution subject as an AR device as an example.

Referring to fig. 2, a flow chart of another content navigation method provided in the embodiment of the present disclosure is shown, the method includes:

s201: acquiring a video frame image of a target scene where the AR equipment is located;

s202: sending the video frame image to a server;

s203: receiving target AR navigation content returned by the server based on the video frame image; target AR navigation content for indicating navigation information in a target scene;

s204: and displaying the target navigation object and the target AR navigation content in the video frame image in an associated mode.

The detailed processes of S201 to S203 may refer to the embodiment corresponding to fig. 1, and are not described herein again.

In the above S204, when the target navigation object and the target AR navigation content in the video frame image are displayed in association with each other, any one of the following manners (a1) to (a4) may be adopted:

(a1) the method comprises the following steps And determining the display position of the target AR navigation content based on the position of the target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position.

In this way, the AR device may directly display the target AR navigation content and the target navigation object in association after determining the target AR navigation content.

Specifically, the AR device can recognize the target navigation object from the acquired video frame image, determine the position of the target navigation object in the video frame image, and then, based on the position, associate and display the corresponding target AR navigation content and the target navigation object.

When the target AR navigation content and the target AR navigation object are presented in association, for example, as shown in fig. 3, the determined target navigation object includes a restaurant, a bar, a coffee shop, and a building attraction; the restaurant, the bar and the coffee shop can be indicated by the target AR navigation contents with the pointing property respectively corresponding to 31, 32 and 33 in fig. 3; aiming at the architectural scenery spots, highlighted AR content can be added to the architectural scenery spots, a character AR navigation content 34 is associated, and the target AR navigation content 34 corresponding to the architectural scenery spots and the architectural scenery spots are displayed in an associated mode.

In addition, referring to the above-mentioned corresponding example of fig. 3, when the AR device presents the target AR navigation content, the target AR navigation content may be rendered on the interactive interface in a plurality of content presentation forms, for example, a presentation form in which the target navigation object is highlighted (a highlight or a different color aperture having a highlighting effect and surrounding the contour of the target navigation object), a presentation form in which the target navigation object is pointed by a certain pattern, and the like.

(a2) The method comprises the following steps And displaying the target AR navigation content at a preset position in an interactive interface of the AR device.

The preset position may include any area where the target AR navigation content may be displayed without blocking the captured video frame image, for example, the video frame image may be displayed in three quarters of the display space at the upper position on the interactive interface, and then the target AR navigation content may be displayed in one quarter of the display space at the lower position.

(a3) The method comprises the following steps Receiving first attitude information sent by a server; the first pose information is used for representing the pose of the target AR navigation content relative to the AR equipment; and displaying the target AR navigation content based on the first pose information.

At this time, the display position and the display posture of the target AR navigation content in the interactive interface of the AR device can be determined by utilizing the first posture information received from the server and the camera imaging principle; wherein the display gesture refers to a gesture of the target AR navigation content displayed in the interactive interface in the AR device. Specifically, when the user shoots the target navigation object using the AR device, the display posture of the corresponding target AR navigation content is changed according to the change of the shooting angle of the AR device to the target navigation object. The display position refers to a position of the target AR navigation content when rendered in the interactive interface.

As shown in fig. 4, a specific example of when the target AR navigation content is presented in the interactive interface according to the first gesture information is provided, in this example, 41 represents a display form of the target AR navigation content for the case where the target navigation object in the drawing is "building". The target AR navigation contents may be displayed in different directions according to the first pose information.

(a4) The method comprises the following steps Determining the sight line position of a user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying target AR navigation content corresponding to the target navigation object watched by the user.

Specifically, the eye tracking technology can be used to track the user's gaze, since subtle changes in the eyes can produce features that can be extracted when the person's eyes are looking in different directions, and then when the computer extracts this feature by image capture or scanning, the changes in the eyes are tracked in real time, and then the user's gaze location is determined to be the credential image. After determining the gaze location, the AR device may determine a target navigation object at which the user gazes from the gaze location. Then, the AR device may display the target AR navigation content corresponding to the target navigation object gazed by the user.

In another embodiment of the present disclosure, when displaying the target AR navigation content, the determined target AR navigation content may be simultaneously displayed in the interactive interface; in addition, when the target AR navigation content is presented by the AR device when the number of the target AR navigation contents is greater than 1, any one of the following methods (b1) to (b3) may be adopted, for example:

(b1) the method comprises the following steps And sequentially displaying the target AR navigation contents according to the preset sequence of the target AR navigation contents.

For example, the preset sequence may be determined in the following manner:

the method comprises the following steps: according to the first gesture information of the AR navigation contents, the distances between the target AR navigation contents with the number larger than 1 and the AR equipment are determined, and according to the sequencing of the distances from near to far, the preset sequence of the corresponding target AR navigation contents during display is determined.

Secondly, the step of: the key target AR navigation contents in the plurality of target AR navigation contents are predetermined, for example, a plurality of key target navigation objects in the target scene may be determined in advance, and the key target navigation objects may include target navigation objects such as "treasure of living being" in an exhibition hall, or "escape sign" requiring important attention. Then, a key target AR navigation content corresponding to the key target navigation object may be determined. At this time, the preset sequence is corresponding to the priority display of the key target AR navigation content, and then the display of other target AR navigation content.

(b2) The method comprises the following steps And displaying a plurality of target AR navigation contents according to the time length of the user watching the target navigation object.

For example, an eyeball tracking technology may be utilized to determine a target navigation object that a user continuously gazes at in a target scene, and take the target navigation object that continuously focuses as a key target navigation object, and preferentially display corresponding target AR navigation content. At this time, the preset sequence is corresponding to the priority display of the key target AR navigation content, and then the display of other target AR navigation content.

(b3) The method comprises the following steps Determining a display sequence of the target AR navigation contents according to user attributes of a user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

For example, user attributes of the user may be obtained first, for example, when the user uses the content navigation method, a user questionnaire may be filled in first, and information such as age, gender, interest bias, and the like may be included in the user questionnaire. After collecting the attribute information of the user, the preference of the user can be determined, for example, for user a, the preference of the user is determined to be preferred to modern art decoration exhibits, and for user B, the preference of the user is determined to be preferred to weapon exhibits with historical backgrounds. When the user a enters the target scene, it may be determined that when the user a is presented with the target AR navigation content, the presentation sequence preferentially presents the target AR navigation content corresponding to the modern art decoration exhibit in the target navigation object, and then presents the target AR navigation content corresponding to the other target navigation object. When the user B enters a target scene, the display sequence can be determined to preferentially display the target AR navigation contents corresponding to weapon exhibits with historical backgrounds in the target navigation objects when the target AR navigation contents are displayed for the user B, and then the target AR navigation contents corresponding to other target navigation objects are displayed.

In addition, the expression of the user when watching the target navigation object in the target scene can be determined, and then whether to preferentially push the corresponding target navigation content to the user or not is determined according to the recognized expression of the user on different target navigation objects. For example, when the user views the display sculpture in the target navigation object, it may be detected that the expression of the user is interesting, and the display sequence is to preferentially display the target AR navigation content corresponding to the target navigation object and then display the target AR navigation content corresponding to other target navigation objects.

The AR navigation content includes, for example, at least one of: text AR navigation content, animated AR navigation content, image AR navigation content, and voice AR navigation content. In a specific implementation, taking a venue as an example of a navigation object, each of the contents corresponds to navigation information preset for the venue, and may be, for example, venue introduction information, venue travel route indication information, venue history background introduction navigation information, venue history person introduction navigation information, and the like.

Taking the display of the character AR navigation content as an example, when the target AR navigation content is the character AR navigation content, a pop-up window containing character navigation information may be directly displayed in the AR device interaction interface, or displayed on the video frame image in the form of a text box according to the relative position relationship with the target navigation object.

Two specific examples of directly displaying the text AR navigation content are provided, as shown in fig. 3 and 5, for example, in which 34 in fig. 3 represents that the text AR navigation content is presented in the form of a text box, and 51 in fig. 5 represents that the AR navigation content of a pop-up window containing text navigation information is directly displayed.

In the navigation method provided in another embodiment of the present disclosure, the method may further include: and responding to the trigger of the user to the target AR navigation content, and executing the operation corresponding to the trigger by using the target AR navigation content. Wherein the trigger comprises at least one of: voice playing, sharing, screen capture, translation, enlarged display, screen recording, closing, and positioning.

Illustratively, when the AR device displays the target navigation object and the corresponding target AR navigation content in association, a corresponding trigger control for the target AR navigation content may be provided to the user, and when the user triggers the corresponding trigger control as needed, the AR device may directly perform a corresponding operation based on the triggering of the user.

In addition, the AR device may also respond to a trigger instruction input by the user to the target AR navigation content, and perform an operation corresponding to the trigger instruction using the target AR navigation content. Wherein the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

Specifically, under the condition that the trigger instruction includes a voice playing instruction, when the AR device executes an operation corresponding to the voice playing instruction by using the target AR navigation content, the AR device may play the voice content in the target AR navigation content, and/or broadcast the text content in the target AR navigation content by voice.

For example, when the target AR navigation content includes text navigation explanation information for a sight spot, the target AR navigation content may include text information for a piece of historical development or the like for the sight spot, for example. The AR equipment can determine corresponding voice information based on the text information in the target AR navigation content and play the voice information; or, the voice information is directly set, and after the user triggers the voice playing instruction, the AR device can directly play the set voice information.

Under the condition that the trigger instruction comprises the sharing instruction, when the AR device executes the operation corresponding to the sharing instruction by using the target AR navigation content, the sharing information can be generated based on the target AR navigation content, and the sharing information is issued to the target sharing platform corresponding to the sharing instruction.

For example, when the user inputs an operation corresponding to the sharing instruction for the target AR navigation content, the AR device may generate sharing information including content in the target AR navigation content based on the target AR navigation content. For example, when the target AR navigation content includes explanation information for a certain sculpture in a target scene, the explanation information and a captured image may be used to generate sharing information, and then the generated sharing information is issued to a target sharing platform corresponding to the sharing instruction. The target sharing platform is not limited to web pages or application software. In addition, after the user in the target sharing platform triggers the sharing information, the user can also view the detailed information in the sharing information and/or jump to other platforms capable of viewing the sharing information. Specifically, the determination may be performed according to actual situations, and details are not described herein.

In the case where the trigger instruction includes a translation instruction, the AR device may translate the target AR navigation content into the target language while performing an operation corresponding to the translation instruction using the target AR navigation content.

For example, since there may be visitors from around the world in a scenic spot, there may be a need for some visitors to have navigation content corresponding to different languages. In order to intuitively display the target AR navigation content on the interactive interface, the target AR navigation content in one language may be displayed on the interactive interface, and a translation trigger control capable of performing language translation is displayed on the interactive interface. After a user with translation requirements triggers the translation triggering control, a proper translation language can be selected, and after the AR equipment acquires the translation language selected by the user, the translated target AR navigation content is displayed in the interactive interface, so that the user with different language requirements can read the target AR navigation content on the interactive interface.

In a case that the trigger instruction includes a zoom-in display instruction, the AR device may zoom in and display the target AR navigation content in the interactive interface when performing an operation corresponding to the zoom-in instruction using the target AR navigation content.

Illustratively, when displaying a large number of exhibitions such as jewelry and the like and having a small volume, when the AR device photographs a plurality of jewelry, target AR navigation contents respectively determined for the plurality of jewelry can be displayed in the interactive interface, and in the case of a large number of displayable target AR navigation contents, the user can zoom in any one of the target AR navigation contents of interest, and then the AR device displays the zoomed-in target AR navigation contents on the interactive interface, so that the user can view the target AR navigation contents conveniently.

In addition, the target AR navigation content is amplified, so that the user with weak eyesight can conveniently check the target AR navigation content, and the experience of the user is further improved.

In a case that the trigger instruction includes a close instruction, the AR device may close the target AR navigation content in the interactive interface when performing an operation corresponding to the close instruction using the target AR navigation content.

For example, for a user who does not need to view the target AR navigation content, such as a tour guide visiting a target scene for multiple times, or a staff member in the target scene, the user may trigger a close trigger control on the interactive interface to close part of the target AR navigation content displayed on the interactive interface, or close all the target AR navigation content displayed on the interactive interface.

In a case where the trigger instruction includes an evaluation instruction, and the AR device performs an operation corresponding to the evaluation instruction using the target AR navigation content, an evaluation request for the target AR navigation content may be generated based on the evaluation content input by the user, and the evaluation request may be transmitted to the server. The evaluation content may include comment content to the target AR navigation content itself, such as "this navigation content is valuable and worth learning", or "there is a historical time error in the navigation content".

After receiving the evaluation request, the server may determine a corresponding evaluation result and send the evaluation result to the AR device. For example, after receiving the comment content of "this guide content is valuable and worth learning", the server can determine that the target AR guide content that the comment content reacts to has no problem through relevant auditors, so that the target AR guide content can be kept unchanged; after receiving the comment content of ' history time error ' in the navigation content ', the related auditors confirm that the target AR navigation content has the problem needing to be corrected, can withdraw the target AR navigation content, correct the target AR navigation content, and then release the target AR navigation content. Here, instead of the manual processing, the corresponding target AR navigation content may be processed according to the comment content using a model such as a convolutional neural network.

The above listed trigger instructions are merely examples, and the trigger instructions and the actual processing procedures corresponding to the trigger instructions are not limited herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, a content navigation apparatus corresponding to the content navigation method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure to solve the problem is similar to the content navigation method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.

Referring to fig. 6, a schematic diagram of a content navigation apparatus provided in an embodiment of the present disclosure is applied to an augmented reality AR device, where the apparatus includes: an acquisition module 61, a first sending module 62, a first receiving module 63 and an association module 64; wherein the content of the first and second substances,

the acquisition module 61 is configured to acquire a video frame image of a target scene where the AR device is located; a first sending module 62, configured to send the video frame image to a server; a first receiving module 63, configured to receive target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; and an association module 64, configured to associate and display the target navigation object and the target AR navigation content in the video frame image.

In an optional embodiment, the apparatus further includes a response module 65 configured to: responding to a trigger instruction input by a user to the target AR navigation content, and executing operation corresponding to the trigger instruction by using the target AR navigation content; the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

In an optional implementation manner, for a case that the trigger instruction includes the voice playing instruction, when performing an operation corresponding to the trigger instruction by using the target AR navigation content, the response module 65 is configured to: playing voice content in the target AR navigation content, and/or broadcasting character content in the target AR navigation content through voice; for the case that the trigger instruction includes a sharing instruction, the response module 65, when performing an operation corresponding to the trigger instruction by using the target AR navigation content, is configured to: generating sharing information based on the target AR navigation content, and issuing the sharing information to a target sharing platform corresponding to the sharing instruction; for the case that the trigger instruction includes the translation instruction, the response module 65, when performing an operation corresponding to the trigger instruction with the target AR navigation content, is configured to: translating the target AR navigation content into a target language; for the case that the trigger instruction includes the zoom-in display instruction, the response module 65, when performing an operation corresponding to the trigger instruction with the target AR navigation content, is configured to: in an interactive interface of the AR device, amplifying and displaying the target AR navigation content; for the case that the trigger instruction includes the close instruction, the response module 65, when performing an operation corresponding to the trigger instruction with the target AR navigation content, is configured to: closing the target AR navigation content; for the case that the trigger instruction includes the evaluation instruction, the response module 65, when performing an operation corresponding to the trigger instruction with the target AR navigation content, is configured to: generating an evaluation request for the target AR navigation content based on the evaluation content input by the user, and sending the evaluation request to the server; receiving and displaying an evaluation result fed back by the server based on the evaluation request; the evaluation result includes target AR navigation content updated based on the evaluation content.

In an alternative embodiment, the associating module 64, when associating the target navigation object and the target AR navigation content in the video frame image, is configured to: determining a display position of the target AR navigation content based on a position of a target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position; or displaying the target AR navigation content at a preset position in an interactive interface of the AR device; or receiving first attitude information sent by the server; the first pose information is used to characterize a pose of the target AR navigation content relative to the AR device; displaying the target AR navigation content based on the first pose information; or determining the sight line position of the user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying the target AR navigation content corresponding to the target navigation object watched by the user.

In an optional implementation manner, in a case that the number of the target AR navigation contents corresponding to the same target navigation object is multiple, when the target navigation object in the video frame image and the target AR navigation contents are displayed in an associated manner, the associating module 64 is configured to: sequentially displaying a plurality of target AR navigation contents according to a preset sequence of the target AR navigation contents; or, displaying a plurality of target AR navigation contents according to a duration that a user gazes at the target navigation object; or determining a display sequence of the target AR navigation contents according to the user attributes of the user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

Referring to fig. 7, a schematic diagram of another content navigation apparatus provided in an embodiment of the present disclosure is applied to a server, and the apparatus includes: a second receiving module 71, a first determining module 72, a second determining module 73, and a second transmitting module 74; wherein the content of the first and second substances,

a second receiving module 71, configured to receive a video frame image sent by an augmented reality AR device; a first determining module 72, configured to determine, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene; a second determining module 73, configured to determine target AR navigation content based on the second pose information; a second sending module 74, configured to send the target AR navigation content to the AR device.

In an optional embodiment, the third determining module 75 is further included to: determining a target navigation object included in the video frame image based on the video frame image; determining third posture information of AR navigation content corresponding to the target navigation object in the scene coordinate system; the determining target AR navigation content based on the second pose information includes: and determining the target AR navigation content based on the second posture information and the third posture information.

In an alternative embodiment, the third determining module 75, when determining the target AR navigation content based on the second pose information, is configured to: determining a plurality of alternative AR navigation contents based on the second pose information; determining a shooting angle change rate between the current video frame image and a historical video frame image of the current video frame image; determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections; and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content from the multiple alternative AR navigation contents.

In an optional implementation, the system further includes a third sending module 76, configured to: determining first posture information based on the second posture information and the third posture information; the first pose information characterizes a pose of the target AR navigation content relative to the AR device; sending the first pose information to the AR device.

In an optional embodiment, the first determining module 72, when determining, based on the video frame image, the second pose information of the AR device in a scene coordinate system corresponding to the target scene, is configured to: performing key point identification on the video frame image to obtain a first key point in the video frame image; and determining a second key point matched with the first key point from a high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in the scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional embodiment, the apparatus further includes a fourth determining module 77 configured to: determining a target pixel point corresponding to the first key point in the video frame image; determining a two-dimensional coordinate value of the target pixel point under an image coordinate system corresponding to the video frame image; the determining, based on the three-dimensional coordinate value of the second keypoint in the scene coordinate system, second pose information of the AR device in the scene coordinate system includes: and determining second position and posture information of the AR equipment in the scene coordinate system based on the two-dimensional coordinate value and the three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional embodiment, the fourth determining module 77, when determining, based on the video frame image, the second pose information of the AR device in a scene coordinate system corresponding to the target scene, is configured to: performing scene key point identification on the video frame image, and determining a target pixel point corresponding to at least one scene key point in the video frame image; predicting the depth value of the video frame image, and determining the depth value of the target pixel point in the video frame image; and determining second position and posture information of the AR equipment based on the depth value of the target pixel point.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

The embodiment of the disclosure also provides a content navigation system, which comprises the AR device and the server. Referring to fig. 8, a schematic diagram of a content navigation system provided by the embodiment of the present disclosure includes an AR device 82 held by a user 81, and a server 83; wherein the content of the first and second substances,

the AR device 82 is configured to acquire a video frame image of a target scene where the AR device is located; sending the video frame image to a server; receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; associating and displaying a target navigation object and the target AR navigation content in the video frame image;

the server 83 is configured to receive a video frame image sent by the augmented reality AR device; determining second position and orientation information of the AR equipment in a scene coordinate system corresponding to a target scene based on the video frame image; determining target AR navigation content based on the second pose information; and sending the target AR navigation content to the AR device.

In an optional embodiment, the AR device 82 is further configured to: responding to a trigger instruction input by a user to the target AR navigation content, and executing operation corresponding to the trigger instruction by using the target AR navigation content; the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

In an optional implementation manner, for a case that the trigger instruction includes the voice playing instruction, when performing an operation corresponding to the trigger instruction by using the target AR navigation content, the AR device 82 is configured to: playing voice content in the target AR navigation content, and/or broadcasting character content in the target AR navigation content through voice; for the case that the trigger instruction includes the sharing instruction, when the AR device 82 performs an operation corresponding to the trigger instruction by using the target AR navigation content, the AR device is configured to: generating sharing information based on the target AR navigation content, and issuing the sharing information to a target sharing platform corresponding to the sharing instruction; for the case that the trigger instruction includes the translation instruction, when the AR device 82 performs an operation corresponding to the trigger instruction by using the target AR navigation content, the AR device is configured to: translating the target AR navigation content into a target language; for the case that the trigger instruction includes the zoom-in display instruction, when performing an operation corresponding to the trigger instruction with the target AR navigation content, the AR device 82 is configured to: in the interactive interface of the AR device 82, magnifying and displaying the target AR navigation content; for the case that the trigger instruction includes the close instruction, when performing an operation corresponding to the trigger instruction with the target AR navigation content, the AR device 82 is configured to: closing the target AR navigation content; for the case that the trigger instruction includes the evaluation instruction, when performing an operation corresponding to the trigger instruction with the target AR navigation content, the AR device 82 is configured to: generating an evaluation request for the target AR navigation content based on the evaluation content input by the user, and sending the evaluation request to the server; receiving and displaying an evaluation result fed back by the server based on the evaluation request; the evaluation result includes target AR navigation content updated based on the evaluation content.

In an alternative embodiment, the AR device 82, when presenting the target navigation object and the target AR navigation content in the video frame image in association, is configured to: determining a display position of the target AR navigation content based on a position of a target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position; or displaying the target AR navigation content at a preset position in an interactive interface of the AR device 82; or receiving first attitude information sent by the server; the first pose information is used to characterize the pose of the target AR navigation content relative to the AR device 82; displaying the target AR navigation content based on the first pose information; or determining the sight line position of the user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying the target AR navigation content corresponding to the target navigation object watched by the user.

In an optional implementation manner, in a case that the number of target AR navigation contents corresponding to the same target navigation object is multiple, when the target navigation object in the video frame image and the target AR navigation contents are presented in association, the AR device 82 is configured to: sequentially displaying a plurality of target AR navigation contents according to a preset sequence of the target AR navigation contents; or, displaying a plurality of target AR navigation contents according to a duration that a user gazes at the target navigation object; or determining a display sequence of the target AR navigation contents according to the user attributes of the user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

In an optional embodiment, after determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to the target scene, the server 83 is further configured to: determining a target navigation object included in the video frame image based on the video frame image; determining third posture information of AR navigation content corresponding to the target navigation object in the scene coordinate system; the server 83, when determining the target AR navigation content based on the second pose information, is configured to: and determining target AR navigation content based on the second posture information and the third posture information.

In an alternative embodiment, the server 83, when determining the target AR navigation content based on the second pose information, is configured to: determining a plurality of alternative AR navigation contents based on the second pose information; determining a shooting angle change rate between the current video frame image and a historical video frame image of the current video frame image; determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections; and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content from the multiple alternative AR navigation contents.

In an optional embodiment, the server 83 is further configured to: determining first posture information based on the second posture information and the third posture information; the first pose information characterizes a pose of the target AR navigation content relative to the AR device; sending the first pose information to the AR device.

In an optional embodiment, the server 83, when determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to the target scene, is configured to: performing key point identification on the video frame image to obtain a first key point in the video frame image; and determining a second key point matched with the first key point from a high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in the scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional implementation manner, after performing the keypoint identification on the video frame image to obtain the first keypoint in the video frame image, the server 83 is further configured to: determining a target pixel point corresponding to the first key point in the video frame image; determining a two-dimensional coordinate value of the target pixel point under an image coordinate system corresponding to the video frame image; the server 83, when determining the second pose information of the AR device in the scene coordinate system based on the three-dimensional coordinate value of the second keypoint in the scene coordinate system, is configured to: and determining second position and posture information of the AR equipment in the scene coordinate system based on the two-dimensional coordinate value and the three-dimensional coordinate value of the second key point in the scene coordinate system.

In an optional embodiment, the server 83, when determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to the target scene, is configured to: performing scene key point identification on the video frame image, and determining a target pixel point corresponding to at least one scene key point in the video frame image; predicting the depth value of the video frame image, and determining the depth value of the target pixel point in the video frame image; and determining second position and posture information of the AR equipment based on the depth value of the target pixel point.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 9, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and the computer device includes:

a processor 91 and a memory 92; the memory 92 stores machine-readable instructions executable by the processor 91, the processor 91 being configured to execute the machine-readable instructions stored in the memory 92, the processor 91 performing the following steps when the machine-readable instructions are executed by the processor 91:

acquiring a video frame image of a target scene where the AR equipment is located; sending the video frame image to a server; receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene; and displaying the target navigation object in the video frame image and the target AR navigation content in an associated manner.

Alternatively, the processor 91 performs the following steps:

receiving a video frame image sent by AR equipment; determining second position and orientation information of the AR equipment in a scene coordinate system corresponding to a target scene based on the video frame image; determining target AR navigation content based on the second pose information; and sending the target AR navigation content to the AR device.

The memory 92 includes a memory 921 and an external memory 922; the memory 921 is also referred to as an internal memory, and temporarily stores operation data in the processor 91 and data exchanged with an external memory 922 such as a hard disk, and the processor 91 exchanges data with the external memory 922 through the memory 921.

The specific execution process of the above instruction may refer to the steps of the content navigation method described in the embodiments of the present disclosure, and details are not described here.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the content navigation method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the content navigation method provided in the embodiments of the present disclosure carries a program code, and instructions included in the program code may be used to execute the steps of the content navigation method in the above method embodiments, which may be specifically referred to in the above method embodiments and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A content navigation method applied to an Augmented Reality (AR) device, the content navigation method comprising:

acquiring a video frame image of a target scene where the AR equipment is located;

sending the video frame image to a server;

receiving target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene;

and displaying the target navigation object in the video frame image and the target AR navigation content in an associated manner.

2. The content navigation method according to claim 1, wherein the target AR navigation content is determined based on a photographing angle change rate between a current video frame image and a history video frame image of the current video frame image;

and aiming at any navigation object in the target scene, different shooting angle change rate intervals correspond to different AR navigation contents of the navigation object.

3. The content navigation method according to claim 1 or 2, further comprising:

responding to a trigger instruction input by a user to the target AR navigation content, and executing operation corresponding to the trigger instruction by using the target AR navigation content;

the triggering instruction comprises at least one of the following: the system comprises a voice playing instruction, a sharing instruction, a screenshot instruction, a translation instruction, an amplification display instruction, a screen recording instruction, a closing instruction and an evaluation instruction.

4. The content navigation method according to claim 3, wherein for a case where the trigger instruction includes the voice play instruction, the performing an operation corresponding to the trigger instruction with the target AR navigation content includes: playing voice content in the target AR navigation content, and/or broadcasting character content in the target AR navigation content through voice;

for the case that the trigger instruction includes the sharing instruction, the executing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: generating sharing information based on the target AR navigation content, and issuing the sharing information to a target sharing platform corresponding to the sharing instruction;

for a case that the trigger instruction includes the translation instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: translating the target AR navigation content into a target language;

for a case that the trigger instruction includes the zoom-in display instruction, the performing, with the target AR navigation content, an operation corresponding to the trigger instruction includes: in an interactive interface of the AR device, amplifying and displaying the target AR navigation content;

for a case that the trigger instruction includes the close instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: closing the target AR navigation content;

for a case that the trigger instruction includes the evaluation instruction, the performing, by using the target AR navigation content, an operation corresponding to the trigger instruction includes: generating an evaluation request for the target AR navigation content based on the evaluation content input by the user, and sending the evaluation request to the server; receiving and displaying an evaluation result fed back by the server based on the evaluation request; the evaluation result includes target AR navigation content updated based on the evaluation content.

5. The content navigation method according to any one of claims 1 to 4, wherein the associating shows a target navigation object and the target AR navigation content in the video frame image, comprising:

determining a display position of the target AR navigation content based on a position of a target navigation object corresponding to the target AR navigation content in the video frame image, and displaying the target AR navigation content based on the display position; or

Displaying the target AR navigation content at a preset position in an interactive interface of the AR device; or

Receiving first attitude information sent by the server; the first pose information is used to characterize a pose of the target AR navigation content relative to the AR device; displaying the target AR navigation content based on the first pose information; or

Determining a sight line position of a user in the video frame image, and determining a target navigation object watched by the user based on the sight line position; and displaying the target AR navigation content corresponding to the target navigation object watched by the user.

6. The content navigation method according to any one of claims 1 to 5, wherein in a case where the number of target AR navigation contents corresponding to the same target navigation object is plural, the associating shows the target navigation object and the target AR navigation contents in the video frame image, including:

sequentially displaying a plurality of target AR navigation contents according to a preset sequence of the target AR navigation contents;

or, displaying a plurality of target AR navigation contents according to a duration that a user gazes at the target navigation object;

or determining a display sequence of the target AR navigation contents according to the user attributes of the user, and displaying the target AR navigation contents according to the display sequence; wherein the user attributes include: at least one of age, gender, expression, interest bias.

7. A content navigation method applied to a server, the content navigation method comprising:

receiving a video frame image sent by Augmented Reality (AR) equipment;

determining second position and orientation information of the AR equipment in a scene coordinate system corresponding to a target scene based on the video frame image;

determining target AR navigation content based on the second pose information;

and sending the target AR navigation content to the AR device.

8. The content navigation method according to claim 7, wherein after determining second pose information of the AR device in a scene coordinate system corresponding to a target scene based on the video frame image, the content navigation method further comprises:

determining a target navigation object included in the video frame image based on the video frame image;

determining third posture information of AR navigation content corresponding to the target navigation object in the scene coordinate system;

the determining target AR navigation content based on the second pose information includes:

and determining the target AR navigation content based on the second posture information and the third posture information.

9. The content navigation method according to claim 7 or 8, wherein the determining target AR navigation content based on the second pose information comprises:

determining a plurality of alternative AR navigation contents based on the second pose information;

determining a shooting angle change rate between a current video frame image and a historical video frame image of the current video frame image;

determining a target shooting angle change rate section to which the shooting angle change rate belongs from a plurality of shooting angle change rate sections;

and determining the alternative AR navigation content corresponding to the target shooting angle change rate interval as the target AR navigation content.

10. The content navigation method according to any one of claims 7-9, wherein the method further comprises:

determining first posture information based on the second posture information and the third posture information; the first pose information characterizes a pose of the target AR navigation content relative to the AR device;

sending the first pose information to the AR device.

11. The content navigation method according to any one of claims 7 to 10, wherein the determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene comprises:

performing key point identification on the video frame image to obtain a first key point in the video frame image;

and determining a second key point matched with the first key point from a high-precision three-dimensional map of the target scene based on the first key point, and determining second position and orientation information of the AR equipment in the scene coordinate system based on a three-dimensional coordinate value of the second key point in the scene coordinate system.

12. The content navigation method according to claim 11, wherein after the key point identification of the video frame image to obtain the first key point in the video frame image, the content navigation method further comprises:

determining a target pixel point corresponding to the first key point in the video frame image;

determining a two-dimensional coordinate value of the target pixel point under an image coordinate system corresponding to the video frame image;

the determining, based on the three-dimensional coordinate value of the second keypoint in the scene coordinate system, second pose information of the AR device in the scene coordinate system includes:

and determining second position and posture information of the AR equipment in the scene coordinate system based on the two-dimensional coordinate value and the three-dimensional coordinate value of the second key point in the scene coordinate system.

13. The content navigation method according to any one of claims 7 to 12, wherein the determining, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene comprises:

performing scene key point identification on the video frame image, and determining a target pixel point corresponding to at least one scene key point in the video frame image;

predicting the depth value of the video frame image, and determining the depth value of the target pixel point in the video frame image;

and determining second position and posture information of the AR equipment based on the depth value of the target pixel point.

14. The content navigation method according to any one of claims 7-13, wherein the presentation form of the target AR navigation content comprises at least one of:

text, animation, images, and speech.

15. A content navigation apparatus applied to an Augmented Reality (AR) device, the content navigation apparatus comprising:

the acquisition module is used for acquiring a video frame image of a target scene where the AR equipment is located;

the first sending module is used for sending the video frame image to a server;

a first receiving module, configured to receive target AR navigation content returned by the server based on the video frame image; the target AR navigation content is used to indicate navigation information in the target scene;

and the association module is used for associating and displaying the target navigation object and the target AR navigation content in the video frame image.

16. A contents guide apparatus applied to a server, the contents guide apparatus comprising:

the second receiving module is used for receiving the video frame image sent by the AR equipment;

a first determining module, configured to determine, based on the video frame image, second pose information of the AR device in a scene coordinate system corresponding to a target scene;

a second determination module for determining target AR navigation content based on the second pose information;

and the second sending module is used for sending the target AR navigation content to the AR equipment.

17. A content navigation system, comprising: augmented Reality (AR) equipment and a server;

18. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor to execute the machine-readable instructions stored in the memory, the machine-readable instructions, when executed by the processor, causing the processor to perform the content navigation method of any one of claims 1 to 6 or to perform the content navigation method of any one of claims 7 to 14.

19. A computer-readable storage medium, having stored thereon a computer program which, when run by a computer device, performs a content navigation method according to any one of claims 1 to 6, or performs a content navigation method according to any one of claims 7 to 14.