WO2018014601A1

WO2018014601A1 - Method and relevant apparatus for orientational tracking, method and device for realizing augmented reality

Info

Publication number: WO2018014601A1
Application number: PCT/CN2017/080276
Authority: WO
Inventors: 刘钢; 熊剑明; 陈健; 方堃; 韦晓宁
Original assignee: 央数文化(上海)股份有限公司
Priority date: 2016-07-19
Filing date: 2017-04-12
Publication date: 2018-01-25
Also published as: CN106251404B; CN106251404A

Abstract

The present application aims to provide a method and a relevant apparatus for orientational tracking, and a method and a device for realizing augmented reality, which, on the basis of augmented reality technology, can superpose some virtual graphics, images, information, etc., into a real scene, augmenting the real scene. The invention takes a physical globe as a target entity, and enables the physical globe to possess the advantages of both a physical globe in the prior art and a virtual globe. The present invention realizes orientational tracking by acquiring, in real time, the central coordinate point and a direction vector of a target entity, and renders, in real time, on the basis of the central coordinate point and direction vector, outputted video frames, so that when a user manipulates a physical globe, a virtual globe will rotate by the same angle and speed, greatly improving user experience.

Description

Azimuth tracking method, method for realizing augmented reality, and related device and device

Technical field

The present application relates to the field of computers, and in particular, to an azimuth tracking method, a method for implementing augmented reality, and related devices and devices.

Background technique

In the process of people knowing the world, the globe has always been an indispensable geographic teaching aid. It allows people to understand the concept of space and location in a more vivid and interesting way, and to understand the vastness of the land, the vastness of the ocean, and the change of the four seasons. Especially for children to form a correct scientific and worldview is also of great significance. However, for a long time, traditional globes can only express very limited geographic information, such as the boundary between land and ocean, the border between countries, etc., and for richer geographic information, such as the internal structure of the Earth, the Earth in the solar system. The position of the earth, the distribution of animals and plants on the surface of the earth, etc., the traditional globe is powerless.

There are some virtual globe application products. These applications display a virtual globe through the screen. People can control the rotation of the virtual globe by sliding on the screen with a single finger. It can also control the virtual globe to zoom in or out by scaling the two fingers on the screen. . Compared to physical globes, virtual globes are easy to operate, easy to carry, and have a large amount of information. However, virtual globes also have some obvious disadvantages, such as the relatively poor user experience, which can not give users a real sense of space and actual operational experience.

Application content

The purpose of the application is to provide an orientation tracking method, a method for implementing augmented reality, and related devices and devices.

To achieve the above objective, the present application provides a method for tracking an orientation of a target entity, the method comprising:

Get a video frame containing the target entity;

Performing image feature extraction on the target entity in the video frame to acquire image feature information of the visible surface of the target entity;

Performing feature recognition on the acquired image feature information of the visible surface of the target entity based on the set of prefabricated features of the target entity, and acquiring a surface area matching the visible surface in the set of prefabricated features, wherein the prefabrication The feature set includes a plurality of surface regions of the target entity and image feature information of each surface region;

A center point coordinate and a direction vector of the visible surface of the target entity are respectively determined according to a center point coordinate and a direction vector of the surface area matching the visible surface.

Further, before acquiring the video frame including the target entity, the method further includes:

Preprocessing the target entity to construct a set of prefabricated features of the target entity.

Further, the target entity is preprocessed to construct a prefabricated feature set of the target entity, including:

Acquiring an image of a plurality of surface regions of the target entity, wherein the plurality of surface regions cover at least all surface regions of the target entity;

Performing image feature extraction on the images of the plurality of surface regions to acquire image feature information of each surface region;

Constructing a set of prefabricated features of the target entity based on the image feature information of each of the surface regions.

Further, determining a center point coordinate of the visible surface of the target entity according to a center point coordinate of the surface area matching the visible surface comprises:

If there are a plurality of surface regions matching the visible surface, the center point coordinates of the plurality of surface regions are subjected to weighted averaging processing, and the center point coordinates of the target entity are determined.

Further, determining the target entity visible surface direction vector according to a direction vector of the surface area matching the visible surface comprises:

If there are multiple surface regions matching the visible surface, the direction vectors of the plurality of surface regions are weighted and added and subjected to vector normalization processing to obtain a direction vector of the visible surface of the target entity.

The application also provides a method for implementing augmented reality, the method comprising:

Obtaining a center point coordinate and a direction vector of the target entity by using the azimuth tracking method;

Obtaining a display portion corresponding to the visible surface from the virtual model according to the direction vector;

The display portion is rendered into the video frame based on the center point coordinates such that a visible surface of the target entity is covered by the display portion.

Output the finished rendered video frame.

Further, the method further includes:

Obtaining an information superposition point in the display portion;

And rendering virtual information corresponding to the information superposition point into the video frame, so that the corresponding position of the display part displays the virtual information.

Further, the virtual model is provided with a trigger point;

The method further includes:

Determining whether a trigger point of the display portion is located in a trigger area of the video frame;

If the trigger point is in a trigger area of the video frame, the trigger effect of the trigger point is rendered into the video frame.

Further, the target entity is a physical globe, and the virtual model is a virtual globe matching the physical globe.

According to another aspect of the present application, there is also provided an orientation tracking device for a target entity, the device comprising:

An image acquisition module, configured to acquire a video frame that includes a target entity;

a feature extraction module, configured to perform image feature extraction on a target entity in the video frame, and acquire image feature information of a visible surface of the target entity;

a feature matching module, configured to perform feature recognition on the acquired image feature information of the visible surface of the target entity based on the set of prefabricated features of the target entity, and acquire a surface region matching the visible surface in the prefabricated feature set The set of prefabricated features includes a plurality of surface regions of the target entity and image feature information of each surface region;

And an integrated processing module, configured to respectively determine a center point coordinate and a direction vector of the visible surface of the target entity according to a center point coordinate and a direction vector of the surface area matching the visible surface.

Further, the device further includes:

And a pre-processing module, configured to pre-process the target entity before acquiring the video frame that includes the target entity, and construct a pre-made feature set of the target entity.

Further, the preprocessing module is configured to control the image acquisition module to acquire a target real An image of a plurality of surface regions, wherein the plurality of surface regions cover at least all surface regions of the target entity; controlling the feature extraction module to perform image feature extraction on images of the plurality of surface regions to obtain the Image feature information for each surface region; and constructing a set of prefabricated features of the target entity based on image feature information for each of the surface regions.

Further, the integrated processing module is configured to perform weighted averaging processing on center point coordinates of the plurality of surface regions when there are multiple surface regions matching the visible surface, and determine a center point coordinate of the target entity .

Further, the integrated processing module is configured to weight-add the direction vectors of the plurality of surface regions and perform vector normalization processing to obtain the target when there are multiple surface regions matching the visible surface The direction vector of the solid visible surface.

The application also provides a device for realizing augmented reality, the device comprising:

An orientation tracking device, configured to acquire a center point coordinate and a direction vector of the target entity;

a rendering device, configured to acquire a display portion corresponding to the visible surface from the virtual model according to the direction vector, and render the display portion into the video frame according to the center point coordinate, so that the The visible surface of the target entity is covered by the display portion;

An output device for outputting a finished video frame.

Further, the rendering device is further configured to acquire an information superposition point in the display portion, and render virtual information corresponding to the information superimposition point into the video frame to make a corresponding position of the display portion The virtual information is displayed.

Further, the virtual model is provided with a trigger point;

The rendering device is further configured to determine whether a trigger point of the display portion is located in a trigger region of the video frame, and when the trigger point is in a trigger region of the video frame, triggering the trigger point Rendered into the video frame.

Compared with the prior art, the solution of the present application can superimpose some virtual graphics, images, information, etc. into a real scene based on augmented reality technology, and enhances the real scene by using the physical globe as a target entity. It enables it to combine the advantages of both physical globes and virtual globes in the prior art. Get the central coordinate point of the target entity in real time and The direction vector realizes the azimuth tracking, and the real-time rendering of the output video frame based on the central coordinate point and the direction vector, so that the virtual globe rotates at the same angle and speed during the manipulation of the physical globe, which can greatly improve the user. Experience.

In addition, the image feature extraction is performed by the video frame containing the target entity acquired in real time, and the image feature matching is performed with the pre-built pre-made feature set, and the calculation is performed based on the matching result to obtain the center of the visible surface of the target entity in the current video frame. The point coordinates and the direction vector can quickly and accurately achieve the azimuth tracking of the target entity, ensuring that the virtual globe displayed on the screen and the physical globe can move synchronously, giving the user a real sense of space presence and an actual operational experience.

DRAWINGS

Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.

FIG. 1 is a flowchart of a method for tracking an azimuth of a target entity according to an embodiment of the present application;

FIG. 2( a ) is a schematic diagram of a feasible projection of a surface area of a globe in an equatorial plane provided by the present application.

2(b) to (e) are schematic views showing the projection of the four hemispherical planes divided by the plane of Fig. 2(a) on the equatorial plane;

3 is a schematic diagram showing the spatial orientation of two surface regions matching the visible surface in a reference coordinate system in the embodiment of the present application;

4 is a specific processing flowchart of performing preprocessing in the embodiment of the present application;

FIG. 5 is a flowchart of a method for implementing augmented reality according to an embodiment of the present application;

6 is a schematic structural diagram of an azimuth tracking device of a target entity according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an apparatus for implementing augmented reality according to an embodiment of the present disclosure;

The same or similar reference numerals in the drawings denote the same or similar components.

detailed description

The present application is further described in detail below with reference to the accompanying drawings.

In a typical configuration of the present application, the terminal, the device of the service network, and the trusted party are both included. One or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

Computer readable media includes both permanent and non-persistent, removable and non-removable media, and information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.

FIG. 1 is a schematic diagram of an azimuth tracking method for a target entity according to an embodiment of the present application. The method is used to implement azimuth tracking of a visible surface of a physical target during processing of AR (Augmented Reality). Specifically, the following steps are included:

Step S101: Acquire a video frame including a target entity. The target entity may be a physical globe, or may be other regular or irregularly shaped entities such as spheres, ellipsoids, polyhedra, and the like. In an actual scenario, a target entity can be photographed by a device with an image capture device such as a camera to obtain a video frame containing the target entity. For example, a video of a physical globe is taken by a camera of a smartphone, and any frame in the video can be used for subsequent steps to obtain a center coordinate point and a direction vector of the visible surface of the physical globe in the video frame.

Step S102, performing image feature extraction on the target entity in the video frame, and acquiring image feature information of the visible surface of the target entity. For a certain target entity, only the surface of a certain part of the surface can be seen in the video frame. For example, for a spherical physical globe, the visible surface of the physical globe in the video frame is at most half a spherical surface, and when image feature extraction is performed, Also only need to mention Take the visual half-spherical image feature information. For entities such as polyhedrons or other irregular shapes, the visible surface may include one or more of the sides depending on the angle of the camera lens. In the image feature extraction, various mature image feature extraction techniques in the prior art may be used, for example, based on any one or more combinations of image feature information such as color features, texture features, shape features, and spatial relationship features. Extracting one or more image feature information of the surface pattern of the globe.

Step S103, performing feature recognition on the acquired image feature information of the visible surface of the target entity based on the pre-made feature set of the target entity, and acquiring a surface area matching the visible surface in the pre-made feature set. Wherein the prefabricated feature set includes a plurality of surface regions of the target entity and image feature information of each surface region, and the physical globe is still taken as an example, and a plurality of surface regions included in the prefabricated feature set are feasible The division is as follows: dividing the globe by the direction of the warp, dividing the whole globe into four complete hemispheres, the center points of the four hemispheres (ie, the center of the sphere) are completely coincident, and each adjacent two hemispheres There is a 90° overlap between the two. The projection of this division on the equatorial plane is shown in Fig. 2(a), and the projection of each hemisphere on the equatorial plane is shown in Fig. 2(b) to (e). The preset feature set may directly adopt the existing set data, or may be obtained by offline pre-processing before processing.

Because the geographical environment on each hemisphere is different, the patterns of the corresponding parts will be different. For example, the water and land will adopt different colors, and various types of terrain on the land will also adopt different colors or patterns. The image feature information corresponding to the four hemispheres will also be completely different. In the prefabricated feature set, the image feature information corresponding to each hemispherical surface may also adopt a combination of any one or more of image feature information such as a color feature, a texture feature, a shape feature, and a spatial relationship feature. By matching the image features, if the image feature information of a certain part of the visible surface is the same as the corresponding part of a certain surface area in the prefabricated feature set or the similarity exceeds the threshold, the two can be matched, thereby obtaining the prefabricated feature. All surface areas in the collection that match the visible surface.

In an actual scenario, due to the difference in the relative positions of the camera and the physical globe during the video acquisition process, the visible surface of the acquired physical globe may contain a complete hemisphere in the set of prefabricated features, or may be composed of multiple A part of the hemisphere. E.g, The lens direction coincides with any point on the equator and the geocentric line. At this point, the visible surface of the physical globe may contain a complete hemisphere in the set of prefabricated features, or a part of two hemispheres; If the earth axes coincide, then the visible surface of the physical globe may be composed of portions of the four hemispheres in the set of prefabricated features described above.

Step S104, determining a center point coordinate and a direction vector of the visible surface of the target entity according to a center point coordinate and a direction vector of the surface area matched with the visible surface. In the process of performing mathematical processing, the position of the current camera is set as the origin of the reference coordinate system. In this reference coordinate system, whether the camera moves or the target entity moves, the position of the target entity can be regarded as occurring. The change, while the position of the camera remains the same. Since the spatial position of the visible surface of the target entity can be determined in the video frame, the center point coordinates and the direction vector of the surface area that matches the visible surface can thereby be determined. For a physical globe, the center point coordinate of the surface area is the coordinates of the center of the sphere in the reference coordinate system whose origin is the position where the camera is located, and the direction vector can be specified as the point from the center of the sphere under the same reference frame. The unit vector to the longitude and latitude are 0° points.

In some specific cases, there is only one surface area matching the visible surface, and the center point coordinate and the direction vector of the surface area are the center point coordinates and the direction vector of the visible surface of the target entity. . When there are a plurality of surface areas matching the visible surface, the center point coordinates and the direction vector of the surface area acquired by the video frame may exist due to factors unavoidable such as processing precision, camera shake, and the like. error. For example, in an ideal case, the coordinates of the center points corresponding to the surface areas of the same target entity should be consistent, and in the presence of errors, the center point coordinates of the multiple surface areas may not be the same, and the corresponding direction vectors may also be different. . If the spatial orientation of the two surface regions matching the visible surface in the reference coordinate system is as shown in FIG. 3, the coordinates of the center point are respectively

with

The direction vector is

with

Determining a center point coordinate of a visible surface of the target entity according to a center point coordinate of a surface area matching the visible surface, if there are multiple surface areas matching the visible surface, a plurality of surface areas The center point coordinates are subjected to a weighted averaging process to determine the center point coordinates of the target entity. Wherein, the setting of the weights may be specifically set according to different scenarios to achieve an optimal calculation precision, and as a feasible implementation manner, the weights of all surface regions may be set. Set to the same, you can calculate according to the following formula:

among them,

That is, the coordinates of the center point of the visible surface of the target entity,

Represents the center point coordinates of any surface area that matches the visible surface, where n is the number of matched surface areas.

And when the target entity visible surface direction vector is determined according to the direction vector of the surface area matching the visible surface, if there are multiple surface areas matching the visible surface, the direction of the plurality of surface areas The vector weights are added and subjected to vector normalization processing to obtain a direction vector of the visible surface of the target entity. Similarly, when performing weighted addition, the weight setting can be specifically set according to different scenarios to achieve optimal calculation accuracy, and as a feasible implementation manner, the weights of all surface areas can be set to be the same. It can be calculated according to the following formula:

among them,

A direction vector representing any surface area that matches the visible surface, where n is the number of matched surface areas.

By continuously processing the multi-frame image, the spatial orientation of the physical globe at each moment can be dynamically obtained. Whether changing the relative position of the camera and the physical globe, or rotating the physical globe, the physical globe can be accurately acquired relative to the center of the camera. The point coordinates and the direction vector, and based on this, the virtual globe is rendered to ensure that the virtual globe and the physical globe rotate or move in synchronization.

In an actual scenario, the target entity may be pre-processed before the video frame containing the target entity is acquired to construct a pre-made feature set of the target entity. In this way, a corresponding set of prefabricated features can be constructed for a target entity of any shape, thereby realizing real-time azimuth tracking and AR implementation of the target entity of any shape. Taking the scene of the globe as an example, the main process of the preprocessing is to scan the image features on the surface of the sphere offline to obtain a prefabricated feature set on the surface of the sphere. The prefabricated feature set is pre-stored and can be collected in real time. The image features of the visible surface of the physical globe are compared and matched. The specific process includes the steps shown in Figure 4:

Step S401, acquiring an image of multiple surface areas of the target entity, wherein the plurality of surface areas cover at least all surface areas of the target entity;

Step S402, performing image feature extraction on the images of the plurality of surface regions, and acquiring image feature information of each surface region;

Step S403, construct a pre-made feature set of the target entity according to the image feature information of each surface area.

When pre-processing the physical globe, that is, when collecting the image feature information of the surface area and constructing the pre-made feature set, the following principles need to be followed to ensure more stable and efficient processing performance when performing the azimuth tracking identification, including:

1) The prefabricated feature set is composed of a plurality of surface regions of the surface of the sphere (relative to the entire sphere surface, the surface region being a partial curved surface), and the coverage area of each surface region is no more than one complete hemisphere;

2) The collection of all surface areas should be able to cover the complete sphere;

3) The image features on each surface area should be as adequate and uniform as possible;

4) each two adjacent surface areas may have a certain overlap area;

5) The larger the coincidence area between adjacent surface areas, the more accurate the orientation tracking can be improved;

6) The greater the number of surface areas, the greater the computational load, and the number should be determined based on the actual processing power of the equipment.

Further, the embodiment of the present application further provides a method for implementing augmented reality. The processing flow of the method is as shown in FIG. 5, and includes the following steps:

Step S501: Acquire a center point coordinate and a direction vector of the target entity by using the foregoing azimuth tracking method.

Step S502, obtaining a display portion corresponding to the visible surface from the virtual model according to the direction vector. If the target entity is a physical globe, the virtual model may be a virtual globe that matches the physical globe. According to the direction vector, a display portion corresponding to the visible surface may be determined based on a center of the virtual globe, and the display portion is a hemispherical surface pointed by the direction vector.

Step S503, rendering the display portion into the video frame according to the center point coordinate, so that a visible surface of the target entity is covered by the display portion. Based on the coordinates of the center point, it can be determined at which position in the video frame the display portion needs to be rendered. Still taking the virtual globe as an example, the hemispherical surface of the display portion can be determined in the video frame based on the coordinates of the center point and the radius of the virtual globe. The spatial position that should be in, thereby completing the synthesis of the final picture, such that the visible surface of the original physical globe in the video frame is covered by the corresponding hemisphere of the virtual globe.

Step S504, outputting the completed video frame. Through the foregoing processing, the physical globe can not be seen in the picture presented to the user, but the virtual globe replaces the original position of the physical globe, because the virtual globe can display a large amount of information, and in the process of manipulating the physical globe, The virtual globe rotates at the same angle and speed, greatly improving the user experience.

In addition, in addition to the geographic information available on some common globes, virtual globes can be used to carry large amounts of information. By adding information superposition points on the virtual model in advance, virtual information of different topics can be added, for example, in correspondence. The geographical location presents three-dimensional models of landmark buildings, rare animals, and geological wonders. The specific treatment method is:

First, an information superposition point in the display portion is acquired. In the actual processing, after the corresponding display part is obtained, it is detected whether the display part contains a preset information superposition point, for example, an information superposition point about the giant panda is preset in the position of Sichuan, when the virtual globe is set. When the hemispherical surface of the display part contains the geographical area where Sichuan is located, the superposition point of the information can be detected.

Then, the virtual information corresponding to the information superposition point is rendered into the video frame, so that the corresponding position of the display portion displays the virtual information. The virtual information can be a three-dimensional model of a giant panda, so that the user can directly see a giant panda in Sichuan on the screen, and more intuitive and clear understanding of various geographical knowledge.

Further, in order to be able to deliver more avatar and more dynamic information to the user, some trigger points can be set on the virtual model. During a rendering process, determining whether a trigger point of the display portion is located in a trigger area of the video frame; if the trigger point is in a trigger area of the video frame, rendering a trigger effect of the trigger point to the video In the frame.

The trigger area can be set according to actual needs. For example, for a touch screen, you can The trigger area is set in the area currently touched by the user, so that the user clicks to implement the trigger; or can be statically set in a specific area of the screen, such as the center of the screen; and can be further marked by a cursor, an aperture, etc., to facilitate user identification. . The triggering effect may be a voice, a video, a three-dimensional animation effect, etc., for example, setting the trigger point to a display position of the giant panda. When the three-dimensional model of the giant panda is in the triggering region, the three-dimensional model of the giant panda is Zoom in and perform corresponding actions at the same time, such as walking, standing, etc. In addition, the triggering effect can also be a highlighting effect, for example, setting the trigger point to the range of any administrative area, when an administrative area enters the triggering area, the administrative The area will be highlighted. Herein, those skilled in the art should understand that the triggering area, the triggering effect, and the setting manner of the triggering point are only examples, and other existing or future possible manners may be applied to the present application, and should also be included in the present application. It is within the scope of protection and is hereby incorporated by reference.

Based on another aspect of the present application, an orientation tracking device 60 for a target entity is also provided. The device 60 is configured to implement azimuth tracking of a visible surface of a physical target during processing of implementing the AR. 6 shows an image acquisition module 610, a feature extraction module 620, a feature matching module 630, and an integrated processing module 640. Specifically, the image collection module 610 is configured to acquire a video frame that includes a target entity. The target entity may be a physical globe, or may be other regular or irregularly shaped entities such as spheres, ellipsoids, polyhedra, and the like. In an actual scenario, a target entity can be photographed by a device with an image capture device such as a camera to obtain a video frame containing the target entity. For example, a video of a physical globe is taken by a camera of a smartphone, and any frame in the video can be used for subsequent processing to obtain a central coordinate point and a direction vector of the visible surface of the physical globe in the video frame.

The feature extraction module 620 is configured to perform image feature extraction on a target entity in the video frame to acquire image feature information of a visible surface of the target entity. For a certain target entity, only the surface of a certain part of the surface can be seen in the video frame. For example, for a spherical physical globe, the visible surface of the physical globe in the video frame is at most half a spherical surface, and when image feature extraction is performed, It is also only necessary to extract visual half-spherical image feature information. For entities such as polyhedrons or other irregular shapes, the visible surface may include one or more of the sides depending on the angle of the camera lens. In the image feature extraction, various mature image feature extraction techniques in the prior art can be used, for example, based on color features, texture features, and shapes. A combination of any one or more of image feature information such as signs and spatial relationship features, and extracting one or more image feature information of the surface pattern of the globe.

The feature matching module 630 is configured to perform feature recognition on the acquired image feature information of the target entity visible surface based on the set of pre-made features of the target entity, and acquire a matching match with the visible surface in the pre-made feature set. Surface area. Wherein the prefabricated feature set includes a plurality of surface regions of the target entity and image feature information of each surface region, and the physical globe is still taken as an example, and a plurality of surface regions included in the prefabricated feature set are feasible The division is as follows: dividing the globe by the direction of the warp, dividing the whole globe into four complete hemispheres, the center points of the four hemispheres (ie, the center of the sphere) are completely coincident, and each adjacent two hemispheres There is a 90° overlap between the two. The projection of this division on the equatorial plane is shown in Fig. 2(a), and the projection of each hemisphere on the equatorial plane is shown in Fig. 2(b) to (e). The preset feature set may directly adopt the existing set data, or may be obtained by offline pre-processing before processing.

In an actual scenario, due to the difference in the relative positions of the camera and the physical globe during the video acquisition process, the visible surface of the acquired physical globe may contain a complete hemisphere in the set of prefabricated features, or may be composed of multiple A part of the hemisphere. For example, the lens direction coincides with any point on the equator and the geocentric line, and the visible surface of the physical globe may contain a complete hemisphere in the set of prefabricated features, or a part of two hemispheres; The direction coincides with the earth axis, at which point the visible surface of the physical globe may consist of portions of the four hemispheres in the set of prefabricated features described above.

The comprehensive processing module 640 is configured to determine a center point coordinate and a direction vector of the visible surface of the target entity according to a center point coordinate and a direction vector of the surface area matched with the visible surface. In the process of performing mathematical processing, the position of the current camera is set as the origin of the reference coordinate system. In this reference coordinate system, whether the camera moves or the target entity moves, the position of the target entity can be regarded as occurring. The change, while the position of the camera remains the same. Since the spatial position of the visible surface of the target entity can be determined in the video frame, the center point coordinates and the direction vector of the surface area that matches the visible surface can thereby be determined. For a physical globe, the center point coordinate of the surface area is the coordinates of the center of the sphere in the reference coordinate system whose origin is the position where the camera is located, and the direction vector can be specified as the point from the center of the sphere under the same reference frame. The unit vector to the longitude and latitude are 0° points.

with

The direction vector is

with

When the integrated processing module 640 determines the center point coordinates of the visible surface of the target entity according to the center point coordinates of the surface area matching the visible surface, if there are multiple surface areas matching the visible surface And performing weighted averaging processing on center point coordinates of the plurality of surface regions to determine a center point coordinate of the target entity. The weight setting may be specifically set according to different scenarios to achieve an optimal calculation precision, and as a feasible implementation manner, the weights of all surface regions may be set to be the same, and the calculation may be performed according to the following formula:

among them,

Represents the coordinates of the center point of any surface area that matches the visible surface, where n is the number of matched surface areas.

When the integrated processing module 640 determines the target entity visible surface direction vector according to the direction vector of the surface area matching the visible surface, if there are multiple surface areas matching the visible surface, The direction vectors of the plurality of surface regions are weighted and added and subjected to vector normalization processing to obtain a direction vector of the visible surface of the target entity. Similarly, when performing weighted addition, the weight setting can be specifically set according to different scenarios to achieve optimal calculation accuracy, and as a feasible implementation manner, the weights of all surface areas can be set to be the same. It can be calculated according to the following formula:

among them,

In an actual scenario, the target entity may be pre-processed by the pre-processing module to obtain a pre-made feature set of the target entity before acquiring the video frame including the target entity. In this way, a corresponding set of prefabricated features can be constructed for a target entity of any shape, thereby realizing real-time azimuth tracking and AR implementation of the target entity of any shape. Taking the scene of the globe as an example, the main process of the preprocessing is to scan the image features on the surface of the sphere offline to obtain a prefabricated feature set on the surface of the sphere. The prefabricated feature set is pre-stored and can be collected in real time. The physical features of the physical globe are aligned and matched to the image features of the visible surface. The pre-processing module is specifically configured to control the image acquisition module to acquire an image of a plurality of surface regions of the target entity, wherein the plurality of surface regions cover at least all surface regions of the target entity; and the feature extraction module is controlled Image of the plurality of surface regions is subjected to image feature extraction Taking image feature information of each surface area; and constructing a set of prefabricated features of the target entity according to image feature information of each surface area.

4) each two adjacent surface areas may have a certain overlap area;

Further, the embodiment of the present application further provides an apparatus for implementing augmented reality. The structure of the apparatus is as shown in FIG. 7, and includes an azimuth tracking device 60, a rendering device 710, and an output device 720. Specifically, the azimuth tracking device 60 is configured to acquire a center point coordinate and a direction vector of the target entity. The rendering device 710 is configured to acquire a display portion corresponding to the visible surface from a virtual model according to the direction vector, and render the display portion into the video frame according to the center point coordinate, so that The visible surface of the target entity is covered by the display portion. If the target entity is a physical globe, the virtual model may be a virtual globe that matches the physical globe. According to the direction vector, a display portion corresponding to the visible surface may be determined based on a center of the virtual globe, and the display portion is a hemispherical surface pointed by the direction vector. Based on the coordinates of the center point, it can be determined at which position in the video frame the display portion needs to be rendered. Still taking the virtual globe as an example, the hemispherical surface of the display portion can be determined in the video frame based on the coordinates of the center point and the radius of the virtual globe. The spatial position that should be in, thereby completing the synthesis of the final picture, such that the visible surface of the original physical globe in the video frame is covered by the corresponding hemisphere of the virtual globe.

The output device 720 is configured to output a video frame that is finished rendering. Through the aforementioned processing, the most The physical globe is not visible in the picture presented to the user, but the virtual globe replaces the original position of the physical globe. The virtual globe can display the same amount of information, and the virtual globe will be the same during the manipulation of the physical globe. The angle and speed of rotation can greatly improve the user experience.

Here, those skilled in the art should understand that the device may include, but is not limited to, a user terminal or a device formed by integrating a user terminal and a network device through a network. The user terminal includes, but is not limited to, a personal computer, a touch terminal, and the like, and specifically may be a smart phone, a tablet computer, a PDA, an AR glasses, or other electronic device having an image acquisition, processing, and output function; Implementations are not limited to, for example, a network host, a single network server, multiple network server sets, or a cloud-based computer collection. Here, the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a virtual computer composed of a group of loosely coupled computers. Furthermore, the device may also be an electronic device running an application (APP) containing associated algorithms and graphical user interfaces, or the application itself.

Taking a smart phone as an example, as a typical implementation, the camera of the smart phone can be used to implement the related functions of the image capturing module 610, and the processor can be used to implement the feature extraction module 620, the feature matching module 630, the comprehensive processing module 640, and The related functions of the device 710 are rendered while their screens enable the associated functions of the output device 720. In an actual scenario, the processor can also upload relevant data to the network device through the communication module, and the network device completes calculation and processing of related data.

In addition, in addition to the geographic information available on some common globes, virtual globes can be used to carry large amounts of information. By adding information superposition points on the virtual model in advance, virtual information of different topics can be added, for example, in correspondence. The geographical location presents three-dimensional models of landmark buildings, rare animals, and geological wonders. Thus, the rendering device 710 is further configured to perform the following processing:

First, an information superposition point in the display portion is acquired. In the actual processing, after the corresponding display part is obtained, it is detected whether the display part contains a pre-set information superposition point, for example, a superposition point about the giant panda is preset in the position of Sichuan, when the virtual globe is set. When the hemisphere of the display part contains the geographical area where Sichuan is located, it can be detected. Go to the superposition point of this information.

Further, in order to be able to deliver more avatar and more dynamic information to the user, some trigger points can be set on the virtual model. During the rendering process, the rendering device 710 determines whether the trigger point of the display portion is located in a trigger region of the video frame; if the trigger point is in a trigger region of the video frame, triggering the trigger point Rendered into the video frame.

The trigger area can be set according to actual needs. For example, for a touch screen, the trigger area may be set to an area currently touched by the user to implement a user click to implement triggering; or may be statically set in a certain specific area of the screen, such as the center of the screen; and may further pass a cursor, an aperture, etc. Mark it for user identification. The triggering effect may be audio, video, three-dimensional animation effect, etc., for example, setting the trigger point to the display position of the giant panda. When the three-dimensional model of the giant panda is in the triggering area, the three-dimensional model of the giant panda will be Zoom in and perform corresponding actions at the same time, such as walking, standing, etc. In addition, the triggering effect can also be a highlighting effect, for example, setting the trigger point to the range of any administrative area, when an administrative area enters the triggering area, the administrative The area will be highlighted. Herein, those skilled in the art should understand that the triggering area, the triggering effect, and the setting manner of the triggering point are only examples, and other existing or future possible manners may be applied to the present application, and should also be included in the present application. It is within the scope of protection and is hereby incorporated by reference.

In summary, the solution of the present application can superimpose some virtual graphics, images, information, etc. into a real scene based on augmented reality technology, and enhances the real scene by using the physical globe as a target entity. It combines the advantages of both physical globes and virtual globes in the prior art. Azimuth tracking is realized by acquiring the central coordinate point and direction vector of the target entity in real time, and the output video frame is rendered in real time based on the central coordinate point and the direction vector, so that the virtual globe will be the same in the process of manipulating the physical globe. Angle and speed rotation can greatly improve the user experience.

In addition, image feature extraction is performed by a video frame containing the target entity acquired in real time, Image feature matching is performed with the pre-built pre-made feature set, and calculation is performed based on the matching result to obtain the center point coordinate and the direction vector of the visible surface of the target entity in the current video frame, thereby enabling fast and accurate orientation tracking of the target entity. To ensure that the virtual globe displayed on the screen and the physical globe can move synchronously, giving the user a real sense of space and actual operational experience.

It should be noted that the present application can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device. In one embodiment, the software program of the present application can be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including related data structures) of the present application can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like. In addition, some of the steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.

In addition, a portion of the present application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or technical solution in accordance with the present application. The program instructions for invoking the method of the present application may be stored in a fixed or removable recording medium, and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run. Herein, an embodiment in accordance with the present application includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering The apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present application.

It is obvious to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims instead All changes in the meaning and scope of equivalent elements are included in this application. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is to be understood that the word "comprising" does not exclude other elements or steps. A plurality of units or devices recited in the device claims may also be implemented by a unit or device. Or hardware to achieve. The words first, second, etc. are used to denote names and do not denote any particular order.

Claims

A method for tracking an orientation of a target entity, wherein the method comprises:

Get a video frame containing the target entity;

Performing image feature extraction on the target entity in the video frame to acquire image feature information of the visible surface of the target entity;

Performing feature recognition on the acquired image feature information of the visible surface of the target entity based on the set of prefabricated features of the target entity, and acquiring a surface area matching the visible surface in the set of prefabricated features, wherein the prefabrication The feature set includes a plurality of surface regions of the target entity and image feature information of each surface region;

A center point coordinate and a direction vector of the visible surface of the target entity are respectively determined according to a center point coordinate and a direction vector of the surface area matching the visible surface.
The method of claim 1, wherein before acquiring the video frame including the target entity, the method further comprises:

Preprocessing the target entity to construct a set of prefabricated features of the target entity.
The method of claim 2, wherein pre-processing the target entity to construct the set of pre-made features of the target entity comprises:

Acquiring an image of a plurality of surface regions of the target entity, wherein the plurality of surface regions cover at least all surface regions of the target entity;

Performing image feature extraction on the images of the plurality of surface regions to acquire image feature information of each surface region;

Constructing a set of prefabricated features of the target entity based on the image feature information of each of the surface regions.
The method according to any one of claims 1 to 3, wherein determining a center point coordinate of the visible surface of the target entity according to a center point coordinate of a surface area matching the visible surface comprises:

If there are a plurality of surface regions matching the visible surface, the center point coordinates of the plurality of surface regions are subjected to weighted averaging processing, and the center point coordinates of the target entity are determined.
The method according to any one of claims 1 to 3, wherein determining the target entity visible surface direction vector according to a direction vector of a surface area matching the visible surface comprises:

If there are multiple surface regions matching the visible surface, the direction vectors of the plurality of surface regions are weighted and added and subjected to vector normalization processing to obtain a direction vector of the visible surface of the target entity.
A method of implementing augmented reality, wherein the method comprises:

Obtaining a center point coordinate and a direction vector of the target entity by using the azimuth tracking method;

Obtaining a display portion corresponding to the visible surface from the virtual model according to the direction vector;

The display portion is rendered into the video frame based on the center point coordinates such that a visible surface of the target entity is covered by the display portion.

Output the finished rendered video frame.
The method of claim 6 wherein the method further comprises:

Obtaining an information superposition point in the display portion;

And rendering virtual information corresponding to the information superposition point into the video frame, so that the corresponding position of the display part displays the virtual information.
The method of claim 6, wherein the virtual model is provided with a trigger point;

The method further includes:

Determining whether a trigger point of the display portion is located in a trigger area of the video frame;

If the trigger point is in a trigger area of the video frame, the trigger effect of the trigger point is rendered into the video frame.
A method according to any one of claims 6 to 8, wherein the target entity is a physical globe and the virtual model is a virtual globe that matches the physical globe.
An orientation tracking device for a target entity, wherein the device comprises:

An image acquisition module, configured to acquire a video frame that includes a target entity;

a feature extraction module, configured to perform image feature extraction on a target entity in the video frame, and acquire image feature information of a visible surface of the target entity;

a feature matching module, configured to perform feature recognition on the acquired image feature information of the visible surface of the target entity based on the set of prefabricated features of the target entity, and acquire a surface region matching the visible surface in the prefabricated feature set The set of prefabricated features includes a plurality of surface regions of the target entity and image feature information of each surface region;

And an integrated processing module, configured to respectively determine a center point coordinate and a direction vector of the visible surface of the target entity according to a center point coordinate and a direction vector of the surface area matching the visible surface.
The device of claim 10, wherein the device further comprises:

And a pre-processing module, configured to pre-process the target entity before acquiring the video frame that includes the target entity, and construct a pre-made feature set of the target entity.
The apparatus according to claim 11, wherein the preprocessing module is configured to control the image acquisition module to acquire an image of a plurality of surface regions of a target entity, wherein the plurality of surface regions cover at least the target entity All the surface regions; controlling the feature extraction module to perform image feature extraction on the images of the plurality of surface regions, acquiring image feature information of each surface region; and constructing the image according to the image feature information of each surface region A set of prefabricated features of the target entity.
The apparatus according to any one of claims 10 to 12, wherein the integrated processing module is configured to coordinate center points of the plurality of surface regions when there are a plurality of surface regions matching the visible surface A weighted averaging process is performed to determine the center point coordinates of the target entity.
The apparatus according to any one of claims 10 to 12, wherein the integrated processing module is configured to weight direction vectors of the plurality of surface regions when there are a plurality of surface regions matching the visible surface Adding and performing vector normalization processing to obtain a direction vector of the visible surface of the target entity.
A device for realizing augmented reality, wherein the device comprises:

An orientation tracking device, configured to acquire a center point coordinate and a direction vector of the target entity;

a rendering device, configured to acquire a display portion corresponding to the visible surface from the virtual model according to the direction vector, and render the display portion into the video frame according to the center point coordinate, so that the The visible surface of the target entity is covered by the display portion;

An output device for outputting a finished video frame.
The device according to claim 15, wherein the rendering device is further configured to acquire an information superposition point in the display portion, and render virtual information corresponding to the information superimposition point into the video frame, The virtual information is displayed in a corresponding position of the display portion.
The apparatus of claim 16, wherein the virtual model is provided with a trigger point;

The rendering device is further configured to determine whether a trigger point of the display portion is located in the video a triggering region of the frame, and when the triggering point is in a triggering region of the video frame, rendering a triggering effect of the triggering point into the video frame.
Apparatus according to any one of claims 15 to 17, wherein the target entity is a physical globe and the virtual model is a virtual globe that matches the physical globe.