CN108364302B

CN108364302B - Unmarked augmented reality multi-target registration tracking method

Info

Publication number: CN108364302B
Application number: CN201810096334.1A
Authority: CN
Inventors: 张宇; 卢明林; 李雯
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2020-09-22
Anticipated expiration: 2038-01-31
Also published as: CN108364302A

Abstract

The invention discloses a unmarked augmented reality multi-target registration tracking method, which comprises the following steps: 1) an off-line stage: clustering a vocabulary tree model through layered k-means, registering corresponding image id for inverted indexes of all leaf nodes of the vocabulary tree, and finally updating the vocabulary tree into a vocabulary tree with tf-idf according to the frequency of the leaf nodes of the whole tree and the total number of images to be registered; 2) an online stage: and in the online stage, a real-time response system is used for retrieving the most similar image from the image input by the camera in real time according to the trained vocabulary tree, then calculating the initial pose of the 3D object to be loaded through pose estimation of the camera, tracking the motion of the target by adopting a tracking algorithm of KLT, and finally constructing the augmented reality scene of each target through a rendering thread. The invention provides an efficient and reliable solution for unmarked multi-target augmented reality.

Description

Unmarked augmented reality multi-target registration tracking method

Technical Field

The invention relates to the field of computer graphics, in particular to a unmarked augmented reality multi-target registration tracking method.

Background

Augmented reality is a technology for seamlessly fusing information of a virtual world and information of a real world, and experience information which originally exceeds the real world is overlapped in the real world through simulation by scientific technologies such as computers, so that sense experience exceeding reality is achieved, and the augmented reality can act on sensory systems such as vision, hearing and taste. The origin of the augmented reality can be traced back to the birth of computer technology, the head-mounted display device invented by Sutherland of the subsidiary professor of electrical engineering of Harvard university in 1968 is a prototype of the augmented reality, the whole set of system places the display device on the ceiling of the top of the head of a user and is connected with the head-mounted device through a connecting rod, and a simple wire frame graph can be converted into an image with a 3D effect. The augmented reality application scene related by the invention is mainly a visual system, the traditional augmented reality technology based on the mark can not keep up with the foot step of the times, the abundant diversified world requirements of people can not be met, and the unmarked augmented reality technology is more flexible and has universal applicability. The invention discloses a technology applied to unmarked multi-target augmented reality registration tracking, which mainly comprises algorithms of visual feature clustering, a vocabulary tree model, inverted indexes, camera pose estimation, target tracking and the like.

In the 21 st century, a large number of companies began to develop augmented reality applications, and the development platforms used by the companies were mainly as follows: (1) vufaria; (2) easy AR; (3) wikitude; (4) ARToolkit; (5) maxst; (6) xzimg; and Metaio corporation, early purchased by apple Inc., and released an ARkit development kit for IOS system for users in 6 months of 2017. Besides the ARToolKit, other platforms only provide open interfaces for users to develop applications, and charges are needed after certain limits are exceeded. Although the commercial open platform provides a relatively perfect solution, it cannot provide a set of detailed and definite algorithm theoretical basis for researchers. Only the ARToolKit with the only open source algorithm code provides a marked-based multi-target registration tracking method, but a unmarked multi-target registration tracking method is not realized, and the ARToolKit traverses the whole image database in the recognition stage, so that the recognition efficiency is very low when a large number of registered images exist. Meanwhile, the system and the method based on the unmarked augmented reality of the image, which are invented by the Zhang et al, still select to traverse the whole image database for comparison in the retrieval stage.

The core technology for augmented reality mainly comprises three aspects: (1) retrieving an image; (2) estimating the pose of the camera; (3) tracking on line; in the beginning of the 21 st century, a widely used augmented reality image recognition method employed a template based on a square marker. A square template based on marks is composed of an embedding mode, a white background and a black frame, wherein the embedding mode determines the uniqueness of the marks, and the black frame is firstly identified, tracked and detected and used for estimating the pose of a camera. However, the template based on the mark has less storage characteristics, and the supported modes are limited, so the template is not suitable for a massive mode recognition scene.

With the subsequent development of the technology, the unmarked natural feature tracking breaks through the limitation of the traditional marking-based method, so that the augmented reality application scene is more flexible, and the storage capacity is larger. The unmarked augmented reality consists of natural feature points, and commonly used feature point extraction algorithms include SIFT, SURF, FAST, ORB and the like. The method has the advantages that the extracted feature points are many and complex based on unmarked augmented reality, and the named BOW bag-of-words model is available in the image retrieval stage, however, experiments show that the bag-of-words model is adopted to often need a larger clustering center number, such as 10, in the clustering stage⁶The above results are better and the time spent in the search stage is linearly related as the number of registered images increases. The scalable recognition using lexical trees and the KLT target tracking algorithm based on optical flow proposed by Nister D, Stewenius H provide the basis for the present invention.

Disclosure of Invention

The invention aims to overcome the defects of small storage capacity and inflexible use of a marked template in augmented reality, and provides a unmarked augmented reality multi-target registration tracking method. And the recognition speed is high, the storage capacity is large, and the efficiency and the quality are guaranteed under the multi-target tracking.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a multi-target registration tracking method for unmarked augmented reality comprises the following steps:

1) an off-line stage: clustering out a vocabulary tree model through layered k-means, registering corresponding image id for inverted indexes of all leaf nodes of the vocabulary tree, and finally updating the vocabulary tree into a vocabulary tree with tf-idf according to the frequency of the leaf nodes of the whole tree and the total number of images to be registered.

2) An online stage: and in the online stage, a real-time response system is used for retrieving the most similar image from the image input by the camera in real time according to the vocabulary tree trained in the step 1), then calculating the initial pose of the 3D object to be loaded through pose estimation of the camera, tracking the motion of the target by adopting a KLT tracking algorithm, and finally constructing the augmented reality scene of each target through a rendering thread.

In step 1), the off-line phase comprises the following steps:

a) extracting all SIFT feature point descriptors for an image set to be registered;

b) performing k-means hierarchical clustering according to the number of branches of the vocabulary tree;

c) repeating the operation of b) for all SIFT feature point descriptors under each branch in sequence until leaf nodes;

d) finally, respectively linking one inverted index file to all leaf nodes at the bottommost layer;

e) extracting SIFT feature point descriptors for the image to be registered again, counting the score weight of each feature point descriptor on each leaf node according to the trained vocabulary tree and registering the feature point descriptors for the corresponding image;

f) recalculating the vocabulary tree with tf-idf weight according to the length of the inverted index file on the leaf node of the whole tree; the total number of registered images is N, and N is shared by any leaf node j_jWhen an image is registered on the inverted index, the tf-idf weight of the leaf node j is

For registered image d', if there is m_jIf a feature point falls on leaf node j, the score of the leaf node is d_j'＝m_jw_jFor the image q' to be searched, if there is n_jIf a feature point falls on a leaf node j, the score of the leaf node is q_j'＝n_jw_j。

In step 2), the whole system has an identification thread, a plurality of tracking threads and a rendering thread, and the specific conditions are as follows:

the expression form of multi-target tracking is as follows: given a set of image sequences I₁,I₂,...,I_tThe number of targets in each image is M_tThe state of each target has position and attitude information, and the position and attitude information is composed of a 3 × 4 matrix

Represented by the formula:

the 3D point of a world coordinate system can be mapped to obtain the position of a 2D point of a camera coordinate system through the following equation:

where u, v are the image coordinate system of the camera projection, x_w，y_w，z_wIs the world coordinate system coordinate of a feature point, and A is the internal parameter of the camera, as follows:

in the formula, is a_xAnd ∈_yRepresents the focal length, c_xAnd c_yCoordinates representing principal points, s representing tilt coefficients, which are intrinsic parameters of the camera used to correct the effects of optical deformations, which can be done in an off-line phase; the pose estimation is to calculate an accurate external parameter s of the camera through the visual angles of n more than or equal to 3 reference points_tCalculating a rotation matrix R and a translation matrix f by using an ICP (inductively coupled plasma) iterative method;

then all the object states in each image are represented as

The motion track corresponding to the ith target is expressed as

Sequence of states S consisting of all image objects_1:t＝{S₁,S₂,...,S_t}；

Identifying the work content of the thread:

in contrast to conventional single-object recognition, multi-object recognition aims at recognizing as many objects as possible in an image, let I_tThe method comprises h targets, and the flow of the steps is as follows:

a) according to the input image I of the camera_tFirstly, detecting all feature point sets P by using an SIFT algorithm, and extracting all SIFT feature point descriptors for the feature point sets P;

b) performing inner product on each descriptor and each layer of nodes of the vocabulary tree from top to bottom according to the vocabulary tree trained in the 1) stage, judging that each feature point descriptor falls into a final leaf node, and adding tf-idf weight of the node to the score of the corresponding leaf node;

c) taking out all images in the leaf node inverted index with the score not equal to zero of the input image, and obtaining the most similar reference image in the to-be-detected image set by using the following similarity calculation formula

Considering that the image may contain a plurality of identical targets, the Euclidean distance calculation will result in more similar targets and longer distance, so the similarity calculation method for multi-target detection and identification should adopt a cosine similarity calculation formula;

wherein q is the leaf node score vector of the query image with all scores different from zero,

and (4) a leaf node score vector with all non-zero scores of one image to be compared in the image database.

d) And searching a homography matrix H which is a 3 multiplied by 3 matrix, if the homography matrix H exists, recalculating the inner point set Q by using a RANSAC algorithm, and if the homography matrix H does not exist, stopping. The pair of points on the images from two different perspectives can be described by a projective transformation, as follows:

x'＝H·x

where x', x are the coordinates of point pairs on two images from different viewing angles, respectively.

e) For reference image

The 4 boundary points and the homography matrix H are operated by the above formula, and whether the quadrangle a 'b' c 'e' after matrix transformation is a convex quadrangle or not is judged, and if not, the operation is ended. Setting the height of an original image as h and the width as w, and calculating four points a, b, c and e by the following formula to obtain coordinates of a ', b', c 'and e', and judging whether the coordinates are convex edges or not by a geometric theorem;

in the formula, the coordinate of the point a is (0,0), the coordinate of the point b is (0, h), the coordinate of the point c is (w, h), the coordinate of the point e is (w,0), and the coordinate of the point a' is (x)_a,y_a) And b' point coordinate is (x)_b,y_b) C' point coordinate is (x)_c,y_c) And e' point coordinates are (x)_e,y_e)。

f) Calculating the initial pose matrix of the target by an ICP iterative algorithm

Handing over to a new tracking thread and shielding the target area;

g) let P be { P- (P ∩ Q) }, and give the target to the tracking thread to update the position and pose matrix in real time

The recognition thread continues to repeat a) -f) work on the unmasked region.

Work content of each trace thread:

multi-object tracking means that a sequence of images I is given₁,I₂,...,I_tAnd finding a moving object set in the image sequence through multi-target identification, corresponding moving objects in subsequent frames to one of the moving objects, and giving the motion tracks of different objects. Herein, adoptedThe tracking algorithm is a KLT algorithm based on optical flow, and is specifically realized as follows:

the KLT implementation principle defines that the same target appears in two images I, J and is the same over a local window W, then within the local window W: i (x, y, l) ═ J (x ', y', l + τ), which explains that on image I, all (x, y) points have moved in one direction by a distance g ═ dx, dy, and at time l + τ correspond to (x ', y') on image J, i.e., (x + dx, y + dy), so the problem of finding a match can be to find a minimum for the following equation:

the above equation, also called the difference equation, is the equivalent of the integral:

where w (x) is a weighting function, and is usually set to a constant value of 1. I.e. find two images, I is u_x，u_yCentered at J with u_x+g，u_y+ g is central, wherein

The difference between the local windows W of the radius, if (g) is to be taken as the minimum, the derivative of the above equation is zero, i.e. the distance g of the movement can be solved.

Therefore, the specific implementation steps for identifying the thread are as follows:

a) tracking the identified target in real time by using a KLT tracking algorithm;

b) if the target tracking is lost, feeding back to the identification thread, and requiring the identification thread to recover the region to be detected, namely P ═ { PuQ };

c) if the target tracking is successful, the target tracking is fed back to the identification thread, and the shielding area and the pose matrix of the identification thread are updated

The work content of the rendering thread is as follows:

according to the position and posture matrix of each tracking thread

And placing the corresponding 3D models in sequence, and performing augmented reality scene rendering through OpenGL.

Compared with the prior art, the invention has the following advantages and benefits:

1. the method solves the limitation that the marked augmented reality depends on the auxiliary mark in the image retrieval stage, and can be applied to unmarked augmented reality application scenes.

2. Compared with the prior art oolkit platform which is open source, the method abandons the low-efficiency method of traversing the whole image library for retrieval in the image retrieval without marking natural characteristic points, and shortens the retrieval time by utilizing the index structure of the vocabulary tree.

3. The method provides a solution on unmarked multi-target tracking, and compared with an Artoolkit platform, the method can simultaneously track a plurality of same or different target objects.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram illustrating the hierarchical clustering using vocabulary trees according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating the results of a search using a trained lexical tree in accordance with an embodiment of the present invention.

FIG. 4 is a system framework diagram of the present invention.

FIG. 5 is a convex quadrilateral after homography transformation of an original image according to an embodiment of the present invention.

Fig. 6 shows an embodiment of the present invention in which an input image containing two objects is captured by a USB camera.

FIG. 7 is a graph showing the results of the search of FIG. 6 according to the embodiment of the present invention.

Fig. 8 is an image of the left object being homography matched according to an embodiment of the present invention.

Fig. 9 is an effect image of performing homography matching on a right object in the embodiment of the present invention.

Fig. 10 is an effect diagram of performing augmented reality tracking on 2 different objects in the embodiment of the present invention.

Fig. 11 is an effect diagram of performing augmented reality tracking on 9 identical objects in the embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the following specific examples.

As shown in fig. 1, the method for multi-target registration and tracking in unmarked augmented reality provided by this embodiment includes the following two stages:

1) an off-line stage: clustering a vocabulary tree model through layered k-means, registering corresponding image id for inverted indexes of all leaf nodes of the vocabulary tree, and finally updating the vocabulary tree into a vocabulary tree with tf-idf according to the frequency of the leaf nodes of the whole tree and the total number of images to be registered; the realization process is as follows:

f) and recalculating the vocabulary tree with tf-idf weight according to the length of the inverted index file on the leaf node of the whole tree. The total number of registered images is N, and N is shared by any leaf node j_jWhen an image is registered on the inverted index, the tf-idf weight of the leaf node j is

For registered image d', if there is m_jIf a feature point falls on leaf node j, the leaf isThe score of a node is d_j'＝m_jw_jFor the image q' to be searched, if there is n_jIf a feature point falls on a leaf node j, the score of the leaf node is q_j'＝n_jw_j。

2) An online stage: and in the online stage, a real-time response system is used for retrieving the most similar image from the image input by the camera in real time according to the vocabulary tree trained in the step 1), then calculating the initial pose of the 3D object to be loaded through pose estimation of the camera, tracking the motion of the target by adopting a KLT tracking algorithm, and finally constructing the augmented reality scene of each target through a rendering thread. The whole system is provided with an identification thread, a plurality of tracking threads and a rendering thread.

The expression form of multi-target tracking is as follows: given a set of image sequences I₁,I₂,...,I_tThe number of targets in each image is M_tThe state of each target has position, attitude and other information, and is composed of a 3 × 4 matrix

Represented by the formula:

then all the object states in each image are represented as

The motion track corresponding to the ith target is expressed as

Identifying the work content of the thread:

Considering possible packets in the imageThe method comprises a plurality of identical targets, and the Euclidean distance calculation can cause more similar targets to be at longer distance instead, so that a cosine similarity calculation formula is adopted in the similarity calculation method for multi-target detection and identification;

x'＝H·x

e) For reference image

Handing over to a new tracking thread and shielding the target area;

The recognition thread continues to repeat a) -f) work on the unmasked region.

Work content of each trace thread:

multi-object tracking means that a sequence of images I is given₁,I₂,...,I_tAnd finding a moving object set in the image sequence through multi-target identification, corresponding moving objects in subsequent frames to one of the moving objects, and giving the motion tracks of different objects. Here, the tracking algorithm used is a KLT algorithm based on optical flow, and is specifically implemented as follows:

The work content of the rendering thread is as follows:

according to the position and posture matrix of each tracking thread

The unmarked augmented reality multi-target registration tracking method of the embodiment is further described below with reference to specific data and accompanying drawings, which are as follows:

1) off-line phase

The offline stage is mainly the training of the vocabulary tree. The method is based on a windows 7 operating system, relies on an OpenCV2.4.10 graphic library, and finishes the writing and debugging of codes under VS 2012. The image set to be registered adopts a test image database provided by the simple paper by Wang J Z et al, and comprises 1000 test images, the specification size is 384 × 256 or 256 × 384, and the test images are stored in JPEG format. In the training process, an SIFT algorithm is adopted to extract feature points of an image set to be registered, and SIFT descriptors are used for generating feature descriptors for the feature points, wherein the feature descriptors are 128-dimensional integer vectors; the number of sub-branches of the k-means cluster is chosen to be 10, i.e. the number of clusters, the depth is chosen to be 6, and the structure of the vocabulary tree is shown in fig. 2. During the experiment, images captured by a common USB camera (with the resolution of 640 x 480) are used as input, the algorithm is operated on a 4-core processor of AMD A6-3400M, and the main frequency is 1.4 GHz.

Firstly, in the step a), all SIFT descriptors are extracted from 1000 input images, and the total time is 125.4 seconds; according to the steps b), c) and d), hierarchical clustering is carried out through k-means, inverted index files are linked, the constructed vocabulary tree takes 70.2 seconds, and the size of the vocabulary tree is 19M; according to the above steps e), f), it takes 17.7 seconds to register all the images, and the size of the vocabulary tree including the image index is registered to be 27M.

The larger the training feature data extracted, and the larger the sub-branching trees and depths of the lexical tree, the more discriminative the recognition of the lexical tree will be. For a new image to be registered later, the extracted feature points are traversed once, and the image is registered for the inverted index falling into the leaf node, and the time consumption is millisecond level. Typically a better lexical tree can be used as a common dictionary and the user only needs to be concerned with the registration of new images and the retrieval of images to be recognized. As shown in fig. 3, it takes 0.226 seconds to retrieve a single image.

2) On-line phase

The online stage is a real-time response system, the system framework is as shown in fig. 4, identification and tracking are relatively independent, a new thread is allocated to each identified target for tracking, the tracking thread feeds back results to the identification thread, detection is not performed on the identified region, and finally each tracking thread collects the results to a rendering thread to complete construction of the augmented reality scene.

For an input image, an image captured by using a USB camera includes at least two targets, and fig. 6 shows an image captured by using a USB camera and including two reference targets, which includes the following specific steps:

firstly, in the step a) of identifying the threads, extracting all key points in the image by using an SIFT detector and generating SIFT descriptor feature vectors;

according to the step b) of identifying the thread, the descriptors of all the detection points are distributed to the leaf nodes by utilizing the trained vocabulary tree, and finally, the non-zero-score feature vectors of all the leaf nodes are obtained;

according to the step c) of identifying the thread, similarity comparison is carried out on the weight of the inverted index image library, the most similar image is solved by adopting a cosine similarity calculation method, and the retrieval result is shown in FIG. 7;

according to the step d) and the step e) of identifying the thread, the image and the matching image are subjected to homography detection, experiments show that homography solving of the gray level image is superior to that of the primary color image, and as shown in the figures 8 and 9, outlier screening of RANSAC is carried out at the same time. And checking whether the original image is convex after transformation, as shown in fig. 5;

calculating an initial pose matrix of each target by an ICP (inductively coupled plasma) iterative method according to the step f) of identifying the threads

And finally, according to the step g) of identifying the thread, taking the retrieved image out of the image database, marking the area as an identified area, and avoiding detecting the area again next time.

The tracking stage is to accelerate the recognition speed of the feature points, the feature extraction algorithm is replaced by an ORB detector and a FREAK descriptor, and an optical flow tracking method of KLT is adopted. FIG. 10 is a drawing illustration of the recognition phase capturing two different images of the returned registration, the lower left corner rendering a cube for each object; fig. 11 is a view showing that 9 returned registered same images are captured in the recognition stage, a tracking thread is added for each recognized object, each object is tracked, each object is rendered into a cube at the lower left corner, the real-time tracking rate is about 30fps, and the picture is smooth and free of jamming.

The above-mentioned embodiments are only preferred embodiments of the present invention, and not intended to limit the scope of the present invention, so that any modification and optimization made according to the technical solution of the present invention should be covered by the scope of the present invention.

Claims

1. A multi-target registration tracking method for unmarked augmented reality is characterized by comprising the following steps:

1) an off-line stage: clustering a vocabulary tree model through layered k-means, registering corresponding image id for inverted indexes of all leaf nodes of the vocabulary tree, and finally updating the vocabulary tree into a vocabulary tree with tf-idf according to the frequency of the leaf nodes of the whole tree and the total number of images to be registered;

2) an online stage: the online stage is a real-time response system, images input by a camera in real time are searched out to be the most similar images according to the word tree trained in the step 1), then the initial pose of the 3D object to be loaded is calculated through pose estimation of the camera, the motion of the target is tracked by adopting a tracking algorithm of KLT, and finally an augmented reality scene of each target is built through a rendering thread; the whole system is provided with an identification thread, a plurality of tracking threads and a rendering thread, and the specific conditions are as follows:

Represented by the formula:

wherein u and v are image coordinate systems projected by the camera; x is the number of_w，y_w，z_wWorld coordinate system coordinates of a characteristic point; a is an internal parameter of the camera, as follows:

then all the object states in each image are represented as

The motion track corresponding to the ith target is expressed as

Identifying the work content of the thread:

b) according to the vocabulary tree trained in the offline stage, performing inner product on each descriptor and each layer of nodes of the vocabulary tree from top to bottom, judging the final leaf node where each feature point descriptor falls, and adding tf-idf weight of the node to the score of the corresponding leaf node;

c) take out the input imageThe most similar reference image in the to-be-detected image set is obtained by utilizing the following similarity calculation formula

Considering that the image may contain a plurality of identical targets, the Euclidean distance calculation leads to more similar targets to be far away, so the similarity calculation method for multi-target detection and identification adopts a cosine similarity calculation formula;

the leaf node score vectors with the scores different from zero are all the leaf nodes of one image to be compared in the image database;

d) searching a homography matrix H which is a 3 multiplied by 3 matrix, if the homography matrix H exists, recalculating the inner point set Q by using a RANSAC algorithm, and if the homography matrix H does not exist, terminating; the pair of points on the images from two different perspectives can be described by a projective transformation, as follows:

x'＝H·x

in the formula, x' and x are point pair coordinates on two images with different visual angles respectively;

e) for reference image

The 4 boundary points and the homography matrix H are operated by the above formula, whether the quadrangle a 'b' c 'e' after matrix transformation is a convex quadrangle or not is judged, and if not, the operation is terminated; setting the height of an original image as h and the width as w, and calculating four points a, b, c and e by the following formula to obtain coordinates of a ', b', c 'and e', and judging whether the coordinates are convex edges or not by a geometric theorem;

in the formula, the coordinate of the point a is (0,0), the coordinate of the point b is (0, h), the coordinate of the point c is (w, h), the coordinate of the point e is (w,0), and the coordinate of the point a' is (x)_a,y_a) And b' point coordinate is (x)_b,y_b) C' point coordinate is (x)_c,y_c) And e' point coordinates are (x)_e,y_e)；

Handing over to a new tracking thread and shielding the target area;

The identification thread continues to repeat a) to f) work on the unmasked area;

work content of each trace thread:

multi-object tracking means that a sequence of images I is given₁,I₂,...,I_tFinding a moving object set in an image sequence through multi-target identification, corresponding moving objects in subsequent frames to one of the moving objects, and giving motion tracks of different objects; here, the tracking algorithm used is a KLT algorithm based on optical flow, and is specifically implemented as follows:

the KLT implementation principle defines that the same target appears in two frames of images I and J; and the same is true on the local window W, then within the local window W: i (x, y, l) ═ J (x ', y', l + τ), which explains that on image I, all (x, y) points have moved in one direction by a distance g ═ dx, dy, and at time l + τ correspond to (x ', y') on image J, i.e., (x + dx, y + dy), so the problem of finding a match can be to find a minimum for the following equation:

where w (x) is a weighting function, typically set to a constant of 1; i.e. find two images, I is u_x，u_yCentered at J with u_x+g，u_y+ g is central, wherein

Is the difference between a local window W of radius, if (g) is taken to be the minimum, the derivative of the above equation is zero, i.e. the distance g of movement can be solved;

The work content of the rendering thread is as follows:

according to the position and posture matrix of each tracking thread

2. The multi-target registration tracking method for unmarked augmented reality as claimed in claim 1, wherein in step 1), the off-line stage comprises the following steps: