CN108364302A - A kind of unmarked augmented reality multiple target registration method - Google Patents

A kind of unmarked augmented reality multiple target registration method Download PDF

Info

Publication number
CN108364302A
CN108364302A CN201810096334.1A CN201810096334A CN108364302A CN 108364302 A CN108364302 A CN 108364302A CN 201810096334 A CN201810096334 A CN 201810096334A CN 108364302 A CN108364302 A CN 108364302A
Authority
CN
China
Prior art keywords
image
target
thread
point
leaf node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810096334.1A
Other languages
Chinese (zh)
Other versions
CN108364302B (en
Inventor
张宇
卢明林
李雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810096334.1A priority Critical patent/CN108364302B/en
Publication of CN108364302A publication Critical patent/CN108364302A/en
Application granted granted Critical
Publication of CN108364302B publication Critical patent/CN108364302B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The invention discloses a kind of unmarked augmented reality multiple target registration methods, including:1) off-line phase:Vocabulary tree-model is clustered out by being layered k means, and corresponding image id is registered for the inverted index of all leaf nodes of words tree, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total is updated to a words tree with tf idf;2) on-line stage:On-line stage is the system of a real-time response, the image that camera is inputted in real time, according to trained words tree, retrieve most like image, the initial pose of 3D objects to be loaded is then calculated by the pose of camera, and using the track algorithm of KLT to the movement of this target into line trace, finally by the augmented reality scene for rendering thread and building each target.The present invention provides efficient, reliable solution for unmarked multiple target augmented reality.

Description

A kind of unmarked augmented reality multiple target registration method
Technical field
The present invention relates to field of Computer Graphics, refer in particular to a kind of unmarked augmented reality multiple target registration Method.
Background technology
Augmented reality is the technology that the seamless fusion of the information of virtual world and real world is got up, script beyond reality Experience information in the world is carried out analog simulation and is superimposed upon in real world, surmounted to reach by science and technology such as computers The sensory experience of reality, augmented reality may act in the sensory systems such as vision, the sense of hearing and the sense of taste.The origin of augmented reality can Trace the birth in computer technology, the head-mounted display apparatus of nineteen sixty-eight Ha Fu university's electrical engineering associate professor Sutherland invention It is the prototype of augmented reality, whole system will show that equipment is placed on the ceiling on the user crown, and pass through connecting rod and head It wears equipment to be connected, simple line frame graph can be converted to the image of 3D effect.The augmented reality applied field involved in the present invention arrived For scape based on vision system, the step that traditional augmented reality based on label has been out of step with the times can not meet people Relatively rich memberization world demand, unmarked augmented reality is more flexible, have general applicability.The present invention illustrates The technology that unmarked multiple target augmented reality registration is applied to includes mainly that visual signature clusters, words tree mould Type, inverted index, pose of camera estimation, target following scheduling algorithm.
21 century has a large amount of company to get down to the application and development of augmented reality, and used development platform is by following several Based on a:(1)Vuforia;(2)EasyAR;(3)Wikitude;(4)ARToolKit;(5)Maxst;(6)Xzimg;And apple The Metaio companies of fruit company early stage purchase, and issued the ARkit kits of IOS systems for users to use in June, 2017. Wherein, other than ARToolKit, other several platforms all only provide open interface and carry out application and development for user, and exceed It needs to charge after certain limitation.Can not be to grind although commercialized open platform provides more perfect solution The person of studying carefully provides a set of detailed, specific theory of algorithm basis.The only ARToolKit of unique algorithmic code of increasing income, provides Based on markd multiple target registration method, but it is not carried out unmarked multiple target registration method, and ArtoolKit traverses whole image database in cognitive phase, and when registered images are more, recognition efficiency is very low Under.Meanwhile Cui Jianzhu et al. invention the system and method based on unmarked augmented reality on the image retrieval phase still It is to choose to go through whole image database to compare.
The core technology that augmented reality is used includes mainly three aspects:(1) image retrieval;(2) camera pose is estimated; (3) online tracking;At the beginning of 21 century, widely applied augmented reality image-recognizing method uses the mould based on square indicia Version.One square masterplate based on label is made of embedded model, white background, dark border, and embedded model determines mark The uniqueness of note, dark border are identified at first, line trace of going forward side by side detection, and for carrying out the pose estimation of camera. However, its storage feature of the masterplate based on label is less, the pattern supported is limited, in the pattern-recognition scene of magnanimity not It is applicable in.
Subsequently with the development of technology, unmarked physical feature tracking breaches traditional limitation based on labeling method Property so that the application scenarios of augmented reality are more flexible, and memory capacity is also bigger.Based on unmarked augmented reality by natural spy Sign point composition, common feature point extraction algorithm have SIFT, SURF, FAST, ORB etc..It is carried based on unmarked augmented reality The characteristic point taken is more, complicated, and there is the more famous method based on BOW bag of words in the image retrieval stage, however experiment shows Larger cluster centre number is generally required in clustering phase using bag of words, such as 106It is above just to have a preferable retrieval result, and With the increase of registered images, retrieval phase the time it takes is linearly related.Nister D, Stewenius H are proposed Be that the present invention provides bases with the expansible identification of words tree and the KLT target tracking algorisms based on optical flow.
Invention content
It is small it is an object of the invention to overcome markd masterplate memory capacity in augmented reality, it is inflexible for use not Foot, it is proposed that a kind of unmarked augmented reality multiple target registration method, this method solve markd augmented realities The limitation of method, suitable for physical feature scene.And recognition speed is fast, memory capacity is big, the efficiency under multiple target tracking There is guarantee with quality.
To achieve the above object, technical solution provided by the present invention is:A kind of unmarked augmented reality multiple target note Volume tracking, includes the following steps:
1) off-line phase:Vocabulary tree-model, and falling for all leaf nodes of words tree are clustered out by being layered k-means Row's index registers corresponding image id, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total It is updated to a words tree with tf-idf.
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1) In trained words tree, retrieve most like image, 3D objects to be loaded then calculated by the pose of camera The initial pose of body, and KLT track algorithms is used, into line trace, each target to be built finally by thread is rendered to the movement of this target Augmented reality scene.
In step 1), off-line phase includes the following steps:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node is Only;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to training Words tree count its score weight on each leaf node and registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated. If the picture number registered in total as N, to any leaf node i, shares NiIt opens on image registration to its inverted index, then this leaf The tf-idf weights of node i areFor registered image d, if there is miA characteristic point drops into leaf node i On, then this leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops into leaf node On i, then this leaf node is scored at qi=niwi
In step 2), whole system gather around there are one identification thread, multiple track threads, one rendering thread, specific feelings Condition is as follows:
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target in each image Number is Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system It can map to obtain the 2D points position of camera coordinates system by following equation:
Wherein u, v are the image coordinate system of camera projection, xw, yw, zwFor the world coordinate system coordinate of a characteristic point, A is The inner parameter of camera, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to camera shooting The inherent parameter of machine can be completed for correcting the influence of optical deformation in off-line phase;Pose estimation is exactly to pass through n >=3 The visual angle of reference point calculates an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement of i-th of target Track is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if ItIn include n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be spy Sign point set P extracts all SIFT feature description;
B) according to the good words tree of 1) stage-training, to each description carry out from top to bottom with each node layer of words tree Inner product is done, judges that each feature point description falls into final leaf node, the score for corresponding leaf node is saved plus this The tf-idf weights of point;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, using following similar Degree calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical in image Target, using Euclidean distance calculating can cause more similar purposes instead distance it is remoter, therefore multi-target detection identification Similarity calculating method should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d is one in image data base It opens and waits for the leafy node score vector that all scores of contrast images are not zero.
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point using RANSAC algorithms in the presence of if Set Q, there is no then terminate.Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate.
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges after matrixing Whether quadrangle a'b'c'd' is convex-edge shape, is not to terminate.If a height of h of original image, width w, four point a, b, c, d are logical It crosses following formula and a', b', c' is calculated, the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' points Coordinate is (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd)。
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and Shield this target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentify thread after It is continuous that a)~f is repeated to unscreened region) work.
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, figure is found by multi-targets recognition first It is corresponded therewith as the collection of objects moved in sequence, and by the moving object in subsequent frame, then provides different objects Movement locus.Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J, and on local window W be it is the same, Then have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward one Direction moves the distance of d=(dx, dy), and corresponds to (x', y') on image J at the t+ τ moment, that is, (x+dx, y+ Dy), so the problem of seeking matching can turn to and seek minimum value to the following formula:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1.I.e. find two images in, I withCentered on, J withCentered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is Zero, you can solve mobile distance d.
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, are increased by OpenGL Strong reality scene renders.
Compared with prior art, the present invention having the following advantages that and benefit:
1, the present invention solves the limitation that markd augmented reality relies on aid mark in the image retrieval stage, can answer For in unmarked augmented reality application scenarios.
2, the present invention has abandoned its figure in unmarked natural feature points compared to the ARToolkit platforms increased income now Ineffective technique as traversing whole image library searching in retrieval, consumes when shortening retrieval using the index structure of words tree.
3, the present invention proposes solution on unmarked multiple target tracking, for Artoolkit platforms, Multiple identical or different target objects can be tracked simultaneously.
Description of the drawings
Fig. 1 is the method for the present invention work flow diagram.
Fig. 2 is the configuration diagram for carrying out hierarchical cluster in the embodiment of the present invention using words tree.
Fig. 3 is the result figure retrieved using trained words tree in the embodiment of the present invention.
Fig. 4 is the system framework figure of the present invention.
Fig. 5 is convex quadrangle of the original image after homography conversion in the embodiment of the present invention.
Fig. 6 is the input picture for including two objects by USB camera capture in the embodiment of the present invention.
Fig. 7 be the embodiment of the present invention in Fig. 6 retrieved obtained by result figure.
Fig. 8 is to carry out the matched effect picture of homography to left side object in the embodiment of the present invention.
Fig. 9 is to carry out the matched effect picture of homography to the right object in the embodiment of the present invention.
Figure 10 is the design sketch that in the embodiment of the present invention 2 different objects are carried out with augmented reality tracking.
Figure 11 is the design sketch that in the embodiment of the present invention 9 same objects are carried out with augmented reality tracking.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
As shown in Figure 1, the unmarked augmented reality multiple target registration method that the present embodiment is provided, including it is following Two stages:
1) off-line phase:Vocabulary tree-model, and falling for all leaf nodes of words tree are clustered out by being layered k-means Row's index registers corresponding image id, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total It is updated to a words tree with tf-idf;It realizes that process is as follows:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node is Only;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to training Words tree count its score weight on each leaf node and registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated. If the picture number registered in total as N, to any leaf node i, shares NiIt opens on image registration to its inverted index, then this leaf The tf-idf weights of node i areFor registered image d, if there is miA characteristic point drops into leaf node i On, then this leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops into leaf node On i, then this leaf node is scored at qi=niwi
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1) In trained words tree, retrieve most like image, 3D objects to be loaded then calculated by the pose of camera The initial pose of body, and KLT track algorithms is used, into line trace, each target to be built finally by thread is rendered to the movement of this target Augmented reality scene.Wherein, whole system gather around there are one identification thread, multiple track threads, one rendering thread.
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target in each image Number is Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system It can map to obtain the 2D points position of camera coordinates system by following equation:
Wherein u, v are the image coordinate system of camera projection, xw, yw, zwFor the world coordinate system coordinate of a characteristic point, A is The inner parameter of camera, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to camera shooting The inherent parameter of machine can be completed for correcting the influence of optical deformation in off-line phase;Pose estimation is exactly to pass through n >=3 The visual angle of reference point calculates an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement of i-th of target Track is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if ItIn include n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be spy Sign point set P extracts all SIFT feature description;
B) according to the good words tree of 1) stage-training, to each description carry out from top to bottom with each node layer of words tree Inner product is done, judges that each feature point description falls into final leaf node, the score for corresponding leaf node is saved plus this The tf-idf weights of point;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, using following similar Degree calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical in image Target, using Euclidean distance calculating can cause more similar purposes instead distance it is remoter, therefore multi-target detection identification Similarity calculating method should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d is one in image data base It opens and waits for the leafy node score vector that all scores of contrast images are not zero.
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point using RANSAC algorithms in the presence of if Set Q, there is no then terminate.Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate.
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges after matrixing Whether quadrangle a'b'c'd' is convex-edge shape, is not to terminate.If a height of h of original image, width w, four point a, b, c, d are logical It crosses following formula and a', b', c' is calculated, the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' points Coordinate is (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd)。
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and Shield this target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentify thread after It is continuous that a)~f is repeated to unscreened region) work.
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, figure is found by multi-targets recognition first It is corresponded therewith as the collection of objects moved in sequence, and by the moving object in subsequent frame, then provides different objects Movement locus.Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J, and on local window W be it is the same, Then have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward one Direction moves the distance of d=(dx, dy), and corresponds to (x', y') on image J at the t+ τ moment, that is, (x+dx, y+ Dy), so the problem of seeking matching can turn to and seek minimum value to the following formula:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1.I.e. find two images in, I withCentered on, J withCentered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is Zero, you can solve mobile distance d.
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, are increased by OpenGL Strong reality scene renders.
With reference to specific data and attached drawing unmarked augmented reality multiple target registration side above-mentioned to the present embodiment Method is further described, specific as follows:
1) off-line phase
Off-line phase is mainly the training of words tree.The present invention is based on 7 operating systems of windows, rely on OpenCV 2.4.10 shape library, writing and debugging in 2012 times completion codes of VS.Image set to be registered is proposed using Wang J Z et al. The test image database that is provided of SIMPLIcity papers, including 1000 test images, specification size be 384*256 or 256*384 is stored in the jpeg-format.The characteristic point for being extracted image set to be registered in training process using SIFT algorithms, is used in combination SIFT description are that each characteristic point generates feature descriptor, for the integer vector of one 128 dimension;The son point of k-means clusters Number is selected as 10, i.e. cluster numbers, and depth is selected as 6, and the structure of words tree is as shown in Figure 2.It is imaged using generic USB when experiment As input, algorithm operates on 4 core processors of AMD A6-3400M the image of head (resolution ratio 640*480) capture, main Frequency is 1.4GHz.
First, previous step a) is extracted all SIFT to 1000 images of input and describes son, takes in total first 125.4 the second;According to previous step b), c), d), by k-means hierarchical clusters, and link inverted index file, the word of construction The tree that converges takes 70.2 seconds, size 19M;According to previous step e), f), register all images and take 17.7 seconds, registration includes The words tree size of image index is 27M.
The training characteristics data of extraction are bigger and the sub-branch tree of words tree and depth it is bigger, the identification meeting of words tree More discrimination.To the required new images registered backward, it is only necessary to traverse the characteristic point of extraction once, to fall into leaf node Inverted index registered images, take as Millisecond.Usual one preferable words tree can be used as a public dictionary, and user is only The registration of new images and the retrieval of images to be recognized need to be paid close attention to.As shown in figure 3, retrieval single image takes 0.226 second.
2) on-line stage
On-line stage be a real-time response system, this system frame as shown in figure 4, identification and track it is relatively independent, To the target that each is identified, one new thread of distribution tracks it, and result is fed back to identification thread by track thread, right Identified region is no longer detected, and final each track thread summarizes result to thread is rendered, and completes augmented reality field Scape is built.
For input picture, the image using USB camera capture includes at least two targets, and Fig. 6 is to be imaged with USB One image for including two reference targets of head capture, is as follows:
The step a) of thread above-identified first extracts key point all in image with SIFT detectors, and raw Subcharacter vector is described at SIFT;
According to the step b) of above-identified thread, using trained words tree before, by the description of each test point Son is assigned on affiliated leaf node, can finally obtain the feature vector of all leaf node non-zero scores;
According to the step c) of above-identified thread, similarity-rough set is carried out with the weight of inverted index image library, using remaining The computational methods of string similarity, find out most like piece image, and retrieval result is as shown in Figure 7;
According to the step d), e of above-identified thread), this image is done into homography detection with image is matched, experiment shows ash Primary colour image can be better than by spending the homography solution of image, and as shown in Figure 8,9, the exterior point for being carried out at the same time RANSAC screens out.And it verifies Whether it is convex-edge shape after the transformation of original image, as shown in Figure 5;
The initial position auto―control of each target is calculated by ICP iterative methods according to the step f) of above-identified thread
Finally, according to the step g) of above-identified thread, the image to retrieve takes out from image data base, and marks This region is identification region, avoids next time again being detected this region.
Tracking phase is the recognition speed for accelerating characteristic point, and feature extraction algorithm replaces with ORB detections and FREAK descriptions Son, using the optical flow tracking of KLT.Figure 10 is two registered different images that cognitive phase capture returns, lower-left Angle has rendered a cube to each target;Figure 11 is 9 registered identical images that cognitive phase capture returns, to each A object recognized increases a track thread newly, and respectively to each target into line trace, each target all carries out in the lower left corner One cubical rendering, real-time tracking rate are 30fps or so, and picture smoothness is without interim card.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore Adjustment that all technical solutions under this invention are done, optimization, should all cover within the scope of the present invention.

Claims (3)

1. a kind of unmarked augmented reality multiple target registration method, which is characterized in that include the following steps:
1) off-line phase:Vocabulary tree-model is clustered out by being layered k-means, and is the row's of falling rope of all leaf nodes of words tree Draw the corresponding image id of registration, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total updates For a words tree with tf-idf;
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1) middle instruction The words tree perfected retrieves most like image, at the beginning of then calculating 3D objects to be loaded by the pose of camera Beginning pose, and using the track algorithm of KLT to the movement of this target into line trace, build each target finally by thread is rendered Augmented reality scene.
2. a kind of unmarked augmented reality multiple target registration method according to claim 1, which is characterized in that In step 1), off-line phase includes the following steps:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to trained word The tree that converges counts its score weight on each leaf node and is registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated;If total The picture number registered altogether, to any leaf node i, shares N as NiIt opens on image registration to its inverted index, then this leaf node The tf-idf weights of i areFor registered image d, if there is miA characteristic point drops on leaf node i, then This leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops on leaf node i, Then this leaf node is scored at qi=niwi
3. a kind of unmarked augmented reality multiple target registration method according to claim 1, which is characterized in that In step 2), whole system is gathered around there are one identification thread, multiple track threads, a rendering thread, and concrete condition is as follows:
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target number in each image is Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system can lead to Following equation is crossed to map to obtain the 2D points position of camera coordinates system:
Wherein u, v are the image coordinate system of camera projection;xw, yw, zwFor the world coordinate system coordinate of a characteristic point;A is camera Inner parameter, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to the interior of video camera In parameter, for correcting the influence of optical deformation, can be completed in off-line phase;Pose estimation is exactly to pass through n >=3 reference point Visual angle, calculate an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement locus of i-th of target It is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if ItMiddle packet Containing n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be characterized a little Set P extracts all SIFT feature description;
B) according to the trained words tree of off-line phase, each description do with each node layer of words tree from top to bottom Inner product judges that each feature point description falls into final leaf node, and the score for corresponding leaf node adds this node Tf-idf weights;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, following similarity meter is utilized Calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical mesh in image Mark, leading to more similar purposes using Euclidean distance calculating, distance is remoter instead, therefore multi-target detection identification is similar Degree computational methods should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d waits for for one in image data base The leafy node score vector that all scores of contrast images are not zero;
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point set using RANSAC algorithms in the presence of if Q, there is no then terminate;Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate;
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges four side after matrixing Whether shape a'b'c'd' is convex-edge shape, is not to terminate;If a height of h of original image, width w, four points a, b, c, under d passes through A', b', c' is calculated in formula, and the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' point coordinates For (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd);
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and is shielded This target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentification thread continues pair Unscreened region repeats a)~f) work;
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, image sequence is found by multi-targets recognition first The collection of objects moved in row, and the moving object in subsequent frame is corresponded therewith, then provide the movement of different objects Track;Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J;And it is the same on local window W, then exists Have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward a direction The distance of d=(dx, dy) is moved, and (x', y') on image J is corresponded at the t+ τ moment, that is, (x+dx, y+dy), institute Minimum value is sought to the following formula can be turned to the problem of seeking matching:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1;I.e. find two images in, I withCentered on, J with Centered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is zero, i.e., Mobile distance d can be solved;
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, and augmented reality is carried out by OpenGL Scene rendering.
CN201810096334.1A 2018-01-31 2018-01-31 Unmarked augmented reality multi-target registration tracking method Expired - Fee Related CN108364302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810096334.1A CN108364302B (en) 2018-01-31 2018-01-31 Unmarked augmented reality multi-target registration tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810096334.1A CN108364302B (en) 2018-01-31 2018-01-31 Unmarked augmented reality multi-target registration tracking method

Publications (2)

Publication Number Publication Date
CN108364302A true CN108364302A (en) 2018-08-03
CN108364302B CN108364302B (en) 2020-09-22

Family

ID=63007579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810096334.1A Expired - Fee Related CN108364302B (en) 2018-01-31 2018-01-31 Unmarked augmented reality multi-target registration tracking method

Country Status (1)

Country Link
CN (1) CN108364302B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978829A (en) * 2019-02-26 2019-07-05 深圳市华汉伟业科技有限公司 A kind of detection method and its system of object to be detected
CN110473259A (en) * 2019-07-31 2019-11-19 深圳市商汤科技有限公司 Pose determines method and device, electronic equipment and storage medium
CN111402579A (en) * 2020-02-29 2020-07-10 深圳壹账通智能科技有限公司 Road congestion degree prediction method, electronic device and readable storage medium
CN112000219A (en) * 2020-03-30 2020-11-27 华南理工大学 Movable gesture interaction device and method for augmented reality game
CN112734797A (en) * 2019-10-29 2021-04-30 浙江商汤科技开发有限公司 Image feature tracking method and device and electronic equipment
CN112884048A (en) * 2021-02-24 2021-06-01 浙江商汤科技开发有限公司 Method for determining registration image in input image, and related device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177468A (en) * 2013-03-29 2013-06-26 渤海大学 Three-dimensional motion object augmented reality registration method based on no marks
CN104966307A (en) * 2015-07-10 2015-10-07 成都品果科技有限公司 AR (augmented reality) algorithm based on real-time tracking
WO2016048366A1 (en) * 2014-09-26 2016-03-31 Hewlett Packard Enterprise Development Lp Behavior tracking and modification using mobile augmented reality
CN106843493A (en) * 2017-02-10 2017-06-13 深圳前海大造科技有限公司 A kind of augmented reality implementation method of picture charge pattern method and application the method
KR20180005430A (en) * 2016-07-06 2018-01-16 윤상현 Augmented reality realization system for image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177468A (en) * 2013-03-29 2013-06-26 渤海大学 Three-dimensional motion object augmented reality registration method based on no marks
WO2016048366A1 (en) * 2014-09-26 2016-03-31 Hewlett Packard Enterprise Development Lp Behavior tracking and modification using mobile augmented reality
CN104966307A (en) * 2015-07-10 2015-10-07 成都品果科技有限公司 AR (augmented reality) algorithm based on real-time tracking
KR20180005430A (en) * 2016-07-06 2018-01-16 윤상현 Augmented reality realization system for image
CN106843493A (en) * 2017-02-10 2017-06-13 深圳前海大造科技有限公司 A kind of augmented reality implementation method of picture charge pattern method and application the method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NIST´ER D ET AL: "《Scalable Recognition with a Vocabulary Tree》", 《CVPR2006》 *
林一: "《基于上下文感知的移动增强现实浏览器构建及优化方法研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978829A (en) * 2019-02-26 2019-07-05 深圳市华汉伟业科技有限公司 A kind of detection method and its system of object to be detected
CN109978829B (en) * 2019-02-26 2021-09-28 深圳市华汉伟业科技有限公司 Detection method and system for object to be detected
CN110473259A (en) * 2019-07-31 2019-11-19 深圳市商汤科技有限公司 Pose determines method and device, electronic equipment and storage medium
CN112734797A (en) * 2019-10-29 2021-04-30 浙江商汤科技开发有限公司 Image feature tracking method and device and electronic equipment
CN111402579A (en) * 2020-02-29 2020-07-10 深圳壹账通智能科技有限公司 Road congestion degree prediction method, electronic device and readable storage medium
CN112000219A (en) * 2020-03-30 2020-11-27 华南理工大学 Movable gesture interaction device and method for augmented reality game
CN112000219B (en) * 2020-03-30 2022-06-14 华南理工大学 Movable gesture interaction method for augmented reality game
CN112884048A (en) * 2021-02-24 2021-06-01 浙江商汤科技开发有限公司 Method for determining registration image in input image, and related device and equipment

Also Published As

Publication number Publication date
CN108364302B (en) 2020-09-22

Similar Documents

Publication Publication Date Title
Ramesh et al. Dart: distribution aware retinal transform for event-based cameras
Rogez et al. Lcr-net++: Multi-person 2d and 3d pose detection in natural images
CN108364302A (en) A kind of unmarked augmented reality multiple target registration method
Chen et al. Monocular human pose estimation: A survey of deep learning-based methods
Wang et al. Action recognition based on joint trajectory maps with convolutional neural networks
Kamel et al. Deep convolutional neural networks for human action recognition using depth maps and postures
Zimmermann et al. Learning to estimate 3d hand pose from single rgb images
Kale et al. A study of vision based human motion recognition and analysis
Felsberg et al. The thermal infrared visual object tracking VOT-TIR2015 challenge results
Gavrila The visual analysis of human movement: A survey
CN103530619B (en) Gesture identification method based on a small amount of training sample that RGB-D data are constituted
CN108241849A (en) Human body interactive action recognition methods based on video
Burić et al. Adapting YOLO network for ball and player detection
Ji et al. Arbitrary-view human action recognition: A varying-view RGB-D action dataset
WO2011159258A1 (en) Method and system for classifying a user's action
CN107203745A (en) A kind of across visual angle action identification method based on cross-domain study
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
Dwibedi et al. Deep cuboid detection: Beyond 2d bounding boxes
Li et al. Pose anchor: A single-stage hand keypoint detection network
CN112906520A (en) Gesture coding-based action recognition method and device
Liao et al. A two-stage method for hand-raising gesture recognition in classroom
CN112861808A (en) Dynamic gesture recognition method and device, computer equipment and readable storage medium
Xu et al. Semantic Part RCNN for Real-World Pedestrian Detection.
Gadhiya et al. Analysis of deep learning based pose estimation techniques for locating landmarks on human body parts
Yang et al. Footballer action tracking and intervention using deep learning algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200922

CF01 Termination of patent right due to non-payment of annual fee