CN108364302A - A kind of unmarked augmented reality multiple target registration method - Google Patents
A kind of unmarked augmented reality multiple target registration method Download PDFInfo
- Publication number
- CN108364302A CN108364302A CN201810096334.1A CN201810096334A CN108364302A CN 108364302 A CN108364302 A CN 108364302A CN 201810096334 A CN201810096334 A CN 201810096334A CN 108364302 A CN108364302 A CN 108364302A
- Authority
- CN
- China
- Prior art keywords
- image
- target
- thread
- point
- leaf node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003190 augmentative effect Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000009877 rendering Methods 0.000 claims abstract description 6
- 230000004044 response Effects 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000000007 visual effect Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 9
- 230000003287 optical effect Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 239000012141 concentrate Substances 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000000205 computational method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 230000001149 cognitive effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001953 sensory effect Effects 0.000 description 2
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 description 1
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000004870 electrical engineering Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000014860 sensory perception of taste Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Abstract
The invention discloses a kind of unmarked augmented reality multiple target registration methods, including:1) off-line phase:Vocabulary tree-model is clustered out by being layered k means, and corresponding image id is registered for the inverted index of all leaf nodes of words tree, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total is updated to a words tree with tf idf;2) on-line stage:On-line stage is the system of a real-time response, the image that camera is inputted in real time, according to trained words tree, retrieve most like image, the initial pose of 3D objects to be loaded is then calculated by the pose of camera, and using the track algorithm of KLT to the movement of this target into line trace, finally by the augmented reality scene for rendering thread and building each target.The present invention provides efficient, reliable solution for unmarked multiple target augmented reality.
Description
Technical field
The present invention relates to field of Computer Graphics, refer in particular to a kind of unmarked augmented reality multiple target registration
Method.
Background technology
Augmented reality is the technology that the seamless fusion of the information of virtual world and real world is got up, script beyond reality
Experience information in the world is carried out analog simulation and is superimposed upon in real world, surmounted to reach by science and technology such as computers
The sensory experience of reality, augmented reality may act in the sensory systems such as vision, the sense of hearing and the sense of taste.The origin of augmented reality can
Trace the birth in computer technology, the head-mounted display apparatus of nineteen sixty-eight Ha Fu university's electrical engineering associate professor Sutherland invention
It is the prototype of augmented reality, whole system will show that equipment is placed on the ceiling on the user crown, and pass through connecting rod and head
It wears equipment to be connected, simple line frame graph can be converted to the image of 3D effect.The augmented reality applied field involved in the present invention arrived
For scape based on vision system, the step that traditional augmented reality based on label has been out of step with the times can not meet people
Relatively rich memberization world demand, unmarked augmented reality is more flexible, have general applicability.The present invention illustrates
The technology that unmarked multiple target augmented reality registration is applied to includes mainly that visual signature clusters, words tree mould
Type, inverted index, pose of camera estimation, target following scheduling algorithm.
21 century has a large amount of company to get down to the application and development of augmented reality, and used development platform is by following several
Based on a:(1)Vuforia;(2)EasyAR;(3)Wikitude;(4)ARToolKit;(5)Maxst;(6)Xzimg;And apple
The Metaio companies of fruit company early stage purchase, and issued the ARkit kits of IOS systems for users to use in June, 2017.
Wherein, other than ARToolKit, other several platforms all only provide open interface and carry out application and development for user, and exceed
It needs to charge after certain limitation.Can not be to grind although commercialized open platform provides more perfect solution
The person of studying carefully provides a set of detailed, specific theory of algorithm basis.The only ARToolKit of unique algorithmic code of increasing income, provides
Based on markd multiple target registration method, but it is not carried out unmarked multiple target registration method, and
ArtoolKit traverses whole image database in cognitive phase, and when registered images are more, recognition efficiency is very low
Under.Meanwhile Cui Jianzhu et al. invention the system and method based on unmarked augmented reality on the image retrieval phase still
It is to choose to go through whole image database to compare.
The core technology that augmented reality is used includes mainly three aspects:(1) image retrieval;(2) camera pose is estimated;
(3) online tracking;At the beginning of 21 century, widely applied augmented reality image-recognizing method uses the mould based on square indicia
Version.One square masterplate based on label is made of embedded model, white background, dark border, and embedded model determines mark
The uniqueness of note, dark border are identified at first, line trace of going forward side by side detection, and for carrying out the pose estimation of camera.
However, its storage feature of the masterplate based on label is less, the pattern supported is limited, in the pattern-recognition scene of magnanimity not
It is applicable in.
Subsequently with the development of technology, unmarked physical feature tracking breaches traditional limitation based on labeling method
Property so that the application scenarios of augmented reality are more flexible, and memory capacity is also bigger.Based on unmarked augmented reality by natural spy
Sign point composition, common feature point extraction algorithm have SIFT, SURF, FAST, ORB etc..It is carried based on unmarked augmented reality
The characteristic point taken is more, complicated, and there is the more famous method based on BOW bag of words in the image retrieval stage, however experiment shows
Larger cluster centre number is generally required in clustering phase using bag of words, such as 106It is above just to have a preferable retrieval result, and
With the increase of registered images, retrieval phase the time it takes is linearly related.Nister D, Stewenius H are proposed
Be that the present invention provides bases with the expansible identification of words tree and the KLT target tracking algorisms based on optical flow.
Invention content
It is small it is an object of the invention to overcome markd masterplate memory capacity in augmented reality, it is inflexible for use not
Foot, it is proposed that a kind of unmarked augmented reality multiple target registration method, this method solve markd augmented realities
The limitation of method, suitable for physical feature scene.And recognition speed is fast, memory capacity is big, the efficiency under multiple target tracking
There is guarantee with quality.
To achieve the above object, technical solution provided by the present invention is:A kind of unmarked augmented reality multiple target note
Volume tracking, includes the following steps:
1) off-line phase:Vocabulary tree-model, and falling for all leaf nodes of words tree are clustered out by being layered k-means
Row's index registers corresponding image id, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total
It is updated to a words tree with tf-idf.
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1)
In trained words tree, retrieve most like image, 3D objects to be loaded then calculated by the pose of camera
The initial pose of body, and KLT track algorithms is used, into line trace, each target to be built finally by thread is rendered to the movement of this target
Augmented reality scene.
In step 1), off-line phase includes the following steps:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node is
Only;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to training
Words tree count its score weight on each leaf node and registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated.
If the picture number registered in total as N, to any leaf node i, shares NiIt opens on image registration to its inverted index, then this leaf
The tf-idf weights of node i areFor registered image d, if there is miA characteristic point drops into leaf node i
On, then this leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops into leaf node
On i, then this leaf node is scored at qi=niwi。
In step 2), whole system gather around there are one identification thread, multiple track threads, one rendering thread, specific feelings
Condition is as follows:
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target in each image
Number is Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system
It can map to obtain the 2D points position of camera coordinates system by following equation:
Wherein u, v are the image coordinate system of camera projection, xw, yw, zwFor the world coordinate system coordinate of a characteristic point, A is
The inner parameter of camera, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to camera shooting
The inherent parameter of machine can be completed for correcting the influence of optical deformation in off-line phase;Pose estimation is exactly to pass through n >=3
The visual angle of reference point calculates an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement of i-th of target
Track is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if
ItIn include n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be spy
Sign point set P extracts all SIFT feature description;
B) according to the good words tree of 1) stage-training, to each description carry out from top to bottom with each node layer of words tree
Inner product is done, judges that each feature point description falls into final leaf node, the score for corresponding leaf node is saved plus this
The tf-idf weights of point;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, using following similar
Degree calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical in image
Target, using Euclidean distance calculating can cause more similar purposes instead distance it is remoter, therefore multi-target detection identification
Similarity calculating method should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d is one in image data base
It opens and waits for the leafy node score vector that all scores of contrast images are not zero.
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point using RANSAC algorithms in the presence of if
Set Q, there is no then terminate.Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate.
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges after matrixing
Whether quadrangle a'b'c'd' is convex-edge shape, is not to terminate.If a height of h of original image, width w, four point a, b, c, d are logical
It crosses following formula and a', b', c' is calculated, the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' points
Coordinate is (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd)。
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and
Shield this target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentify thread after
It is continuous that a)~f is repeated to unscreened region) work.
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, figure is found by multi-targets recognition first
It is corresponded therewith as the collection of objects moved in sequence, and by the moving object in subsequent frame, then provides different objects
Movement locus.Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J, and on local window W be it is the same,
Then have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward one
Direction moves the distance of d=(dx, dy), and corresponds to (x', y') on image J at the t+ τ moment, that is, (x+dx, y+
Dy), so the problem of seeking matching can turn to and seek minimum value to the following formula:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1.I.e. find two images in, I withCentered on, J withCentered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is
Zero, you can solve mobile distance d.
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, are increased by OpenGL
Strong reality scene renders.
Compared with prior art, the present invention having the following advantages that and benefit:
1, the present invention solves the limitation that markd augmented reality relies on aid mark in the image retrieval stage, can answer
For in unmarked augmented reality application scenarios.
2, the present invention has abandoned its figure in unmarked natural feature points compared to the ARToolkit platforms increased income now
Ineffective technique as traversing whole image library searching in retrieval, consumes when shortening retrieval using the index structure of words tree.
3, the present invention proposes solution on unmarked multiple target tracking, for Artoolkit platforms,
Multiple identical or different target objects can be tracked simultaneously.
Description of the drawings
Fig. 1 is the method for the present invention work flow diagram.
Fig. 2 is the configuration diagram for carrying out hierarchical cluster in the embodiment of the present invention using words tree.
Fig. 3 is the result figure retrieved using trained words tree in the embodiment of the present invention.
Fig. 4 is the system framework figure of the present invention.
Fig. 5 is convex quadrangle of the original image after homography conversion in the embodiment of the present invention.
Fig. 6 is the input picture for including two objects by USB camera capture in the embodiment of the present invention.
Fig. 7 be the embodiment of the present invention in Fig. 6 retrieved obtained by result figure.
Fig. 8 is to carry out the matched effect picture of homography to left side object in the embodiment of the present invention.
Fig. 9 is to carry out the matched effect picture of homography to the right object in the embodiment of the present invention.
Figure 10 is the design sketch that in the embodiment of the present invention 2 different objects are carried out with augmented reality tracking.
Figure 11 is the design sketch that in the embodiment of the present invention 9 same objects are carried out with augmented reality tracking.
Specific implementation mode
The present invention is further explained in the light of specific embodiments.
As shown in Figure 1, the unmarked augmented reality multiple target registration method that the present embodiment is provided, including it is following
Two stages:
1) off-line phase:Vocabulary tree-model, and falling for all leaf nodes of words tree are clustered out by being layered k-means
Row's index registers corresponding image id, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total
It is updated to a words tree with tf-idf;It realizes that process is as follows:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node is
Only;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to training
Words tree count its score weight on each leaf node and registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated.
If the picture number registered in total as N, to any leaf node i, shares NiIt opens on image registration to its inverted index, then this leaf
The tf-idf weights of node i areFor registered image d, if there is miA characteristic point drops into leaf node i
On, then this leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops into leaf node
On i, then this leaf node is scored at qi=niwi。
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1)
In trained words tree, retrieve most like image, 3D objects to be loaded then calculated by the pose of camera
The initial pose of body, and KLT track algorithms is used, into line trace, each target to be built finally by thread is rendered to the movement of this target
Augmented reality scene.Wherein, whole system gather around there are one identification thread, multiple track threads, one rendering thread.
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target in each image
Number is Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system
It can map to obtain the 2D points position of camera coordinates system by following equation:
Wherein u, v are the image coordinate system of camera projection, xw, yw, zwFor the world coordinate system coordinate of a characteristic point, A is
The inner parameter of camera, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to camera shooting
The inherent parameter of machine can be completed for correcting the influence of optical deformation in off-line phase;Pose estimation is exactly to pass through n >=3
The visual angle of reference point calculates an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement of i-th of target
Track is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if
ItIn include n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be spy
Sign point set P extracts all SIFT feature description;
B) according to the good words tree of 1) stage-training, to each description carry out from top to bottom with each node layer of words tree
Inner product is done, judges that each feature point description falls into final leaf node, the score for corresponding leaf node is saved plus this
The tf-idf weights of point;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, using following similar
Degree calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical in image
Target, using Euclidean distance calculating can cause more similar purposes instead distance it is remoter, therefore multi-target detection identification
Similarity calculating method should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d is one in image data base
It opens and waits for the leafy node score vector that all scores of contrast images are not zero.
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point using RANSAC algorithms in the presence of if
Set Q, there is no then terminate.Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate.
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges after matrixing
Whether quadrangle a'b'c'd' is convex-edge shape, is not to terminate.If a height of h of original image, width w, four point a, b, c, d are logical
It crosses following formula and a', b', c' is calculated, the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' points
Coordinate is (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd)。
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and
Shield this target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentify thread after
It is continuous that a)~f is repeated to unscreened region) work.
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, figure is found by multi-targets recognition first
It is corresponded therewith as the collection of objects moved in sequence, and by the moving object in subsequent frame, then provides different objects
Movement locus.Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J, and on local window W be it is the same,
Then have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward one
Direction moves the distance of d=(dx, dy), and corresponds to (x', y') on image J at the t+ τ moment, that is, (x+dx, y+
Dy), so the problem of seeking matching can turn to and seek minimum value to the following formula:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1.I.e. find two images in, I withCentered on, J withCentered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is
Zero, you can solve mobile distance d.
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, are increased by OpenGL
Strong reality scene renders.
With reference to specific data and attached drawing unmarked augmented reality multiple target registration side above-mentioned to the present embodiment
Method is further described, specific as follows:
1) off-line phase
Off-line phase is mainly the training of words tree.The present invention is based on 7 operating systems of windows, rely on OpenCV
2.4.10 shape library, writing and debugging in 2012 times completion codes of VS.Image set to be registered is proposed using Wang J Z et al.
The test image database that is provided of SIMPLIcity papers, including 1000 test images, specification size be 384*256 or
256*384 is stored in the jpeg-format.The characteristic point for being extracted image set to be registered in training process using SIFT algorithms, is used in combination
SIFT description are that each characteristic point generates feature descriptor, for the integer vector of one 128 dimension;The son point of k-means clusters
Number is selected as 10, i.e. cluster numbers, and depth is selected as 6, and the structure of words tree is as shown in Figure 2.It is imaged using generic USB when experiment
As input, algorithm operates on 4 core processors of AMD A6-3400M the image of head (resolution ratio 640*480) capture, main
Frequency is 1.4GHz.
First, previous step a) is extracted all SIFT to 1000 images of input and describes son, takes in total first
125.4 the second;According to previous step b), c), d), by k-means hierarchical clusters, and link inverted index file, the word of construction
The tree that converges takes 70.2 seconds, size 19M;According to previous step e), f), register all images and take 17.7 seconds, registration includes
The words tree size of image index is 27M.
The training characteristics data of extraction are bigger and the sub-branch tree of words tree and depth it is bigger, the identification meeting of words tree
More discrimination.To the required new images registered backward, it is only necessary to traverse the characteristic point of extraction once, to fall into leaf node
Inverted index registered images, take as Millisecond.Usual one preferable words tree can be used as a public dictionary, and user is only
The registration of new images and the retrieval of images to be recognized need to be paid close attention to.As shown in figure 3, retrieval single image takes 0.226 second.
2) on-line stage
On-line stage be a real-time response system, this system frame as shown in figure 4, identification and track it is relatively independent,
To the target that each is identified, one new thread of distribution tracks it, and result is fed back to identification thread by track thread, right
Identified region is no longer detected, and final each track thread summarizes result to thread is rendered, and completes augmented reality field
Scape is built.
For input picture, the image using USB camera capture includes at least two targets, and Fig. 6 is to be imaged with USB
One image for including two reference targets of head capture, is as follows:
The step a) of thread above-identified first extracts key point all in image with SIFT detectors, and raw
Subcharacter vector is described at SIFT;
According to the step b) of above-identified thread, using trained words tree before, by the description of each test point
Son is assigned on affiliated leaf node, can finally obtain the feature vector of all leaf node non-zero scores;
According to the step c) of above-identified thread, similarity-rough set is carried out with the weight of inverted index image library, using remaining
The computational methods of string similarity, find out most like piece image, and retrieval result is as shown in Figure 7;
According to the step d), e of above-identified thread), this image is done into homography detection with image is matched, experiment shows ash
Primary colour image can be better than by spending the homography solution of image, and as shown in Figure 8,9, the exterior point for being carried out at the same time RANSAC screens out.And it verifies
Whether it is convex-edge shape after the transformation of original image, as shown in Figure 5;
The initial position auto―control of each target is calculated by ICP iterative methods according to the step f) of above-identified thread
Finally, according to the step g) of above-identified thread, the image to retrieve takes out from image data base, and marks
This region is identification region, avoids next time again being detected this region.
Tracking phase is the recognition speed for accelerating characteristic point, and feature extraction algorithm replaces with ORB detections and FREAK descriptions
Son, using the optical flow tracking of KLT.Figure 10 is two registered different images that cognitive phase capture returns, lower-left
Angle has rendered a cube to each target;Figure 11 is 9 registered identical images that cognitive phase capture returns, to each
A object recognized increases a track thread newly, and respectively to each target into line trace, each target all carries out in the lower left corner
One cubical rendering, real-time tracking rate are 30fps or so, and picture smoothness is without interim card.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore
Adjustment that all technical solutions under this invention are done, optimization, should all cover within the scope of the present invention.
Claims (3)
1. a kind of unmarked augmented reality multiple target registration method, which is characterized in that include the following steps:
1) off-line phase:Vocabulary tree-model is clustered out by being layered k-means, and is the row's of falling rope of all leaf nodes of words tree
Draw the corresponding image id of registration, finally the picture number to be registered according to the frequency of the leafy node of whole tree appearance and in total updates
For a words tree with tf-idf;
2) on-line stage:On-line stage is the system of a real-time response, to the image that camera inputs in real time, according to 1) middle instruction
The words tree perfected retrieves most like image, at the beginning of then calculating 3D objects to be loaded by the pose of camera
Beginning pose, and using the track algorithm of KLT to the movement of this target into line trace, build each target finally by thread is rendered
Augmented reality scene.
2. a kind of unmarked augmented reality multiple target registration method according to claim 1, which is characterized in that
In step 1), off-line phase includes the following steps:
A) it is that image set to be registered extracts all SIFT feature description;
B) k-means hierarchical clusters are carried out according to branch's number of words tree;
C) it is followed successively by the operation of SIFT feature description repetition all under each branch b), until leaf node;
D) an inverted index file is finally linked respectively for all leaf nodes of the bottom;
E) again it is image to be registered, extraction SIFT feature description is each feature point description according to trained word
The tree that converges counts its score weight on each leaf node and is registered for correspondence image;
F) according to inverted index file size in whole leaf child node, the words tree with tf-idf weights is recalculated;If total
The picture number registered altogether, to any leaf node i, shares N as NiIt opens on image registration to its inverted index, then this leaf node
The tf-idf weights of i areFor registered image d, if there is miA characteristic point drops on leaf node i, then
This leaf node is scored at di=miwi, for image q to be retrieved, if there is niA characteristic point drops on leaf node i,
Then this leaf node is scored at qi=niwi。
3. a kind of unmarked augmented reality multiple target registration method according to claim 1, which is characterized in that
In step 2), whole system is gathered around there are one identification thread, multiple track threads, a rendering thread, and concrete condition is as follows:
The expression form of multi-target tracking:Give one group of image sequence { I1,I2,...,It, the target number in each image is
Mt, the state of each target has position, the information such as posture, by one 3 × 4 matrixIt indicates, such as following formula:
Which includes one 3 × 3 spin matrix R and 3 × 1 translation matrix t, the 3D points of a world coordinate system can lead to
Following equation is crossed to map to obtain the 2D points position of camera coordinates system:
Wherein u, v are the image coordinate system of camera projection;xw, yw, zwFor the world coordinate system coordinate of a characteristic point;A is camera
Inner parameter, such as following formula:
In formula, ∝xAnd ∝yRepresent focal length, cxAnd cyThe coordinate of principal point is represented, behalf inclination factor, these belong to the interior of video camera
In parameter, for correcting the influence of optical deformation, can be completed in off-line phase;Pose estimation is exactly to pass through n >=3 reference point
Visual angle, calculate an accurately video camera extrinsic parameter st, position auto―control R and t are calculated with ICP iterative methods;
So all dbjective states are expressed as in each imageCorrespond to the movement locus of i-th of target
It is expressed asThe status switch S of all image object compositions1:t={ S1,S2,...,St};
Identify the action of thread:
Compared with traditional single goal identification, multi-targets recognition is intended to identify the target in image as much as possible, if ItMiddle packet
Containing n target, steps flow chart is as follows:
A) according to the input picture I of camerat, all set of characteristic points P are detected first with SIFT algorithms, and be characterized a little
Set P extracts all SIFT feature description;
B) according to the trained words tree of off-line phase, each description do with each node layer of words tree from top to bottom
Inner product judges that each feature point description falls into final leaf node, and the score for corresponding leaf node adds this node
Tf-idf weights;
C) image in the leaf node inverted index that all scores of this input picture are not zero is taken out, following similarity meter is utilized
Calculation formula obtains image to be detected and concentrates most like reference image R, it is contemplated that may include multiple identical mesh in image
Mark, leading to more similar purposes using Euclidean distance calculating, distance is remoter instead, therefore multi-target detection identification is similar
Degree computational methods should use cosine similarity calculation formula;
In formula, q is the leaf node score vector that query image is not zero in all scores, and d waits for for one in image data base
The leafy node score vector that all scores of contrast images are not zero;
D) homography matrix H is found, is one 3 × 3 matrix, if recalculating interior point set using RANSAC algorithms in the presence of if
Q, there is no then terminate;Point on the image of two different visual angles is to that can use a projective transformation statement, such as following formula:
X'=Hx
In formula, x', x are respectively point on two different visual angles images to coordinate;
E) operation is done with above formula to 4 boundary points of reference image R and homography matrix H, judges four side after matrixing
Whether shape a'b'c'd' is convex-edge shape, is not to terminate;If a height of h of original image, width w, four points a, b, c, under d passes through
A', b', c' is calculated in formula, and the coordinate of d' judges whether it is convex-edge shape through geological theorems;
In formula, a point coordinates is (0,0), and b point coordinates is (0, h), and c point coordinates is (w, h), and d point coordinates is (w, 0), a' point coordinates
For (xa,ya), b' point coordinates is (xb,yb), c' point coordinates is (xc,yc), d' point coordinates is (xd,yd);
F) the initial position auto―control of this target is calculated by ICP iterative algorithmsA new track thread is given, and is shielded
This target area;
G) P={ P- (P ∩ Q) } is enabled, and this target is given into track thread real-time update position auto―controlIdentification thread continues pair
Unscreened region repeats a)~f) work;
The action of each track thread:
Multiple target tracking refers to giving an image sequence { I1,I2,...,It, image sequence is found by multi-targets recognition first
The collection of objects moved in row, and the moving object in subsequent frame is corresponded therewith, then provide the movement of different objects
Track;Here, the track algorithm used is implemented as follows for the KLT algorithms based on optical flow:
KLT realization principles define same target and appear in two field pictures I, J;And it is the same on local window W, then exists
Have in window W:I (x, y, t)=J (x', y', t+ τ), is construed on image I, and all (x, y) is put all toward a direction
The distance of d=(dx, dy) is moved, and (x', y') on image J is corresponded at the t+ τ moment, that is, (x+dx, y+dy), institute
Minimum value is sought to the following formula can be turned to the problem of seeking matching:
Above formula is also referred to as difference equation, and with integral representation, it is equivalent to:
In formula, w (x) is weighting function, is typically set to constant 1;I.e. find two images in, I withCentered on, J with
Centered on,For the difference between a rectangular window W of radius, minimum value is obtained to ε (d), then the derivative of above formula is zero, i.e.,
Mobile distance d can be solved;
Therefore, identify that the specific implementation step of thread is as follows:
A) to the target identified, KLT track algorithm real-time trackings are utilized;
If b) BREAK TRACK, identification thread is fed back to, it is desirable that it restores area to be tested, i.e. P={ P ∪ Q };
If c) target following success, feeds back to identification thread, updates its shielding area and position auto―control
Render the action of thread:
According to the position auto―control of each track threadCorresponding 3D models are put successively, and augmented reality is carried out by OpenGL
Scene rendering.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810096334.1A CN108364302B (en) | 2018-01-31 | 2018-01-31 | Unmarked augmented reality multi-target registration tracking method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810096334.1A CN108364302B (en) | 2018-01-31 | 2018-01-31 | Unmarked augmented reality multi-target registration tracking method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108364302A true CN108364302A (en) | 2018-08-03 |
CN108364302B CN108364302B (en) | 2020-09-22 |
Family
ID=63007579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810096334.1A Expired - Fee Related CN108364302B (en) | 2018-01-31 | 2018-01-31 | Unmarked augmented reality multi-target registration tracking method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108364302B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978829A (en) * | 2019-02-26 | 2019-07-05 | 深圳市华汉伟业科技有限公司 | A kind of detection method and its system of object to be detected |
CN110473259A (en) * | 2019-07-31 | 2019-11-19 | 深圳市商汤科技有限公司 | Pose determines method and device, electronic equipment and storage medium |
CN111402579A (en) * | 2020-02-29 | 2020-07-10 | 深圳壹账通智能科技有限公司 | Road congestion degree prediction method, electronic device and readable storage medium |
CN112000219A (en) * | 2020-03-30 | 2020-11-27 | 华南理工大学 | Movable gesture interaction device and method for augmented reality game |
CN112734797A (en) * | 2019-10-29 | 2021-04-30 | 浙江商汤科技开发有限公司 | Image feature tracking method and device and electronic equipment |
CN112884048A (en) * | 2021-02-24 | 2021-06-01 | 浙江商汤科技开发有限公司 | Method for determining registration image in input image, and related device and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177468A (en) * | 2013-03-29 | 2013-06-26 | 渤海大学 | Three-dimensional motion object augmented reality registration method based on no marks |
CN104966307A (en) * | 2015-07-10 | 2015-10-07 | 成都品果科技有限公司 | AR (augmented reality) algorithm based on real-time tracking |
WO2016048366A1 (en) * | 2014-09-26 | 2016-03-31 | Hewlett Packard Enterprise Development Lp | Behavior tracking and modification using mobile augmented reality |
CN106843493A (en) * | 2017-02-10 | 2017-06-13 | 深圳前海大造科技有限公司 | A kind of augmented reality implementation method of picture charge pattern method and application the method |
KR20180005430A (en) * | 2016-07-06 | 2018-01-16 | 윤상현 | Augmented reality realization system for image |
-
2018
- 2018-01-31 CN CN201810096334.1A patent/CN108364302B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177468A (en) * | 2013-03-29 | 2013-06-26 | 渤海大学 | Three-dimensional motion object augmented reality registration method based on no marks |
WO2016048366A1 (en) * | 2014-09-26 | 2016-03-31 | Hewlett Packard Enterprise Development Lp | Behavior tracking and modification using mobile augmented reality |
CN104966307A (en) * | 2015-07-10 | 2015-10-07 | 成都品果科技有限公司 | AR (augmented reality) algorithm based on real-time tracking |
KR20180005430A (en) * | 2016-07-06 | 2018-01-16 | 윤상현 | Augmented reality realization system for image |
CN106843493A (en) * | 2017-02-10 | 2017-06-13 | 深圳前海大造科技有限公司 | A kind of augmented reality implementation method of picture charge pattern method and application the method |
Non-Patent Citations (2)
Title |
---|
NIST´ER D ET AL: "《Scalable Recognition with a Vocabulary Tree》", 《CVPR2006》 * |
林一: "《基于上下文感知的移动增强现实浏览器构建及优化方法研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978829A (en) * | 2019-02-26 | 2019-07-05 | 深圳市华汉伟业科技有限公司 | A kind of detection method and its system of object to be detected |
CN109978829B (en) * | 2019-02-26 | 2021-09-28 | 深圳市华汉伟业科技有限公司 | Detection method and system for object to be detected |
CN110473259A (en) * | 2019-07-31 | 2019-11-19 | 深圳市商汤科技有限公司 | Pose determines method and device, electronic equipment and storage medium |
CN112734797A (en) * | 2019-10-29 | 2021-04-30 | 浙江商汤科技开发有限公司 | Image feature tracking method and device and electronic equipment |
CN111402579A (en) * | 2020-02-29 | 2020-07-10 | 深圳壹账通智能科技有限公司 | Road congestion degree prediction method, electronic device and readable storage medium |
CN112000219A (en) * | 2020-03-30 | 2020-11-27 | 华南理工大学 | Movable gesture interaction device and method for augmented reality game |
CN112000219B (en) * | 2020-03-30 | 2022-06-14 | 华南理工大学 | Movable gesture interaction method for augmented reality game |
CN112884048A (en) * | 2021-02-24 | 2021-06-01 | 浙江商汤科技开发有限公司 | Method for determining registration image in input image, and related device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108364302B (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ramesh et al. | Dart: distribution aware retinal transform for event-based cameras | |
Rogez et al. | Lcr-net++: Multi-person 2d and 3d pose detection in natural images | |
CN108364302A (en) | A kind of unmarked augmented reality multiple target registration method | |
Chen et al. | Monocular human pose estimation: A survey of deep learning-based methods | |
Wang et al. | Action recognition based on joint trajectory maps with convolutional neural networks | |
Kamel et al. | Deep convolutional neural networks for human action recognition using depth maps and postures | |
Zimmermann et al. | Learning to estimate 3d hand pose from single rgb images | |
Kale et al. | A study of vision based human motion recognition and analysis | |
Felsberg et al. | The thermal infrared visual object tracking VOT-TIR2015 challenge results | |
Gavrila | The visual analysis of human movement: A survey | |
CN103530619B (en) | Gesture identification method based on a small amount of training sample that RGB-D data are constituted | |
CN108241849A (en) | Human body interactive action recognition methods based on video | |
Burić et al. | Adapting YOLO network for ball and player detection | |
Ji et al. | Arbitrary-view human action recognition: A varying-view RGB-D action dataset | |
WO2011159258A1 (en) | Method and system for classifying a user's action | |
CN107203745A (en) | A kind of across visual angle action identification method based on cross-domain study | |
CN110956158A (en) | Pedestrian shielding re-identification method based on teacher and student learning frame | |
Dwibedi et al. | Deep cuboid detection: Beyond 2d bounding boxes | |
Li et al. | Pose anchor: A single-stage hand keypoint detection network | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
Liao et al. | A two-stage method for hand-raising gesture recognition in classroom | |
CN112861808A (en) | Dynamic gesture recognition method and device, computer equipment and readable storage medium | |
Xu et al. | Semantic Part RCNN for Real-World Pedestrian Detection. | |
Gadhiya et al. | Analysis of deep learning based pose estimation techniques for locating landmarks on human body parts | |
Yang et al. | Footballer action tracking and intervention using deep learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200922 |
|
CF01 | Termination of patent right due to non-payment of annual fee |