CN103903013A

CN103903013A - Optimization algorithm of unmarked flat object recognition

Info

Publication number: CN103903013A
Application number: CN201410151036.XA
Authority: CN
Inventors: 金城; 贾琼; 冯瑞; 薛向阳
Original assignee: Fudan University
Current assignee: Shanghai Expo development (Group) Co., Ltd.; Fudan University
Priority date: 2014-04-15
Filing date: 2014-04-15
Publication date: 2014-07-02

Abstract

The invention belongs to the technical field of statistical pattern recognition and image processing and particularly relates to an optimization algorithm of unmarked flat object recognition. The optimization algorithm comprises the steps of taking local feature key points as features of an unmarked object, and extracting features in initial stages of an off-line training stage and a real-time recognition stage by means of a dichotomy decision algorithm; performing off-line training on the key points through a random Ferns classifier; adopting a random sampling consistency algorithm in the recognition stage to obtain the position and the posture of the object in a real-time frame; adding position and posture information of the target object obtained by calculation to a virtual object, and overlaying position and posture information in an actual scene to complete an augmented reality system. According to the optimization algorithm, two optimizations are performed, namely, weighting screen is performed on the key points in a feature detection stage, and an improved ARANSAC fitting algorithm is adopted to enable interior points in an initial random set to be scaled-up and improve fitting performance in the recognition stage. Compared with a basic line algorithm, performance in all aspects can be improved greatly, and requirements for real-time performance and reliability of the augmented reality system can be met.

Description

A kind of optimized algorithm of unmarked planar object identification

Technical field

The invention belongs to statistical model identification and technical field of image processing, be specifically related to a kind of optimized algorithm of unmarked planar object identification, this technology provides technical foundation for augmented reality.

Background technology

Augmented reality is that the information providing by computer system increases the technology of user to real world perception, and dummy object, scene or system prompt information that computing machine is generated are added in real scene, thereby realize real enhancing.The present invention uses a kind of target identification technology based on computer vision, for augmented reality provides technical foundation.The novel interactive mode of augmented reality guiding, has broad application prospects.

The key issue of augmented reality is identification and the location of target, refer in complicated image sequence, check whether target object exists, and calculate a kind of technology of target present position in image, the subject matter that solve have in complex illumination, complex background, multiple dimensioned, from various visual angles, the target identification location under condition such as block.

Target identification method generally can divide global approach and partial approach two classes.Global approach generally uses Statistical Classification technology, carrys out the similarity degree of comparison input picture and target object training atlas.These class methods generally, for major issue for example hiding relation, illumination and the background etc. of complexity of target identification, are not carried out specific aim solution.

Partial approach is to describe target object with for example crucial point set of simple local feature or limit collection.Mapping from the feature set of image to be identified to the feature set of target object model image, is called coupling.Correct coupling is called interior point, and the coupling of mistake is called exterior point.Even if lose a part of feature, if find enough interior points, still can identify localizing objects object.Suspicious coupling can be screened by simple geometrical constraint.This requires Feature Descriptor to change insensitive to visual angle and light.SIFT is as the most well-known descriptor, and therefore the problem such as effectively solve yardstick, rotate, block is widely used and develops.But use SIFT descriptor of a high price, and matching stage can only use KNN, operand cannot reduce, can not requirement of real time.

Summary of the invention

The optimized algorithm that the object of the present invention is to provide a kind of unmarked planar object identification, is applied to real time enhancing reality system, to meet the requirement of system real time and accuracy.

The unmarked planar object identification optimized algorithm that is applied to real time enhancing reality system of the present invention's proposition, is mainly in off-line training step, and synthetic a large amount of training sample, extracts key point on this basis first automatically; Then use random assortment device to carry out off-line learning to all key points; Then use track algorithm to carry out Tracking Recognition to subsequent frame; Finally use graphical tool that dummy object is fused in real-time scene.

The unmarked planar object identification optimized algorithm that the present invention proposes, is divided into 3 stages: off-line training step, ONLINE RECOGNITION stage, augmented reality stage successively.

In off-line training step, in extraction model key point process, by the method for weighting, model candidate key point to be screened, the model set of keypoints that is used in coupling has the feature of robust; The ONLINE RECOGNITION stage, the key point that real-time scene is extracted is marked, obtain the initial candidate set of the highest set of keypoints of similarity as the matching stage, ARANSAC algorithm after improvement proves there is great lifting than same class methods accuracy rate, recall rate, time performance everyway by experiment.The algorithm all significantly liftings of performance in all its bearings that the present invention is based on just computer vision, make augmented reality system have the plurality of advantages such as real-time robust, bring smooth visual experience to user.Main frame as shown in Figure 1.

One, off-line training step, concrete steps are:

(1) automatically compound training sample, directly using a unobstructed image of objective plane object as material, adopts automatically synthetic method to generate training set.The concrete method that adopts affined transformation generates new random view from the initial views of target object, and adds white noise.Here compound training integrates size as S (S desirable 10000).Use the synthetic view of random device, using the training sample as sorter.

(2) screen stable crucial point set

Concrete steps are as follows:

The first step: key point extraction problem is converted into two classification problems of key point class and non-key some class, all pixels on image are carried out to Fast Classification.

Specific practice is as shown in Figure 2: choose pixel m to be measured, pixel on its circumference taking R as radius, crosses two pixels of diameter on random choose circumference, its gray-scale value is carried out to following calculating:

| \tilde{I} (m) - \tilde{I} (m + d R_{α}) | \leq τ

And

| \tilde{I} (m) - \tilde{I} (m - d R_{α}) | \leq τ - - - (1)

Wherein

for the pretreated image of process, m point is pixel to be measured, and d is vector of unit length, and R is radius, and α is the angle of this diameter and horizontal diameter, and τ is 4; Once pixel gray-scale value meets above formula on random selected diameter, m point is classified as non-key point at once, on diameter, two pixels are also classified as non-key point simultaneously, if select after test through 4 times, m is not classified as non-key point yet, this point is likely key point, is added key point candidate collection.

Second step: incite somebody to action by previous step after all pixel traversals, obtain key point candidate collection.Use Gaussian-Laplacian pyramid scoring to the scoring of candidate's key point, approximate expression is:

LoG (m) \approx \underset{α &Element; [0, π]}{Σ} \tilde{I} (m - d R_{α}) - \tilde{I} (m) + \tilde{I} (m + d R_{α}) - - - (2)

Wherein, original image I carries out the Gaussian convolution computing of different scale, and carries out down-sampled operation, obtains different scale

image composition gaussian pyramid, wherein n is the pyramidal number of plies (n=4), as shown in Figure 3.

The yardstick information of gaussian pyramid is added to key point candidate's standards of grading, adjusts scoring as follows:

score (m) \approx Σ_{i = 1}^{n} \underset{α &Element; [0, π]}{Σ} i * ({\tilde{I}}_{i} (m - d R_{α}) - {\tilde{I}}_{i} (m) + {\tilde{I}}_{i} (m + d R_{α})) - - - (3)

Using this appraisal result as the ultimate criterion of weighing candidate's key point.

The number of times being detected in training sample by adding up each key point, using point sets maximum detection number of times as final selected crucial point set.Standards of grading for the key point candidate of model are:

score (m) \approx Σ_{j = 1}^{S} r_{j} * Σ_{i = 2}^{n} \underset{α &Element; [0, π]}{Σ} i * (\tilde{I} (m - d R_{α}) - \tilde{I} (m) + \tilde{I} (m + d R_{α})) - - - (4)

Wherein r _jrepresent in j synthetic sample that whether m point is in key point candidate collection, as its r in candidate collection time _jvalue is 1, and anti regular is 0.

(3) adopt semi-naive Bayes sorter to classify

(a). sorter solves matching problem, and the present invention uses the description as such of neighborhood texture centered by key point, and this neighborhood is the square dough sheet of the length of side 32 pixels.From S automatically synthetic sample, extract key point and dough sheet, as the training sample set of all kinds of key points.

(b). select the semi-naive Bayes sorter based on random Ferns, use the model-naive Bayesian based on raw forming model to set up sorter, rely on multiple proper vectors a class variable is classified;

Use the upper pixel of key point neighborhood dough sheet (the square pixel region of the length of side 32 pixels centered by key point) between gray-scale value light and shade comparative result as this key point feature; Here use random algorithm to extract pixel pair, based on the eigenwert relatively obtaining of random pixel position, be combined into proper vector, the classification posterior probability that finally uses sorter to train each key point corresponding to each eigenwert distributes.

(c). the core algorithm of sorter

If c _k, k=1 ..., H, as the set of class, the dough sheet that wish is classified is by a stack features { f _j, j=1 ..., N represents, f _jrepresent to carry out pixel to m on dough sheet _{j, 1}and m _{j, 2}the binary features value of test is suc as formula (5), and wherein I represents gray level image:

f_{j} = \{\begin{matrix} 1 & if, I (m_{j, 1}) \leq I (m_{j, 2}) \\ 0 & otherwise \end{matrix} - - - (5)

The object of sorter is to find the classification number C of maximum probability in dough sheet (patch), suc as formula (6):

\underset{k}{\arg \max} P (C = c_{k} | patch) - - - (6)

Dough sheet is inputted in sorter, represented dough sheet with eigenwert, (6) formula is equivalent to (7) formula:

\underset{k}{\arg \max} P (C = c_{k} | f_{1}, f_{2}, . . ., f_{N}) - - - (7)

From Bayesian formula:

\begin{matrix} P (C = c_{k} | f_{1}, f_{2}, . . ., f_{N}) \\ = \frac{P (f_{1}, f_{2}, . . ., f_{N} | C = c_{k}) \cdot P (C = c_{k})}{P (f_{1}, f_{2}, . . ., f_{N})} \end{matrix} - - - (8)

Wherein right formula denominator is irrelevant with classification, the P (C=c of molecule _k) factor can regard constant as under this algorithm prerequisite, therefore (7) formula can be approximately:

\underset{k}{\arg \max} P (f_{1}, f_{2}, . . ., f_{N} | C = c_{k}) - - - (9)

But in the time that N is larger, calculate 2 ⁿin individual eigenwert, the probability distribution of each class is unpractical.N feature is divided into M group for this reason, supposes the feature independence between M group, this just expedites the emergence of out random Ferns sorter.M classifiers is called to random forest, each group is called to random Fern.Matching problem is finally converted into formula (10):

\begin{matrix} \underset{k}{\arg \max} \underset{j}{Π} P (f_{j} | C = c_{k}) \\ = \underset{k}{\arg \max} Π_{l = 0}^{M - 1} P (f_{l \cdot N / M + 1}, f_{l \cdot N / M + 2}, . . . f_{(l + 1) \cdot N / M} | C = c_{k}) \end{matrix} - - - (10)

Training stage is input to the dough sheet of training sample extraction in sorter and classifies, and finally adds up probability distribution all kinds of on the each leaf node of random forest and to its normalization, finally obtains all kinds of posterior probability under each eigenwert and distribute.

Two, the ONLINE RECOGNITION stage, concrete steps are:

(1) key point is extracted

Key point extracting method in extracting method same (2) chapters and sections, (3) formula of use is the module of candidate's key point.

(2) sorter classification, is converted into gray-scale map by target frame, extracts key point.

For each key point, in the sorter that the input of its dough sheet is trained, select the class of posterior probability maximum as the coupling of this key point according to (10) formula, the basic score using this probability as coupling.

(3) homography matrix is estimated, using classical fitting algorithm RANSAC is point set in the screening of random sampling consistency algorithm, and the pose matrix of estimation model in target.

(a) RANSAC is as given a definition: known calculations goes out the transition matrix all parameters of target object to model, and required minimum coupling number is 3.Each iteration is chosen at random 3 couplings from coupling complete or collected works, solves an initial homography matrix.And then find all couplings that meet this homography matrix in all certain error allowed bands from coupling complete or collected works, be called support set.In the time that support set size reaches certain threshold value, think that support set is enough large, all couplings are interior point, upgrade initial homography matrix with this support set; Otherwise iteration to iterations arrives limited number of times again.The present invention upgrades with for example Levenberg-Marquardt least-squares estimation of golden standard algorithm and obtains final homography matrix, i.e. the pose matrix of target object after RANSAC.

(b) ARANSAC, using the score of coupling as the foundation of determining new initial random scope.When target critical point is classified in sorter, the highest model key point of score value is mated.Otherwise in the mating of all and model key point, the most similar score of target critical point texture is higher.So in the coupling of whole top scores, the probability of interior point is by ratio in coupling complete or collected works higher than it.

Using the intermediate value of the highest and minimum point in whole scores as threshold value, use the arithmetic mean of all scores as threshold value here, only higher than this threshold value just as the first candidate matches of initial set, as shown in Figure 4.To coupling, complete or collected works sort with reference to score, enter Candidate Set higher than the coupling of threshold value, and other couplings are only for support set.Adopt the quick sorting algorithm for threshold value, the unification time of traversal set of matches completes sequence.After this stage completes, if target successfully detected, obtain the pose matrix of objective plane object on image to be detected.

Three, in the online augmented reality stage, concrete steps are:

Use OpenGL on the unmarked planar object of identification, need to operate a 3D object real time enhancing to following matrix: the conversion of (1) projection matrix; (2) model matrix conversion.Wherein: model matrix, refer to matrix manipulation carried out in the position of model in model what comes into a driver's, by model with the fixing position of the attitude of fixing as in scene, even if scene such as rotates at the change afterwards, model still remains on the relative position of scene and fixes; Projection matrix, refers to current scene is carried out to projective transformation, and people's viewpoint is in conversion, and the scene view of seeing is just different, now exactly scene view is carried out to the projection again from human eye angle.

Use a landscape painting real time enhancing to the pose of target object, step is as follows: in (1) cognitive phase, obtain the pose matrix of target object, this is made as to projection matrix; (2) setting model matrix, model is arranged in scene core, the depth of field-90, landscape painting planar process vector direction overlaps with z axle; (3) upset projection matrix because the transition matrix of trying to achieve before and eye-observation to rotation translation matrix be reciprocal; (4) scope of scene depth of field z is 100 to 1000000; (5) scene background is set to real-time incoming frame, and landscape painting is fitted in scene.

Figure 5 shows that the software interface of this augmented reality system, is wherein augmented reality result figure in upper left box, and wherein the target object of bottom right is identified in scene, and uses lower-left the 3rd landscape figure to be strengthened in real-time scene.

The feature of the present invention using local feature key point as unmarked object, adopts the decision making algorithm of two classification to carry out key point feature extraction at off-line training and incipient stage in Real time identification stage respectively; Use random Ferns sorter to carry out off-line training to key point; Cognitive phase adopts ARANSAC algorithm to obtain position and the attitude of object in real-time frame; Calculating gained target object posture information is attached on dummy object, is superimposed upon and in real scene, completes augmented reality system.The present invention makes following main 2 optimizations to the augmented reality system based on object identification because existing recognition technology cannot meet interaction demand: the critical point detection step in off-line training and ONLINE RECOGNITION stage, based on the stronger principle of large scale key point robustness, key point is weighted to screening; Cognitive phase, the ARANSAC fitting algorithm after improvement increases interior some ratio in initial random set, improves the performance of matching.Algorithm after improvement than baseline algorithm in every respect performance be all greatly improved, met real-time, the reliability requirement of augmented reality system.

Brief description of the drawings

Fig. 1: algorithm flow chart.

Fig. 2: key point is extracted schematic diagram.

Fig. 3: Zuo Tu: the down-sampled golden word obtaining of Gauss's multiscale space; Right figure: the correspondence of the key point of different scale in archeus.

Fig. 4: the initial random scope of upper figure: RANSAC is coupling complete or collected works, the initial random scope of figure below: ARANSAC is coupling complete or collected works proper subclass.

Fig. 5: FUDAN Demos interface.

Fig. 6: Zuo Tu: Baseline Methods; Right figure: Baseline Methods+1.+2.+3..

Embodiment

Do to organize contrast experiments with baseline algorithm author's test video more.This test video is totally 499 frames, resolution 640*480.Video packets is containing the even motion blur problems of problem of all wide baseline couplings.

The influence degree of the optimization of different phase proposing for research the present invention to result, be altogether 6 groups of contrast experiments, being 1. wherein 3.2 joint screening model key points, is 2. 4.1 knotter screen target frame key points, is 3. that 4.2 joint ARANSAC.6 group contrast experiments are respectively: (1) Baseline Methods; (2) Baseline Methods+1.; (3) Baseline Methods+3.; (4) Baseline Methods+1.+3.; (5) Baseline Methods+2.+3.; (6) Baseline Methods+1.+2.+3..

Table 1 contrast experiment various aspects of performance reference table

Performance contrast experiment numbering	(1)	(2)	(3)	(4)	(5)	(6)
							Recall rate (%)	79.36	79.56	92.79	92.99	93.19	92.18
Accurate rate (%)	89.89	93.84	96.18	96.34	96.58	97.01
							Support set size (individual)	78.2	82.3	82.5	81.6	82.1	82.2
Iterations (inferior)	292.1	277.6	84.0	83.9	121.7	117.0
							Frame frequency (fps)	16.5	19.1	20.7	20.6	19.3	18.5

Contrast experiment (6) and Baseline Methods: recall rate improves 16.15%; Accurate rate improves 7.92%; Support set increases by 5.12%; Iterations reduces 59.95%; Frame frequency improves 12.12%.Wherein, because the threshold value of support set is higher, all algorithm identified rates are all more than 99%.Analyze all contrast experiment's data known, ARANSAC algorithm is larger on the performance boost impact of result, and to key point, screening has obvious lifting to the performance except recall rate.

As shown in Figure 4, left figure is control group (1) recognition result, and right figure is control group (6) recognition result, can obviously find out, the support set that (6) are found is much larger than (1), and the pose of target object mouse pad is more accurate.

Claims

1. an optimized algorithm for unmarked planar object identification, is characterized in that being divided into successively 3 stages: off-line training step, ONLINE RECOGNITION stage, augmented reality stage;

One, the concrete steps of off-line training step are:

(1) automatic compound training sample

A direct unobstructed image using objective plane object, as material, adopts automatically synthetic method to generate training set; The concrete method that adopts affined transformation generates new random view from the initial views of objective plane object, and adds white noise; Here compound training integrates size as S, uses the synthetic view of random device, as the training sample of sorter;

(2) screen stable crucial point set

The first step: key point extraction problem is converted into two classification problems of key point class and non-key some class, all pixels on image are carried out to Fast Classification:

Choose pixel m to be measured, pixel on its circumference taking R as radius, two pixels of mistake diameter on random choose circumference, carry out following calculating to its gray-scale value:

Figure 201410151036X100001DEST_PATH_IMAGE001

and

Figure 201410151036X100001DEST_PATH_IMAGE002

(1)

Wherein for the pretreated image of process, m point is pixel to be measured, and d is vector of unit length, and R is radius,

for the angle of this diameter and horizontal diameter, be 4; Once pixel gray-scale value meets above formula on random selected diameter, m point is classified as non-key point at once, on diameter, two pixels are also classified as non-key point simultaneously, if select after test through 4 times, m is not classified as non-key point yet, this point is likely key point, is added key point candidate collection;

Second step: incite somebody to action by previous step

after all pixel traversals, obtain key point candidate collection; Use Gaussian-Laplacian pyramid scoring to the scoring of candidate's key point, approximate expression is:

(2)

Using this appraisal result as the basic standard of weighing model image candidate's key point feature stability and conspicuousness;

Wherein, to original image

carry out the Gaussian convolution computing of different scale, and carry out down-sampled operation, obtain different scale

Figure 201410151036X100001DEST_PATH_IMAGE007

image composition gaussian pyramid,

for the pyramidal number of plies, be 4 here;

(3)

Using this appraisal result as the ultimate criterion of weighing candidate's key point;

The number of times being detected in training sample by adding up each key point, using point sets maximum detection number of times as final selected crucial point set; Standards of grading for the key point candidate of model are:

?(4)

Wherein

represent in j synthetic sample

whether point is in key point candidate collection, when it is in candidate collection time

value is 1, and anti regular is 0;

(3) adopt semi-naive Bayes sorter to classify

(a). the neighborhood texture of use centered by key point is as such description, and this neighborhood is the square dough sheet of the length of side 32 pixels; From S automatically synthetic sample, extract key point and dough sheet, as the training sample set of all kinds of key points;

(b). select the semi-naive Bayes sorter based on random Ferns

Use the model-naive Bayesian based on raw forming model to set up sorter, rely on multiple proper vectors a class variable is classified;

Use pixel on key point neighborhood dough sheet between gray-scale value light and shade comparative result as this key point feature; Here use random algorithm to extract pixel pair, based on the eigenwert relatively obtaining of random pixel position, be combined into proper vector, the classification posterior probability that finally uses sorter to train each key point corresponding to each eigenwert distributes;

(c). the core algorithm of sorter is as follows:

If c _k, k=1 ..., H, as the set of class, the dough sheet that wish is classified is by a stack features { f _j, j=1 ..., N represents, f _jrepresent to carry out pixel to m on dough sheet _{j, 1}and m _{j, 2}the binary features value of test is suc as formula (5), wherein

represent gray level image:

(5)

The object of sorter is to find the classification number C of maximum probability in dough sheet, suc as formula (6):

(6)

(7)

Known by Bayesian formula:

(8)

Wherein, right formula denominator is irrelevant with classification, the P (C=c of molecule _k) factor regards constant as, therefore (7) formula is approximately:

(9)

N feature is divided into M group, supposes the feature independence between M group, matching problem is finally converted into formula (10):

(10)

Training stage is input to the dough sheet of training sample extraction in sorter and classifies, and adds up probability distribution all kinds of on the each leaf node of random forest and to its normalization, finally obtains all kinds of posterior probability under each eigenwert and distribute;

Two, the concrete steps in ONLINE RECOGNITION stage are:

(1) key point is extracted

Extracting method is with the key point extracting method of the stable crucial point set of the screening of off-line training step, the module taking formula (3) as key point;

(2) sorter classification, is converted into gray-scale map by target frame, extracts key point

For each key point, in the sorter that the input of its dough sheet is trained, select the class of posterior probability maximum as the coupling of this key point according to (10) formula, the basic score using this probability as coupling;

(3) homography matrix is estimated, using classical fitting algorithm RANSAC is point set in the screening of random sampling consistency algorithm, and the pose matrix of estimation model in target;

(a) fitting algorithm RANSAC is defined as follows: known calculations goes out the transition matrix all parameters of target object to model, and required minimum coupling number is 3; Each iteration is chosen at random 3 couplings from coupling complete or collected works, solves an initial homography matrix, and then finds all couplings that meet this homography matrix in all certain error allowed bands from coupling complete or collected works, is called support set; In the time that support set size reaches certain threshold value, think that support set is enough large, all couplings are interior point, upgrade initial homography matrix with this support set; Otherwise iteration again; After RANSAC, use Levenberg-Marquardt least-squares estimation to upgrade and obtain final homography matrix;

(b) improved fitting algorithm ARANSAC, using the score of coupling as the foundation of determining new initial random scope; When target critical point is classified in sorter, the highest model key point of score value is mated; Otherwise in the mating of all and model key point, the most similar score of target critical point texture is higher; So in the coupling of whole top scores, the probability of interior point is by ratio in coupling complete or collected works higher than it;

Using the intermediate value of the highest and minimum point in whole scores as threshold value, use the arithmetic mean of all scores as threshold value, only higher than this threshold value just as the first candidate matches of initial set; To coupling, complete or collected works sort with reference to score, enter Candidate Set higher than the coupling of threshold value, and other couplings are only for support set; Adopt the quick sorting algorithm for threshold value, the unification time of traversal set of matches completes sequence; After this stage completes, if target successfully detected, obtain the pose matrix of objective plane object on image to be detected;

Three, the online augmented reality stage

Use OpenGL on the unmarked planar object of identification, need to operate a 3D object real time enhancing to following matrix: the conversion of (1) projection matrix; (2) model matrix conversion; Wherein: model matrix, refer to matrix manipulation carried out in the position of model in model what comes into a driver's, by model with the fixing position of the attitude of fixing as in scene, even if scene changes afterwards, model still remains on the relative position of scene and fixes; Projection matrix, refers to current scene is carried out to projective transformation, and people's viewpoint is in conversion, and the scene view of seeing is just different, now exactly scene view is carried out to the projection again from human eye angle;

Use a landscape painting real time enhancing to the pose of target object, its step is as follows: in (1) cognitive phase, obtain the pose matrix of target object, this is made as to projection matrix; (2) setting model matrix, model is arranged in scene core, the depth of field-90, landscape painting planar process vector direction overlaps with z axle; (3) upset projection matrix because the transition matrix of trying to achieve before and eye-observation to rotation translation matrix be reciprocal; (4) scope of scene depth of field z is 100 to 1000000; (5) scene background is set to real-time incoming frame, and landscape painting is fitted in scene.