CN104715254A

CN104715254A - Ordinary object recognizing method based on 2D and 3D SIFT feature fusion

Info

Publication number: CN104715254A
Application number: CN201510117991.6A
Authority: CN
Inventors: 李新德; 刘苗苗; 徐叶帆
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2015-03-17
Filing date: 2015-03-17
Publication date: 2015-06-17
Anticipated expiration: 2035-03-17
Also published as: CN104715254B

Abstract

The invention discloses an ordinary object recognizing method based on 2D and 3D SIFT feature fusion. The ordinary object recognizing method based on 2D and 3D SIFT feature fusion aims to increase the ordinary object recognizing accuracy. A 3D SIFT feather descriptor based on a point cloud model is provided based on Scale Invariant Feature Transform, SIFT (2D SIFT), and then the ordinary object recognizing method based on 2D and 3D SIFT feature fusion is provided. The ordinary object recognizing method based on 2D and 3D SIFT feature fusion comprises the following steps that 1, a two-dimension image and 2D and 3D feather descriptors of three-dimension point cloud of an object are extracted; 2, feather vectors of the object are obtained by means of a BoW (Bag of Words) model; 3, the two feature vectors are fused according to feature level fusion, so that description of the object is achieved; 4, classified recognition is achieved through a support vector machine (SVN) of a supervised classifier, and a final recognition result is given.

Description

A kind of general object identification method merged based on 2D and 3D SIFT feature

Technical field

The present invention relates to a kind of general object identification method merged based on 2D and 3D SIFT feature, belong to the technical field of recognition methods.

Background technology

General object identification is the hot issue of recent domestic research, be different from certain objects identification (Specific Object Recognition), as recognition of face etc., can be trained by the training sample of magnanimity, only process certain object or certain type objects; General object identification difficulty is many, because general features general between object type must be used, and can not be certain particular category defined feature, and this feature needs to give expression to as much as possible general character and class inherited in class, it must process multicategory classification and incremental learning, puts the Massive Sample of given classification cannot be used to train before this.

The main approaches of general object identification is that extract body characteristics realizes Object representation at present, utilizes certain machine learning algorithm to carry out object type study, finally carries out object classification, realize object identification.General object identification method based on image local feature is research emphasis for a long time, and be the research field of current relative maturity, but based on two dimensional image identification mainly for the identification of digitized greyscale image, lost the three-dimensional information of actual object, and be easily subject to the impact of the external conditions such as illumination.Point cloud model is through necessarily processing the object model obtained by Object Depth image, because depth information only depends on the geometric configuration of object, have nothing to do with the characteristic such as the brightness of object and reflection, there is not shade when using gray level image or surface projection's problem, so based on the process of object point cloud model recognition object, more easier than use gray level image.

When identifying in target class that difference is large, between class, similarity is high, single feature can not well reflect general character in class inherited and class.In order to address this problem, Many researchers proposes target identification method based on multi-feature fusion, is all widely used in Aircraft Target Identification, recognition of face, object identification.

But the general object identification research in true environment is artificial intelligence pith, plays an important role in intelligent monitoring, remote measurement remote sensing, robot, Medical Image Processing etc.Be different from certain objects identification, in true environment, general kind of object is various, has that similarity between class is high, otherness is little in class a problem, makes general object identification become especially difficulty.In prior art, normal employing two dimensional character method, but it describes at object space local characteristics the technical matters that this is existence disappearance on the one hand.How to select general character in suitable character representation general object class inherited and class most important, extract stable and effective feature could obtain best recognition result under limited training sample, improve discrimination.

Summary of the invention

Goal of the invention: in order to overcome the deficiencies in the prior art, the invention provides a kind of general object identification method merged based on 2D and 3D SIFT feature, method in conjunction with two and three dimensions feature merges multiple object information, effectively can reduce based on the low problem of the recognizer discrimination of single feature, difference is little in high, the class of similarity when between class, still have higher recognition correct rate.

Technical scheme: for achieving the above object, the technical solution used in the present invention is:

Based on the general object identification method that 2D and 3D SIFT feature merges, comprise the following steps:

1) feature extraction and expression:

For sample object, extract the feature interpretation of described sample object, described feature interpretation comprises subject image and object point cloud; First extract volume image 2D SIFT feature, completes subject image character representation; Then extract object point cloud 3DSIFT feature, complete object point cloud character representation; Namely 2D and the 3D SIFT feature descriptor of sample object is obtained;

2) Object representation:

Utilize the method for KMeans++ cluster to obtain namely corresponding vision word storehouse, sample clustering center, recycling BoW model, adopts multi-C vector to carry out Object representation, obtains 2D and the 3D SIFT feature vector of the correspondence of sample object;

3) Fusion Features:

Utilize the method for feature-based fusion to carry out Fusion Features 2D and the 3D SIFT feature of the correspondence of sample object vector, obtain the serial fusion feature vector of sample object;

4) classifier design and training:

Utilize support vector machine and SVM to learn the target type of described sample object and realize target classification, training classifier is to build multi classifier;

5) object identification to be identified:

By the input of the serial fusion feature of object to be identified vector through described step 4) multi classifier that trains, obtain the probability that described object to be identified belongs to each classification, the sample object classification corresponding to most probable value is the recognition result of described object to be identified.

Further, in the present invention, the extracting method of described object point cloud 3D SIFT feature comprises the following steps:

1-1) critical point detection:

The point cloud model mid point coordinate of object is expressed as P (x, y, z), and for realizing scale invariability, the metric space of definition 3D point cloud is L (x, y, z, σ):

L(x,y,z,σ)＝G(x,y,z,σ)*P(x,y,z) (1)

Wherein σ is the metric space factor, and the three-dimensional Gaussian kernel function of change yardstick is:

G (x, y, z, σ) = \frac{1}{{(\sqrt{2 π} σ)}^{3}} e^{- (x^{2} + y^{2} + z^{2}) / 2 σ^{2}} - - - (2)

Utilize multiplication factor k ⁱobtain different scale, if often organizing the number of plies in pyramid group is s, then k is set ^s=2; Build some cloud gaussian pyramid, utilize difference of Gaussian DoG function to carry out extremum extracting, obtain DoG Function Extreme Value point and be key point; Wherein, DoG operator computing formula is:

D(x,y,z,k ⁱσ)＝L(x,y,z,k ⁱ⁺¹σ)-L(x,y,z,k ⁱσ) (3)

Wherein, i ∈ [0, s+2];

1-2) key point direction is distributed:

For the key point that each detects, need for described key point calculates this key point local feature of a vector description, this vector is called the descriptor at key point place; In order to make descriptor have rotational invariance, utilize the local feature of some cloud to distribute a reference direction for key point, the direction distribution method of described key point is as follows:

1-2-1) calculate the k neighborhood of key point P, neighborhood point is designated as P _ki, i={1,2 ... n} represents neighborhood point sequence number, and wherein n represents neighborhood point number;

1-2-2) calculate the center point P of the k neighborhood of key point P _c;

1-2-3) compute vector with obtain vector magnitude d and two angle wherein (x, y, z) is vectorial coordinate;

1-2-4) use in statistics with histogram k neighborhood according to described step 1-2-3) in the vector magnitude d that calculates and angle i.e. direction, respectively will be divided into 18 sub-ranges (bins) and 36 sub-ranges, each sub-range is 10 °; Using amplitude d as weights, statistics angle shi Jinhang Gauss weighting wherein R _maxrepresent key point neighborhood maximum radius, ignore the point exceeding this distance;

1-2-5) histogrammic peak value represents the direction of this key point neighborhood, using the principal direction of this direction as described key point, in order to strengthen the robustness of coupling, only retains the auxiliary direction of direction as this key point that peak value is greater than principal direction peak value 80%, definition corresponding principal direction is (α, β);

1-3) key point feature interpretation:

The generative process of the Feature Descriptor of key point is as follows:

1-3-1) calculate the k neighborhood of key point P, neighborhood point is designated as P _ki, i={1,2 ... n} represents neighborhood point sequence number, and n represents neighborhood point number, and when this k neighborhood distributes with key point direction, described neighborhood choice scope is identical;

1-3-2) by histogrammic X-axis rotate to key point principal direction, ensure rotational invariance, neighborhood point coordinate transformation for mula is:

{(x^{'}, y^{'}, z^{'})}^{T} = (\begin{matrix} \cos α_{p} \cos β_{p} & - \sin α_{p} & - \cos α_{p} \sin β_{p} \\ \sin α_{p} \cos β_{p} & \cos α_{p} & - \sin α_{p} \sin β_{p} \\ \sin β_{p} & 0 & \cos β_{p} \end{matrix}) \cdot {(x, y, z)}^{T} - - - (5)

Wherein (x, y, z) and (x', y', z') is the coordinate of neighborhood point before and after rotating respectively,

1-3-3) calculate the k neighborhood of key point P at a normal vector at P place

1-3-4) compute vector utilize described formula (4) compute vector amplitude and two angles, simultaneously computing method vector and vector angle δ is:

δ = \cos^{- 1} (\overset{&RightArrow;}{{PP}_{ki}} \cdot \overset{&RightArrow;}{n} / | \overset{&RightArrow;}{{PP}_{ki}} | | \overset{&RightArrow;}{n} |) - - - (6)

1-3-5) the feature four-tuple obtained of key point and neighborhood represent, according to 45 ° of posts, respectively will be divided into 8,4 and 4 sub-ranges, and statistics drops on counting out of each sub-range; Using amplitude d as weights, when counting out between Statistical Area, carry out Gauss's weighting the proper vector obtaining one 128 dimension is thus F={f ₁, f ₂, L f ₁₂₈;

1-3-6) normalization characteristic vector: for proper vector F={f ₁, f ₂, L f ₁₂₈, after normalization be

L={l ₁, l ₂, L l ₁₂₈, wherein so far, the 3D SIFT feature descriptor of key point is generated.

Further, in the present invention, described step 2) in the concrete grammar of Object representation be:

Utilize KMeans++ clustering method, obtain the vision word storehouse that sample clustering center is namely corresponding, be designated as center={center _l, l=1,2, K k}, wherein k represents cluster centre number, center _lrepresent l vision word in vision word storehouse; Recycling BoW model method, carries out Object representation with a multi-C vector;

Further, in the present invention, described step 4) in, the method of target classification is: build multi classifier by training the method for several binary classifiers, concrete training process is as follows: the i-th class training sample and remaining n-1 class training sample are carried out SVM between two respectively and trains, obtain multiple 1V1SVM sorter, then n class training sample has individual 1V1SVM sorter.

Further, in the present invention, described step 1) in, the method obtaining described DoG Function Extreme Value point and key point is:

Each some P (x, y, z) in the point cloud model of described object compares with other all consecutive point, determines whether the maximum value or minimum value in this contiguous range; Wherein, middle check point not only will compare with 26 points of described monitoring point with yardstick, and 27 × 2 points that also will be corresponding with neighbouring yardstick compare, and the extreme point detected thus is key point; Arrange threshold tau=1.0, the key point being less than this threshold value is the key point of low contrast, rejects.

Further, in the present invention, described step 3) in, for sample O _ξ∈ O, wherein O is sample space, described sample O _ξcorresponding 2D and 3D SIFT feature vector is respectively Vec_2D and Vec_3D, obtains described sample O _ξserial fusion feature vector be Vec_3D2D=(Vec_3D, Vec_2D) ^t, utilize described serial fusion feature vector to realize Object representation.

Further, in the present invention, described step 5) in, the concrete grammar obtaining the recognition result of described object to be identified is:

5-1) extract 2D and the 3D SIFT feature vector of object to be identified, obtain 2D and the 3D SIFT feature descriptor of object to be identified; Utilize the proper vector of BoW modeling statistics object to be identified to distribute, be expressed as Vec_2D and Vec_3D;

5-2) carry out feature-based fusion to two proper vectors of described object to be identified, forming new serial fusion feature vector is Vec_3D2D=(Vec_3D, Vec_2D) ^t, realize Object representation;

5-3) described serial fusion feature vector is inputted the 1V1SVM multi classifier trained, discriminant function obtains corresponding differentiation result, obtains the probability that this object belongs to the i-th class be designated as P (i), i ∈ [1 by ballot, n], wherein n represents the total class number of object;

5-4) judge by the value of maximum probability the class class that described object to be identified is corresponding, mathematical formulae is:

class = \arg \max_{1 \leq i \leq n} {P (i)} - - - (7)

Beneficial effect: a kind of general object identification method merged based on 2D and 3D SIFT feature provided by the invention, for two dimensional image and the three-dimensional point cloud of arbitrary object, extract its local feature 2D and 3D SIFT descriptor, represent as this object features, based on " word bag " (Bag of Words, BoW) model obtains object features vector, then utilize feature-based fusion to complete the corresponding BoW proper vector of 2D and 3D SIFT to merge, realize Object representation, support vector machine (Support Vector Machine, SVM) is finally utilized to realize object identification.The 3D SIFT feature descriptor that the present invention proposes can be good at describing object space local characteristics, effectively solves this problem lacked on the one hand of two dimensional character.The method that 2D and 3D SIFT feature merges compensate for the deficiency of single feature recognition algorithms, and more abundant characterizes object properties, improves the correct recognition rata of general object identification method significantly.

This method addresses object features general object identification and extracts and the difficult problem represented from two aspects, propose a kind of general object identification method merged based on 2D and 3D SIFT feature, first for the object identification Problems existing based on two dimensional image, developing rapidly of three-dimensional point cloud model, and 3D SIFT feature is based on the superperformance in the object identification of voxel model, 2D SIFT is extended to object dimensional point cloud model by this method, proposes a kind of general object identification method based on 3D SIFT descriptor.Secondly this problem of object properties can not be represented very well in order to solve single features, in conjunction with the premium properties of 2D SIFT in image recognition, this method, on the basis proposing 3D SIFT algorithm, proposes a kind of general object identification method merged based on 2D and 3D SIFT feature.To sum up, the novelty of the method is:

(1) 3D SIFT feature descriptor is improved, be applied in point cloud model character representation, statistics point cloud local feature histogram, describes the vital normal vector of local characteristics, realizes the feature extraction of object point cloud model and expression in addition point cloud model;

(2) improvement 3D SIFT is applied in general object identification, realizes general object identification function;

(3) 2D and 3D SIFT feature is carried out feature-based fusion, achieve general object recognition algorithm based on multi-feature fusion, solve the problem that single features discrimination is low.

Accompanying drawing explanation

Fig. 1 is the block schematic illustration that the present invention is based on the general object identification method that 2D and 3D SIFT feature merges;

Fig. 2 is the schematic flow sheet that the present invention is based on the general object identification method that 2D and 3D SIFT feature merges;

Fig. 3 is multiclass object different characteristic fusion method correct recognition rata schematic diagram;

Fig. 4 is each type objects correct recognition rata schematic diagram;

Fig. 5 is various visual angles correct recognition rata schematic diagram;

Fig. 6 is size scaling correct recognition rata schematic diagram.

Embodiment

Below in conjunction with accompanying drawing, the present invention is further described.

The framework of the general object identification method based on the fusion of 2D and 3D SIFT feature that the present invention proposes as shown in Figure 1, first extract body characteristics, set up the description of general object, then utilize machine learning method to learn object type, finally by known object type, unknown object is identified.By early stage some sample training and study, under simpler environment, machine vision technique can realize detecting the environment observed, splitting, and when observing the new object being subordinated to old classification, provides corresponding recognition result.The algorithm frame that Fig. 1 provides mainly comprises following 4 aspects:

1) feature extraction and expression: extract object point cloud and 3D and 2D SIFT feature corresponding to image, realize object features and represent;

2) object BoW model: obtain BoW proper vector corresponding to 3D and 2D SIFT corresponding to object two features with classical statistical models BoW (Bag of Words) model;

3) Fusion Features: BoW proper vector corresponding for 3D and 2D SIFT is carried out feature-based fusion, realizes Object representation;

4) object type learns and classification: for multiclass object, and training builds 1V1SVM between two respectively, and in identifying, utilization ballot provides the probability that object to be identified belongs to the i-th class, according to probability distribution, provides final recognition result.

Embodiment 1 recognizer framework

Based on the general object identification method that 2D and 3D SIFT feature merges, mainly comprise the following steps:

1) feature extraction and expression:

Feature extraction and expression are the bases of object identification, and how to extract stable and effective feature is the Focal point and difficult point in Study on Feature Extraction, the feature chosen could obtain best recognition result under limited training sample condition.General physical quantities is numerous, it can not be each object Modling model storehouse, simultaneously, the differences such as each body form color of each class are also very large, so the object features extracted must meet the following conditions: 1) make class inherited maximum, namely can characterize the feature that every type objects is different from other type objects; 2) make difference in class minimum, namely can characterize the common feature of every type objects.This carries out abstract and reasonable expression to every type objects with regard to needing on certain semantic hierarchies, characterizes this type objects with limited training physical quantities.The present invention proposes the 3D SIFT feature based on point cloud model, as object features together with the 2D SIFT feature of image, realizes object identification, specific as follows:

A) 2D SIFT feature is extracted

Utilize the gaussian kernel function convolution of image and different scale to generate metric space, using the Local Extremum that detects in difference of Gaussian function (Differenceof Gaussian, DoG) metric space as key point, DoG operator computing formula is as follows:

D(x,y,σ)＝L(x,y,kσ)-L(x,y,σ) (1-1)

L(x,y,σ)＝G(x,y,σ)*I(x,y) (1-2)

Wherein L represents metric space, and I (x, y) representative image is at the pixel value at (x, y) place, and σ is the metric space factor, and be worth less expression image by level and smooth fewer, corresponding yardstick is also less, and dimensional Gaussian kernel function is:

G (x, y, σ) = \frac{1}{2 {πσ}^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}} - - - (1 - 3)

Because DoG operator can produce stronger skirt response, in order to strengthen the stability of identification and increase antimierophonic ability, the key point of low contrast and unstable skirt response point need be rejected.Arrange threshold tau=0.02, every key point being less than this threshold value all needs disallowable.Then utilize the Hessian matrix of 2 × 2 to reject frontier point, even because very little noise, it also can be made to produce unstable descriptor.

Utilize the gradient direction distribution characteristic of key point neighborhood territory pixel to be each key point determination principal direction and auxiliary square to, gradient magnitude and direction is obtained by formula (1-4), each crucial neighborhood of a point point is assigned in the subregion of 4 × 4, calculate the gradient and the direction that affect the sampled point of subregion, be assigned on 8 directions, namely each key point forms the proper vector of one 128 dimension.

\begin{matrix} m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1, y))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}} \\ θ (x, y) = \tan^{- 1} ((L (x, y + 1) - L (x, y - 1)) / (L (x + 1, y) - L (x - 1, y))) \end{matrix} - - - (1 - 4)

B) 3D SIFT feature is extracted

Lost important three-dimensional information due to two dimensional image and be easily subject to the external condition impacts such as illumination, so SIFT is extended to 3D SIFT by the present invention, inherit the above feature of 2D SIFT, simultaneously adding due to depth information, make 3D SIFT descriptor can describe the local space relation of object more accurately.The 3D SIFT feature extraction algorithm key step that the present invention proposes is as follows: critical point detection, key point direction distribute and key point feature interpretation, specific as follows:

1-1) critical point detection:

The point cloud model mid point coordinate of object is expressed as P (x, y, z), for realizing scale invariability, the metric space of definition 3D point cloud is L (x, y, z, σ), by gaussian kernel function G (x, the y of a change yardstick, z, σ) obtain with input point cloud P (x, y, z) convolution:

L(x,y,z,σ)＝G(x,y,z,σ)*P(x,y,z) (1-5)

Wherein σ is the metric space factor, and three-dimensional Gaussian kernel function is:

G (x, y, z, σ) = \frac{1}{{(\sqrt{2 π} σ)}^{3}} e^{- (x^{2} + y^{2} + z^{2}) / 2 σ^{2}} - - - (1 - 6)

Utilize multiplication factor k ⁱobtain different scale, if often organizing the number of plies in pyramid group is s, then k is set ^s=2; Build some cloud gaussian pyramid, replace the Gauss-Laplace of dimension normalization with more efficient difference of Gaussian function (difference-of-Gaussian, DoG) carry out extremum extracting, obtain DoG Function Extreme Value point and be key point; Wherein, DoG operator computing formula is:

D(x,y,z,k ⁱσ)＝L(x,y,z,k ⁱ⁺¹σ)-L(x,y,z,k ⁱσ) (1-7)

Wherein, i ∈ [0, s+2].

Key point is made up of the Local Extremum in DoG space, the method obtaining DoG Function Extreme Value point and key point is: each some P (x in the point cloud model of object, y, z) compare with other all consecutive point, determine whether the maximum value or minimum value in this contiguous range; Wherein, middle check point not only will compare with 26 points of monitoring point with yardstick, and 27 × 2 points that also will be corresponding with neighbouring yardstick compare, and the extreme point detected thus is key point; Arrange threshold tau=1.0, the key point being less than this threshold value is the key point of low contrast, rejects.Utilize Rusu etc. at document " Towards 3D object maps for autonomous household robots " (Rusu R B, Blodow N, Marton Z, Soos A, Beetz M.Towards 3D object maps for autonomous household robots.In:Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.SanDiego, CA:IEEE, 2007.3191-3198) in propose method judge whether key point is frontier point, if so, then reject.

1-2) key point direction is distributed:

For the key point that each detects, need for it calculates this key point local feature of vector description, this vector is called the descriptor at key point place; In order to make descriptor have rotational invariance, utilize the local feature of some cloud to distribute a reference direction for key point, the direction distribution method of key point is as follows:

1-2-2) calculate the center point P of the k neighborhood of key point P _c;

1-2-4) use in statistics with histogram k neighborhood according to step 2-3) in the vector magnitude d that calculates and angle i.e. direction, respectively will be divided into 18 sub-ranges (bins) and 36 sub-ranges, each sub-range is 10 °; Using amplitude d as weights, statistics angle shi Jinhang Gauss weighting wherein R _maxrepresent key point neighborhood maximum radius, ignore the point exceeding this distance;

1-2-5) histogrammic peak value represents the direction of this key point neighborhood, using the principal direction of this direction as key point, in order to strengthen the robustness of coupling, only retains the auxiliary direction of direction as this key point that peak value is greater than principal direction peak value 80%, definition corresponding principal direction is (α, β);

So far, namely the key point containing position, yardstick and direction detected is the 3D SIFT feature point of this cloud.

1-3) key point feature interpretation:

By above step, for each key point, have three information: position, yardstick and direction.Next set up a descriptor for each key point exactly, with one group of vector, this key point is described out, make it not change, such as illumination variation, visual angle change etc. with various change; This descriptor not only comprises key point, also comprise to its contributive point around key point, and descriptor should have higher uniqueness, so that the probability that raising key point is correctly mated.

2D SIFT descriptor is that the one of key point neighborhood Gaussian image gradient statistics represents, for three-dimensional point cloud model, be then statistics key point neighborhood local space relation, calculate each angular histogram in neighborhood, generate 3D SIFT feature vector, this cloud of unique expression.Surface normal is the important attribute on solid surface, normal vector distribution can the 3D geometric properties of representation surface, so in step 1-2 when the present invention calculates 3D SIFT feature vector) in addition method vector on the basis of vector that calculates, more comprehensively express the local spatial feature of object.

The generative process of the Feature Descriptor of key point is as follows:

1-3-1) calculate the k neighborhood of key point P, neighborhood point is designated as P _ki, i={1,2 ... n} represents neighborhood point sequence number, and n represents neighborhood point number, and when this k neighborhood distributes with key point direction, neighborhood choice scope is identical;

{(x^{'}, y^{'}, z^{'})}^{T} = (\begin{matrix} \cos α_{p} \cos β_{p} & - \sin α_{p} & - \cos α_{p} \sin β_{p} \\ \sin α_{p} \cos β_{p} & \cos α_{p} & - \sin α_{p} \sin β_{p} \\ \sin β_{p} & 0 & \cos β_{p} \end{matrix}) \cdot {(x, y, z)}^{T} - - - (1 - 9)

1-3-4) compute vector utilize formula (1-8) compute vector amplitude and two angles, simultaneously computing method vector and vector angle δ is:

δ = \cos^{- 1} (\overset{&RightArrow;}{{PP}_{ki}} \cdot \overset{&RightArrow;}{n} / | \overset{&RightArrow;}{{PP}_{ki}} | | \overset{&RightArrow;}{n} |) - - - (1 - 10)

2) Object representation:

The present invention adopts classical BoW (Bag of Words) modeling statistics object features vector distribution, realizes Object representation with a multi-C vector; Being different from classical BoW model utilizes KMeans to carry out cluster, and the present invention utilizes KMeans++ clustering algorithm to obtain object vision word storehouse.Compared with KMeans clustering algorithm, KMeans++ clustering algorithm improves initial cluster center, makes algorithm in cluster result accuracy or all there is lifting working time.First utilize the method for KMeans++ cluster to obtain namely corresponding vision word storehouse, sample clustering center, recycling BoW model, adopts multi-C vector to carry out Object representation, obtains 2D and the 3D SIFT feature vector of the correspondence of sample object;

Wherein, the concrete grammar of Object representation is:

Multi-C vector computing method are specially: add up in vision word storehouse, the number of times that the vision word in 2D and the 3D SIFT feature vector that sample object is corresponding occurs, are designated as (y ₀y ₁k y _k-2y _k-1), wherein y _lrepresent vision word center _lthe number of times occurred describes the one dimension in the multi-C vector of object; Wherein, the statistical method of vision word occurrence number is specially: calculate the distance of 2D and 3D SIFT feature vector corresponding to sample object to center, to corresponding center _lthe minimum sample object of distance, corresponding number of times y _ladd 1.

The basic thought chosen of KMeans++ clustering algorithm initial cluster center is: the phase mutual edge distance between initial cluster centre is far away as much as possible.Initial cluster center selects step as follows:

2-1) cluster centre set is designated as center, from the vector set X={x of input ₁, x ₂, x ₃l x _n-1, x _nmiddle Stochastic choice vector x _i∈ X is as first cluster centre;

2-2) for satisfied { x _j| x _j∈ _xany vector of center}, calculates the distance D (x of itself and nearest cluster centre (referring to the cluster centre selected) _j) ²;

2-3) select a vector as new cluster centre, each vector is chosen as the probability P (x of cluster centre _j) calculated by formula (1-12), P (x _j) maximum time corresponding vector and new cluster centre;

P (x_{j}) = D {(x_{j})}^{2} / Σ_{x_{j} &Element; X^{center}} D {(x_{j})}^{2} - - - (1 - 11)

2-4) repeat step 2-2) and 2-3) until K initial cluster center is out selected.

Obtain the KMeans algorithm of operative norm after K initial cluster center.Carry out Experimental Comparison by choosing different K values, the embodiment of the present invention chooses K=300.

3) Fusion Features:

Fusion Features mode mainly comprises: pixel-based fusion, feature-based fusion and decision level fusion.Feature-based fusion merges for the proper vector extracted, and enriched target object feature, and compared with having the pixel-based fusion of huge data volume process, recognition effect has and reduces a little, but data volume reduces greatly, can realize real-time process.In addition on the one hand by feature-based fusion, the effective information that can characterize object essence can retain, and enriches than the effective information of decision level fusion.But directly carry out fusion for object different characteristic descriptor and there is different characteristic descriptor number difference, reluctant problem, so after Feature Descriptor utilizes BoW modeling statistics by this method, obtain the proper vector of a multidimensional, then carry out feature-based fusion, can effectively solve the problem.

The method of feature-based fusion is utilized to realize general object identification based on multi-feature fusion; For sample O _ξ∈ O, wherein O is sample space, sample O _ξcorresponding 2D and 3D SIFT feature vector is respectively Vec_2D and Vec_3D, utilizes the method for feature-based fusion to carry out Fusion Features, obtains sample O _ξserial fusion feature vector be Vec_3D2D=(Vec_3D, Vec_2D) ^t, utilize serial fusion feature vector to realize Object representation;

4) classifier design and training:

After goal description completes, utilize the target type of support vector machine (Support Vector Machine, SVM) learning sample and realize target classification, training classifier is to build multi classifier; SVM is that one is of good performance supervision, discriminant machine learning method, by the off-line training of limited sample in early stage, seeks compromise, finally try to achieve a discriminant function between the complexity and learning ability of model.

SVM is typical binary classifier, and what more often need to realize is multicategory classification problem, and this method builds multi classifier by the method for the multiple binary classifier of training and solves appeal problem.The method of target classification is: build multi classifier by training the method for several binary classifiers, concrete training process is as follows: the i-th class training sample and remaining n-1 class training sample are carried out SVM between two respectively and trains, obtain multiple 1V1SVM sorter, then n class training sample has individual 1V1SVM sorter.

5) recognition methods of general object based on multi-feature fusion is as follows:

Two proper vectors 5-2) treating recognition object carry out feature-based fusion, and forming new serial fusion feature vector is Vec_3D2D=(Vec_3D, Vec_2D) ^t, realize Object representation;

5-3) serial fusion feature vector is inputted the 1V1SVM multi classifier trained, discriminant function obtains corresponding differentiation result, obtains the probability that this object belongs to the i-th class be designated as P (i), i ∈ [1 by ballot, n], wherein n represents the total class number of object;

5-4) judge by the value of maximum probability the class class that object to be identified is corresponding, mathematical formulae is:

class = \arg \max_{1 \leq i \leq n} {P (i)} - - - (1 - 12)

Embodiment 2 algorithm flow

Be illustrated in figure 2 the general object recognition algorithm schematic flow sheet merged based on 2D and 3D SIFT feature, the general object identification process that the present invention proposes mainly comprises off-line training and two stages of ONLINE RECOGNITION.Below for training link in process flow diagram and identifying that link is described in detail.

1. training algorithm flow process:

1.1 off-line training step:

1.1.1 after off-line training starts, for the image p that the i-th type objects in subject image storehouse is corresponding _ithe point cloud pc corresponding with the i-th type objects in object point cloud storehouse _i, i=1,2, K n, n represent training sample classification number, first extract 2D and the 3D SIFT feature that n class training sample is corresponding, are designated as F_R={f _i_ R, i=1,2, K n}, R ∈ (2D, 3D), wherein, f _i_ 2D is m _i* the set of eigenvectors of 128, f _i_ 3D is mc _i* the set of eigenvectors of 128, wherein m _iand mc _irepresent corresponding object 2D and 3D SIFT key point number, complete 2D and 3D SIFT feature and extract and represent.

1.1.2 utilize KMeans++ cluster, obtain namely corresponding vision word storehouse, sample clustering center and image vision word library and put cloud vision word storehouse, being designated as center={center _l, l=1,2, K k}, wherein k represents cluster centre number, center _lrepresent l, vision word storehouse vision word; The cluster centre that 2D and 3D SIFT feature descriptor is corresponding is center_2D and center_3D.

1.1.3 utilize BoW model method, obtain the i-th type objects BoW model, describe object with a multi-C vector.Add up the number of times that in each training sample proper vector, vision word occurs, be designated as (y ₀y ₁k y _k-2y _k-1), wherein y _lrepresent vision word center _lthe number of times occurred.Statistical method is: calculation training sampling feature vectors to the distance of center, if to center _ldistance minimum, then corresponding y _ladd 1.The BoW model eigenvectors that 2D and 3D SIFT feature descriptor is corresponding is Vec_2D and Vec_3D.

1.1.4 utilize feature-based fusion to realize Object representation, after merging, object features vector is

Vec_3D2D＝(Vec_3D,Vec_2D) ^T。

1.1.5 last 1V1SVM training is carried out to training sample, obtain corresponding discriminant function.Select linear core SVM of the present invention realizes multi classifier, and concrete training process is as follows: for the i-th type objects, and make it carry out SVM between two respectively with residue (n-1) type objects and train, obtain multiple 1V1SVM sorter, then n class training sample has individual 1V1SVM sorter.

2. recognizer flow process

ONLINE RECOGNITION stage partial, for image and the some cloud of object to be identified, first complete 2D and 3D SIFT feature extract and represent, obtain corresponding subject image BoW model and some cloud BoW model respectively, then feature-based fusion is utilized to realize Object representation, finally utilize n (n-1)/2 sorter Forecasting recognition result one by one of training and obtaining, obtain the probability P (i) that object to be identified belongs to the i-th class by voting, then finally identify that classification class is calculated by formula (1-12).

Embodiment 3 experimental result

The present invention tests the point cloud model of employing and RGB image comes from K.lai etc. (RGB-D dataset.http: //rgbd-dataset.cs.washington.edu/dataset.html, 2011-03-05. the document of correspondence is K.Lai, L.-F. ~ Bo, X.-F ~ Ren, D. ~ Fox, A Large-Scale Hierarchical Multi-View RGB-D Object Dataset, Proc.of IEEE Int.Conf.on Robotics and Autom., pp:1817--1824, Shanghai, China, 2011.) the large-scale cloud data storehouse set up, this database comprises the 51 classes point cloud model of totally 300 objects and RGB image, each object point cloud and image comprise 3 visual angles.Experimental technique a: object in each class of random selecting is as test sample book, and residue object is then as training sample, and every class training sample selection 100, test sample book is 60, all randomly draws from database.In order to assess the performance proposing algorithm herein, this part has carried out multiple experiment, adds up correct recognition rata in multiple situation, correct recognition rata computing method:

P = \frac{n_{r}}{N} - - - (1 - 13)

Wherein, P represents correct recognition rata, n _rrepresent in test sample book and correctly identify number, N represents total test sample book number.

3.1 experiment 1:3D SIFT correct recognition ratas

This experiment choose difference in class obviously, similarity is high between class 6 type objects test, and are respectively apple, tomato, banana, pitcher, cereal_box, kleenex.In this experiment, first 6 class training samples are trained, then test by test sample book.In existing numerous somes cloud features, PFHRGB and PFH is that [Alexandre L is Descriptors for Object and Category Recognition:a ComparativeEvaluation.In:Proceedings of IEEE International Conference on Intelligent Robotic Systems.Vilamoura A.3D for the good feature of discrimination, Portugal:IEEE, 2012.Vol.Workshop on Color-Depth Camera Fusion in Robotics, 1-6].In order to verify the advantage of 3D SIFT feature in this paper in object identification, carry out the contrast test of 3 kinds of Feature Descriptors under identical condition, often kind of Feature Descriptor all adopts SIFTKeypoint module to detect key point, then the proper vector of key point place different characteristic descriptor is calculated respectively, statistics correct recognition rata, experimental result is in table 1.

The each Feature Descriptor correct recognition rata of table 1

Colouring information is dissolved into PFH Feature Descriptor by PFHRGB, and as shown in Table 1, characteristic information has been enriched in the introducing of colouring information, improves object correct recognition rata.The 3D SIFT feature descriptor that this method proposes improves 9.72% and 6.94% respectively than PFH and PFHRGB in discrimination, demonstrates 3D SIFT feature descriptor based on the validity in the general object identification of point cloud model.

3.2 experiments 2: the correct recognition rata merged based on 2D and 3D SIFT feature

This problem of otherness of similar object between class can not be indicated preferably in order to overcome point cloud model, propose the general object identification method merged based on 2D and 3D SIFT feature level, experiment 2 is detailed comparisons's correct recognition rata of 2DSIFT, 3D SIFT and both feature-based fusion under identical condition, training sample is identical with experiment 1 with test sample book, and experimental result is in table 2.

Table 2 Feature Fusion Algorithm correct recognition rata

In order to represent convenient, represent that 2D and 3D SIFT feature level merges with 2D+3D SIFT.As shown in Table 2, compared with 2D SIFT, 3D SIFT discrimination improves 3.05%, and the introducing of visible depth information is of value to and realizes object identification.Due to the polytrope of object, there is out of true, uncertain and incomplete problem in the information that single features provides, make single features algorithm discrimination lower, the discrimination of 2D and 3D SIFT after average weighted merges is 93.06%, have a distinct increment than the discrimination of single features descriptor, illustrate that the general object recognition algorithm that the present invention proposes has obvious advantage in discrimination.

3.3 experiments 3: various features blending algorithm correct recognition rata

This experiment gives multiple blending algorithm recognition result, choose difference in class obviously, similarity is high between class 10 type objects, identifications is carried out to 2-10 class and tests, be respectively apple, tomato, banana, pitcher, cereal_box, kleenex, camera, coffee_mug, calculator, cell_phone.Compared for the correct recognition rata of 4 kinds of blending algorithms altogether, be respectively: the average weighted in feature-based fusion and decision level fusion merges, DSmT is theoretical and Murphy is regular.Experimental result as shown in Figure 3.

In Fig. 3, ave represents that average weighted merges, and horizontal ordinate represents classification number, and such as, " 6 " represent that this experiment comprises 6 type objects altogether, add up the correct recognition rata of this 6 type objects.As shown in Figure 3, when object classification number increases, (1) Feature Fusion Algorithm has higher correct recognition rata and stronger robustness than single features algorithm.In 4 kinds of blending algorithms, average weighted merge and DSmT theory fusion results relatively, lower than other two kinds of fusion methods; Compared with feature-based fusion, the result of 2D and 3D SIFT and both feature-based fusion is not improved, so adopt the method for feature-based fusion to complete the task of general object identification herein according to the result after the fusion of Murphy rule in 3 evidence sources totally.(2) the 3D SIFT feature descriptor that the present invention proposes has better recognition effect relative to PFHRGB and 2D SIFT feature descriptor; (3) often kind of recognizer discrimination declines all to some extent, and partly cause is the design of sorter, and the multi classifier that the present invention adopts is constructed by multiple 1V1SVM sorter, and the error of each sorter can be accumulated in final vote result.Along with the increase of object classification number, 1V1SVM sorter number increases sharply, and as 10 type objects have 45 sorters, the error in judgement of 45 sorters is added in final vote result the identification error that will cause to a great extent.

3.4 experiments 4: Algorithm robustness is tested

Still higher correct recognition rata and good robustness can be had similarity is high in order to verify that general object recognition algorithm otherness in class that the present invention proposes is large, between class, this Experimental comparison inhomogeneity but high similar (such as, apple and tomato), and similar but High Defferential (such as, pitcher) correct recognition rata of object in different characteristic expression situation, experimental result as shown in Figure 4.

To choose in pitcher class 3 different objects, be respectively the high circular stainless steel kettle of the high round ceramic kettle of 345mm, 230mm and the high round ceramic kettle of 130mm, in such class, difference is huge, PFHRGB is utilized to identify pitcher class, only have the discrimination of 70%, but now 3D SIFT can realize the discrimination of 96.67%.Alternative gets apple class and tomato class sample, and between this two classes class, similarity is high, and when adopting other single features to identify apple class, its discrimination is poor, but 3D SIFT can realize the discrimination of 71.67%.Contrast the discrimination curve that various feature is corresponding, can verify that between class similarity is high, under difference is large in class condition, 3D SIFT feature descriptor in this paper has higher discrimination compared with other Feature Descriptors, and has better robustness based on the method that 2D and 3D SIFT feature level merges than single features.

3.5 experiments 5: various visual angles experiment

In order to verify the robustness of this method for visual angle change, to 30 °, 3 visual angles of every type objects, 45 ° and 60 ° carry out contrast experiment, training sample is identical with experiment 1, from each visual angle of each class testing sample, Stochastic choice 60 is as new test sample book, namely each visual angle all comprises 6 classes totally 360 test sample books, and experimental result as shown in Figure 5.

As shown in Figure 5, compared with PFHRGB Feature Descriptor, 3D SIFT recognition effect is relatively accurate and stable; Compared with single features, Feature Fusion Algorithm discrimination when visual angle change of proposition maintains more than 90%, demonstrates this method for the validity of visual angle change and robustness.

3.6 experiments 6: size scaling

The object of this experiment investigates the validity of this method for scaling, and training sample database is identical with experiment 1, and convergent-divergent is carried out in test sample book storehouse on the basis of experiment 1, zooms to 1/2,1/3,1/4 respectively, adds up now object identification rate,

Experimental result is shown in Fig. 6.

As seen from Figure 6, when carrying out convergent-divergent to object, the blending algorithm that the present invention proposes is better than single features recognizer.But the recognition correct rate that each Feature Descriptor is corresponding declines all to some extent, when especially zooming to 1/4,2D SIFT feature descriptor correct recognition rata only has 49.54%, main cause is that parts of images such as apple original size only has 84*82, after convergent-divergent, substantially effective key point can not be detected.And now feature-based fusion algorithm in this paper still has the correct recognition rata of 63.05%.

3.7 experiments 7: time complexity

At i7-3770@3.4GHz CPU, under the experiment porch of 64 Win7 operating systems, the time that this experiment statistics utilizes different characteristic descriptor to complete identifying to expend, with experiment 1 test sample book identical, calculate an average object identification required time, experimental result is in table 3.

The table 3 different characteristic descriptor time compares

Point cloud model relative image, information is horn of plenty more, and the data volume comprised is large many, so the processing time is long.Carry out time complexity analysis to the recognizer that the present invention proposes, what whole identifying ratio consuming time was maximum is feature extraction and expression part.3D SIFT feature descriptor comprises critical point detection and key point feature interpretation two parts, if object point cloud quantity to be identified is n, critical point detection part-time complexity is O (octavesscalekn), because the yardstick scale of pyramid number of plies octaves, every one deck and key point neighborhood k is constant, so critical point detection part-time complexity is O (n); Calculate the feature interpretation vector of m (m<n) the individual key point detected, time complexity is O (mn), so 3D SIFT feature descriptor Algorithms T-cbmplexity is O (mn+n), ignore lower term, the time complexity of 3D SIFT is O (mn).As shown in Table 3, compared with PFHRGB, the average each test sample book of recognizer of the 3D SIFT recognizer that the present invention proposes and fusion 2D and 3D SIFT is consuming time decreases 34.75% and 22.01%, improves the recognizer performance based on point cloud model.

The above is only the preferred embodiment of the present invention; be noted that for those skilled in the art; under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1., based on the general object identification method that 2D and 3D SIFT feature merges, it is characterized in that: comprise the following steps:

1) feature extraction and expression:

2) Object representation:

3) Fusion Features:

4) classifier design and training:

5) object identification to be identified:

2. the general object identification method merged based on 2D and 3D SIFT feature according to claim 1, is characterized in that: step 1) in state object point cloud 3D SIFT feature extracting method comprise the following steps:

1-1) critical point detection:

L (x, y, z, σ)=G (x, y, z, σ) * P (x, y, z) (1) wherein σ is the metric space factor, and the three-dimensional Gaussian kernel function of change yardstick is:

G (x, y, z, σ) = \frac{1}{{(\sqrt{2 π} σ)}^{3}} e^{- (x^{2} + y^{2} + z^{2}) / 2 σ^{2}} - - - (2)

Utilize multiplication factor k to obtain different scale, build some cloud gaussian pyramid, if often organizing the number of plies in pyramid group is s, then k is set ^s=2; Utilize difference of Gaussian DoG function to carry out extremum extracting, obtain DoG Function Extreme Value point and be key point; DoG operator computing formula is:

D(x,y,z,k ⁱσ)＝L(x,y,z,k ⁱ⁺¹σ)-L(x,y,z,k ⁱσ) (3)

Wherein, i ∈ [0, s+2];

1-2) key point direction is distributed:

1-2-2) calculate the center point P of the k neighborhood of key point P _c;

d = \sqrt{x^{2} + y^{2} + z^{2}}

θ＝sin ^-1(z/d) (4)

1-2-4) use in statistics with histogram k neighborhood according to described step 1-2-3) in the vector magnitude d that calculates and angle i.e. direction, respectively will be divided into 18 sub-ranges and 36 sub-ranges, each sub-range is 10 °; Using amplitude d as weights, statistics angle shi Jinhang Gauss weighting wherein R _maxrepresent key point neighborhood maximum radius, ignore the point exceeding this distance;

1-3) key point feature interpretation:

The generative process of the Feature Descriptor of key point is as follows:

{(x^{'}, y^{'}, z^{'})}^{T} = (\begin{matrix} \cos α_{p} \cos β_{p} & - \sin α_{p} & - \cos α_{p} \sin β_{p} \\ \sin α_{p} \cos β_{p} & \cos α_{p} & - \sin α_{p} \sin β_{p} \\ \sin β_{p} & 0 & \cos β_{p} \end{matrix}) \cdot {(x, y, z)}^{T} - - - (5)

δ = \cos^{- 1} (\overset{&RightArrow;}{{PP}_{ki}} \cdot \overset{&RightArrow;}{n} / | \overset{&RightArrow;}{{PP}_{ki}} | | \overset{&RightArrow;}{n} |) - - - (6)

1-3-6) normalization characteristic vector: for proper vector F={f ₁, f ₂, L f ₁₂₈, be L={l after normalization ₁, l ₂, L l ₁₂₈, wherein so far, the 3D SIFT feature descriptor of key point is generated.

3. the general object identification method merged based on 2D and 3D SIFT feature according to claim 1, is characterized in that: described step 2) in the concrete grammar of Object representation be:

Utilize KMeans++ clustering method, obtain the vision word storehouse that sample clustering center is namely corresponding, be designated as center={center _l, l=1,2, K k}, wherein k represents cluster centre number, center _lrepresent l vision word in vision word storehouse; Recycling BoW model method, carries out Object representation with a multi-C vector.

4. the general object identification method merged based on 2D and 3D SIFT feature according to claim 1, it is characterized in that: described step 4) in, the method of target classification is: build multi classifier by training the method for several binary classifiers, concrete training process is as follows: the i-th class training sample and remaining n-1 class training sample are carried out SVM between two respectively and trains, obtain multiple 1V1SVM sorter, then n class training sample has individual 1V1SVM sorter.

5. the general object identification method merged based on 2D and 3D SIFT feature according to claim 2, is characterized in that: described step 1) in, the method obtaining described DoG Function Extreme Value point and key point is:

6. the general object identification method merged based on 2D and 3D SIFT feature according to claim 1, is characterized in that: described step 3) in, for sample O _ξ∈ O, wherein O is sample space, described sample O _ξcorresponding 2D and 3D SIFT feature vector is respectively Vec_2D and Vec_3D, obtains described sample O _ξserial fusion feature vector be Vec_3D2D=(Vec_3D, Vec_2D) ^t, utilize described serial fusion feature vector to realize Object representation.

7. the general object identification method merged based on 2D and 3D SIFT feature according to claim 1, is characterized in that: described step 5) in, the concrete grammar obtaining the recognition result of described object to be identified is:

class = \arg \max_{1 \leq i \leq n} {P (i)} - - - (7)