CN101980250B

CN101980250B - Method for identifying target based on dimension reduction local feature descriptor and hidden conditional random field

Info

Publication number: CN101980250B
Application number: CN201010515864.9A
Authority: CN
Inventors: 李超; 池毅韬; 郭信谊; 熊璋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2010-10-15
Filing date: 2010-10-15
Publication date: 2014-06-18
Anticipated expiration: 2030-10-15
Also published as: CN101980250A

Abstract

The invention provides a method for identifying a target based on a dimension reduction local feature descriptor and hidden conditional random field. The method is to establish a target identification model for identifying an object, and the model establishing process is a process that the model performs supervised training by using a training image as a sample, wherein each object in the training image corresponds to different label values. The method comprises the following steps of: calculating a descriptor vector of SIFT (Scale invariant feature transform) for the training images of different objects, wherein the descriptor vector corresponding to each image forms a corresponding high-dimensional vector set; performing dimension reduction on the SIFT set by adopting a Neighbor Preserving Embedding (NPE) method; and allowing the vector group subjected to dissension reduction and a label of the object corresponding to a source image to form a dualistic group, namely, each image has a corresponding dualistic group, and the set consisting of the dualistic groups can be used as a sample for training a hidden conditional random field model. An identifying process by the model, namely for a set test image comprises the following steps of: calculating the SIFT feature descriptor set of the test image; reducing dimension of the acquired SIFT set by the NPE method; inputting the vector set subjected to dimension reduction to the hidden conditional random field acquired by training; and outputting the final object label serving as an identification result.

Description

Based on the target identification method of dimensionality reduction local feature description and hidden conditional random fields

Technical field

The invention belongs to a kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields.Specifically, it is that local feature extraction, dimension reduction method and the hidden conditional random fields of combining image in current computer vision field carries out modeling and target image is sentenced to method for distinguishing.

Background technology

Target is identified as one of most important direction of computer vision field, is that follow-up various higher level processing example is as the basis of target classification, video frequency searching, behavior understanding etc.Existing many methods, comprising: based on changing detection, the detection based on feature modeling of profile, detection, the detection based on region method and detection based on frame difference method of the color rarity based on EM algorithm etc.Succinct and the easy to understand of these classical ways, but its effect can not be satisfactory.Use simple feature information to be not sufficient to object to differentiate, thereafter in the middle of the improvement algorithm occurring, owing to still also there is the characteristic of cancelling out each other between some feature, so comparatively successfully target identification method is up to this point all under certain scene.

Local feature is the feature extracting method of the nearest computer vision field rising, and is widely used in target identification, image registration, image retrieval, three-dimensional reconstruction field.Local feature has unchangeability for geometric transformation, illumination conversion, for noise, block and background interference all has good robustness, and between feature, has very high discrimination.

For target identification mission, the extraction of local feature has completed a most basic step.The information that local feature extracts comprises characteristic point information and descriptor information corresponding to unique point.Also need to be afterwards described sub coupling, mate screening, utilize the process of probability model just can complete target identification, this does not also comprise the process of establishing about Object representation word bank.And in the whole process of utilizing local feature to mate and then to identify, what also must use is institute's recognition object surface correspondence physically.

The present invention proposes a kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields.First it extract SIFT (Scale invariant feature transform to image, yardstick invariant features) Feature Descriptor, pass in the higher dimensional space that keeps SIFT descriptor is prerequisite, utilize neighbour to keep embedding (Neighbor Preserving Embedding, NPE) method is carried out dimensionality reduction to higher-dimension descriptor, set up hidden conditional random fields (Hidden Conditional Random Fields, HCRF) model and identify for target.

Summary of the invention

A kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields of the present invention, for the problem solving: the descriptor of SIFT feature that extracts image, descriptor is used to NPE method dimensionality reduction, and use hidden conditional random fields to carry out modeling and complete the task that target is identified.

A kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields that the present invention proposes, its target is to set up a Model of Target Recognition for object identification, comprises modeling and two processes of identification.Wherein the step of modeling comprises:

(1) every piece image of the object that includes corresponding label value of concentrating for training sample, extracts its SIFT Feature Descriptor;

(2) use NPE method to the higher-dimension SIFT Feature Descriptor dimensionality reduction extracting, obtain dimensionality reduction vector set later;

(3) the vector set after the dimensionality reduction that every piece image is corresponding, with a sample of the tag number composing training HCRF model of object, obtains can be used for the hidden conditional random fields model of recognition object through all sample learnings;

The step of identification comprises:

(1) the every piece image that includes corresponding object of concentrating for test sample book to be identified, extracts its SIFT Feature Descriptor;

(3) the vector set after the dimensionality reduction that every piece image is corresponding, the hidden conditional random fields model that input training obtains, output object tag number, as final recognition result.

Wherein, every piece image of the object that includes corresponding label value of concentrating for training sample, or the every piece image that includes corresponding object that test sample book to be identified is concentrated, extract corresponding SIFT feature, include feature point detection and descriptor and calculate two processes, wherein feature point detection step is:

(1) metric space extreme point detects: detect the extreme point on metric space, need to travel through the point in the image D (x, y, σ) after Gaussian difference (Difference-of-Gaussian, DoG) computing.D (x, y, σ) is expressed as

D(x，y，σ)＝(G(x，y，kσ)-G(x，y，σ))*I(x，y)

＝L(x，y，kσ)-L(x，y，σ)

Wherein k is the scale factor between adjacent two yardsticks.G (x, y, σ) is taking initial point as average, the Gaussian function that σ is mean square deviation, and L (x, y, σ) is called the Gaussian smoothing about variable dimension σ of piece image.I (x, y) represents source images, and * represents convolution algorithm.The relatively gray-scale value of each point in D (x, y, σ) and adjacent 8 points and upper and lower two-layer 9 adjacent points, if this corresponding grey scale value is adjacent area greatly or minimal value, set it as candidate's key point;

(2) accurate feature points location: if the Local Extremum detecting is X ₀=(x ₀, y ₀, σ), D (x, y, σ) is used to Taylor expansion, and to expansion differentiate, making derivative is 0, obtains corresponding to Local Extremum X ₀exact position

X_{acc} = [- (\frac{{&PartialD;}^{2} D}{{&PartialD; X}^{2}}) \frac{&PartialD; D}{&PartialD; X}] |_{X = X_{0};}

Descriptor calculation procedure comprises:

(1) principal direction is determined: the image L (x, y, σ) for each width through Gaussian smoothing, and the gradient amplitude m (x, y) of surrounding's point at unique point place and direction θ (x, y) are calculated by following two formulas:

m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}}

θ(x，y)＝arctan(L(x，y+1)-L(x，y-1))/(L(x+1，y)-L(x-1，y))

Be divided into 36 deciles by 0 °～360 °, 10 ° of each deciles, make the histogram about respective amplitude m (x, y) according to direction θ (x, y), and in corresponding histogram, the direction of peaked correspondence is as the principal direction of this unique point;

(2) descriptor calculates: centered by unique point, rotatable coordinate axis makes the Primary Direction Superposition of x direction and this unique point, get the window of 16 × 16 sizes, be divided into 4 × 4 the foursquare trellis of equalization region, use mean square deviation is that big or small i.e. 8 the Gaussian function of elongated half that calculates descriptor window used carries out weights distribution to the point in region, again for each region calculated level, vertically, principal diagonal, the each both sides of counter-diagonal in totally 8 directions about the histogram of Grad, Grad corresponding in each direction is as the one-component in Feature Descriptor, form the vector of 4 × 4 × 8=128 dimension, and do normalization and generate final descriptor vector.

Wherein, every piece image of the object that includes corresponding label value of concentrating for training sample, or the concentrated every piece image that includes corresponding object of test sample book to be identified, the SIFT Feature Descriptor of extraction adopts NPE method dimensionality reduction.NPE method can be summit to the high dimension vector taking identical dimensional, and phase mutual edge distance is that the vector in the non-directed graph of weights on limit carries out dimensionality reduction, and keeps the unchangeability of weights on limit; For given sequence vector x=[x ₁, x ₂..., x _m], the sequence vector after dimensionality reduction is y=[y ₁, y ₂.., y _m], by x _tto y _tmapping table be shown

wherein

d=r × c, d < < D, A _npebe the transition matrix of D × d dimension, its step is as follows:

(1) structure adjacent map: if G represents to have the figure of m node, t and the sequence number of s correspondence image in characteristic point sequence, construct in such a way adjacent map:

If a) x _tand x _sbelong to same source object, calculate Euclidean distance dist (t, s) between the two=|| x _t-x _s|; Otherwise, dist (t, s)=C, C is predefine constant;

If b) x _sbe positioned at x _tk neighbour within the scope of, at x _tto x _sbetween set up directed connection line;

(2) calculate weight matrix: each data can be formed by the vectorial linear combination reconstruct of its contiguous numbering, are meeting ∑ _sw _tsunder=1 prerequisite, minimize objective function ∑ _t|| x _t-∑ _sw _tsx _s||, the optimum obtaining represents the weight matrix W of local contiguous linear dependence, wherein, and W _tsrepresent x _tby its neighbor point x _saccording to the coefficient after space length normalization reconstruct;

(3) calculate projection matrix: minimize cost function Φ (Y)=∑ _t(y _t-∑ _sw _tsy _s) ²=a ^txMX ^ta, M=(I-W) ^t(I-W), I is unit matrix, and imposes restriction

converting vector a is by solving Generalized Characteristic Equation xMx ^ta=λ xx ^tthe minimal eigenvalue of a obtains, and supposes column vector a ₁, a ₂..., a _daccording to eigenvalue λ ₁≤ λ ₂≤ ...≤λ _dthe homographic solution of sequence, final mapping relations are expressed as

that D × d ties up matrix.

Hidden conditional random fields (Hidden Conditional Random Fields) model, can be according to the same dimension observation vector sequences y={ y of input ₁, y ₂..., y _mdifferentiate mark value z, and the parameter model of a hidden conditional random fields is made up of observation vector and the label value of hidden state layer, input, and HCRF utilizes following formula to carry out modeling and differentiation to the conditional probability of label:

P (z | y, θ, ω) = \underset{h}{Σ} (z, h | y, θ, ω) = \frac{Σ_{h} e^{Ψ (z, h, y θ, ω)}}{Σ_{z^{'} &Element; Z, h &Element; H} e^{Ψ (z^{'}, h, yθ, ω)}}

Wherein, h={h ₁, h ₂.., h _mcorresponding to observation sequence y, h _i∈ H, H represents the hidden state set likely occurring; Parameter is θ=[θ _h, θ _z, θ _e] and the potential-energy function Ψ (z, h, y: θ, ω) of window size ω be

Figure E is non-directed graph, and (j, k) represents a limit wherein, the corresponding hidden state in each summit in figure;

can represent the arbitrary characteristics in observation window; Parameter group θ=[θ _h, θ _z, θ _e] in, θ _hrepresent corresponding implicit state h _ithe parameter of ∈ H, θ _zthat measure is hidden state h _iand the compatibility between label z, θ _ewhat measure is the compatibility being connected between state j and k and label z;

(1) in the training process of HCRF model, parameter group θ=[θ _h, θ _z, θ _e] optimal value according to following formula determine

θ ^*＝argmax _θL(θ)

Wherein, estimation function L (θ) is

L (θ) = Σ_{i = 1}^{n} \log P (z_{i} | y_{i}, θ, w) - \frac{1}{2 {σ_{θ}}^{2}} {| | θ | |}^{2}

Wherein n represents total number of training sample sequence, and it is σ that parameter θ obeys variance _θ ²gaussian distribution;

(2) differentiation process, for the observation vector sequences y of input, the label value of differentiation

for

\tilde{z} = \arg \max_{z &Element; Z} P (z | y, ω, θ^{*}) .

Brief description of the drawings

Fig. 1 is the foundation of whole model and the process flow diagram of identifying.

Fig. 2 is gradient and the Gauss's weights areal map in 16 × 16 regions around unique point.

Fig. 3 is descriptor net result figure.

Fig. 4 is the hidden conditional random fields model schematic diagram of the single target that comprises 4 hidden states.

Concrete technical scheme

The foundation of model and the identifying of target are as shown in Figure 1.Image collection for training objective model of cognition comprises L object, wherein l object corresponding k again _lwidth training image.The source images img that one width comprises certain certain objects _ithe SIFT unique point set that obtains after calculating, wherein information corresponding to each unique point can be by element group representation more than:

Sift _j：＝<j，(x，y)，σ，θ，descriptor _128×1>

Wherein j represents the label of this unique point in set, and (x, y) represents the position of this unique point in source images, and σ represents corresponding yardstick information, and θ represents principal direction information, descriptor _{12 × 1}what represent is the descriptor vectors of 128 dimensions corresponding to this unique point.

The major part that determines characteristic point information is descriptor vector, and it is the Main Basis of matching process.For source images img _ithe SIFT feature calculating, extracts its descriptor part to form the set of descriptor vector:

SiftSet _i＝{SiftDescriptor _j}

＝{<j，descriptor _128×1>}

By NPE method, SIFT descriptor is carried out to dimensionality reduction again, wherein original dimension D=128, after dimensionality reduction, dimension is chosen d=6.Reduction process is expressed as

{SiftSet}_{t}^{(red)} = {SiftDescripto r_{j}}^{(red)}

- {< j, A_{npe} {desciptor}_{128 \times 1}}

Wherein A _npeit is dimensionality reduction transformation matrix.

The corresponding label value obj of each image that comprises certain source object _i, wherein obj _i=l, 1≤l≤L.The input set of training process be combined into

obj _i>, j=1 ..., n}, n represents total number of training image.According to training sample, to model, training obtains corresponding model parameter, and for the test pattern of input, through SIFT feature extraction, NPE reduction process obtains the vector set after dimensionality reduction, obtains the target recognition result of output after input model.

First SIFT feature extraction wants the Gaussian difference (DoG) of computed image Gaussian smoothing (LoG) and image.In the process of the Gaussian smoothing of computed image, need to use the concept of metric space.Metric space is divided into different layers, and every one deck is corresponding different sampling rate for image, and the sampling step length of calculating one deck is 1, and it is that the sampling step length of 4, the k layers is 2 that the second layer is 2, the three layers ^k-1.And in every one deck, be divided into again S level, wherein on s level, the corresponding mean square deviation for level and smooth Gaussian function is σ _s=2 ^s/Sσ ₀, wherein σ ₀=16, S gets 5 conventionally.In one deck, two adjacent levels do the poor Gaussian difference that obtains image, and the number of the Gaussian difference image in every layer is S-1.Judge whether certain in Gaussian difference is a bit possible unique point, will by it with in this level and upper and lower two levels the value of totally 26 points compare, if extreme point, just elect candidate's unique point as.

Unique point is around for calculating the gradient group of descriptor and corresponding Gauss's weights distribution range as shown in Figure 2.In Gaussian smoothing image, get unique point 16 × 16 points around, according to

m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}}

θ(x，y)＝arctan(L(x，y+1)-L(x，y?1))/(L(x+1，y)L(x-1，y))

Be identified for amplitude corresponding to each position and direction.Use mean square deviation is that big or small i.e. 8 the Gaussian function of elongated half that calculates descriptor window used carries out weights distribution to the point in region, to strengthen the indeformable of illumination and Geometrical change.Again for each region calculated level, vertically, principal diagonal, the each both sides of counter-diagonal in totally 8 directions about the histogram of Grad, Grad corresponding in each direction is as the one-component in Feature Descriptor, form the vector of 4 × 4 × 8=128 dimension, and do naturalization and generate final descriptor vector.As shown in Figure 3.

Shown in Fig. 4 be in the hidden conditional random fields that single body is corresponding abstract model out.Model is divided into three layers: last layer representative be label corresponding to target, during for training pattern, it is the part of input, in identifying, it is final Output rusults; Middle one deck is the non-directed graph that hidden state forms, in these hidden states, between every two summits that represent hidden state, all have a limit, and weights on value and the limit of the value on limit between object label and hidden state, hidden state are all constantly adjusted to complete in training process; Each hidden state is a corresponding observation vector again, and for the present invention, what these observation vectors were corresponding is the vector set after NPE method dimensionality reduction.In Fig. 4, only include 4 hidden states, corresponding 4 observation vectors.And in real process, typically, the image about object that one width comprises certain texture extracts through SIFT algorithm, possess even thousands of unique point descriptors of hundreds of, thereby vector set after corresponding similar number dimensionality reduction, it has formed the importation of training and the identification of the Model of Target Recognition based on HCRF.

Claims

1. the target identification method based on dimensionality reduction local feature description and hidden conditional random fields, it is characterized in that: its target is to set up a Model of Target Recognition for object identification, comprise model foundation and two stages of object identification, wherein the step of modeling comprises:

Every piece image of the object that includes corresponding label value of 1.1, concentrating for training sample, extracts its yardstick invariant features SIFT Feature Descriptor;

1.2, use neighbour to keep embedding grammar to the higher-dimension yardstick invariant features SIFT Feature Descriptor dimensionality reduction extracting, obtain dimensionality reduction vector set later;

1.3, the vector set after the dimensionality reduction that every piece image is corresponding, with a sample of the tag number composing training hidden conditional random fields model of object, obtains can be used for the hidden conditional random fields model of recognition object through all sample learnings; The step of identification comprises:

2.1,, for the every piece image that includes corresponding object in test set, extract its yardstick invariant features SIFT Feature Descriptor;

2.2, use neighbour to keep embedding grammar to the higher-dimension yardstick invariant features SIFT Feature Descriptor dimensionality reduction extracting, obtain dimensionality reduction vector set later;

2.3, the vector set after the dimensionality reduction that every piece image is corresponding, the hidden conditional random fields model that input training obtains, output object tag number, as final recognition result;

Wherein, every piece image of the object that includes corresponding label value of concentrating for training sample, or the every piece image that includes corresponding object that test sample book to be identified is concentrated, extract corresponding yardstick invariant features SIFT feature, include feature point detection and descriptor and calculate two processes, wherein feature point detection step is:

3.1, metric space extreme point detects: detect the extreme point on metric space, need to travel through the point in the image D (x, y, σ) after Gaussian difference computing; D (x, y, σ) is expressed as

D(x,y,σ)=(G(x,y,kσ)-G(x,y,σ))*I(x,y)

=L(x,y,kσ)-L(x,y,σ)

Wherein k is the scale factor between adjacent two yardsticks; G (x, y, σ) is taking initial point as average, the Gaussian function that σ is mean square deviation, and L (x, y, σ) is called the Gaussian smoothing about variable dimension σ of piece image; I (x, y) represents source images, and * represents convolution algorithm; The relatively gray-scale value of each point in D (x, y, σ) and adjacent 8 points and upper and lower two-layer 9 adjacent points, if very big or minimal value, the key point using this as candidate that this corresponding grey scale value is adjacent area;

3.2, accurate feature points location: if the Local Extremum detecting is X ₀=(x ₀, y ₀, σ), D (x, y, σ) is used to Taylor expansion, and to expansion differentiate, making derivative is 0, obtains corresponding to Local Extremum X ₀exact position

Descriptor calculation procedure comprises:

4.1, principal direction is determined: the image L (x, y, σ) for each width through Gaussian smoothing, and the gradient amplitude m (x, y) of surrounding's point at unique point place and direction θ (x, y) are calculated by following two formulas:

θ(x,y)=arctan(L(x,y+1)-L(x,y-1))/(L(x+1,y)-L(x-1,y))

4.2, descriptor calculates: centered by unique point, rotatable coordinate axis makes the Primary Direction Superposition of x direction and this unique point, get the window of 16 × 16 sizes, be divided into 4 × 4 the foursquare trellis of equalization region, use mean square deviation is that big or small i.e. 8 the Gaussian function of elongated half that calculates descriptor window used carries out weights distribution to the point in region, for each region calculated level, vertically, principal diagonal, the each both sides of counter-diagonal in totally 8 directions about the histogram of Grad, Grad corresponding in each direction is as the one-component in Feature Descriptor, form the vector of 4 × 4 × 8=128 dimension, and do normalization and generate final descriptor vector,

Wherein, described neighbour keeps embedding grammar, is summit to the high dimension vector taking identical dimensional, and phase mutual edge distance is that the non-directed graph of weights on limit carries out dimensionality reduction, and keeps the unchangeability of weights on limit; For given sequence vector x=[x ₁, x ₂..., x _m], the sequence vector after dimensionality reduction is y=[y ₁, y ₂..., yx], by x _tto y _tmapping table be shown

wherein

d=r × c, d<<D, A _npebe the transition matrix of D × d dimension, its step is as follows:

5.1, structure adjacent map: if G represents to have the figure of m node, t and the sequence number of s correspondence image in characteristic point sequence, construct in such a way adjacent map:

If a) x _tand x _sbelong to same source object, calculate Euclidean distance dist (t, s) between the two=|| x _t-x _s||; Otherwise, dist (t, s)=C, C is predefine constant;

5.2, calculate weight matrix: each data are formed by the vectorial linear combination reconstruct of its contiguous numbering, are meeting ∑ _sw _tsunder=1 prerequisite, minimize objective function ∑ _t|| x _t-∑ _sw _tsx _s||, the optimum obtaining represents the weight matrix W of local contiguous linear dependence, wherein, and W _tsrepresent x _tby its neighbor point x _saccording to the coefficient after space length normalization reconstruct;

5.3, calculate projection matrix: minimize cost function Φ (Y)=∑ _t(y _t-∑ _sw _tsy _s) ²=a ^txMX ^ta, M=(I-W) ^t(I-W), I is unit matrix, and imposes restriction

converting vector a is by solving Generalized Characteristic Equation xMx ^ta=λ xx ^tthe minimal eigenvalue of a obtains, and supposes column vector a ₁, a ₂..., a _daccording to eigenvalue λ ₁≤ λ ₂≤ ... ≤ λ _dthe homographic solution of sequence, final mapping relations are expressed as

that D × d ties up matrix;

Wherein, described hidden conditional random fields model, according to the same dimension observation vector sequences y={ y of input ₁, y ₂..., y _mdifferentiate label value z, and the parameter model of a hidden conditional random fields is made up of observation vector and the label value of hidden state layer, input, and hidden conditional random fields HCRF utilizes following formula to carry out modeling and differentiation to the conditional probability of label:

Wherein, h={h ₁, h ₂..., h _mcorresponding to observation sequence y, h _i∈ H, H represents the hidden state set likely occurring; Parameter is θ=[θ _h, θ _z, θ _e] and the potential-energy function Ψ (z, h, y: θ, ω) of window size ω be

Figure E is non-directed graph, and (j, k) represents a limit wherein, the corresponding hidden state in each summit in figure; represent the arbitrary characteristics in observation window; Parameter group θ=[θ _h, θ _z, θ _e] in, θ _hrepresent corresponding implicit state h _ithe parameter of ∈ H, θ _zthat measure is hidden state h _iand the compatibility between label z, θ _ewhat measure is the compatibility being connected between state j and k and label z;

6.1, in the training process of hidden conditional random fields model model, parameter group θ=[θ _h, θ _z, θ _e] optimal value according to following formula determine

θ ^*=argmax _θL(θ)

Wherein, estimation function L (θ) is

6.2, differentiation process, for the observation vector sequences y of input, the label value of differentiation

for

Figure 2010105158649100001DEST_PATH_IMAGE002

。