Based on the target identification method of dimensionality reduction local feature description and hidden conditional random fields
Technical field
The invention belongs to a kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields.Specifically, it is that local feature extraction, dimension reduction method and the hidden conditional random fields of combining image in current computer vision field carries out modeling and target image is sentenced to method for distinguishing.
Background technology
Target is identified as one of most important direction of computer vision field, is that follow-up various higher level processing example is as the basis of target classification, video frequency searching, behavior understanding etc.Existing many methods, comprising: based on changing detection, the detection based on feature modeling of profile, detection, the detection based on region method and detection based on frame difference method of the color rarity based on EM algorithm etc.Succinct and the easy to understand of these classical ways, but its effect can not be satisfactory.Use simple feature information to be not sufficient to object to differentiate, thereafter in the middle of the improvement algorithm occurring, owing to still also there is the characteristic of cancelling out each other between some feature, so comparatively successfully target identification method is up to this point all under certain scene.
Local feature is the feature extracting method of the nearest computer vision field rising, and is widely used in target identification, image registration, image retrieval, three-dimensional reconstruction field.Local feature has unchangeability for geometric transformation, illumination conversion, for noise, block and background interference all has good robustness, and between feature, has very high discrimination.
For target identification mission, the extraction of local feature has completed a most basic step.The information that local feature extracts comprises characteristic point information and descriptor information corresponding to unique point.Also need to be afterwards described sub coupling, mate screening, utilize the process of probability model just can complete target identification, this does not also comprise the process of establishing about Object representation word bank.And in the whole process of utilizing local feature to mate and then to identify, what also must use is institute's recognition object surface correspondence physically.
The present invention proposes a kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields.First it extract SIFT (Scale invariant feature transform to image, yardstick invariant features) Feature Descriptor, pass in the higher dimensional space that keeps SIFT descriptor is prerequisite, utilize neighbour to keep embedding (Neighbor Preserving Embedding, NPE) method is carried out dimensionality reduction to higher-dimension descriptor, set up hidden conditional random fields (Hidden Conditional Random Fields, HCRF) model and identify for target.
Summary of the invention
A kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields of the present invention, for the problem solving: the descriptor of SIFT feature that extracts image, descriptor is used to NPE method dimensionality reduction, and use hidden conditional random fields to carry out modeling and complete the task that target is identified.
A kind of target identification method based on dimensionality reduction local feature description and hidden conditional random fields that the present invention proposes, its target is to set up a Model of Target Recognition for object identification, comprises modeling and two processes of identification.Wherein the step of modeling comprises:
(1) every piece image of the object that includes corresponding label value of concentrating for training sample, extracts its SIFT Feature Descriptor;
(2) use NPE method to the higher-dimension SIFT Feature Descriptor dimensionality reduction extracting, obtain dimensionality reduction vector set later;
(3) the vector set after the dimensionality reduction that every piece image is corresponding, with a sample of the tag number composing training HCRF model of object, obtains can be used for the hidden conditional random fields model of recognition object through all sample learnings;
The step of identification comprises:
(1) the every piece image that includes corresponding object of concentrating for test sample book to be identified, extracts its SIFT Feature Descriptor;
(2) use NPE method to the higher-dimension SIFT Feature Descriptor dimensionality reduction extracting, obtain dimensionality reduction vector set later;
(3) the vector set after the dimensionality reduction that every piece image is corresponding, the hidden conditional random fields model that input training obtains, output object tag number, as final recognition result.
Wherein, every piece image of the object that includes corresponding label value of concentrating for training sample, or the every piece image that includes corresponding object that test sample book to be identified is concentrated, extract corresponding SIFT feature, include feature point detection and descriptor and calculate two processes, wherein feature point detection step is:
(1) metric space extreme point detects: detect the extreme point on metric space, need to travel through the point in the image D (x, y, σ) after Gaussian difference (Difference-of-Gaussian, DoG) computing.D (x, y, σ) is expressed as
D(x,y,σ)=(G(x,y,kσ)-G(x,y,σ))*I(x,y)
=L(x,y,kσ)-L(x,y,σ)
Wherein k is the scale factor between adjacent two yardsticks.G (x, y, σ) is taking initial point as average, the Gaussian function that σ is mean square deviation, and L (x, y, σ) is called the Gaussian smoothing about variable dimension σ of piece image.I (x, y) represents source images, and * represents convolution algorithm.The relatively gray-scale value of each point in D (x, y, σ) and adjacent 8 points and upper and lower two-layer 9 adjacent points, if this corresponding grey scale value is adjacent area greatly or minimal value, set it as candidate's key point;
(2) accurate feature points location: if the Local Extremum detecting is X
0=(x
0, y
0, σ), D (x, y, σ) is used to Taylor expansion, and to expansion differentiate, making derivative is 0, obtains corresponding to Local Extremum X
0exact position
Descriptor calculation procedure comprises:
(1) principal direction is determined: the image L (x, y, σ) for each width through Gaussian smoothing, and the gradient amplitude m (x, y) of surrounding's point at unique point place and direction θ (x, y) are calculated by following two formulas:
θ(x,y)=arctan(L(x,y+1)-L(x,y-1))/(L(x+1,y)-L(x-1,y))
Be divided into 36 deciles by 0 °~360 °, 10 ° of each deciles, make the histogram about respective amplitude m (x, y) according to direction θ (x, y), and in corresponding histogram, the direction of peaked correspondence is as the principal direction of this unique point;
(2) descriptor calculates: centered by unique point, rotatable coordinate axis makes the Primary Direction Superposition of x direction and this unique point, get the window of 16 × 16 sizes, be divided into 4 × 4 the foursquare trellis of equalization region, use mean square deviation is that big or small i.e. 8 the Gaussian function of elongated half that calculates descriptor window used carries out weights distribution to the point in region, again for each region calculated level, vertically, principal diagonal, the each both sides of counter-diagonal in totally 8 directions about the histogram of Grad, Grad corresponding in each direction is as the one-component in Feature Descriptor, form the vector of 4 × 4 × 8=128 dimension, and do normalization and generate final descriptor vector.
Wherein, every piece image of the object that includes corresponding label value of concentrating for training sample, or the concentrated every piece image that includes corresponding object of test sample book to be identified, the SIFT Feature Descriptor of extraction adopts NPE method dimensionality reduction.NPE method can be summit to the high dimension vector taking identical dimensional, and phase mutual edge distance is that the vector in the non-directed graph of weights on limit carries out dimensionality reduction, and keeps the unchangeability of weights on limit; For given sequence vector x=[x
1, x
2..., x
m], the sequence vector after dimensionality reduction is y=[y
1, y
2.., y
m], by x
tto y
tmapping table be shown
wherein
d=r × c, d < < D, A
npebe the transition matrix of D × d dimension, its step is as follows:
(1) structure adjacent map: if G represents to have the figure of m node, t and the sequence number of s correspondence image in characteristic point sequence, construct in such a way adjacent map:
If a) x
tand x
sbelong to same source object, calculate Euclidean distance dist (t, s) between the two=|| x
t-x
s|; Otherwise, dist (t, s)=C, C is predefine constant;
If b) x
sbe positioned at x
tk neighbour within the scope of, at x
tto x
sbetween set up directed connection line;
(2) calculate weight matrix: each data can be formed by the vectorial linear combination reconstruct of its contiguous numbering, are meeting ∑
sw
tsunder=1 prerequisite, minimize objective function ∑
t|| x
t-∑
sw
tsx
s||, the optimum obtaining represents the weight matrix W of local contiguous linear dependence, wherein, and W
tsrepresent x
tby its neighbor point x
saccording to the coefficient after space length normalization reconstruct;
(3) calculate projection matrix: minimize cost function Φ (Y)=∑
t(y
t-∑
sw
tsy
s)
2=a
txMX
ta, M=(I-W)
t(I-W), I is unit matrix, and imposes restriction
converting vector a is by solving Generalized Characteristic Equation xMx
ta=λ xx
tthe minimal eigenvalue of a obtains, and supposes column vector a
1, a
2..., a
daccording to eigenvalue λ
1≤ λ
2≤ ...≤λ
dthe homographic solution of sequence, final mapping relations are expressed as
that D × d ties up matrix.
Hidden conditional random fields (Hidden Conditional Random Fields) model, can be according to the same dimension observation vector sequences y={ y of input
1, y
2..., y
mdifferentiate mark value z, and the parameter model of a hidden conditional random fields is made up of observation vector and the label value of hidden state layer, input, and HCRF utilizes following formula to carry out modeling and differentiation to the conditional probability of label:
Wherein, h={h
1, h
2.., h
mcorresponding to observation sequence y, h
i∈ H, H represents the hidden state set likely occurring; Parameter is θ=[θ
h, θ
z, θ
e] and the potential-energy function Ψ (z, h, y: θ, ω) of window size ω be
Figure E is non-directed graph, and (j, k) represents a limit wherein, the corresponding hidden state in each summit in figure;
can represent the arbitrary characteristics in observation window; Parameter group θ=[θ
h, θ
z, θ
e] in, θ
hrepresent corresponding implicit state h
ithe parameter of ∈ H, θ
zthat measure is hidden state h
iand the compatibility between label z, θ
ewhat measure is the compatibility being connected between state j and k and label z;
(1) in the training process of HCRF model, parameter group θ=[θ
h, θ
z, θ
e] optimal value according to following formula determine
θ
*=argmax
θL(θ)
Wherein, estimation function L (θ) is
Wherein n represents total number of training sample sequence, and it is σ that parameter θ obeys variance
θ 2gaussian distribution;
(2) differentiation process, for the observation vector sequences y of input, the label value of differentiation
for
Brief description of the drawings
Fig. 1 is the foundation of whole model and the process flow diagram of identifying.
Fig. 2 is gradient and the Gauss's weights areal map in 16 × 16 regions around unique point.
Fig. 3 is descriptor net result figure.
Fig. 4 is the hidden conditional random fields model schematic diagram of the single target that comprises 4 hidden states.
Concrete technical scheme
The foundation of model and the identifying of target are as shown in Figure 1.Image collection for training objective model of cognition comprises L object, wherein l object corresponding k again
lwidth training image.The source images img that one width comprises certain certain objects
ithe SIFT unique point set that obtains after calculating, wherein information corresponding to each unique point can be by element group representation more than:
Sift
j:=<j,(x,y),σ,θ,descriptor
128×1>
Wherein j represents the label of this unique point in set, and (x, y) represents the position of this unique point in source images, and σ represents corresponding yardstick information, and θ represents principal direction information, descriptor
12 × 1what represent is the descriptor vectors of 128 dimensions corresponding to this unique point.
The major part that determines characteristic point information is descriptor vector, and it is the Main Basis of matching process.For source images img
ithe SIFT feature calculating, extracts its descriptor part to form the set of descriptor vector:
SiftSet
i={SiftDescriptor
j}
={<j,descriptor
128×1>}
By NPE method, SIFT descriptor is carried out to dimensionality reduction again, wherein original dimension D=128, after dimensionality reduction, dimension is chosen d=6.Reduction process is expressed as
Wherein A
npeit is dimensionality reduction transformation matrix.
The corresponding label value obj of each image that comprises certain source object
i, wherein obj
i=l, 1≤l≤L.The input set of training process be combined into
obj
i>, j=1 ..., n}, n represents total number of training image.According to training sample, to model, training obtains corresponding model parameter, and for the test pattern of input, through SIFT feature extraction, NPE reduction process obtains the vector set after dimensionality reduction, obtains the target recognition result of output after input model.
First SIFT feature extraction wants the Gaussian difference (DoG) of computed image Gaussian smoothing (LoG) and image.In the process of the Gaussian smoothing of computed image, need to use the concept of metric space.Metric space is divided into different layers, and every one deck is corresponding different sampling rate for image, and the sampling step length of calculating one deck is 1, and it is that the sampling step length of 4, the k layers is 2 that the second layer is 2, the three layers
k-1.And in every one deck, be divided into again S level, wherein on s level, the corresponding mean square deviation for level and smooth Gaussian function is σ
s=2
s/Sσ
0, wherein σ
0=16, S gets 5 conventionally.In one deck, two adjacent levels do the poor Gaussian difference that obtains image, and the number of the Gaussian difference image in every layer is S-1.Judge whether certain in Gaussian difference is a bit possible unique point, will by it with in this level and upper and lower two levels the value of totally 26 points compare, if extreme point, just elect candidate's unique point as.
Unique point is around for calculating the gradient group of descriptor and corresponding Gauss's weights distribution range as shown in Figure 2.In Gaussian smoothing image, get unique point 16 × 16 points around, according to
θ(x,y)=arctan(L(x,y+1)-L(x,y?1))/(L(x+1,y)L(x-1,y))
Be identified for amplitude corresponding to each position and direction.Use mean square deviation is that big or small i.e. 8 the Gaussian function of elongated half that calculates descriptor window used carries out weights distribution to the point in region, to strengthen the indeformable of illumination and Geometrical change.Again for each region calculated level, vertically, principal diagonal, the each both sides of counter-diagonal in totally 8 directions about the histogram of Grad, Grad corresponding in each direction is as the one-component in Feature Descriptor, form the vector of 4 × 4 × 8=128 dimension, and do naturalization and generate final descriptor vector.As shown in Figure 3.
Shown in Fig. 4 be in the hidden conditional random fields that single body is corresponding abstract model out.Model is divided into three layers: last layer representative be label corresponding to target, during for training pattern, it is the part of input, in identifying, it is final Output rusults; Middle one deck is the non-directed graph that hidden state forms, in these hidden states, between every two summits that represent hidden state, all have a limit, and weights on value and the limit of the value on limit between object label and hidden state, hidden state are all constantly adjusted to complete in training process; Each hidden state is a corresponding observation vector again, and for the present invention, what these observation vectors were corresponding is the vector set after NPE method dimensionality reduction.In Fig. 4, only include 4 hidden states, corresponding 4 observation vectors.And in real process, typically, the image about object that one width comprises certain texture extracts through SIFT algorithm, possess even thousands of unique point descriptors of hundreds of, thereby vector set after corresponding similar number dimensionality reduction, it has formed the importation of training and the identification of the Model of Target Recognition based on HCRF.