CN102629328B - Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color - Google Patents

Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color Download PDF

Info

Publication number
CN102629328B
CN102629328B CN 201210062379 CN201210062379A CN102629328B CN 102629328 B CN102629328 B CN 102629328B CN 201210062379 CN201210062379 CN 201210062379 CN 201210062379 A CN201210062379 A CN 201210062379A CN 102629328 B CN102629328 B CN 102629328B
Authority
CN
China
Prior art keywords
training image
image
train
sift
hsv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201210062379
Other languages
Chinese (zh)
Other versions
CN102629328A (en
Inventor
杨金福
王锴
李明爱
王阳丽
杨宛露
傅金融
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maowao Technology (Tianjin) Co., Ltd.
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN 201210062379 priority Critical patent/CN102629328B/en
Publication of CN102629328A publication Critical patent/CN102629328A/en
Application granted granted Critical
Publication of CN102629328B publication Critical patent/CN102629328B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a probabilistic latent semantic model object image recognition method with the fusion of a significant characteristic of a color, belonging to the field of image recognition technology. The method is characterized by: using an SIFT algorithm to extract a local significant characteristic of an image, adding a color characteristic simultaneously, generating a HSV_SIFT characteristic, introducing TF_IDF weight information to carry out characteristic reconstruction such that the local significant characteristic has discrimination more, using a latent semantic characteristicmodel to obtain an image latent semantic characteristic, and finally using a nearest neighbor KNN classifier to carry out classification. According to the method, not only is color information of theimage considered, but also the distribution of a visual word in a whole image set is fully considered, thus the local significant characteristic of an object has discrimination more, and the ability of recognition is raised.

Description

A kind of notable feature probability latent semantic model subject image recognition methods of Fusion of Color
Technical field
The invention belongs to the image recognition technology field, introduce a kind of notable feature probability latent semantic model subject image recognition methods of Fusion of Color information.When extracting the image notable feature, add colouring information, introducing TF-IDF(term frequency – inverse document frequency) word frequency weight statistical method makes local notable feature have more discrimination, obtain on this basis the potential semantic feature of image according to latent semantic model, dwindle the semantic gap that exists in the object identification, the identification problem of easier solution image.
Background technology
Current, the mobile robot has been widely applied to the numerous areas such as industry, space flight, military affairs, service.Along with the expansion of application, people are also more and more higher to mobile robot's intelligent requirement.Intelligent independent formula mobile robot has become the study hotspot in intelligence system field.Because the robotic vision system is near the mode of human perception environment, and can provide abundant perception information for the mobile robot, therefore, attracted a large amount of researchists to participate in based on mobile robot's environment sensing problem of vision.Wherein object identification is basis and the core of mobile robot technology, also is to improve the intelligentized gordian technique of mobile robot.Because in circumstances not known, the mobile robot need to obtain by vision sensor the image of surrounding environment, then the object in the image is identified and is understood, and then carry out corresponding task.
Feature extraction is a very important link in the subject image identifying, its objective is to finish the conversion of image information from the data space to the feature space.In some sense, for the subject image identification mission, feature extraction result's quality has played vital effect to recognition result.And the feature of image local is with its superior performance, and more and more studied personnel pay close attention to.
Generally, local notable feature has comprised human interested important goal, can express the content of image.If give different processing priority for different characteristics of image, can not only reduce the complexity of analytic process, and can improve the efficient of analytical calculation.Harris.C.J in 1988, Stephens.M.A combined corner and edge detector.Proc.4th Alvey Vision Conferences, 1988:147-151 describes point of interest based on Moravec, utilize the autocorrelation matrix of luminance function to realize detection to unique point (angle point), and centered by point of interest the abstract image local feature; Lindeberg.T.Feature detection with automatic scale selection.International Journal of Computer Vision in 1998,1998,30 (2): 79-116 has used the method for automatic scale selection to come extract minutiae, add unique point yardstick information, when determining characteristic point position, also determined the characteristic dimension of this point; Calendar year 2001 Mikolajczyk.C.S.K.Indexing based on scale invariant interest points.Proc.8th International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc, 2001:525-531 utilizes the Laplace operator to detect the yardstick of Harris angle point, made up a kind of Harris-Laplace operator with yardstick unchangeability, and the Harris-Laplace operator has been expanded to the Harris-affine operator with affine unchangeability; David G.Lowe.Distinctive image features from scale-invariant interest points.International Journal of Computer Vision in 2004,2004,60 (2): 91-110 utilizes DOG(Difference of Gaussian) thereby operator replaces the Laplace operator to improve the speed that point of interest detects, and proposes and perfect SIFT(Scale Invariant Feature Transform) algorithm.K.Mikolajczyk in 2005, C.Schmid.A performance evaluation of local descriptors.IEEE Transactions on pattern Analysis and Machine Intelligence, 2005,27 (10): 1615-1630 is for the whole bag of tricks of feature extraction, finds that the performance of SIFT algorithm in the situations such as illumination variation, image geometry distortion, differences in resolution, rotation, fuzzy and compression of images is best.Yet still there is the limitation of some in the method, at first traditional SIFT algorithm has only utilized pixel grey scale and the gradient information of image, the chromatic information of having ignored image, thus it inevitably for those close on the gray scale and on colouring information differentiated image produce the mistake coupling.
Can tentatively obtain the local notable feature of each subject image by top method, behind the process vector quantization, these features just can image think some vision words, because every width of cloth image all is comprised of a large amount of vision words, subject image identification is sought its corresponding image with regard to being similar to the frequency that occurs according to certain class visual word, the word bag model of design of graphics picture (Bag of Words, BOW) thus.And traditional BOW model has only utilized the information of vision word at single image, does not but take into full account the distribution situation of vision word in whole image collection.
Traditional probability latent semantic analysis method (Probabilistic latent semantic analysis, PLSA) is applied in the document information retrieval field at first, and the potential theme that calculates each document distributes.Because image and text have very large similarity, therefore adopt similar principle, the PLSA method also can be used for problem of image recognition, calculates the potential theme of every width of cloth image.
The present invention is based on the local notable feature semantic feature of Fusion of Color and carries out subject image identification.Utilize the local notable feature of SIFT algorithm extraction image and add color characteristic, these characteristic highly significants and relatively easily obtaining, in huge property data base, be easy to pick out object to be detected, after extracting local notable feature, add the TF-IDF weight information and carry out feature reconstruction, make local notable feature have more discrimination, finally utilize latent semantic model to obtain the potential semantic feature of image and finish the subject image identification mission.
Summary of the invention
The present invention is by adding color characteristic in the SIFT notable feature, and introduces the weight information of vision word in visual dictionary, designed and Implemented whole subject image recognition methods.Because traditional SIFT algorithm has only utilized pixel grey scale and the gradient information of image, the chromatic information of having ignored image, thereby it inevitably for those close on the gray scale and on colouring information differentiated image produce the mistake coupling, color space commonly used is the RGB color space, but the distance between the color that rgb space calculates can not characterize the real difference between two kinds of colors that people's reality perceives well.Therefore, this patent hsv color model that meets visual characteristics of human eyes, the method after the improvement has overcome the shortcoming of above classic method.Traditional BOW model is not considered the distribution situation of vision word in whole image collection simultaneously, has just utilized the information of vision word at single image.The present invention introduces the TF-IDF statistical method, and the method is common in information retrieval and text mining, in order to assess a words for the significance level of a file in a file set or the corpus.Behind vector quantization, can regard a vision word as with each notable feature of extracting in the sampled images, each width of cloth image just can be thought a document, introduce the TF-IDF weighing computation method, just considered simultaneously the distribution of vision word in single image and whole image collection.If the frequency that certain vision word occurs in piece image is high, and in other images, seldom occur, think that then this vision word has good class discrimination ability, be fit to classification.The recycling latent semantic model calculates the potential semantic feature of all images, dwindles the semantic gap that exists in the subject image identification, the identification problem of easier solution complicated image.
The invention is characterized in, in computing machine, realize according to the following steps successively:
In the robotic training stage, train according to the following steps:
Step (1) structure tranining database, the N kind that computer acquisition and input are divided by the object purposes, the subject image that classification is numbered 1~N comprise T width of cloth training image in every type objects image, and P is used in the set of structure training image TrainExpression adds up to: N * T=Q width of cloth image;
Step (2) adopts yardstick invariant features change algorithm according to the following steps, and namely the SIFT algorithm calculates described training image set P TrainIn the remarkable characteristic of every width of cloth training image, add colouring information and generate notable feature, represent with HSV_SIFT, thereby form the HSV_SIFT notable feature storehouse O of described training image set HSV_SIFT:
Step (2.1) makes up the every width of cloth image in the described training image set successively, uses d i(x, y) expression, i ∈ P Train, (x, y) is the coordinate of pixel, presses following formula and gaussian kernel function G (x, y, σ m) carry out convolution operation m time:
G ( x , y , σ m ) = 1 2 π σ m 2 e - ( x 2 + y 2 ) 2 σ m 2 , m = 1 . . . 10
Wherein: σ mRepresent scale factor, initial value σ 0=1.6, σ m=α σ M-1,
Figure GDA00003053275600032
Obtain thus one group of totally ten gaussian pyramid space L i(x, y, σ m), each is expressed as:
L i(x,y,σ m)=G(x,y,σ m)*d i(x,y),i∈P train
Step (2.2) is subtracted each other adjacent two gaussian pyramid spaces by following formula, obtain one group of totally nine Gauss's residual pyramid space, and each described Gauss's residual pyramid space representation is Dog i(x, y, σ M-1):
Dog i(x,y,σ m-1)=(G(x,y,ασ m-1)-G(x,y,σ m-1))*d i(x,y)=L i(x,y,ασ m-1)-L i(x,y,σ m-1)
Step (2.3), in Gauss's residual pyramid space of described every width of cloth image i, be in up and down 9 pixels of correspondence position in 8 pixels that every layer pixel and same layer are adjacent, adjacent upper and lower two-layer each layer, amounting to 26 pixels compares, if described every layer pixel is all larger or all little than the value of these 26 pixels, then every layer pixel as a unique point;
Step (2.4) is selected in the unique point that obtains from step (2.3) according to the following steps and the reservation remarkable characteristic;
Step (2.4.1) is Gauss's residual pyramid Dog of described every each layer of width of cloth training image i(x, y, σ M-1) represent with Taylor expansion at the unique point place that step (2.3) obtains, and get front two and obtain Dog i(X Max), X=(x, y, σ wherein M-1), Dog I, oExpression first of Taylor expansion, T represents transposition, obtains:
Dog i ( X max ) = Dog i , o + 1 2 ( ∂ Dog i ( X ) ∂ X ) T X , Wherein ( ∂ Dog i ( X ) ∂ X ) T = ( Dog i , x , Dog i , y , Dog i , σ m - 1 )
If | Dog (X Max) | 〉=0.03, then keep this unique point, otherwise filter out;
Step (2.4.2) to being positioned at the unique point at this residual pyramid edge of described every floor height, is filtered according to following formula, if
Figure GDA00003053275600043
Think that then this unique point is positioned at the image border and it is filtered out, otherwise just keep this unique point, Tr (H Hess) be to use H HessThe Hessian matrix trace of expression, Det (H Hess) be to use H HessThe Hessian determinant of a matrix of expression,
H hess = | D xx D xy D xy D yy |
Tr(H hess)=D xx+D yy
Det(H hess)=D xxD yy-(D xy) 2
D XxD YyRespectively that described Taylor expansion is at the second-order partial differential coefficient of x direction, y direction, D XyX, the mixed partial derivative of y both direction, the unique point that remains in described step (2.3), the step (2.4) is called remarkable characteristic;
Step (2.5), press the principal direction of each remarkable characteristic in the following formula determining step (2.4), described principal direction refers to around each described remarkable characteristic in 8 pixels gradient direction corresponding to high gradient mould, square e (x, y) of each described remarkable characteristic place gradient-norm 2For:
e(x,y) 2=(L i(x+1,y,σ m-1)-L i(x-1,y,σ m-1)) 2+(L i(x,y+1,σ m-1)-L i(x,y-1,σ m-1)) 2
Each described remarkable characteristic gradient direction θ of place (x, y) is:
θ(x,y)=tan -1((L i(x,y+1,σ m-1)-L i(x,y-1,σ m-1))/(L i(x+1,y,σ m-1)-L i(x-1,y,σ m-1)))
Take gradient-norm as ordinate, gradient direction gradient direction corresponding to high gradient mould in the gradient orientation histogram of horizontal ordinate, represented the principal direction of each described remarkable characteristic;
Step (2.6), generate the SIFT feature of each described remarkable characteristic, each SIFT feature by 4 * 4 totally 16 Seed Points form, wherein each described Seed Points is again 4 * 4 image slices vegetarian refreshments, each pixel has the vector information of 8 directions, final generation 4 * 4 * 8 is the described SIFT proper vector of totally 128 dimensions, and each described SIFT proper vector is made of gradient-norm and gradient direction;
Step (2.7) generates each described image d according to the following steps iThe color characteristic of (x, y):
Step (2.7.1) is pressed following formula every width of cloth image d i(x, y) by the RGB color space conversion to the hsv color space, wherein:
H is the hue angle of angle, H ∈ [0 °, 360 °),
S is saturation degree, S ∈ [0,1],
V is brightness, V ∈ [0,1],
R, G, B represent the red, green, blue color component value of pixel successively,
If max=max (R, G, B), min=min (R, G, B):
Figure GDA00003053275600051
Figure GDA00003053275600052
V=max,
Step (2.7.2) is described whole hsv color space quantization 72 kinds of colors according to the following steps, generates 72 dimension color characteristics, three component H, S in the described hsv color space, V are carried out respectively different equal interval quantizings, and hue angle H is divided into 8 parts, and span is: 0-7, h value of every part of correspondence, saturation degree S is divided into 3 parts, and span is: 0-2, s value of every part of correspondence, brightness V is divided into 3 parts, span is: 0-2, and v value of every part of correspondence, by following formula:
ζ=H*S*V
The most described image d iThe hsv color space quantization of (x, y) is 72 kinds of main colors, generates 72 dimension color characteristics;
Step (2.8) merges described remarkable characteristic SIFT feature and described training image d iThe color characteristic of (x, y) is with described training image d iColor characteristic splicing each remarkable characteristic SIFT feature back in this training image of (x, y) consists of one 200 feature of tieing up, and each is called: the HSV_SIFT notable feature also claims the vision word, described training image set P TrainIn every width of cloth image d i(x, y), i ∈ P Train, in all the HSV_SIFT notable features consist of the notable feature storehouse U of this training image i, P in the described whole training image set TrainIn the notable feature storehouse ∑ U of whole Q width of cloth images i, i=1,2...Q, thus consist of a HSV_SIFT notable feature storehouse O HSV_SIFT
Step (3) consists of according to the following steps a word bag model BOW and represents described HSV_SIFT notable feature storehouse O HSV_SIFT, every width of cloth training image I i(x, y) is expressed as one and comprises whole HSV_SIFT notable feature w in this training image J, i, w J, i=(w 1, i, w 2, i, w 3, i..., w J.i..., w J.i), wherein: J ∈ [1,200], J represent this training image d iThe number of (x, y) interior HSV_SIFT notable feature, the number of vision word j namely,
w j , i = t f j , i × log ( Q d f j )
Wherein: Q represents described training image set P TrainThe number of interior all training images,
Tf J, iExpression is that a vision word j is at every described training image d i(x, y) notable feature storehouse U iThe number of times of middle appearance,
Df jRepresent described HSV_SIFT notable feature storehouse O HSV_SIFTIn the number of the visual word j that comprises;
Step (4) is calculated described training image set P successively according to the following steps TrainIn every width of cloth training image d iThe potential semantic feature vector Z of (x, y) Train, i, wherein said Z Train, iEvery width of cloth training image set d iThe set of potential semantic topic in (x, y), K is the number of potential semantic topic, gathers P with described training image on value TrainThe classification N of interior training image has following relation: K=N ± 5, and described potential semantic topic refers to every described training image d iThe generalities of some concrete objects statement in (x, y):
Step (4.1), initialization is to P (z k| d i) and P (w J, i| z k) give respectively random number between one 0 to 1 as initial value, wherein: P (z k| d i) be every described training image d i(x, y) potential semantic topic z k, the distribution probability of k ∈ [1, K], P (w J, i| z k) be described potential semantic topic z kDistribution probability in described vision word;
Step (4.2) is calculated as follows every width of cloth d iAny one vision word w in the described training image of expression J, iTo producing described potential semantic topic z kPosterior probability P (z k| d i, w J, i):
P ( z k | d i , w j , i ) = P ( w j , i | z k ) P ( z k | d i ) Σ k = 1 K P ( w j , i | z k ) P ( z k | d i )
Step (4.3) is pressed the respectively described P (z of calculation procedure (4.2) of following formula k| d i) and P (w J, i| z k), wherein, π (d i, w J, i) for using d iVision word w described in every width of cloth training image of expression J, iThe number of times that occurs:
P ( w j , i | z k ) = Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i ) Σ j = 1 J Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i )
P ( z k | d i ) = Σ j = 1 J π ( d i , w j , i ) P ( z k | d i , w j , i ) Σ j = 1 J Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i )
Step (4.4) is calculated as follows described training image set P TrainIn every width of cloth training image d iIn potential semantic feature vector Z Train, i:
Z train,i={z 1,i,z 2,i,z 3,i,...,z K,i},i∈P train
Step (4.4.1), according to step (4.2) and (4.3) obtaining described potential semantic feature z kUsing d iPosterior probability P (z in every width of cloth training image of expression k| d i, w J, i);
Step (4.4.2) is calculated as follows likelihood function Likelihood the λ time λ:
Likelihood λ = Σ i = 1 Q Σ j = 1 J π ( d i , w j , i ) log ( d i , w j , i ) ;
Step (4.4.3) is by using d iA vision word of every appearance w in every width of cloth training image of expression J, iAs iterations, judge the value of likelihood function recruitment described in adjacent twice iteration, when less than a setting threshold φ=0.5, just stop iteration, obtain described training image set P TrainIn every width of cloth image I iThe potential semantic feature vector Z of (x, y) Train, i, otherwise, just continue iteration, until the recruitment of affiliated function is less than φ=0.5;
φ=Likelihood λ-Likelihood λ-1
The robot cognitive phase, identify according to the following steps:
Step (5), the test pattern that calculates Real-time Collection by step (1)~step (4) is gathered P TestIn, the potential semantic feature vector Z of the test pattern of every described Real-time Collection Test, i':
Z test,i'={z 1,i',z 2,i',z 3,i',...,z K,i'},i'∈P test
Step (6) is utilized following arest neighbors KNN sorter model, calculates described training image P TrainTest pattern set P with described Real-time Collection TrainDistance B is on potential semantic feature vector, the minimum classification of distance is exactly corresponding object classification:
Dis = Z train , i - Z test , i ′ .
100 times subject image identification experimental result is: average recognition rate is rec=78.2%, wherein the highest discrimination is 81.6%, minimum discrimination is 75.3%, and the time of on average identifying every width of cloth image is 0.644 second, and recognition time and discrimination satisfy the requirement of mobile robot in the laboratory.
Description of drawings
Fig. 1 generates HSV_SIFT feature process flow diagram;
Fig. 2 makes up gaussian pyramid and Gauss's residual pyramid;
Fig. 3 is the SIFT Seed Points;
Fig. 4 is very big, minimum point detection;
Fig. 5 is PLSA model framework figure;
Embodiment
1. in the robotic training stage, because robot identification need to make up first tranining database, with the training image that gathers in advance, purposes definition N kind according to objects in images, classification is numbered 1~N, comprises T width of cloth image in each image category, whole training image set P TrainAding up to of middle image: N * T=Q;
2. every width of cloth image of concentrating for training image adopts the SIFT algorithm, calculate the remarkable characteristic of every width of cloth training image, and generating the HSV_SIFT notable feature, key step is as follows: image characteristic point detects, and keeps remarkable characteristic, determine remarkable characteristic principal direction, generate remarkable characteristic SIFT feature, synthetic image color characteristic, SIFT feature and the color of image feature of merging remarkable characteristic, generate the HSV_SIFT notable feature, finally make up training image set P TrainHSV_SIFT notable feature storehouse O HSV_SIFT, see respectively Fig. 1, Fig. 2, Fig. 3 and Fig. 4;
3. for the deficiency of statistics with histogram method in the traditional B OW model, introduce TF-IDF weight statistical method, make the HSV_SIFT feature have more discrimination, use following TF-IDF weight formula:
w j , i = t f j , i × log ( Q d f j )
Wherein: tf J, iRefer to certain vision word j at image notable feature storehouse U i, i ∈ P TrainThe number of times of middle appearance, the sum of Q representative training picture, df jRepresent HSV_SIFT notable feature storehouse O HSV_SIFTIn comprise the number of vision word j.Every width of cloth image finally is expressed as w J,I=(w 1, i, w 2, i, w 3, i..., w J.i..., w J.i), j ∈ [1, J] wherein, the common value 200 of J, this vector is called image B OW to be described;
4. after redescribing through the BOW model by the every width of cloth image of top step, use the potential semantic topic model of PLSA method computed image, potential semantic topic is some concepts that may comprise in the image, can think such as: computer and to comprise: the themes such as mouse, keyboard, display, cabinet, following formula represents the conditional probability of " image-vision word ":
P ( d i , w j , i ) = Σ k = 1 K P ( d i ) P ( w j , i | z k ) P ( z k | d i )
Wherein: w J, iFor image B OW in the upper step describes, d iRepresent i width of cloth image, P (d i) represent that i image is at whole training image set P TrainThe probability of middle appearance, P (w J, i| z k) be the distribution probability of potential semantic topic on the vision word, P (z k| d i) the potential semantic topic distribution probability of presentation video, wherein: k ∈ [1, K], K is the number of potential semantic topic, gathers P with described training on value TrainThe classification N of interior training image has following relation: K=N ± 5, and concrete PLSA model is seen Fig. 5;
Likelihood λ = Σ i = 1 Q Σ j = 1 J π ( d i , w j , i ) log ( d i , w j , i )
φ=Likelihood λ-Likelihood λ-1
When the recruitment of likelihood function expectation value stops iteration during less than a setting threshold φ=0.5, otherwise, iteration continued, until satisfy threshold value φ.By the calculating of PLSA algorithm, obtain training image set P TrainIn the potential semantic feature vector Z of every width of cloth image Train, i={ z 1, i, z 2, i, z 3, i..., z K, i, i ∈ P Train
5. at the robot cognitive phase, for the test pattern set P of Real-time Collection TestAdopt top identical method, calculate the potential semantic feature vector Z of every width of cloth image Test, i'={ z 1, i', z 2, i', z 3, i'..., z K, i', i' ∈ P Test
6. utilize arest neighbors KNN(K-Nearest Neighbor) sorter model
Figure GDA00003053275600093
I ∈ P Train, i' ∈ P Test, the potential semantic feature vector of gathering according to the test pattern of training image set and robot Real-time Collection carries out object classification.

Claims (1)

1. the notable feature probability latent semantic model subject image recognition methods of a Fusion of Color is characterized in that, realizes according to the following steps successively in computing machine:
In the robotic training stage, train according to the following steps:
Step (1) structure tranining database, the N kind that computer acquisition and input are divided by the object purposes, the subject image that classification is numbered 1~N comprise T width of cloth training image in every type objects image, and P is used in the set of structure training image TrainExpression adds up to: N * T=Q width of cloth image;
Step (2) adopts yardstick invariant features change algorithm according to the following steps, and namely the SIFT algorithm calculates described training image set P TrainIn the remarkable characteristic of every width of cloth training image, add colouring information and generate notable feature, represent with HSV_SIFT, thereby form the HSV_SIFT notable feature storehouse O of described training image set HSV_SIFT:
Step (2.1) makes up the every width of cloth image in the described training image set successively, uses d i(x, y) expression, i ∈ P Train, (x, y) is the coordinate of pixel, presses following formula and gaussian kernel function G (x, y, σ m) carry out convolution operation m time:
G ( x , y , σ m ) = 1 2 πσ m 2 e - ( x 2 + y 2 ) 2 σ m 2 , m = 1 . . . 10
Wherein: σ mRepresent scale factor, initial value σ 0=1.6,
Figure FDA00003227201700012
Obtain thus one group of totally ten gaussian pyramid space L i(x, y, σ m), each is expressed as:
L i(x,y,σ m)=G(x,y,σ m)*d i(x,y),i∈P train
Step (2.2) is subtracted each other adjacent two gaussian pyramid spaces by following formula, obtain one group of totally nine Gauss's residual pyramid space, and each described Gauss's residual pyramid space representation is Dog i(x, y, σ M-1):
Dog i(x,y,σ m-1)=(G(x,y,ασ m-1)-G(x,y,σ m-1))*d i(x,y)=L i(x,y,ασ m-1)-L i(x,y,σ m-1)
Step (2.3), in Gauss's residual pyramid space of described every width of cloth image i, be in up and down 9 pixels of correspondence position in 8 pixels that every layer pixel and same layer are adjacent, adjacent upper and lower two-layer each layer, amounting to 26 pixels compares, if described every layer pixel is all larger or all little than the value of these 26 pixels, then every layer pixel as a unique point;
Step (2.4) is selected in the unique point that obtains from step (2.3) according to the following steps and the reservation remarkable characteristic;
Step (2.4.1) is Gauss's residual pyramid Dog of described every each layer of width of cloth training image i(x, y, σ M-1) represent with Taylor expansion at the unique point place that step (2.3) obtains, and get front two and obtain Dog i(X Max), X=(x, y, σ wherein M-1), Dog I, oExpression first of Taylor expansion, T represents transposition, obtains:
Dog i ( X max ) = Dog i , o + 1 2 ( ∂ Dog i ( X ) ∂ X ) T X , Wherein ( ∂ Dog i ( X ) ∂ X ) T = ( Dog i , x , Dog i , y , Dog i , σ m - 1 )
If | Dog i(X Max) | 〉=0.03, then keep this unique point, otherwise filter out;
Step (2.4.2) to being positioned at the unique point at this residual pyramid edge of described every floor height, is filtered according to following formula, if
Figure FDA00003227201700023
Think that then this unique point is positioned at the image border and it is filtered out, otherwise just keep this unique point, Tr (H Hess) be to use H HessThe Hessian matrix trace of expression, Det (H Hess) be to use H HessThe Hessian determinant of a matrix of expression,
H hess = D xx D xy D xy D yy
Tr(H hess)=D xx+D yy
Det(H hess)=D xxD yy-(D xy) 2
D XxD YyRespectively that described Taylor expansion is at the second-order partial differential coefficient of x direction, y direction, D XyX, the mixed partial derivative of y both direction, the unique point that remains in described step (2.3), the step (2.4) is called remarkable characteristic;
Step (2.5), press the principal direction of each remarkable characteristic in the following formula determining step (2.4), described principal direction refers to around each described remarkable characteristic in 8 pixels gradient direction corresponding to high gradient mould, square e (x, y) of each described remarkable characteristic place gradient-norm 2For:
e(x,y) 2=(L i(x+1,y)-L i(x-1,y)) 2+(L i(x,y+1)-L i(x,y-1)) 2
Each described remarkable characteristic gradient direction θ of place (x, y) is:
θ(x,y)=tan -1((L i(x,y+1)-L i(x,y-1))/(L i(x+1,y)-L i(x-1,y)))
Take gradient-norm as ordinate, gradient direction gradient direction corresponding to high gradient mould in the gradient orientation histogram of horizontal ordinate, represented the principal direction of each described remarkable characteristic;
Step (2.6), generate the SIFT feature of each described remarkable characteristic, each SIFT feature by 4 * 4 totally 16 Seed Points form, wherein each described Seed Points is again 4 * 4 image slices vegetarian refreshments, each pixel has the vector information of 8 directions, final generation 4 * 4 * 8 is the described SIFT proper vector of totally 128 dimensions, and each described SIFT proper vector is made of gradient-norm and gradient direction;
Step (2.7) generates each described image d according to the following steps iThe color characteristic of (x, y):
Step (2.7.1) is pressed following formula every width of cloth image d i(x, y) by the RGB color space conversion to the hsv color space, wherein:
H is the hue angle of angle, H ∈ [0 °, 360 °),
S is saturation degree, S ∈ [0,1],
V is brightness, V ∈ [0,1],
R, G, B represent the red, green, blue color component value of pixel successively,
If max=max (R, G, B), min=min (R, G, B):
Figure FDA00003227201700031
Figure FDA00003227201700032
V=max,
Step (2.7.2) is described whole hsv color space quantization 72 kinds of colors according to the following steps, generates 72 dimension color characteristics, three component H, S in the described hsv color space, V are carried out respectively different equal interval quantizings, and hue angle H is divided into 8 parts, and span is: 0-7, h value of every part of correspondence, saturation degree S is divided into 3 parts, and span is: 0-2, s value of every part of correspondence, brightness V is divided into 3 parts, span is: 0-2, and v value of every part of correspondence, by following formula:
ζ=H*S*V
The most described image d iThe hsv color space quantization of (x, y) is 72 kinds of main colors, generates 72 dimension color characteristics;
Step (2.8) merges described remarkable characteristic SIFT feature and described training image d iThe color characteristic of (x, y) is with described training image d iColor characteristic splicing each remarkable characteristic SIFT feature back in this training image of (x, y) consists of one 200 feature of tieing up, and each is called: the HSV_SIFT notable feature also claims the vision word, described training image set P TrainIn every width of cloth image d i(x, y), i ∈ P Train, in all the HSV_SIFT notable features consist of the notable feature storehouse U of this training image i, P in the described whole training image set TrainIn the notable feature storehouse ∑ U of whole Q width of cloth images i, i=1,2...Q, thus consist of a HSV_SIFT notable feature storehouse O HSV_SIFT
Step (3) consists of according to the following steps a word bag model BOW and represents described HSV_SIFT notable feature storehouse O HSV_SIFT, every width of cloth training image d i(x, y) is expressed as one and comprises whole HSV_SIFT notable feature w in this training image J, i, w J, i=(w 1, i, w 2, i, w 3, i..., w J.i..., w J.i), wherein: J ∈ [1,200], J represent this training image d iThe number of (x, y) interior HSV_SIFT notable feature, the number of vision word j namely,
w j , i = tf j , i × log ( Q df j )
Wherein: Q represents described training image set P TrainThe number of interior all training images,
Tf J, iExpression is that a vision word j is at every described training image d i(x, y) notable feature storehouse U iThe number of times of middle appearance,
Df jRepresent described HSV_SIFT notable feature storehouse O HSV_SIFTIn the number of the visual word j that comprises;
Step (4) is calculated described training image set P successively according to the following steps TrainIn every width of cloth training image d iThe potential semantic feature vector Z of (x, y) Train, i, wherein said Z Train, iEvery width of cloth training image d iThe set of potential semantic topic in (x, y), K is the number of potential semantic topic, gathers P with described training image on value TrainThe classification N of interior training image has following relation: K=N ± 5, and described potential semantic topic refers to every described training image d iThe generalities of some concrete objects statement in (x, y):
Step (4.1), initialization is to P (z k| d i) and P (w J, i| z k) give respectively random number between one 0 to 1 as initial value, wherein: P (z k| d i) be every described training image d i(x, y) potential semantic topic z k, the distribution probability of k ∈ [1, K], P (w J, i| z k) be described potential semantic topic z kDistribution probability in described vision word;
Step (4.2) is calculated as follows every width of cloth d iAny one vision word w in the described training image of expression J, iTo producing described potential semantic topic z kPosterior probability P (z k| d i, w J, i):
P ( z k | d i , w j , i ) = P ( w j , i | z k ) P ( z k | d i ) Σ k = 1 K P ( w j , i | z k ) P ( z k | d i )
Step (4.3) is pressed the respectively described P (z of calculation procedure (4.2) of following formula k| d i) and P (w J, i| z k), wherein, π (d i, w J, i) for using d iVision word w described in every width of cloth training image of expression J, iThe number of times that occurs:
P ( w j , i | z k ) = Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i ) Σ j = 1 J Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i )
P ( z k | d i ) = Σ j = 1 J π ( d i , w j , i ) P ( z k | d i , w j , i ) Σ j = 1 J Σ i = 1 Q π ( d i , w j , i ) P ( z k | d i , w j , i )
Step (4.4) is calculated as follows described training image set P TrainIn every width of cloth training image d iIn potential semantic feature vector Z Train, i:
Z train,i={z 1,i,z 2,i,z 3,i,...,z K,i},i∈P train
Step (4.4.1), according to step (4.2) and (4.3) obtaining described potential semantic feature z kUsing d iPosterior probability P (z in every width of cloth training image of expression k| d i, w J, i);
Step (4.4.2) is calculated as follows likelihood function Likelihood the λ time λ:
Likelihood λ = Σ i = 1 Q Σ j = 1 J π ( d i , w j , i ) log ( d i , w j , i ) ;
Step (4.4.3) is by using d iA vision word of every appearance w in every width of cloth training image of expression J, iAs iterations, judge the value of likelihood function recruitment described in adjacent twice iteration, when less than a setting threshold φ=0.5, just stop iteration, obtain described training image set P TrainIn every width of cloth image I iThe potential semantic feature vector Z of (x, y) Train, i, otherwise, just continue iteration, until the recruitment of affiliated function is less than φ=0.5;
φ=Likelihood λ-Likelihood λ-1
The robot cognitive phase, identify according to the following steps:
Step (5), the test pattern that calculates Real-time Collection by step (1)~step (4) is gathered P TestIn, the potential semantic feature vector Z of the test pattern of every described Real-time Collection Test, i':
Z test,i'={z 1,i',z 2,i',z 3,i',...,z K,i'},i'∈P test
Step (6) is utilized following arest neighbors KNN sorter model, calculates described training image P TrainTest pattern set P with described Real-time Collection TrainDistance B is on potential semantic feature vector, the minimum classification of distance is exactly corresponding object classification:
Dis = Z train , i - Z test , i ′ .
CN 201210062379 2012-03-12 2012-03-12 Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color Active CN102629328B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210062379 CN102629328B (en) 2012-03-12 2012-03-12 Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210062379 CN102629328B (en) 2012-03-12 2012-03-12 Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color

Publications (2)

Publication Number Publication Date
CN102629328A CN102629328A (en) 2012-08-08
CN102629328B true CN102629328B (en) 2013-10-16

Family

ID=46587586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210062379 Active CN102629328B (en) 2012-03-12 2012-03-12 Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color

Country Status (1)

Country Link
CN (1) CN102629328B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819752B (en) * 2012-08-16 2015-04-22 北京理工大学 System and method for outdoor large-scale object recognition based on distributed inverted files
CN104008095A (en) * 2013-02-25 2014-08-27 武汉三际物联网络科技有限公司 Object recognition method based on semantic feature extraction and matching
CN103336835B (en) * 2013-07-12 2017-02-08 西安电子科技大学 Image retrieval method based on weight color-sift characteristic dictionary
CN103530633B (en) * 2013-10-09 2017-01-18 深圳大学 Semantic mapping method of local invariant feature of image and semantic mapping system
CN103712617B (en) * 2013-12-18 2016-08-24 北京工业大学 A kind of creation method of the multilamellar semanteme map of view-based access control model content
CN104008400A (en) * 2014-06-16 2014-08-27 河南科技大学 Object recognition method with combination of SIFT and BP network
CN104598885B (en) * 2015-01-23 2017-09-22 西安理工大学 The detection of word label and localization method in street view image
CN104680189B (en) * 2015-03-15 2018-04-10 西安电子科技大学 Based on the bad image detecting method for improving bag of words
US9880009B2 (en) * 2015-09-04 2018-01-30 Crown Equipment Corporation Industrial vehicle with feature-based localization and navigation
CN105550708B (en) * 2015-12-14 2018-12-07 北京工业大学 Based on the vision bag of words construction method for improving SURF feature
CN105427263A (en) * 2015-12-21 2016-03-23 努比亚技术有限公司 Method and terminal for realizing image registering
CN105718940B (en) * 2016-01-15 2019-03-29 天津大学 The zero sample image classification method based on factorial analysis between multiple groups
CN105677898B (en) * 2016-02-02 2021-07-06 中国科学技术大学 Improved image searching method based on feature difference
CN107423739B (en) * 2016-05-23 2020-11-13 北京陌上花科技有限公司 Image feature extraction method and device
CN107301426B (en) * 2017-06-14 2020-06-30 大连海事大学 Multi-label clustering method for sole pattern images
CN108109162B (en) * 2018-01-08 2021-08-10 中国石油大学(华东) Multi-scale target tracking method using self-adaptive feature fusion
CN110245667A (en) * 2018-03-08 2019-09-17 中华映管股份有限公司 Object discrimination method and its device
CN108710608A (en) * 2018-04-28 2018-10-26 四川大学 A kind of malice domain name language material library generating method based on context semanteme
CN109978982B (en) * 2019-04-02 2023-04-07 广东电网有限责任公司 Point cloud rapid coloring method based on oblique image
CN111291839A (en) * 2020-05-09 2020-06-16 创新奇智(南京)科技有限公司 Sample data generation method, device and equipment
CN112686840A (en) * 2020-12-16 2021-04-20 广州大学 Method, system and device for detecting straw on surface of beverage packaging box and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5828769A (en) * 1996-10-23 1998-10-27 Autodesk, Inc. Method and apparatus for recognition of objects via position and orientation consensus of local image encoding
CN101398846A (en) * 2008-10-23 2009-04-01 上海交通大学 Image, semantic and concept detection method based on partial color space characteristic
CN102184404A (en) * 2011-04-29 2011-09-14 汉王科技股份有限公司 Method and device for acquiring palm region in palm image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5828769A (en) * 1996-10-23 1998-10-27 Autodesk, Inc. Method and apparatus for recognition of objects via position and orientation consensus of local image encoding
CN101398846A (en) * 2008-10-23 2009-04-01 上海交通大学 Image, semantic and concept detection method based on partial color space characteristic
CN102184404A (en) * 2011-04-29 2011-09-14 汉王科技股份有限公司 Method and device for acquiring palm region in palm image

Also Published As

Publication number Publication date
CN102629328A (en) 2012-08-08

Similar Documents

Publication Publication Date Title
CN102629328B (en) Probabilistic latent semantic model object image recognition method with fusion of significant characteristic of color
Liang et al. Material based salient object detection from hyperspectral images
Narihira et al. Learning lightness from human judgement on relative reflectance
US9396412B2 (en) Machine-learnt person re-identification
CN103714181B (en) A kind of hierarchical particular persons search method
CN103679192B (en) Image scene type identification method based on covariance feature
Liu et al. Attribute-restricted latent topic model for person re-identification
CN102509104B (en) Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene
CN103927511B (en) image identification method based on difference feature description
CN106682108A (en) Video retrieval method based on multi-modal convolutional neural network
CN102622604B (en) Multi-angle human face detecting method based on weighting of deformable components
Shivakumara et al. A new multi-modal approach to bib number/text detection and recognition in Marathon images
CN106126585B (en) The unmanned plane image search method combined based on quality grading with perceived hash characteristics
CN103824059A (en) Facial expression recognition method based on video image sequence
CN104240256A (en) Image salient detecting method based on layering sparse modeling
Kobayashi et al. Three-way auto-correlation approach to motion recognition
CN104008375A (en) Integrated human face recognition mehtod based on feature fusion
CN104268590A (en) Blind image quality evaluation method based on complementarity combination characteristics and multiphase regression
CN106909883A (en) A kind of modularization hand region detection method and device based on ROS
CN106909884A (en) A kind of hand region detection method and device based on hierarchy and deformable part sub-model
Seidl et al. Automated petroglyph image segmentation with interactive classifier fusion
CN104715266A (en) Image characteristics extracting method based on combination of SRC-DP and LDA
CN103605993B (en) Image-to-video face identification method based on distinguish analysis oriented to scenes
CN105550642B (en) Gender identification method and system based on multiple dimensioned linear Differential Characteristics low-rank representation
Wang et al. Fusion of multiple channel features for person re-identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190201

Address after: Room 501-1, Building 1, Yuntian Square, 30 Binhu Road, Wuqing Business District, Tianjin 301700

Patentee after: Maowao Technology (Tianjin) Co., Ltd.

Address before: No. 100, Chaoyang District flat Park, Beijing, Beijing

Patentee before: Beijing University of Technology

TR01 Transfer of patent right