CN102368266B - Sorting method of unlabelled pictures for network search - Google Patents

Sorting method of unlabelled pictures for network search Download PDF

Info

Publication number
CN102368266B
CN102368266B CN 201110322609 CN201110322609A CN102368266B CN 102368266 B CN102368266 B CN 102368266B CN 201110322609 CN201110322609 CN 201110322609 CN 201110322609 A CN201110322609 A CN 201110322609A CN 102368266 B CN102368266 B CN 102368266B
Authority
CN
China
Prior art keywords
picture
webpage
reference picture
query information
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110322609
Other languages
Chinese (zh)
Other versions
CN102368266A (en
Inventor
徐颂华
江浩
刘智满
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 201110322609 priority Critical patent/CN102368266B/en
Publication of CN102368266A publication Critical patent/CN102368266A/en
Application granted granted Critical
Publication of CN102368266B publication Critical patent/CN102368266B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sorting method of unlabelled pictures for network search, comprising the following steps of: (1) collecting several reference pictures according to query information; (2) calculating relevancy of each reference picture to the query information; (3) calculating similarity among the reference pictures; (4) amending the relevancy according to the similarity; and (5) sorting the unlabelled pictures according to the amended relevancy. In the method, artificial intelligence related technologies are used, web search and image search results are mined while querying the information, and the relevancy of the unlabelled pictures to the query information is estimated according to the similarity among the reference pictures so as to sort the unlabelled pictures accurately, so that users can search and get the unlabelled pictures, and the searching result is better.

Description

A kind of sort method without the mark picture for web search
Technical field
The invention belongs to web search ordering techniques field, be specifically related to a kind of sort method without the mark picture for web search.
Background technology
As far back as 20 century 70s, the scientific research personnel of the various countries managing image data that how effectively just to begin one's study, the technology that adopted at that time mainly is based on the image retrieval technologies (TBIR) of text, refer to utilize the mode of manual input text to input a series of key word for image, then contact will be set up between the store path of image and the image keyword, in fact image retrieval has become text retrieval, this method is simple, just can realize with traditional relational database, but also there are some shortcomings, workload such as manual input key word is excessive, the key word of sign mass image data is unpractical, and sign inevitably can be with a guy's subjectivity and uncertainty by hand, and different people may be not identical for the understanding of same width of cloth image after all.
In the beginning of this century, the automatic collection of info web and index have obtained deep research as the pith of search engine, and the search engines such as Google, Yahoo are released the picture searching function that adopts the TBIR technology one after another.The image identification that obvious this automatic indexing gathers is very coarse, and accuracy is not high, sometimes or even inaccurate, can retrieve a lot of irrelevant pictures out; Simultaneously, for the picture without text marking that much meets user search information, search engine is to carry out accurately sequencing display to these pictures.
In order to overcome the limitation of text based image retrieval technologies, since the nineties in 20th century, Content-based image retrieval (CBIR) has obtained great development.The CBIR technology mainly refers on the basis that image is processed, and utilizes the basic visual signatures such as spatial relationship of color, shape, texture, profile and the object of image to retrieve.Different from TBIR is the objective visual signature that it has utilized image itself to comprise, and can automatically realize the pick up and store of characteristics of image etc. by computing machine, has improved image processing speed, is conducive to realize that the robotization of image index and retrieval realizes.At present, the system's operation based on the CBIR technology of existing a lot of moulding is such as the Photo Book of MTT and the MARS of UIUC university etc.
But in actual applications, the user only has some about the subjective description of image to required image in advance usually, and what the user needed is the inquiry of image implication, rather than the features such as color, texture, shape.The implication of these images is exactly the high-level semantics features of image, and it has comprised the understanding of people to picture material.Therefore, the CBIR technology is only applicable under the subenvironment search such as the scientific research field database, and and is not suitable under the actual overall situation search such as the internet etc.
Summary of the invention
For the existing above-mentioned technological deficiency of prior art, the invention provides a kind of sort method without the mark picture for web search, realized according to the accurate ordering of Query Information to nothing mark picture, obtain so that the user can search for nothing mark picture, and search effect has been good.
A kind of sort method without the mark picture for web search comprises the steps:
(1) utilizes network search engines to carry out picture searching according to given Query Information, from Search Results, collect the front M pictures of ordering as the reference picture;
(2) calculate the degree of correlation of every reference picture and Query Information;
(3) calculate similarity between reference picture;
(4) according to the similarity between reference picture, revise the degree of correlation of every reference picture and Query Information, obtain the revised degree of correlation of every reference picture and Query Information;
(5) according to the revised degree of correlation of every reference picture and Query Information, have or not the mark picture to sort to institute corresponding to described Query Information.
In the described step (2), the process of the degree of correlation of calculating every reference picture and Query Information is as follows:
A. utilize network search engines to carry out Webpage search according to described Query Information, from Search Results, collect ordering top n webpage as the reference webpage, and be designated as D 1~D N
B. for individual with reference to webpage D at N 1~D NAny word that occurred is designated as w, and is individual with reference to webpage D at N according to following formula statistics w 1~D NIn total frequency of occurrences t (w), and then calculate TF-IDF (the contrary Wen Pin of word frequency) the coefficient ot (w) of w;
t(w)=y 1/m 1+y 2/m 2+...+y N/m N (1)
ot(w)=t(w)ln(1+N/n w) (2)
Wherein: n wFor N with reference to webpage D 1~D NIn contain the webpage number of word w; y iFor w at reference webpage D iIn occurrence number, m iBe reference webpage D iIn total word number, i=1,2 ..., N;
C. open reference picture G for M 1~G MIn arbitrary reference picture G j, note reference picture G jCorresponding picture webpage is GD jFor at M picture webpage GD 1~GD MAny word that occurred is designated as gw, adds up gw at M picture webpage GD according to following formula 1~GD MIn total frequency of occurrences t gAnd then calculate the TF-IDF coefficient ot of gw (gw), g(gw);
t g(gw)=y g,1/m g,1+y g,2/m g,2+...+y g,M/m g,M (3)
ot g(gw)=t g(gw)ln(1+M/n g,gw) (4)
Wherein: n G, gwBe M picture webpage GD 1~GD MIn contain the webpage number of word gw; y G, iFor gw at picture webpage GD jIn occurrence number, m G, iBe picture webpage GD jIn total word number, j=1,2 ..., M;
D. individual with reference to webpage D for N 1~D NAny word w of middle appearance and M picture webpage GD 1~GD MAny word gw of middle appearance, by the semantic relevancy quantity algorithm, calculate semantic relevancy each other, and then obtain semantic relevancy matrix T H (Q), each semantic relevancy value corresponds to each element value among the semantic relevancy matrix T H (Q), semantic relevancy matrix T H (Q) is that U (Q) * V (Q) ties up matrix, and U (Q) is that N is individual with reference to webpage D 1~D NTotal number of middle word, V (Q) is M picture webpage GD 1~GD MTotal number of middle word, Q is Query Information;
E. according to semantic relevancy matrix T H (Q), by formula r (G j, Q)=OT (Q) * TH (Q) * OT G(Q) calculate every reference picture G jAnd the degree of correlation r (G between the Query Information Q j, Q); Wherein: OT (Q)=[ot (w 1) ..., ot (w U(Q))], OT G(Q)=[(ot g(gw 1) ..., ot g(gw V (Q))].
In the described step (3), the process of the similarity between the computing reference picture is as follows:
A. open reference picture G for M 1~G MIn arbitrary reference picture G j, by the Visual Feature Retrieval Process algorithm, extract G jIn each local visual feature; Wherein, each local visual feature v is two tuples a: v=(C, Des), and C is that v is at picture G jThe border circular areas of middle covering, Des are the proper vector of this border circular areas and are the vector of one 128 dimension;
B. add up G 1~G MIn the occurrence number of each different local visual feature, only keep that wherein occurrence number is greater than the local visual feature of first threshold, first threshold is generally 5~20; Wherein, if the Euclidean distance between two local visual features proper vector separately less than 0.01, then is considered as it two identical local visual features;
C. for the local visual feature that remains, judge local visual feature connectedness between any two: establish any two local visual features that remain and be respectively vi=(Ci, Desi) and vj=(Cj, Desj), if vi and vj are arranged in the local visual feature vk=(Ck that same picture and Ci and Cj intersect or exist another to retain, Desk) with vi and vj all is arranged in same picture and Ck intersects with Ci and Cj, then vi is communicated with vj, otherwise is not communicated with;
D. for the local visual feature that retains, according to connectedness statistics connected component, each connected component is designated as a visual signature bunch; With the equal disconnected single local visual feature of other local visual features, also be designated as a visual signature bunch;
E. add up G 1~G MIn the occurrence number of each different visual signature bunch, only keep occurrence number wherein greater than the visual signature of Second Threshold bunch, Second Threshold is generally 5~20; Wherein, if two visual signatures bunch contain duplicate local visual feature, then it is considered as two identical visual signatures bunch;
F. for G 1~G MIn any two reference picture G i, G j, calculate similarity s (G between them according to following formula i, G j);
s ( G i , G j ) = Σ vp ∈ G i , G j | | VP | | 2 1 + | | VP | | / NVP ( G i , G j ) - - - ( 5 )
Wherein: VP is a visual signature that retains bunch, and appears at simultaneously G iAnd G jIn; || VP|| represents the number of the local visual feature that contains among the VP; NVP (G i, G j) be G iAnd G jThe number of middle different visual signature bunch.
In the described step (4), the process of the degree of correlation of revising every reference picture and Query Information is as follows:
A. according to the similarity between reference picture, make up similarity matrix S (Q); Each similarity value corresponds to each element value among the similarity matrix S (Q), and similarity matrix S (Q) is that M * M ties up matrix;
B. according to the degree of correlation of every reference picture of following formula correction and Query Information;
R ′ ( Q ) = ( I + bS ( Q ) + b 2 S 2 ( Q ) 2 ! + b 3 S 3 ( Q ) 3 ! + b 4 S 4 ( Q ) 4 ! ) R ( Q ) - - - ( 6 )
Wherein: I is that M * M ties up unit matrix, and b is correction factor and is generally 0.3, R (Q)=[r (G 1, Q) ..., r (G M, Q)], R ' is the revised matrix of R (Q) (Q), R ' each element value in (Q) is every reference picture and the revised relevance degree of Query Information.
In the described step (5), to Query Information corresponding have or not the process that sorts of mark picture as follows:
A. to G 1~G MIn arbitrary different visual signatures bunch vp, calculate it with respect to arbitrary reference picture G according to following formula jTF-IDF coefficient ot Vp(G j, vp);
ot vp(G j,vp)=(1+ln(t j,vp))ln(1+M/m vp) (7)
Wherein: t J, vpFor vp at G jThe number of times of middle appearance, m VpBe G 1~G MIn contain the picture number of vp;
B. to G 1~G MIn arbitrary different visual signatures bunch vp, calculate the degree of correlation rel (vp, Q) of vp and Query Information Q according to following formula;
rel ( vp , Q ) = Σ j = 1 M r ′ ( G j , Q ) ot vp ( G j , vp ) Σ vp ′ ∈ G j ot vp ( G j , vp ′ ) - - - ( 8 )
Wherein: r ' (G j, Q) be the revised degree of correlation of every reference picture and Query Information;
C. mark picture P for arbitrary nothing without in the set of mark picture corresponding to Query Information Q x, calculate P according to following formula xDegree of correlation Rel (P with Query Information Q x, Q);
Rel ( P x , Q ) = Σ vp ∈ P x rel ( vp , Q ) - - - ( 9 )
D. for without arbitrary in the mark picture set without the mark picture according to its degree of correlation with Query Information Q, from big to small ordering, as Query Information Q without mark picture searching demonstration result.
The present invention utilizes the artificial intelligence correlation technique, excavate simultaneously Webpage search and picture search result by Query Information, and according to the similarity between the reference picture, estimate without the degree of correlation between mark picture and the Query Information and then to accurately sorting without the mark picture, obtain so that the user can search for nothing mark picture, and search effect is good.
Description of drawings
Fig. 1 is the steps flow chart synoptic diagram of sort method of the present invention.
Fig. 2 is the present invention and LXTJ and the test data synoptic diagram of FSC method in the MIRFLICKR image data base.
Fig. 3 is the present invention and LXTJ and the test data curve map of FSC method in the CALTECH101 image data base.
Embodiment
In order more specifically to describe the present invention, be elaborated below in conjunction with the technical scheme of the drawings and the specific embodiments to sort method of the present invention.
As shown in Figure 1, a kind of sort method without the mark picture for web search comprises the steps:
(1) collects several reference picture according to Query Information.
The key word of the inquiry Q that the user is given is submitted on the third-party photographic search engine (such as Google's photographic search engine), collects front 100 pictures of ordering as the reference picture from Search Results, is designated as G 1~G 100
(2) degree of correlation of every reference picture of calculating and Query Information.
1. key word of the inquiry Q is submitted on the third-party web page search engine (such as Google's web page search engine), from Search Results, collects front 200 webpages of ordering as the reference webpage, be designated as D 1~D 200
For at 200 with reference to webpage D 1~D 200Any word that occurred is designated as w, according to following formula statistics w at 200 with reference to webpage D 1~D 200In total frequency of occurrences t (w), and then calculate the TF-IDF coefficient ot (w) of w.
t(w)=y 1/m 1+y 2/m 2+...+y 200/m 200 (1)
ot(w)=t(w)ln(1+200/n w) (2)
Wherein: n wBe 200 with reference to webpage D 1~D 200In contain the webpage number of word w; y iFor w at reference webpage D iIn occurrence number, m iBe reference webpage D iIn total word number, i=1,2 ..., 200.
3. for 100 reference picture G 1~G 100In arbitrary reference picture G j, note reference picture G jCorresponding picture webpage is GD jFor at 100 picture webpage GD 1~GD 100Any word that occurred is designated as gw, adds up gw at 100 picture webpage GD according to following formula 1~GD 100In total frequency of occurrences t gAnd then calculate the TF-IDF coefficient ot of gw (gw), g(gw).
t g(gw)=y g,1/m g,1+y g,2/m g,2+...+y g,100/m g,100 (3)
ot g(gw)=t g(gw)ln(1+100/n g,gw)(4)
Wherein: n G, gwBe 100 picture webpage GD 1~GD 100In contain the webpage number of word gw; y G, iFor gw at picture webpage GD jIn occurrence number, m G, iBe picture webpage GD jIn total word number, j=1,2 ..., 100.
For 200 with reference to webpage D 1~D 200Any word w of middle appearance and 100 picture webpage GD 1~GD 100Any word gw of middle appearance, by the semantic relevancy quantity algorithm, calculate semantic relevancy each other, and then obtain semantic relevancy matrix T H (Q), each semantic relevancy value corresponds to each element value among the semantic relevancy matrix T H (Q), semantic relevancy matrix T H (Q) is that U (Q) * V (Q) ties up matrix, U (Q) be 200 with reference to webpage D 1~D 200Total number of middle word, V (Q) is 100 picture webpage GD 1~GD 100Total number of middle word, Q is key word of the inquiry.
5. according to semantic relevancy matrix T H (Q), by formula r (G j, Q)=OT (Q) * TH (Q) * OT G(Q) calculate every reference picture G jAnd the degree of correlation r (G between the key word of the inquiry Q j, Q); Wherein: OT (Q)=[ot (w 1) ..., ot (w U (Q))], OT G(Q)=[(ot g(gw 1) ..., ot g(gw V (Q))].
(3) similarity between the computing reference picture.
1. for 100 reference picture G 1~G 100In arbitrary reference picture G j, by the Visual Feature Retrieval Process algorithm, extract G jIn each local visual feature; Wherein, each local visual feature v is two tuples a: v=(C, Des), and C is that v is at picture G jThe border circular areas of middle covering, Des are the proper vector of this border circular areas and are the vector of one 128 dimension.
2. add up G 1~G 100In the occurrence number of each different local visual feature, only keep occurrence number wherein greater than 10 local visual feature; Wherein, if the Euclidean distance between two local visual features proper vector separately less than 0.01, then is considered as it two identical local visual features.
3. for the local visual feature that remains, judge local visual feature connectedness between any two: establish any two local visual features that remain and be respectively vi=(Ci, Desi) and vj=(Cj, Desj), if vi and vj are arranged in the local visual feature vk=(Ck that same picture and Ci and Cj intersect or exist another to retain, Desk) with vi and vj all is arranged in same picture and Ck intersects with Ci and Cj, then vi is communicated with vj, otherwise is not communicated with.
4. for the local visual feature that retains, according to connectedness statistics connected component, each connected component is designated as a visual signature bunch; With the equal disconnected single local visual feature of other local visual features, also be designated as a visual signature bunch.
5. add up G 1~G 100In the occurrence number of each different visual signature bunch, only keep occurrence number wherein greater than 10 visual signature bunch; Wherein, if two visual signatures bunch contain duplicate local visual feature, then it is considered as two identical visual signatures bunch.
6. for G 1~G 100In any two reference picture G i, C j, calculate similarity s (G between them according to following formula i, G j).
s ( G i , G j ) = Σ vp ∈ G i , G j | | VP | | 2 1 + | | VP | | / NVP ( G i , G j ) - - - ( 5 )
Wherein: VP is a visual signature that retains bunch, and appears at simultaneously G iAnd G jIn; || VP|| represents the number of the local visual feature that contains among the VP; NVP (G i, G j) be G iAnd G jThe number of middle different visual signature bunch.
(4) according to the similarity correction degree of correlation.
1. according to the similarity between reference picture, make up similarity matrix S (Q); Each similarity value corresponds to each element value among the similarity matrix S (Q), and similarity matrix S (Q) is 100 * 100 dimension matrixes.
2. according to the degree of correlation of every reference picture of following formula correction and key word of the inquiry.
R ′ ( Q ) = ( I + bS ( Q ) + b 2 S 2 ( Q ) 2 ! + b 3 S 3 ( Q ) 3 ! + b 4 S 4 ( Q ) 4 ! ) R ( Q ) - - - ( 6 )
Wherein: I is 100 * 100 dimension unit matrixs, and b is 0.3, R (Q)=[r (G 1, Q) ..., r (G 100, Q)], R ' is the revised matrix of R (Q) (Q), R ' each element value in (Q) is every reference picture and the revised relevance degree of key word of the inquiry.
(5) according to the revised degree of correlation nothing mark picture is sorted.
1. to G 1~G 100In arbitrary different visual signatures bunch vp, calculate it with respect to arbitrary reference picture G according to following formula jTF-IDF coefficient ot Vp(G j, vp).
ot vp(G j,vp)=(1+ln(t j,vp))ln(1+100/m vp) (7)
Wherein: t J, vpFor vp at G jThe number of times of middle appearance, m VpBe G 1~G 100In contain the picture number of vp.
2. to G 1~G 100In arbitrary different visual signatures bunch vp, calculate the degree of correlation rel (vp, Q) of vp and key word of the inquiry Q according to following formula.
rel ( vp , Q ) = Σ j = 1 100 r ′ ( G j , Q ) ot vp ( G j , vp ) Σ vp ′ ∈ G j ot vp ( G j , vp ′ ) - - - ( 8 )
Wherein: r ' (G j, Q) be the revised degree of correlation of every reference picture and key word of the inquiry.
3. mark picture P for arbitrary nothing without in the set of mark picture corresponding to key word of the inquiry Q x, calculate P according to following formula xDegree of correlation Rel (P with key word of the inquiry Q x, Q).
Rel ( P x , Q ) = Σ vp ∈ P x rel ( vp , Q ) - - - ( 9 )
For without arbitrary in the mark picture set without the mark picture according to its degree of correlation with key word of the inquiry Q, from big to small ordering, as Query Information Q without mark picture searching demonstration result.
With other two kinds of picture sort method LXTJ (method described in the paper of " the Textual query of personal photos facilitated by large-scale web data " by name that delivered in 2011) and FSC (method described in the paper of " the Abootstrapping framework for annotating and retrieving www images " by name that delivered in 2004) that are used for web search in present embodiment Ours and the prior art, be applied to respectively in MIRFLICKR and the CALTECH101 image data base, and search for respectively test, according to achievement data NDCG (Normalized Discounted Cumulative Gain) three kinds of methods are compared, concrete test data as shown in Figures 2 and 3; Wherein, the NDCG value is larger, and the expression search effect is just better, and the search effect of visible present embodiment has more advantage with respect to LXTJ and FSC.

Claims (1)

1. the sort method without the mark picture that is used for web search comprises the steps:
(1) utilizes network search engines to carry out picture searching according to given Query Information, from Search Results, collect the front M pictures of ordering as the reference picture;
(2) calculate the degree of correlation of every reference picture and Query Information, concrete mode is as follows:
A. utilize network search engines to carry out Webpage search according to described Query Information, from Search Results, collect ordering top n webpage as the reference webpage;
B. be designated as w for any word that occurred in reference to webpage at N, individual with reference to the total frequency of occurrences in the webpage at N according to following formula statistics w, and then the TF-IDF coefficient of calculating w;
t(w)=y 1/m 1+y 2/m 2+...+y N/m N (1)
ot(w)=t(w)ln(1+N/n w)(2)
Wherein: n wFor N with reference to webpage D 1~D NIn contain the webpage number of word w; y iFor w at reference webpage D iIn occurrence number, m iBe reference webpage D iIn total word number, t (w) is w ot (w) is the TF-IDF coefficient of w with reference to the total frequency of occurrences in the webpage at N, i=1,2 ..., N;
C. be designated as gw for any word that occurred at M picture webpage, described picture webpage is the corresponding webpage of reference picture, according to the total frequency of occurrences of following formula statistics gw in M picture webpage, and then the TF-IDF coefficient of calculating gw;
t g(gw)=y g,1/m g,1+y g,2/m g,2+...+y g,M/m g,M (3)
ot g(gw)=t g(gw)ln(1+M/n g,gw)(4)
Wherein: n G, gwBe M picture webpage GD 1~GD MIn contain the webpage number of word gw; y G, iFor gw at picture webpage GD jIn occurrence number, m G, iBe picture webpage GD jIn total word number, t g(gw) be the total frequency of occurrences of gw in M picture webpage, ot g(gw) be the TF-IDF coefficient of gw, j=1,2 ..., M;
D. for N with reference to any the word gw that occurred in any the word w that occurred in the webpage and M the picture webpage, by the semantic relevancy quantity algorithm, calculating semantic relevancy each other, and then obtain the semantic relevancy matrix;
E. according to the semantic relevancy matrix, by formula r (G j, Q)=OT (Q) * TH (Q) * OT G(Q), calculate the degree of correlation between every reference picture and the Query Information; Wherein: OT (Q)=[ot (w 1) ..., ot (w U (Q))], OT G(Q)=[(ot g(gw 1) ..., ot g(gw V (Q))], TH (Q) is the semantic relevancy matrix, r (G j, Q) be the degree of correlation between arbitrary reference picture and the Query Information; U (Q) is the individual total number with reference to word in the webpage of N, and V (Q) is total number of word in M the picture webpage;
(3) calculate similarity between reference picture, concrete mode is as follows:
A. open arbitrary reference picture in the reference picture for M, by the Visual Feature Retrieval Process algorithm, extract each local visual feature in arbitrary the reference picture;
B. add up M and open the occurrence number of each different local visual feature in the reference picture, only keep wherein occurrence number greater than the local visual feature of first threshold;
C. for the local visual feature that remains, judge local visual feature connectedness between any two;
D. for the local visual feature that retains, according to connectedness statistics connected component, each connected component is designated as a visual signature bunch; With the equal disconnected single local visual feature of other local visual features, also be designated as a visual signature bunch;
E. add up the occurrence number that M opens each different visual signature in the reference picture bunch, only keep occurrence number wherein greater than the visual signature of Second Threshold bunch;
F. open any two reference picture in the reference picture for M, calculate similarity between them according to following formula;
s ( G i , G j ) = Σ vp ∈ G i , G j | | VP | | 2 1 + | | VP | | / NVP ( G i , G j ) - - - ( 5 )
Wherein: s (G i, G j) be any two reference picture G iAnd G jBetween similarity, VP is a visual signature that retains bunch, and appears at simultaneously reference picture G iAnd G jIn; ‖ VP ‖ represents the number of the local visual feature that contains among the VP; NVP (G i, G j) be reference picture G iAnd G jThe number of middle different visual signature bunch;
(4) according to the similarity between reference picture, revise the degree of correlation of every reference picture and Query Information, obtain the revised degree of correlation of every reference picture and Query Information, concrete mode is as follows:
A. according to the similarity between reference picture, make up similarity matrix;
B. according to the degree of correlation of every reference picture of following formula correction and Query Information;
R ′ ( Q ) = ( I + bS ( Q ) + b 2 S 2 ( Q ) 2 ! + b 3 S 3 ( Q ) 3 ! + b 4 S 4 ( Q ) 4 ! ) R ( Q ) - - - ( 6 )
Wherein: I is that M * M ties up unit matrix, and b is correction factor, and S (Q) is similarity matrix, R (Q)=[r (G 1, Q) ..., r (G M, Q)], R ' is the revised matrix of R (Q) (Q), R ' each element value in (Q) is every reference picture and the revised relevance degree of Query Information;
(5) according to the revised degree of correlation of every reference picture and Query Information, have or not the mark picture to sort to institute corresponding to described Query Information, concrete mode is as follows:
A. open in the reference picture arbitrary different visual signatures bunch for M, calculate it with respect to the TF-IDF coefficient of arbitrary reference picture according to following formula;
ot vp(G j,vp)=(1+ln(t j,vp))ln(1+M/m vp)(7)
Wherein: vp is arbitrary different visual signature bunch, t J, vpFor vp at reference picture G jThe number of times of middle appearance, m VpFor M opens the picture number that contains vp in the reference picture, ot Vp(G j, be that vp is with respect to G vp) jThe TF-IDF coefficient;
B. open in the reference picture arbitrary different visual signatures bunch for M, calculate the degree of correlation of itself and Query Information according to following formula;
rel ( vp , Q ) = Σ j = 1 M r ′ ( G j , Q ) ot vp ( G j , vp ) Σ vp ′ ∈ G j ot vp ( G j , vp ′ ) - - - ( 8 )
Wherein: r ' (G j, Q) be the revised degree of correlation of every reference picture and Query Information; Q is Query Information, and rel (vp, Q) is the degree of correlation of vp and Q;
C. for Query Information corresponding without arbitrary in the mark picture set without the mark picture, calculate the degree of correlation of itself and Query Information according to following formula;
Rel ( P x , Q ) = Σ vp ∈ P x rel ( vp , Q ) - - - ( 9 )
Wherein: P xBe arbitrary nothing mark picture, Rel (P x, Q) be P xThe degree of correlation with Q;
D. for without arbitrary in the mark picture set without the mark picture according to its degree of correlation with Query Information, sort from big to small.
CN 201110322609 2011-10-21 2011-10-21 Sorting method of unlabelled pictures for network search Expired - Fee Related CN102368266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110322609 CN102368266B (en) 2011-10-21 2011-10-21 Sorting method of unlabelled pictures for network search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110322609 CN102368266B (en) 2011-10-21 2011-10-21 Sorting method of unlabelled pictures for network search

Publications (2)

Publication Number Publication Date
CN102368266A CN102368266A (en) 2012-03-07
CN102368266B true CN102368266B (en) 2013-03-20

Family

ID=45760830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110322609 Expired - Fee Related CN102368266B (en) 2011-10-21 2011-10-21 Sorting method of unlabelled pictures for network search

Country Status (1)

Country Link
CN (1) CN102368266B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682095B (en) * 2012-04-27 2015-06-10 百度在线网络技术(北京)有限公司 Method for searching paired pictures and searching system for providing the paired pictures
CN103425644B (en) * 2012-05-14 2016-04-06 腾讯科技(深圳)有限公司 The extracting method of picture and device in Web page text
CN103870597B (en) * 2014-04-01 2018-03-16 北京奇虎科技有限公司 A kind of searching method and device of no-watermark picture
US10891019B2 (en) * 2016-02-29 2021-01-12 Huawei Technologies Co., Ltd. Dynamic thumbnail selection for search results

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256594A (en) * 2008-03-25 2008-09-03 北京百问百答网络技术有限公司 Method and system for measuring graph structure similarity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2652714A1 (en) * 2006-05-29 2007-12-06 Philip Ogunbona Content based image retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256594A (en) * 2008-03-25 2008-09-03 北京百问百答网络技术有限公司 Method and system for measuring graph structure similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Boosting学习的图片自动语义标注;茹立云等;《中国图象图形学报》;20060430;第11卷(第4期);第486-491页 *
茹立云等.基于Boosting学习的图片自动语义标注.《中国图象图形学报》.2006,第11卷(第4期),第486-491页.

Also Published As

Publication number Publication date
CN102368266A (en) 2012-03-07

Similar Documents

Publication Publication Date Title
CN101174273B (en) News event detecting method based on metadata analysis
CN103593350B (en) Method and device for recommending promotion keyword price parameters
CN103412937B (en) A kind of search purchase method based on handheld terminal
CN101388022B (en) Web portrait search method for fusing text semantic and vision content
CN103778227B (en) The method screening useful image from retrieval image
Negoescu et al. Analyzing flickr groups
CN103593425B (en) Preference-based intelligent retrieval method and system
CN105426529B (en) Image retrieval method and system based on user search intention positioning
Sinha et al. Summarization of personal photologs using multidimensional content and context
CN101853295B (en) Image search method
CN102163226B (en) Adjacent sorting repetition-reducing method based on Map-Reduce and segmentation
CN104317834A (en) Cross-media sorting method based on deep neural network
Bendersky et al. Learning from user interactions in personal search via attribute parameterization
CN103064903A (en) Method and device for searching images
CN104834693A (en) Depth-search-based visual image searching method and system thereof
CN103544216A (en) Information recommendation method and system combining image content and keywords
CN103559191A (en) Cross-media sorting method based on hidden space learning and two-way sorting learning
CN101460947A (en) Content based image retrieval
CN105426514A (en) Personalized mobile APP recommendation method
CN101369281A (en) Retrieval method based on video abstract metadata
CN102368266B (en) Sorting method of unlabelled pictures for network search
CN106294661A (en) A kind of extended search method and device
CN102456064B (en) Method for realizing community discovery in social networking
CN101751439A (en) Image retrieval method based on hierarchical clustering
CN100446003C (en) Blog search and browsing system of intention driven

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130320

Termination date: 20141021

EXPY Termination of patent right or utility model