Background technology
Simple, it is exactly to use the result that characteristic information that image comprises returns image search engine to resequence that image reorders, and obtains the Search Results that more makes user satisfied.Generally speaking, the characteristic information of image comprises the visual information of text message and the image of image.
Existing web image search engine, is used the text feature of image correlation connection to sort to image, image text around for example, anchor text etc.Because text feature also has too many noise and the visual signature of having ignored image, the result that search is returned is easy to make user dissatisfied.Therefore, the image technology of reordering has good theoretical research and application background.
Most of images algorithm that reorders adopts visual signature to reorder, and has the research work of a lot of this respects.Summary is got up, and can be divided into three class algorithms below: the mode based on cluster; Mode and the mode based on figure based on classification.Wherein the mode based on figure obtains many concerns, and in the retrieval of image and video, has obtained very good result.In the method based on figure, image is used as the node in figure, and the similarity between image is used as the weights between image.Mode based on figure is the consistance based on ranking results in figure conventionally, and for example adjacent node should have similar ranking results.Wherein random walk and semi-supervised learning are two kinds of frameworks conventional in the algorithm based on figure.
But much research points out, only using image vision information to reorder can not achieve satisfactory results.Therefore, many researchers have proposed to merge the algorithm based on figure reordering that multiple characteristics of image carries out image, and wherein " fusion in early days " and " fusion in late period " are modal two kinds of modes.But these algorithms are seldom considered the semantic consistency of text feature and visual signature.In general the text feature of a sub-picture should have consistent semantic information with visual signature, thereby the text message of image and the consistance of visual information should be the key factors of middle consideration of reordering.
Half is superintended and directed learning algorithm is the class algorithm between supervised learning algorithm and unsupervised learning algorithm.Superintend and direct learning algorithm and be suggested for many different half.
Generally speaking, to need the technical matters of solution badly be how to use semi-supervised learning algorithm to carry out image to reorder to prior art.Summary of the invention
In order to overcome the deficiencies in the prior art, the present invention proposes a kind of semi-supervised image method for reordering based on isomery figure with self feed back characteristic, use text feature and the visual signature of image to build isomery figure, then adopt the semi-supervised learning algorithm with self feed back characteristic based on isomery figure to carry out the method that image reorders.The method is not only improved Search Results, do not need user's extra input, and working time is less, is suitable for use in real image indexing system, improves the reorder performance of technology of image.
To achieve these goals, the present invention adopts following technical scheme:
A semi-supervised image method for reordering based on isomery figure with self feed back characteristic, is characterized in that, the step of the method is as follows:
Step (1): the image that needs are reordered, extracts text feature and visual signature;
Step (2): build isomery figure with text feature and the visual signature of image, between the interior similarity of compute mode and mode, similarity is as the weights between isomery figure corresponding node;
Step (3): use the semi-supervised learning algorithm of self feed back on isomery figure, calculate image text feature ordering score and visual signature sequence score;
Step (4): according to image text feature score and the visual signature score calculated in step (3), computed image sequence score, according to score from high to low, reorders to image.
In described step (1),
The method of extracting visual signature is: every width image is extracted to SIFT feature, then represent the image as a word bag;
The method of extracting the text feature of image is: to every width image collection image associated text, utilize topic model LDA that described image associated text is polymerized to a plurality of potential themes, then the text message of image is also expressed as to a word bag.
In described step (2), the process that builds isomery figure is as follows:
First build a polygon, in described polygon, each node is a sub-picture, the text feature that comprises image and visual signature, in polygon, between any two nodes, comprise 4 limits, described 4 limits are respectively limit, the visual signature of two nodes and the limit between text feature between the text feature of the limit between the visual signature of two nodes, two nodes;
Wherein, the limit between the limit between the text feature of two nodes and the visual signature of two nodes is used for portraying similarity in mode, and visual signature and the limit between text feature of two nodes are used for portraying similarity between mode;
Then, each node in polygon is divided into two types of text feature node and visual signature nodes, between node, is connected, weight is similarity between similarity or mode in corresponding mode, thereby obtains isomery figure.
The concrete steps of described step (3) are as follows:
Step (31): the text feature of image and the sequence score f of visual signature that use each node
*upgrade initial sequence score vector y;
Step (32): the text feature of image and the sequence score f of visual signature that use each node
*upgrade the similarity between mode in similarity matrix S;
Step (33): use the similarity matrix S obtaining in step (32) to upgrade Laplacian Matrix L;
Step (34): if mean accuracy is greater than current optimum precision, give current optimum precision this mean accuracy assignment, feedback continues, and jumps to the step (1) of the algorithm that reorders and proceeds; Otherwise feedback stops, the algorithm that reorders stops.
The sequence score f of the text feature of described image and visual signature
*computing method as follows:
Wherein, f=[f
t, f
v] be the score that sorts in the isomery figure that need to ask, f (i), f (j) is respectively i, the sequence score of j width image, y=[y
t, y
v] be the sequence score in initial isomery figure, S is similarity matrix, D is a triangular matrix, wherein on diagonal line i element be element that s-matrix i is capable and, μ is balance parameters, be used for adjusting two items of formula right-hand part part, 0 < μ < 1, i, the span of j is 1 < i < N, 1 < j < N, the total number of images order of N for reordering;
First described formula (1) operation needs the sequence score of the text feature of image and visual signature to carry out respectively initialization;
Wherein, the score that the initialization of text feature sequence score and visual signature sequence score is all used normalized image search engine to return, that is:
Wherein, N is amount of images to be sorted, r
iit is the sequence in the result returned at search engine of image.
The iterative formula of described formula (1) is as follows:
Wherein, f (t) is the sequence score of the t time iteration, μ is identical with the implication in formula (1), μ is balance parameters, 0 < μ < 1, t is iterations, f (0)=y, the Laplacian Matrix of L for being calculated by similarity matrix S and triangular matrix D.
In described step (4), the last sequence score of image is mixed to get by the sequence score of the text feature sequence score of image and the visual signature of image, and computing formula is as follows:
RankScore(i)=αf(t
i)+(1-α)f(v
i) (4)
Wherein, RankScore (i) is the last sequence score of image, f (t
i) be image text feature ordering score, f (v
i) be Image Visual Feature sequence score, α is the parameter of mixing, between 0 to 1.
In described mode, similarity comprises the similarity between similarity, visual signature and the visual signature between text feature and text feature; Between described mode, similarity refers to the similarity between text feature and visual signature.
In described mode, similarity adopts cosine similarity calculating method, that is:
Wherein, p and q represent Text eigenvector or visual feature vector.
Between described mode, the influence factor of similarity comprises: the similarity of the consistance between mode, image text feature, the similarity between Image Visual Feature.
Consistance computing formula between described mode is:
Wherein, t
ithe text feature that represents i width image, v
irepresent i width Image Visual Feature, f (t
i), f (v
i) be respectively and use the sequence score of text feature and the sequence score of use visual signature, σ is zoom factor (σ > 0), the span of i is 1 < i < N, the total number of images order of N for reordering;
Between described mode, the computing formula of similarity is as follows:
s(t
i,v
j)=c(t
i,v
j)[αs(t
i,t
j)+(1-α)s(v
i,v
j)] (7)
Wherein, t
ithe text feature that represents i width image, v
ithe visual signature that represents i width image, t
jthe text feature that represents j width image, v
jrepresent respectively the visual signature of j width image, c (t
i, v
j) be the consistance between mode, s (t
i, t
j) be the similarity between text feature, s (v
i, v
j) be the similarity between visual signature, the parameter (0 < α < 1) of α for mixing, the span of i is 1 < i < N, the total number of images order of N for reordering.
The invention has the beneficial effects as follows:
1, algorithm inconsistent situation in text feature and visual signature that the present invention proposes, can improve Search Results;
2, the algorithm that the present invention proposes does not need user's extra input, is applicable to actual image indexing system application;
3, the present invention proposes algorithm and working time are less, are applicable to large-scale image indexing system.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described; Fig. 1 is algorithm flow chart of the present invention, in conjunction with this process flow diagram, below the enforcement of this algorithm and detail is described further.
A semi-supervised image method for reordering based on isomery figure with self feed back characteristic, the step of the method is as follows:
Step (1): the image that needs are reordered, extracts text feature and visual signature;
Step (2): build isomery figure with text feature and the visual signature of image, between the interior similarity of compute mode and mode, similarity is as the weights between isomery figure corresponding node;
Step (3): use the semi-supervised learning algorithm of self feed back on isomery figure, calculate image text feature ordering score and visual signature sequence score;
Step (4): according to image text feature score and the visual signature score calculated in step (3), computed image sequence score, according to score from high to low, reorders to image.
In described step (1),
The method of extracting visual properties is: to every width image, use the mode of intensive sampling (dense sampling) to extract SIFT feature, then use K-means clustering algorithm that the feature obtaining is carried out to cluster, obtain dictionary, then represent the image as a visual signature word bag v
i; The method of extracting image text feature is: to every width image collection image associated text, utilize topic model LDA that these texts are polymerized to a plurality of themes, then the text message of image is also expressed as to a text feature word bag t
i.
In described step (2), the process that builds isomery figure is as follows:
First build a polygon, in described polygon, each node is a sub-picture, the text feature that comprises image and visual signature, in polygon, between every two nodes, comprise 4 limits (such as, suppose that two nodes are respectively node 1 and node 2,4 limits are respectively so: the limit between the limit between the limit between the limit between the visual signature of node 1 and the visual signature of node 2, the visual signature of node 1 and the text feature of node 2, the text feature of node 1 and the visual signature of node 2, the text feature of node 1 and the text feature of node 2); Wherein, portray similarity in mode, portray similarity between mode for other 2 for 2;
Then, each node in polygon is divided into two types of text feature node and visual signature nodes, between node, is connected, weight is similarity between similarity or mode in corresponding mode, thereby obtains isomery figure.
As shown in figure (2), the text feature of matrix t (i) presentation video i wherein, the text feature of matrix t (j) presentation video j, the visual signature of circular v (i) presentation video i, the visual signature of circle v (j) presentation video j, solid line represents similarity in mode, and dotted line represents the similarity between mode.
In described mode, similarity comprises the similarity between similarity, visual signature and the visual signature between text feature and text feature; Between described mode, similarity refers to the similarity between text feature and visual signature.
In described mode, the computing method of similarity comprise: the inverse of Euclidean distance, cosine similarity, histogram intersection.
Between described mode, the influence factor of similarity comprises: the similarity of the consistance between mode, image text feature, the similarity between Image Visual Feature.
Consistance computing formula between described mode is:
Wherein, t
ithe text feature that represents i width image, v
irepresent i width Image Visual Feature, f (t
i), f (v
i) be respectively and use the sequence score of text feature and the sequence score of use visual signature, σ is zoom factor (σ > 0), the span of i is 1 < i < N, the total number of images order of N for reordering.
Between described mode, the computing formula of similarity is as follows:
s(t
i,v
j)=c(t
i,v
j)[αs(t
i,t
j)+(1-α)s(v
i,v
j)] (7)
Wherein, t
ithe text feature that represents i width image, v
ithe visual signature that represents i width image, t
jthe text feature that represents j width image, v
jrepresent respectively the visual signature of j width image, c (t
i, v
j) be the consistance between mode, s (t
i, t
j) be the similarity between text feature, s (v
i, v
j) be the similarity between visual signature, the parameter (0 < α < 1) of α for mixing, the span of i is 1 < i < N, the total number of images order of N for reordering.
In described step (3), based on isomery figure, adopt the semi-supervised learning algorithm with self feed back characteristic to obtain the later sequence score of rearrangement of image;
The objective function of described semi-supervised learning algorithm is as follows:
Wherein, f=[f
t, f
v] be the score that sorts in the isomery figure that need to ask, f (i), f (j) is respectively i, the sequence score of j width image, y=[y
t, y
v] be the sequence score in initial isomery figure, S is similarity matrix, D is a triangular matrix, wherein on diagonal line i element be element that s-matrix i is capable and, μ is balance parameters, be used for adjusting two items of formula right-hand part part, 0 < μ < 1, i, the span of j is 1 < i < N, 1 < j < N, the total number of images order of N for reordering.
The iterative formula of described semi-supervised learning algorithm is as follows:
Wherein, f (t) is the sequence score of the t time iteration, μ is identical with the implication in formula (3), μ is balance parameters, 0 < μ < 1, t is iterations, f (0)=y, the Laplacian Matrix of L for being calculated by similarity matrix S and triangular matrix D.
First this algorithm operation needs the sequence score of the text feature of image and visual signature to carry out respectively initialization.
Wherein, the score that the initialization of text feature sequence score and visual signature sequence score is all used normalized image search engine to return, that is:
Wherein, N is amount of images to be sorted, r
iit is the sequence in the result returned at search engine of image.
In order to utilize the feature between mode to obtain better image sequence score, self feed back algorithm has been proposed, can use automatically f obtained above
*upgrade similarity matrix S, thereby carry out next iteration.
The step of described self feed back algorithm is as follows:
Step (31): use f
*upgrade initial sequence score vector y;
Step (32): use f
*upgrade the similarity between mode in similarity matrix S;
Step (33): use the similarity matrix S obtaining in step (32) to upgrade Laplacian Matrix L;
Step (34): if mean accuracy (ap) is greater than current optimum precision (apbest), give current optimum precision (apbest) this mean accuracy (ap) assignment, feedback continues, and jumps to the step (1) of the algorithm that reorders and proceeds; Otherwise feedback stops, the algorithm that reorders stops.
In described step (4), the last sequence score of image is mixed to get by the sequence score of the text feature sequence score of image and the visual signature of image, and computing formula is as follows:
RankScore(i)=αf(t
i)+(1-α)f(v
i) (4)
Wherein, RankScore (i) is the last sequence score of image, f (t
i) be image text feature ordering score, f (v
i) be Image Visual Feature sequence score, α is the parameter of mixing, between 0 to 1.
Although above-mentioned, by reference to the accompanying drawings the specific embodiment of the present invention is described; but be not limiting the scope of the invention; one of ordinary skill in the art should be understood that; on the basis of technical scheme of the present invention, those skilled in the art do not need to pay various modifications that creative work can make or distortion still in protection scope of the present invention.