CN103473327A

CN103473327A - Image retrieval method and image retrieval system

Info

Publication number: CN103473327A
Application number: CN2013104202879A
Authority: CN
Inventors: 钟海兰
Original assignee: GUANGDONG TUTUSOU NETWORK TECHNOLOGY Co Ltd
Current assignee: GUANGDONG TUTUSOU NETWORK TECHNOLOGY Co Ltd
Priority date: 2013-09-13
Filing date: 2013-09-13
Publication date: 2013-12-25

Abstract

The invention discloses an image retrieval method and an image retrieval system. The method includes for given query texts and/or query images, acquiring multiple similarity ordered lists of in-base images according to text relevance and image content relevance, and then returning a comprehensive ordered list by combining the acquired ordered lists and comprehensively considering the text similarity and the image content similarity. Through the multi-mode mixed retrieval mechanism, shortcomings of conventional single-mode retrieval mechanisms are overcome, respective advantages of a text retrieval method and an image content retrieval method are developed, and accuracy of image retrieval is greatly improved. Since only ordering results of single retrieval models are fused, the single retrieval models can be increased, decreased and replaced conveniently, text and image content feature retrieval models are configured flexibly, and performance of the image retrieval system is improved.

Description

Image search method and system

Technical field

The present invention relates to technical field of information retrieval, particularly relate to a kind of image search method and system.

Background technology

Nearly ten years, image retrieval is the hot research theme of MultiMedia Field always.Image indexing system is to provide a class Specific Search Engine System of relational graph image document retrieval service on internet according to the descriptive text of image or visual signature (being picture material) for the user.For example, Google, a series of search engines such as Baidu can both provide the picture searching service.

Traditional image retrieval depends on the descriptive text of picture, is generally according to key word, the picture in database to be searched for.For example, but visual signature (certain specific decorative pattern) that can't the accurate description image in a lot of situations of text keyword, so the retrieval technique based on picture material occurred.At present, the feature of Description Image content has a lot, color characteristic for example, textural characteristics, shape facility etc.Yet, although image content features can catch the visual similarity of picture, vision is similar might not represent that semanteme is similar, has " semantic gap " problem.Therefore, text based image retrieval and CBIR cut both ways, and all can not meet consumers' demand well.

Summary of the invention

Based on above-mentioned situation, the present invention proposes a kind of image search method and system, to improve the accuracy of image retrieval.

A kind of image search method comprises step:

Receive inquiry picture and/or query text that the user submits to;

Extract the various content characteristics of described inquiry picture, and described query text is carried out to participle;

By the corresponding contents feature of every pictures in the various content characteristics of described inquiry picture and database relatively, according to similarity, the picture in database is sorted, obtained each list of content similarity; The descriptive document comparison corresponding with the every pictures of database by the described query text after participle, sorted to the picture in database according to similarity, obtains the list of text similarity;

According to the weight of the position in each list and place list, compose and divide for each picture in database, and, according to composing a minute rearrangement, obtain the integrated ordered list of similarity, this list is returned to the user.

A kind of image indexing system comprises:

The Query Information receiving end, inquiry picture and/or the query text for receiving the user, submitted to;

The Query Information processing module, for extracting the various content characteristics of described inquiry picture, and carry out participle to described query text;

Similarity individual event order module, for the corresponding contents feature comparison of the various content characteristics by described inquiry picture and the every pictures of database, sorted to the picture in database according to similarity, obtains each list of content similarity; The descriptive document comparison corresponding with the every pictures of database by the described query text after participle, sorted to the picture in database according to similarity, obtains the list of text similarity;

The integrated ordered module of similarity, for according in the position of each list and the weight of place list, be that each picture in database is composed minute, and, according to composing a minute rearrangement, obtain the integrated ordered list of similarity, and this list is returned to the user.

Image search method of the present invention and system, for given query text and/or inquiry picture, the sequencing of similarity list that draws picture in a plurality of storehouses according to text relevant and image content correlativity respectively, then in conjunction with a plurality of sorted lists that draw, consider text similarity and image content similarity, return to a comprehensive sorted lists.This multi-modal mixed index mechanism has been avoided the deficiency of single mode search mechanism in the past, has brought into play text searching method and Image Retrieval method advantage separately, has greatly improved the accuracy rate of picture retrieval.Because the ranking results to each multiple aspect rearching model is only merged, therefore can increase and decrease easily, replace the multiple aspect rearching model, realized the flexible configuration of text and image content features retrieval model, improved the performance of image indexing system.

The accompanying drawing explanation

The schematic flow sheet that Fig. 1 is image search method of the present invention;

The process flow diagram that Fig. 2 is picture material and text retrieval-by-unification in image search method of the present invention;

The result for retrieval contrast that Fig. 3 is application image search method of the present invention and traditional search method;

The structural representation that Fig. 4 is image indexing system of the present invention.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is described in further detail.Should be appreciated that embodiment described herein, only in order to explain the present invention, does not limit protection scope of the present invention.

Image search method of the present invention as shown in Figure 1, comprises step:

Inquiry picture and/or query text that step S101, reception user submit to;

Step S102, extract the various content characteristics of described inquiry picture, and described query text is carried out to participle;

Step S103, by the corresponding contents feature of every pictures in the various content characteristics of described inquiry picture and database relatively, sorted to the picture in database according to similarity, obtains each list of content similarity; The descriptive document comparison corresponding with the every pictures of database by the described query text after participle, sorted to the picture in database according to similarity, obtains the list of text similarity;

Step S104, according to position in each list and the weight of place list, be that each picture in database is composed minute, and, according to composing a minute rearrangement, obtain the integrated ordered list of similarity, this list is returned to the user.

Traditional search method, the textual description information of submitting to according to the user is retrieved, or extracts a kind of feature the picture of submitting to from the user and retrieved, be i.e. single mode retrieval.And adopt this search method, user both can be retrieved according to picture or textual description information merely, can also carry out retrieval-by-unification according to picture and textual description information simultaneously.In the situation that the user only submits picture to, as described in step S102, what this search method was extracted is not to only have a kind of content characteristic, but extracts the plurality of kinds of contents feature, and carries out overall ranking.In a word, compare traditional search method, this search method is a kind of multi-modal mixed index method.Experiment showed, this mixed index mechanism more in the past the search mechanism of single mode improve a lot returning results aspect accuracy rate.Below above-mentioned several steps are described in detail.

After submit queries information, in step S102, the inquiry picture of submitting to is extracted to content characteristic, and the text of submitting to is carried out to participle.In embodiments of the present invention, image content features preferably includes color characteristic, textural characteristics and shape facility, and these three kinds of features are features of more typically reacting image content commonly used at present.The method that participle is taked is hidden markov model (HMM).If state set is Q=(q ₁, q ₂... q _n), i.e. part of speech (for example in prefix, word, the suffix) complete or collected works of mark; The observation set is V=(v ₁, v ₂... q _m), i.e. the complete or collected works that treat the participle character of user's input; Observation sequence is O=(o ₁, o ₂... o _t), input treats a minute character string; Its status switch is I=(i ₁, i ₂... i _t), treat the part of speech sequence label that minute character string is possible.At first determining the corpus used, then by the method for statistics, will obtain three parameters of hidden markov model, is respectively state transition probability matrix A=[a _ij] _{n * N}, observation probability matrix B=[b _j(k)] _{n * M}, original state probability vector π=(π _i).Wherein:

a_{ij} = P (q_{j} | q_{i}) = \frac{P (q_{i}, q_{j})}{P (q_{i})} = \frac{count (q_{i}, q_{j})}{count (q_{i})}

b_{j} (k) = P (v_{k} | q_{j}) = \frac{P (v_{k}, q_{j})}{P (q_{j})} = \frac{count (v_{k}, q_{j})}{count (q_{j})}

π _i=P(q _i)

Wherein count represents frequency, by training data, is obtained.

After HMM model λ=(A, B, π) determines, use Viterbi (viterbi) algorithm to carry out participle.The definition all single path (i that the t state is i constantly ₁, i ₂... i _t-1, i _t) in the maximum probability value be δ _t(i) and t-1 node establishing the path of maximum probability be Ψ _t(i).At first initialization, make δ ₁(i)=π _ib _i(o ₁), Ψ ₁(i)=0, i=1,2 ..., N.Then recursion, to t=2 ..., T, calculate respectively:

δ_{t} (i) = \max_{1 \leq j \leq N} [δ_{t - 1} (j) a_{ji}] b_{i} (o_{t}),

Ψ_{t} (i) = \arg \max_{1 \leq j \leq N} [δ_{t - 1} (j) a_{ji}],

i=1，2，…，N

Finally, make P ^*=max _1≤i≤Nδ _t(i) and

p ^*the probability that means optimal path,

the terminal that means optimal path.After finding the terminal of optimal path, recalled, to t=T-1, T-2 ... 1, order

try to achieve optimal path

the hidden state sequence that optimal path is exported, namely corresponding word segmentation result.Utilize same procedure to carry out participle to the document information of picture in storehouse, and utilize classical inverted index to carry out index to document, so that efficient retrieval.

Then we extract the content characteristic of inquiry picture, comprise color characteristic, textural characteristics and shape facility.At first we set up color histogram.When submit queries picture Q, first the picture Q submitted to is carried out to pre-service, statistic histogram on the color vector space then, color histogram is the discrete function of an one dimension, that is:

In formula, n _kfor the number of pixels that to quantize rear color feature value be k, total number that N is image pixel, l is color feature value number, the i.e. dimension of a n dimensional vector n H after quantizing.Obtain thus the color histogram vector H of query image Q _q.Under off-line case, picture in storehouse is extracted too the color histogram feature and sets up index.

Then we extract yardstick invariant features (Scale-Invariant Feature, sift), describe the textural characteristics of picture.After submit queries picture I (x, y), the metric space of inquiry picture is:

L(x，y，σ)=G(x，y，σ)*I(x，y)

G (x, y, σ) is Gaussian function (sigma is scale parameter).Then calculate the difference of Gaussian (Difference of Gaussian) of adjacent scalogram picture, that is:

D(x，y，σ)＝L(x，y，kσ)-L(x，y，σ)

Wherein, k generally gets 2 ^1/3.

After calculating the difference of Gaussian of adjacent scalogram picture, obtain a series of images, and ask extreme point in this image space.Compare respectively a pixel and its all consecutive point in this difference diagram of each panel height, see that whether it is larger or little than the consecutive point of its image area and scale domain.After obtaining extreme point, need to carry out curve fitting to metric space DoG function, screen extreme point, remove the point on low contrast and edge:

D (\overset{&OverBar;}{X}) = D + \frac{1}{2} \frac{&PartialD; D^{T}}{&PartialD; X} \overset{&OverBar;}{X}

Wherein,

what mean is the skew of sample point,

extreme value for X.To each candidate's extreme point

judged, if its value is less than certain threshold value (generally getting 0.03 gets final product), judged the unstable extreme point that this candidate's extreme point is low as contrast, removed.

In order to obtain stable extreme point, the impact that also should remove edge, when

\frac{Tr {(H)}^{2}}{Det (H)} < \frac{{(r + 1)}^{2}}{r}

The time, critical pixel reservation, on the contrary reject.Key point is the unique point that we will look for.Wherein,

H = (\begin{matrix} D_{xx} & D_{xy} \\ D_{xy} & D_{yy} \end{matrix})

The hession matrix, D _xxtwice of the image x direction differentiate of a certain yardstick in the DoG space.Tr (H) is the H matrix trace, and Det (H) is the determinant of matrix H.α is the eigenwert that matrix H is larger, and β is the eigenwert that the H matrix is less, γ=α/β.

After having determined the position of image characteristic point, the unique point that next our gradient by asking each unique point neighborhood is image is composed a direction, and gradient magnitude m (x, y) is defined as with gradient direction θ (x, y) so:

m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1, y))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}}

θ (x, y) = \tan^{- 1} (L (x, y + 1) - L (x, y - 1)) / (L (x + 1, y) - L (x - 1, y))

Centered by unique point, delimit a zone, utilize the gradient of all points in this zone to form a direction histogram.Select the principal direction as this unique point of ordinate value maximum from histogram.If there is other direction, the size of its ordinate is greater than 80% of principal direction ordinate, also the direction using this direction as this unique point.

After feature point detection, next determine the descriptor of unique point.At first, take unique point as the center of circle by θ ° (being adjusted to 0 °) of unique point neighborhood rotation, the direction that wherein θ is unique point.In postrotational image, get 16 * 16 neighborhood window centered by unique point, a pixel in each little lattice representative feature vertex neighborhood window.Rectangular window by 16 * 16 evenly is divided into 16 sub regions, adopt the method for Gaussian Blur, increase the weighted value with the unique point closer neighborhoods, and the weighted value of reduction and unique point neighborhood far away, then calculate the histogram of gradients of 8 directions in each zone, obtaining the proper vector of unique point descriptor, is 4 * 4 * 8=128 dimensional vector.Then, the unique point descriptor is carried out to normalized, establishing D is the unique point descriptor, i.e. D=(d ₁, d ₂... d ₁₂₈), after normalization, obtain:

\overset{&OverBar;}{D} = \frac{D}{\sqrt{Σ_{i = 1}^{128} d_{i}^{2}}} = (\overset{&OverBar;}{d_{1}}, \overset{&OverBar;}{d_{2}}, \cdot \cdot \cdot \overset{&OverBar;}{d_{128}})

In order to reduce large Grad impact, set a threshold value 0.2 for it, if in vector, the value of certain one dimension is greater than 0.2, it is set to 0.2, and re-starts normalized.

Under off-line case, utilize equally above-mentioned steps to obtain all unique point descriptors of picture in storehouse, and by these descriptor clusters.Using the cluster that obtains as " visual word ", and answer word bag model (Bag of Words) to carry out inverted index to picture in storehouse.Then applying the proper vector that identical word bag model obtains inquiring about picture expresses.

Finally we carry out index to the global shape feature of picture.After the submit queries picture, at first utilize the Gabor wave filter to carry out according to the following formula sampling filter to the inquiry picture:

Wherein,

\{\begin{matrix} x_{r_{θ_{i}}} = x \cos (θ_{i}) + y \sin (θ_{i}) \\ y_{r_{θ_{i}}} = - x \sin (θ_{i}) + y \cos (θ_{i}) \end{matrix}

The yardstick that l is wave filter; K is normal number; The standard deviation that σ is Gaussian function; θ _i=π (i-1)/θ _l, i=1,2 ..., θ _l, θ _lfor the direction sum under yardstick l.Image and Gabor wave filter are carried out to convolution, obtain filtered image and be:

F_{θ_{i}}^{l} = G_{θ_{i}}^{l} * I

Filtered picture is divided into to 4 * 4 grid, averages in each grid, finally the mean value obtained in all directions, each yardstick grid is placed in a vector, as the shape facility of inquiry picture.In off-line index step, picture in storehouse is done to the same shape facility index (setting up k-d tree index and hash index so that efficient retrieval) that calculates, so that match map plate shape feature.

After the content characteristic information of the text message after obtaining participle and inquiry picture, in step S103, according to the information obtained, in the joint index of text and image, search for and search picture concerned.

Fig. 2 is this search method schematic flow sheet comparatively intuitively, and shows the specific implementation method of step S103: we set up respectively retrieval (IR) model (step S201) based on each single mode (individual event) feature.Wherein, each IR model is independent operating and can freely configures, and therefore can select different IR models to be combined according to actual conditions, and the corresponding sort algorithm of last basis separately returns to a list that comprises result.Then in step S202, the text message after participle and/or the content characteristic that extracts picture from inquiry are inputted to corresponding IR model and obtain a plurality of sorted lists, step S203 is merged these sorted lists, finally obtains integrated ordered list and returns to the user.This combined method for searching embodiment is in conjunction with the result of text based retrieval model output and the result of the output of the retrieval model based on picture material, obtain a comprehensive sorted lists, what wherein comprise is the picture result of returning, and by the descending sort with the Query Information degree of correlation.

Specifically, in step S201, model text based IR model.Preferably, adopt statistical language modeling technology (Statistical Language Modeling) to set up text IR model.If V means the dictionary (vocabulary) of certain language, V={ ω ₁, ω ₂..., ω _{| v|}, claim ω _ibe a lexical item (term), D is one piece of document in document sets C, D=d ₁d ₂d _n, d _i∈ V.In statistical translation model, when the user submits text query information Q to, Q=q ₁q ₂q _m, q _i∈ V, document D " translation " becomes the probability of Query Information Q to be:

P (Q | D) = Π_{i = 1}^{m} \underset{ω}{Σ} t (q_{i} | ω) P (ω | D) - - - (1)

Wherein, P (ω | D) be basic document language model, t (q _i| what ω) mean is translation probability.After calculating P (Q|D), need to return to document and concentrate the rank of document.Now, we need to estimate posterior probability P (D|Q), according to Bayesian formula, have:

P (D | Q) = \frac{P (Q | D) P (D)}{P (Q)} &Proportional; P (Q | D) P (D) - - - (2)

Wherein, P (D) can adopt the irrelevant amount of certain and inquiry, in model, does not consider this.After calculating posterior probability P (D|Q), just can carry out rank to the document in document sets according to probable value, return to a sorted lists, as step S202.

After the submit queries picture, in step S201, set up a plurality of IR models based on image content features.The corresponding a kind of characteristics of image of each IR model, comprise color characteristic, textural characteristics, shape facility etc.As previously mentioned, these image content features all are indicated in vector space.Therefore, need to weigh the similarity between proper vector.Preferably utilize Euclidean distance to calculate similarity.Two n-dimensional vector (x ₁₁, x ₁₂... x _1n) and (x ₂₁, x ₂₂... x _2n) between Euclidean distance be:

d_{12} = \sqrt{Σ_{k = 1}^{n} {(x_{1 k} - x_{2 k})}^{2}}

The larger expression of distance is more uncorrelated.After similarity calculate to finish, return to sorted lists by the descending of the degree of correlation, as step S202.

It should be noted that picture is extracted to several features, just return to several lists.For example, if extracted respectively color and two kinds of features of texture of image, so just return to two sorted lists.Each list is according to the degree of correlation of inquiry picture on respective visual features, carrying out descending sort.

After drawing several picture relevancy ranking lists, respectively to each the pictures d in list _jdistribute a score S _hLFIRM, formula is as follows:

S_{HLFIRM} (d_{j}) = (Σ_{i = 1}^{N} 1_{d_{j &Element; L_{i}}}) \times (α_{i} \times \frac{1}{ψ (d_{j}, L_{i},)})

Wherein, ψ (x, H) means the position of picture x in list H, 1 _abe an indicator function, when a is true time, namely work as d _jbelong to list L _ithe time get 1, otherwise get 0.α _ithe weight of i IR model, and say intuitively, the picture that appears at forward position in a plurality of lists will obtain higher mark, and the picture that score is higher and the degree of correlation of Query Information are larger.Calculate the mark of above formula definition and sorted with this mark, as step S203 and S204.

In above formula, if the some IR model performances in model are better than other IR model, should distribute for it higher weighted value, with this, promote the performance of whole system.In example of the present invention, adopt the method for Automatic Optimal that weight α is set _i.Optional automatic optimization method comprises genetic algorithm, annealing algorithm etc.For example, in genetic algorithm, we are the weights of each model of initialization at first, by random initializtion repeatedly, produce the initial weight population of vectors.Then, each individual adaptability in colony is measured.In image retrieval, the performance of the Search Results that the adaptability of a weight vector is produced by this weight vector is weighed, be more given test query and obtain these and inquire about corresponding picture concerned collection, then according to the performance of picture concerned collection calculation procedure output Search Results as adaptive criteria.The index of measurement Search Results performance has a variety of, as F1 score value, Normalized Discounted Cumulative Gain, Mean Average Precision etc.Then we according to adaptability to individuality selected, crossover and mutation, produce population of new generation.The population information of a new generation is better than previous generation.Go round and begin again, the fitness of weight vector improves constantly, until meet the algorithm end condition: in the maximum iteration time restriction, be met the individuality of the weight vector fitness desired value set in advance; Perhaps reach maximum iteration time, now return to the highest individuality of fitness in all generation individualities.

The embodiment of the present invention can also comprise step S105: if the user is satisfied to the result of output, search procedure finishes; If the user does not feel quite pleased Output rusults, or with oneself idea in advance, deviation is arranged, can supplement or revise so query text and/or inquire about picture on Query Information basis before.This method can be carried out participle and feature extraction to amended text and/or picture again, and repeating step S102 is to step S105, until export customer satisfaction system result.

Fig. 3 shows the result contrast of search mechanism of the present invention with the single mode Image Retrieval Mechanism.What in figure, the top of right half part showed is the result of mere text retrieval, and what middle part showed is the result of simple Image Retrieval, and bottom is the result that mixed index mechanism that the present invention proposes is returned.From the image result of returning, can find out, the performance of the present invention performance of single mode search method more in the past is greatly improved, and has met greatly user's requirement.

Image indexing system of the present invention is the system corresponding with said method, as shown in Figure 4, comprising:

As a preferred embodiment, the content characteristic of picture comprises color characteristic, textural characteristics and shape facility.

As a preferred embodiment, described Query Information processing module adopts hidden markov model to carry out participle to described query text.

As a preferred embodiment, the described query text after described similarity individual event order module employing statistical language modeling method tolerance participle and the similarity of the descriptive document that in database, every pictures is corresponding; Adopt Euclidean distance to calculate the similarity of the corresponding contents feature of every pictures in the various content characteristics of described inquiry picture and database.

As a preferred embodiment, the integrated ordered module of described similarity adopts genetic algorithm or annealing algorithm that the weight of each list is set.

To sum up, beneficial effect of the present invention is as follows:

The present invention has promoted the practicality of picture search: at first, searching method in the past mostly is based on the search of single mode, has limited to a certain extent the mode that the user expresses query intention.Secondly, CBIR faces the semantic gap problem.And the present invention is by conjunction with text and the multi-modal information of picture material, practical change this present situation.

The present invention has promoted the dirigibility of picture search: image search method in the past is mostly to utilize fixing several features to be searched for, and characteristics of the present invention are free configuration feature combination neatly.

The present invention has promoted the accuracy rate of picture search: this method is by returning results in conjunction with text-based image retrieval method and Content-Based Image Retrieval method, and then draw picture relevancy ranking list more accurately, greatly promoted the accuracy rate returned results.

The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims

1. an image search method, is characterized in that, comprises step:

Receive inquiry picture and/or query text that the user submits to;

2. image search method according to claim 1, is characterized in that,

The content characteristic of picture comprises color characteristic, textural characteristics and shape facility.

3. image search method according to claim 1 and 2, is characterized in that,

Adopt hidden markov model to carry out participle to described query text.

4. image search method according to claim 1 and 2, is characterized in that,

Described query text after employing statistical language modeling method tolerance participle and the similarity of the descriptive document that in database, every pictures is corresponding;

Adopt Euclidean distance to calculate the similarity of the corresponding contents feature of every pictures in the various content characteristics of described inquiry picture and database.

5. image search method according to claim 1 and 2, is characterized in that,

Adopt genetic algorithm or annealing algorithm that the weight of each list is set.

6. an image indexing system, is characterized in that, comprising:

7. image indexing system according to claim 6, is characterized in that,

8. according to the described image indexing system of claim 6 or 7, it is characterized in that,

Described Query Information processing module adopts hidden markov model to carry out participle to described query text.

9. according to the described image indexing system of claim 6 or 7, it is characterized in that,

Described query text after described similarity individual event order module employing statistical language modeling method tolerance participle and the similarity of the descriptive document that in database, every pictures is corresponding; Adopt Euclidean distance to calculate the similarity of the corresponding contents feature of every pictures in the various content characteristics of described inquiry picture and database.

10. according to the described image indexing system of claim 6 or 7, it is characterized in that,

The integrated ordered module of described similarity adopts genetic algorithm or annealing algorithm that the weight of each list is set.