KR20190008699A

KR20190008699A - Method, system and computer program for semantic image retrieval based on topic modeling

Info

Publication number: KR20190008699A
Application number: KR1020170090390A
Authority: KR
Inventors: 이영구; 울아 칸 키파야트; 알리 무하마드; 안 투 구엔
Original assignee: 경희대학교 산학협력단
Priority date: 2017-07-17
Filing date: 2017-07-17
Publication date: 2019-01-25
Also published as: KR101976081B1

Abstract

The present invention relates to a method, system, and computer program for semantic image retrieval. The method of retrieving an image corresponding to a query image or retrieval keyword in at least one database image stored in a database comprises: a preprocessing step of acquiring at least one visual word and at least one text word from the database image to at least one tag is mapped; a step of estimating a model parameter of a background distribution topic model; a step of modeling subject distribution of the database image; a step of refining the tag; a step of modeling the subject distribution of the query image; and a step of evaluating the similarity of the database image. According to the present invention, the method may efficiently retrieve millions of images using texts or images.

Description

[0001] METHOD, SYSTEM AND COMPUTER PROGRAM FOR SEMANTIC IMAGE RETRIEVAL BASED ON TOPIC MODELING [0002]

The present invention relates to a semantic image searching method and system, and more particularly, to a semantic image searching method and system using background distribution topic modeling.

Recently, the social media network service is growing rapidly due to the development of the Internet. As a result, the amount of multimedia explosively increases, an effective image search system is required, and image annotation is becoming more important due to the necessity of efficient image search according to the explosively increasing web image.

Most image retrieval researches have been conducted mainly by Content-based Image Retrieval (CBIR) methods, which analyze the contents of images. Content-based image retrieval analyzes the content of images using visual features such as color, texture, and shape. This method works well when the number of defined tags is small, but the performance deteriorates as the data set grows and the kinds of tags become various.

Text-based Image Retrieval (TBIR) is a method of retrieving images corresponding to text using text as a query. This method is represented by a text descriptor where the visual content of the image is manually tagged, and is used to perform an image search in a database management system.

Content-based image retrieval is efficient for handling large databases, but has a semantic gap problem between low-dimensional image characteristics and high-dimensional image characteristics. Text-based image retrieval can support queries with a high level of concept, but it is time-consuming to manually tag individual images in large databases. In addition, in the case of a social network service image, a tag having a low relevance to an image is often tagged.

Therefore, in order to effectively search for an explosively increasing web image, there is a need for a semantic search system capable of reducing the above-described semantic gap and improving image tagging performance.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems and it is an object of the present invention to provide a method and apparatus for forming a correlation between a visual word, a text word and a background, An object of the present invention is to provide an image retrieval method and system.

It is another object of the present invention to provide an image retrieval method and system capable of predicting omission of tags and eliminating noise.

Another object of the present invention is to provide an image retrieval method and system capable of efficiently and effectively calculating the similarity between a query and a database image.

According to an aspect of the present invention, there is provided a method of retrieving an image corresponding to a query image or a search keyword from one or more database images stored in a database, the method comprising: A preprocessing step of obtaining a text word, a visual word distribution by topic using the visual word and the text word

), Subject text word distribution (

) And background visual word distribution (

Estimating a model parameter of a background distribution topic model that includes the estimated model parameters, using the background distribution topic model including the estimated model parameters,

), Modeling the thematic text word distribution (

) And image theme ratio (

Calculating a degree of correspondence of the tag with respect to the database image and refining the tag according to the degree of correspondence, acquiring one or more visual words from the query image when a search request including the query image is input, Using the background distribution topic model including the estimated model parameters,

), Modeling the subject distribution of the database image (

) And the subject distribution of the query image in the database

And evaluating the degree of similarity of the database image.

The present invention also provides an image retrieval system comprising: a database for storing one or more database images; a first preprocessing unit for obtaining one or more visual words and one or more text words from the database image to which one or more tags are mapped; And the word-by-topic visual word distribution (

), Subject text word distribution (

) And background visual word distribution (

) Of the database image using the background distribution topic model including the estimated model parameters,

), And the above-mentioned text word distribution by topic

) And the specific theme rate in the image (

A first background distribution topic modeling unit for calculating a correspondence degree of the tag with respect to the database image and refining the tag according to the degree of correspondence of the tag with respect to the database image, A second preprocessing unit for acquiring a word and obtaining one or more text words from the search keyword, a second preprocessing unit for acquiring a subject distribution of the query image using the background distribution topic model including the estimated model parameter

A second background distribution topic modeling unit for modeling the subject distribution of the database image

) And the subject distribution of the query image in the database

And a similarity evaluating unit for evaluating the similarity of the database image using the similarity estimating unit.

According to the present invention as described above, the correlation between a visual word, a text word, and a background can be formulated and used for image retrieval, thereby enhancing image retrieval accuracy.

In addition, according to the present invention, it is possible to predict the omission of a tag and remove noise.

Further, according to the present invention, it is possible to calculate the similarity between the query and the database image efficiently and effectively.

1 is a diagram for explaining a configuration of an image search system according to an embodiment of the present invention;
2 is a diagram for explaining a background distribution topic model according to an embodiment of the present invention;
3 is a flowchart illustrating an image search method according to an embodiment of the present invention.
FIG. 4 is a flowchart for explaining the preprocessing step according to an embodiment of the present invention in more detail; FIG.
5 is a flowchart for explaining the model parameter estimating step according to an embodiment of the present invention in more detail.

The above and other objects, features, and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, which are not intended to limit the scope of the present invention. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components, and all combinations described in the specification and claims can be combined in any manner. It is to be understood that, unless the context requires otherwise, references to singular forms may include more than one, and references to singular forms may also include plural forms.

To solve the problem of manual image tagging, an automatic annotation system has been proposed which can automatically deduce and modify tags. In the most noteworthy recent studies, image annotation has been regarded as a sorting task, and the classifier has trained to map visual features to tags. The main disadvantage of this approach, however, is that the annotation is limited to a small number of tag words by well-classified training data. Because the tags on images on the Internet actually contain very differentiated noise, the above limitations make it difficult to use in real-world applications.

Also, the classification based approach works well, but it is not easy to implement. The number of classification classes is increased by the number of words, and in order to increase accuracy, it requires a large number of training image sets. In addition, the correlation between visual characteristics and linguistic concepts is not so simple, which is one of the problems that makes implementation difficult.

There is a Generative Probabilistic Model called Corresponding Latent Dirichlet Allocation (hereinafter referred to as "CorrLDA") as a method for breaking the semantic gap problem and predicting image related keywords. The generation probability model is a model of the view that generates data according to a random process from a certain probability distribution and its parameters. CorrLDA is a model for finding the relationship between the image domain and the potential variable representation of a set of text (words). In order to predict related keywords in the image, we can find potential semantic topics from the concurrency pattern of image content and corresponding text . The CorrLDA model provides a way to learn potential topics from image features and text words. This model derives a direct relationship between visual and textual topics, using the degree of correspondence between visual features and text words through document-to-subject ratios.

The CorrLDA model has the advantage that the correlation between visual and textual topics is explicitly exploited through potential topics, and as a result, the semantic content of images can be extracted effectively, and the ability to solve multiple labeling problems directly Can be effectively utilized in real applications where the data set is updated dynamically.

The present invention is designed to solve the problem that CorrLDA is not sufficiently utilized in retrieval work and is characterized by extending the CorrLDA model using a new concept related to background words.

The Latent Dirichlet Allocation (LDA), which is a prototype of the CorrLDA model described above, is a probabilistic model of what subjects are present in each document for a given document. By analyzing the distribution of the number of words found in a given document based on the distribution of the number of words per topic known in advance, it is possible to predict which topics the document will deal with.

Similarly, it can be assumed that the database image and the query image used in one embodiment of the present invention have several subjects, and the subjects follow the Dirichlet distribution. That is, an image can be viewed as a set of one or more visual words, as a potential Dirichlet allocation would see the document as a set of words.

In other words, a topic is a probability distribution of visual words contained in an image, and consists of semantically similar or related words. In the present invention, each image is represented by a stochastic mixture of subjects, and each subject is represented by a distribution of visual words and text words (tags).

In one embodiment of the present invention, an image feature may be defined in the form of a visaul word, which can be obtained through a process of extracting feature information from a database image or a query image and then digitizing it. The image corpus according to one embodiment of the present invention thus generated can be expressed by a Bag of visual word and a Bag of text word. The visual vocabulary can be understood as a result of extracting image feature information from an image, such as SIFT, clustering the extracted image feature information to find a visual word, and then expressing each image with a histogram of visual words. A vocabulary can be understood as a set of one or more tags that are manually attached to each image.

In the present specification, 'at least one visual word' means a visual word book, and 'one or more text words' means a text word book. In addition, 'text word' and 'tag' refer to the same object, 'tag' means a word that is manually mapped to an image by a user at a collection site such as the web or a social network service, Can be understood as a more comprehensive concept as opposed to a 'word'.

Meanwhile, the image retrieval method according to an embodiment of the present invention uses a concept of removing a 'background word' which is a visual word narrow or not meaningful, which is generally included in the background of an image. Since background words are included in almost all images but they are not useful and noise can be measured in the similarity measurement in image search, it is possible to improve the accuracy of image search through topic modeling considering background words.

Further, the present invention proposes a score system for evaluating the similarity between a query and an image stored in a database in extracting a subject and refining a tag.

According to one embodiment of the present invention, visual words can be classified into a subject word and a background word. Therefore, in the present invention, each image can be represented by a visual subject distribution, a background distribution, and a text subject distribution.

Hereinafter, a method of searching for a semantic image based on a topic modeling according to an embodiment of the present invention will be described with reference to the drawings.

1 is a diagram illustrating an image retrieval system according to an embodiment of the present invention.

1, an image search system according to an exemplary embodiment of the present invention may include a search data generation module 100 and a search module 200. The search data generation module 100 may include a database 110, A first pre-processing unit 130 and a first background distribution topic modeling unit 150. The search module 200 may include a second pre-processing unit 230, a second background distribution topic modeling unit 250, And an evaluation unit 270.

According to an embodiment, the search data generation module 100 may be executed off-line, and the search module 200 may be executed on-line.

The database 110 of the search data generation module 100 stores one or more database images. The database image may be an image collected from the web or an image stored by a user, and the image may be one or more tags mapped.

In recent social network services, hash tags (# tags) are described in an image so as to be able to explain the characteristics of the images, and the images related to the tags can be easily retrieved using the tags. Depending on the characteristics of the web gathered image, the collected images may be stored in the database together with the tags mapped to the images.

One or more database images stored in the database may be transferred to the first preprocessing unit 130. The first preprocessor 130 preprocesses the database image to obtain one or more visual words and one or more text words from the database image. For example, the first preprocessing unit 130 extracts an image feature from a database image using an affine invariant detector, and the image feature is a 128-DSIFT (Scale-Invariant Feature Transform) descriptor a discriptor may be used. The first preprocessing unit 130 may extract representative values of the cluster by clustering extracted image features, and quantize representative values to obtain visual words. For example, the first preprocessing unit 130 can quantize the descriptor into visual words using the codebook learned by the k-means clustering. Further, when the tag is not mapped to the database image, the first preprocessing unit 130 may generate a tag (text word) for the database image using the visual word.

The first background distribution topic modeling unit 150 may classify at least one visual word obtained from the first preprocessing unit 130 and one or more text words obtained from the tag into a model parameter of a background distribution topic model according to an embodiment of the present invention Can be used for learning and estimation.

More specifically, the first background distribution topic modeling unit 150 may include a calculation unit 153 (not shown) and a model parameter estimation unit 155 (not shown).

The first background distribution topic modeling unit 150 uses the visual word and the text word to generate a visual word distribution

), Subject text word distribution (

) And background visual word distribution (

), The calculation unit 153 can estimate the model parameters of the background distribution topic model including the ratio of the visual words assigned to the subject distribution, the specific visual words ("

), The specific image (

), The ratio of the visual words assigned to the subject distribution is derived by using the probability that a specific subject (t) appears in the visual distribution, and the ratio of visual words assigned to the background distribution, (T) to the text word (t)

), The specific image (

) Using the probability that a particular topic (t) will appear in the text word

) May be assigned to the specific subject (t).

The model parameter estimating unit 155 repeatedly performs the operation in the calculating unit 153 for the database image set stored in the database until the model parameters are converged,

), The subject-specific text word distribution (

) And the visual word distribution of the background (

Can be obtained.

A background distribution topic model according to an embodiment of the present invention will be described with reference to FIG. FIG. 2A shows a probability graph of a conventional corresponding latency allocation (hereinafter referred to as CorrLDA), FIG. 2B shows a probability graph of a background distribution topic model (CTMB) according to an embodiment of the present invention, FIG.

The parameters of the CTMB according to the embodiment of the present invention shown in FIG. 2 (b) are shown in Table 1.

parameter Explanation

Preset Dirichlet Hyper parameters

Bernoulli distribution of word types (visual words / background) in images

Visual word of the background Multinomial distribution

Theme Multinomial distribution of image d

Visual word multinomial distribution of subject t

Text word multinomial distribution of topic t

In Fig. 2 (b), the rectangle represents a replica. D database images and individual database images (

)

And

Text words. In Fig. 2 (b), the shaded node represents the observed variable and the white node represents the latent variable. In CTMB, each database image is modeled as a topic distribution, with each topic modeled as a distribution of visual words of size v and text words of size w. The potential variable z represents a visual subject, and y represents a text subject. In CTMB, image features, ie, visual features of images, are used for text word generation. The CTMB probabilistic model goes through the following steps.

1. For each topic t,

a. Visual word distribution:

~

b. Text word distribution:

~

2. Background distribution:

~

3. For each database image (

, d = 1, ..., D)

a. Word type distribution:

~ Beta (

)

b. Topic distribution:

~

4. For each visual word distribution,

,

a. Switch Sample

~

b.

If so,

I. Visual subject variable

~ Multinomial (

)

Ii. Visual subject word

~ Multinomial (

)

c.

If so,

I. Visual background word

~ Multinomial (

)

5. Each text word

,

a. Text subject variable

~ Unif (

)

b. Text word

~ Multinomial (

)

In the above process, Dirichlet and Multinomial mean Dirichlet distribution and Multinomial distribution, respectively. The multinomial distribution is chosen to establish conjugation with the Dirichlet distributions for the word distribution, thus simplifying computation and ensuring efficient inference. The switch variable s controls the generation of visual words. The image contains two types of visual words, one is the subject distribution Multinomial (

) And the other is the background distribution Multinomial (

) To derive a direct correlation between the visual word and the background. Otherwise, the text theme y corresponds to one of the visual themes z through a uniform distribution, and the text word corresponds to the theme distribution Multinomial (

). Therefore, the use of the CTMB according to an embodiment of the present invention greatly enhances the correlation between visual words and text words.

The goal of the CTMB model is to give a given image corpus

(1) < / RTI > to maximize the model parameter < RTI ID =

).

here,

Lt; RTI ID = 0.0 > {

}. parameter

Wow

,

Represent the distribution of visual subjects, backgrounds, and text subjects, respectively.

Since it is very difficult to estimate the above distribution accurately, an approximate estimation algorithm can be used. In one embodiment of the parameter estimation of the CTMB, the following Monte Carlo EM algorithm can be used. In this algorithm, a squared Gibbs sampling algorithm can be used for sampling of the latent variables z, s, y, which can be calculated as shown in Equations 2 to 4.

More specifically, according to the Monte Carlo EM algorithm, an image corpus formed of one or more visual words (visual words) and one or more text words (text words) is input, and the output is an estimated parameter

. According to the algorithm, first, each parameter is initialized ({

}), And if the number of subjects is K, then k = 1, ..., K, {

} Is given, it is possible to obtain the respective images (" (1) ") from the crushed Gibbs sampling using the following equations (2) ) Can be obtained by the N-Gibbs step. Next, using Equations (6) to (8), {

} Can be completed.

Referring to Equations 2 to 4,

and

Means the number of visual words in the image d assigned to the associated subject word and background word, respectively.

Is the number of visual words in image d assigned to subject t,

The number of times the visual word v, the text word w are assigned to the subject t,

Is the number of times the word v is assigned to the background word distribution of the image corpus. And

Represents all assignments except the current assignment.

The above equation

By separately marginalizing it. As can be observed, the first term in equation (2)

And background distribution

dp represents the percentage of the visual words assigned. The second term in equation (2) is a specific term in a particular subject (t)

), And the second term in Equation (3) represents the probability of the background word. The last term in equation (2)

) Represents the probability that a particular subject (t) will come. Further, equation (4)

Is assigned to the subject t, the last term of which expresses the correspondence with the visual content through the proportion of the visual words assigned to the subject t in one image.

Since all latent variables are calculated from the sampling equation,

Is estimated by examining the posterior distribution. Following several iterative steps, the parameters

. The rear part of the topic-visual word polynomial is calculated as shown in the following equations (5) to (8).

,

to be. therefore,

The

Can be estimated as the post-mortem average. This is simply a normalized dirichlet parameter as follows.

Similar to the estimation in equation (6), the distribution corresponding to the background distribution and the text distribution

Wow

Can be estimated as follows.

Once the model parameters are estimated, the background distribution topic modeling unit 130, 230 according to an embodiment of the present invention prepares to infer the image when a query image is input or when a new image is added to the database. In the inference step, the second term of Equations (2) and (3) and the first term of Equation (4)

&Lt; / RTI >

&Lt; / RTI > can be derived.

The image retrieval framework according to an embodiment of the present invention considers each image independently. This allows the image to be distributed among the different systems to be processed, allowing the framework to scale and billions of images in the database to work in real-world applications.

The first background distribution topic modeling unit 150 uses the background distribution topic model including the estimated model parameters to calculate the subject distribution of the database image (

), And the distribution of thematic text words (

) And the specific theme rate in the image (

) Can be used to calculate the correspondence of the tag to the database image, and the tag can be refined accordingly. The subject distribution modeling of the image is as described above, and the tag is purified as described below.

In the background distribution topic model (CTMB) according to an embodiment of the present invention,

Tags for

(Corresponding probability) of Equation (9) can be calculated according to Equation (9).

Thus, the refinement of a tag that predicts a tag and improves by adding a missing tag is achieved by aligning the relevant tag based on the computed probability. In other words, the probability of an irrelevant tag should be small, but the probability of a missing tag is increased through the extracted topic. here,

Wow

Represent the tag-topic distribution and the topic-document ratio, respectively.

When the query image or the search keyword is input to the search module 200, the second preprocessing unit 230 may acquire one or more visual words from the query image and obtain one or more text words from the search keyword, The operation of the first preprocessing unit 230 is the same as the operation of the first preprocessing unit 130 that preprocesses the tag mapped to the database image and the database image.

The second background distribution topic modeling unit 250 uses a background distribution topic model including the estimated model parameters to calculate the subject distribution (

) Can be modeled. Here, the second background distribution topic modeling unit 250 performs the topic modeling using the model parameters estimated by the first background distribution topic modeling unit 150. The topic modeling of the second background distribution topic modeling unit 250 is performed in the same manner as the operation in the first background distribution topic modeling unit 150 described above.

The first background distribution topic modeling unit 150 extracts the subject of the database image, refines the tag mapped to the database image 170, and the second background distribution topic modeling unit 250 extracts the subject of the query image When the tag is generated by using it, the similarity evaluating unit 270 evaluates the similarity between the database image and the query image or the similarity between the search keyword and the database image by using them.

More specifically, the similarity evaluator 270 evaluates the similarity between the database image (

) For each query image (

)

) Can be calculated according to Equation (10).

In the above formula

And

The database image (

) And query image (

), Respectively,

And

The database image (

) And query image (

) Are two W-dimensional vectors representing each textual information.

Is a parameter for controlling the weight of the preset text similarity, and this value can be set based on the user's basic setting or query type. For example, if the query is a search keyword,

Can be set to one. in this case,

, Where the ith element can be set to 1 if the ith tag appears in the query.

On the other hand, if the search request includes both an image or an image and a search keyword,

And the text representation of the refined tag r can be estimated after applying the CTMB model. In this task, since the text information and the time information have the same importance,

Will be set to 0.5. If the search request (query) contains only the query image without the search keyword, the text portion of the CTMB is excluded, and the subject can be extracted entirely from the image feature.

Finally, the list providing unit 280 (not shown) may arrange the database images in descending order according to the score according to the evaluation result in the similarity evaluating unit 270, and provide them to the user.

Next, referring to FIG. 3, a semantic image search method using topic modeling according to an embodiment of the present invention will be described. For convenience of explanation, it is assumed that a semantic image search according to an embodiment of the present invention is performed in an arbitrary search system. Actually, the image retrieval method according to an embodiment of the present invention can be performed by a plurality of different subjects (modules), and it is noted that each step may be separately performed in offline / online.

Referring to FIG. 3, in the image search method according to an exemplary embodiment of the present invention for searching an image corresponding to a query image or a search keyword from one or more database images stored in a database, One or more visual words and one or more text words are obtained from the image (S100), and visual word and text words are used to obtain a visual word distribution

), Subject text word distribution (

) And background visual word distribution (

(S200). &Lt; / RTI > Next, the search system uses a background distribution topic model containing estimated model parameters to determine the subject distribution of the database image (

Can be modeled (S300). Next, the search system searches for a text word distribution

) And image theme ratio (

) To calculate the correspondence of the tag to the database image, and refine the tag accordingly (S400).

On the other hand, when a search request is received in the search system, the search system can determine whether the search image includes the query image or whether the search keyword is included in the search request (S500). As a result of the determination, when a search request including the query image is input, one or more visual words are obtained from the query image, and the background distribution topic model including the estimated model parameters is used to calculate the subject distribution

(S630), and distributes the subject distribution of the database image

) And the subject distribution of query images in the database (

) May be used to evaluate the similarity of the database image (S700). If a search keyword is included in the search request, the search system may search the tag corresponding to the search keyword among the tags of the refined database image through steps 100 through 300 (S650). In step 700, the degree of similarity between the keyword and the tag can be evaluated. Steps 630 and 650 may be performed simultaneously, and only one step may be performed if the search request includes either a query image or a search keyword.

If the similarity score of the database image similar to at least one of the query image or the search keyword is calculated according to the result of the similarity evaluation of the step 700, the search system can sort and provide the database images in the order of high similarity (S800)

Referring to FIG. 4, in operation 100, the search system extracts an image feature from a database image (S130), extracts a representative value of the cluster by clustering the extracted image feature (S150) By acquiring a visual word by quantization (S170), a visual word can be obtained from the database image.

5, in step 210, the search system determines the ratio of the visual words assigned to the subject distribution, the specific visual words ("

), The specific image (

) Can be used to derive the percentage of visual words assigned to the subject distribution using the probability that a particular topic (t) will appear. Next, in step 230, the retrieval system derives the ratio of the visual words assigned to the background distribution using the ratio of the visual words assigned to the background distribution and the probability of the background words. Then, in step 250, a text word ("

), The specific image (

) Using the probability of a particular topic (t)

) Is assigned to a particular subject (t). The search system repeatedly performs the steps a to c for one or more database images until the parameter values are converged to obtain a normalized dirichlet parameter,

), Subject text word distribution (

) And background visual word distribution (

(S290). The model parameters thus obtained are then used in step 630 to model the subject distribution of the query image.

At step 290, the subject's visual word distribution (

), The subject-specific text word distribution (

) And the visual word distribution of the background (

) Is estimated according to the above-described equations (6) to (8), and in this equation

,

A predetermined dirichlet parameter,

Is the number of times the visual word v is assigned to the subject t,

Is the number of times the text word w is assigned to the subject t,

Means the number of times the visual word v is assigned to the background distribution.

On the other hand, if the degree of similarity evaluation method of step 700 is examined in more detail,

) For each query image (

)

) May be calculated according to the above-described Equation (10), and in Equation (10)

And

The database image (

) And query image (

), Respectively,

And

The database image (

) And query image (

), Respectively,

May be a parameter for controlling the weight of the preset text similarity.

According to an embodiment of the present invention, since the development of the Internet, millions of images uploaded to the Internet can be efficiently retrieved using text or images. The present invention is applicable to web applications and mobile applications, and a user can very easily and simply query related images using text, images, or both, using a smart phone or an Internet browser. The processing of each image can be performed independently, and according to the present invention, it is possible to perform fast image processing, and thus, practical application to an application is possible. In addition, each image is represented by a small vector of visual words and text words, which reduces storage space and reduces network latency.

Some embodiments omitted in this specification are equally applicable if their implementation subject is the same. It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to be exemplary and explanatory only and are not restrictive of the invention, The present invention is not limited to the drawings.

100: Search data generation module
110: Database
130: first pre-
150: first background distribution topic modeling unit
200: Search module
230: second pre-
250: second background distribution topic modeling unit
270:

Claims

1. A method for retrieving an image corresponding to a query image or a search keyword from one or more database images stored in a database,
A preprocessing step of obtaining one or more visual words and one or more text words from the database image to which one or more tags are mapped;
The visual word and the text word are used to determine a visual word distribution

), Subject text word distribution (

) And background visual word distribution (

Estimating a model parameter of a background distribution topic model including the background model;
Using the background distribution topic model including the estimated model parameters,

);
The subject-specific text word distribution (

) And image theme ratio (

Calculating a correspondence degree of the tag with respect to the database image, and refining the tag accordingly;
Acquiring at least one visual word from the query image when a search request including the query image is input and extracting a subject distribution of the query image using the background distribution topic model including the estimated model parameter

);
The subject distribution of the database image (

) And the subject distribution of the query image in the database

) To evaluate the similarity of the database image.

The method according to claim 1,
The pre-
Extracting image features from the database image;
Clustering the extracted image features to extract representative values of clusters;
And quantizing the representative value to obtain a visual word.

The method according to claim 1,
The model parameter estimating step
The percentage of visual words assigned to the subject distribution, the specific visual word (s)

), The specific image (

A) deriving a ratio of visual words assigned to the subject distribution using a probability that a specific subject (t) appears in the subject distribution;
A step b of deriving a ratio of visual words assigned to the background distribution by using the ratio of visual words assigned to the background distribution and the probability of the background word;
In a particular topic (t), a text word

), The specific image (

(C) deriving a probability of being assigned to the specific subject (t);
The step (a) to the step (c) are repeatedly performed on the one or more database images to obtain a distribution of the topic-specific visual words

), The subject-specific text word distribution (

) And the visual word distribution of the background (

&Lt; / RTI >

The method of claim 3,
The model parameter obtaining step
The above-mentioned visual word distribution (

), The subject-specific text word distribution (

) And the visual word distribution of the background (

) Is estimated according to the following equation,
In the above formula

,

A predetermined dirichlet parameter,

Is the number of times the visual word v is assigned to the subject t,

Is the number of times the text word w is assigned to the subject t,

Is the number of times the visual word v is assigned to the background distribution.

The method according to claim 1,
The similarity evaluation step
The database image (

) For each query image (

)

) According to the following equation,
In the above formula

And

(&Quot;

) And the query image (

), Respectively,

And

(&Quot;

) And the query image (

), Respectively,

Is a parameter for controlling a weight of a preset text similarity.

The method according to claim 1,
The similarity evaluation step
Further comprising the step of evaluating a similarity between the refined tag and the search keyword when a search request including the search keyword is input.

The method according to claim 1,
And sorting database images similar to at least one of the query image or the search keyword in descending order of degree of similarity according to the result of the similarity evaluation.

A database that stores one or more database images;
A first preprocessing unit for acquiring one or more visual words and one or more text words from the database image to which one or more tags are mapped;
The visual word and the text word are used to determine a visual word distribution

), Subject text word distribution (

) And background visual word distribution (

), And the above-mentioned text word distribution by topic

) And the specific theme rate in the image (

A first background distribution topic modeling unit for calculating a correspondence degree of the tag to the database image by using the first background distribution topic modeling unit,
A second pre-processing unit for acquiring one or more visual words from the query image when a query image or a search keyword is input, and acquiring one or more text words from the search keyword;
Using the background distribution topic model including the estimated model parameters,

A second background distribution topic modeling unit that models the second background distribution topic modeling unit;
The subject distribution of the database image (

) And the subject distribution of the query image in the database

And a similarity evaluator for evaluating the similarity of the database image using the similarity evaluator.

9. The method of claim 8,
The first pre-
Extracting image features from the database image, clustering the extracted image features to extract representative values of clusters, and quantizing the representative values to obtain visual words.

9. The method of claim 8,
The first background distribution topic modeling unit
The percentage of visual words assigned to the subject distribution, the specific visual word (s)

), The specific image (

) To be assigned to the specific subject (t);
The calculations are repeatedly performed on the one or more database images to calculate a distribution of the topic-specific visual words ("

), The subject-specific text word distribution (

) And the visual word distribution of the background (

And a model parameter estimating unit which obtains the model parameter estimating unit.

9. The method of claim 8,
The model parameter estimator
The above-mentioned visual word distribution (

), The subject-specific text word distribution (

) And the visual word distribution of the background (

) Is estimated according to the following equation,
In the above formula

,

A predetermined dirichlet parameter,

Is the number of times the visual word v is assigned to the subject t,

Is the number of times the text word w is assigned to the subject t,

9. The method of claim 8,
The similarity-
The database image (

) For each query image (

)

) Was calculated according to the following formula,
In the above formula

And

(&Quot;

) And the query image (

), Respectively,

And

(&Quot;

) And the query image (

), Respectively,

Is a parameter for controlling a weight of a preset text similarity.

9. The method of claim 8,
The similarity-
When a search request including the search keyword is input
And evaluates the similarity between the refined tag and the search keyword.

9. The method of claim 8,
And a list providing unit for sorting database images similar to at least one of the query image or the search keyword in order of high similarity according to a result of the similarity evaluation in the similarity evaluating unit.

An image retrieval application stored on a computer readable medium for carrying out a method of any one of claims 1 to 7.