KR20190008699A - Method, system and computer program for semantic image retrieval based on topic modeling - Google Patents

Method, system and computer program for semantic image retrieval based on topic modeling Download PDF

Info

Publication number
KR20190008699A
KR20190008699A KR1020170090390A KR20170090390A KR20190008699A KR 20190008699 A KR20190008699 A KR 20190008699A KR 1020170090390 A KR1020170090390 A KR 1020170090390A KR 20170090390 A KR20170090390 A KR 20170090390A KR 20190008699 A KR20190008699 A KR 20190008699A
Authority
KR
South Korea
Prior art keywords
distribution
image
visual
subject
word
Prior art date
Application number
KR1020170090390A
Other languages
Korean (ko)
Other versions
KR101976081B1 (en
Inventor
이영구
울아 칸 키파야트
알리 무하마드
안 투 구엔
Original Assignee
경희대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 경희대학교 산학협력단 filed Critical 경희대학교 산학협력단
Priority to KR1020170090390A priority Critical patent/KR101976081B1/en
Publication of KR20190008699A publication Critical patent/KR20190008699A/en
Application granted granted Critical
Publication of KR101976081B1 publication Critical patent/KR101976081B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a method, system, and computer program for semantic image retrieval. The method of retrieving an image corresponding to a query image or retrieval keyword in at least one database image stored in a database comprises: a preprocessing step of acquiring at least one visual word and at least one text word from the database image to at least one tag is mapped; a step of estimating a model parameter of a background distribution topic model; a step of modeling subject distribution of the database image; a step of refining the tag; a step of modeling the subject distribution of the query image; and a step of evaluating the similarity of the database image. According to the present invention, the method may efficiently retrieve millions of images using texts or images.

Description

[0001] METHOD, SYSTEM AND COMPUTER PROGRAM FOR SEMANTIC IMAGE RETRIEVAL BASED ON TOPIC MODELING [0002]

The present invention relates to a semantic image searching method and system, and more particularly, to a semantic image searching method and system using background distribution topic modeling.

Recently, the social media network service is growing rapidly due to the development of the Internet. As a result, the amount of multimedia explosively increases, an effective image search system is required, and image annotation is becoming more important due to the necessity of efficient image search according to the explosively increasing web image.

Most image retrieval researches have been conducted mainly by Content-based Image Retrieval (CBIR) methods, which analyze the contents of images. Content-based image retrieval analyzes the content of images using visual features such as color, texture, and shape. This method works well when the number of defined tags is small, but the performance deteriorates as the data set grows and the kinds of tags become various.

Text-based Image Retrieval (TBIR) is a method of retrieving images corresponding to text using text as a query. This method is represented by a text descriptor where the visual content of the image is manually tagged, and is used to perform an image search in a database management system.

Content-based image retrieval is efficient for handling large databases, but has a semantic gap problem between low-dimensional image characteristics and high-dimensional image characteristics. Text-based image retrieval can support queries with a high level of concept, but it is time-consuming to manually tag individual images in large databases. In addition, in the case of a social network service image, a tag having a low relevance to an image is often tagged.

Therefore, in order to effectively search for an explosively increasing web image, there is a need for a semantic search system capable of reducing the above-described semantic gap and improving image tagging performance.

SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems and it is an object of the present invention to provide a method and apparatus for forming a correlation between a visual word, a text word and a background, An object of the present invention is to provide an image retrieval method and system.

It is another object of the present invention to provide an image retrieval method and system capable of predicting omission of tags and eliminating noise.

Another object of the present invention is to provide an image retrieval method and system capable of efficiently and effectively calculating the similarity between a query and a database image.

According to an aspect of the present invention, there is provided a method of retrieving an image corresponding to a query image or a search keyword from one or more database images stored in a database, the method comprising: A preprocessing step of obtaining a text word, a visual word distribution by topic using the visual word and the text word

Figure pat00001
), Subject text word distribution (
Figure pat00002
) And background visual word distribution (
Figure pat00003
Estimating a model parameter of a background distribution topic model that includes the estimated model parameters, using the background distribution topic model including the estimated model parameters,
Figure pat00004
), Modeling the thematic text word distribution (
Figure pat00005
) And image theme ratio (
Figure pat00006
Calculating a degree of correspondence of the tag with respect to the database image and refining the tag according to the degree of correspondence, acquiring one or more visual words from the query image when a search request including the query image is input, Using the background distribution topic model including the estimated model parameters,
Figure pat00007
), Modeling the subject distribution of the database image (
Figure pat00008
) And the subject distribution of the query image in the database
Figure pat00009
And evaluating the degree of similarity of the database image.

The present invention also provides an image retrieval system comprising: a database for storing one or more database images; a first preprocessing unit for obtaining one or more visual words and one or more text words from the database image to which one or more tags are mapped; And the word-by-topic visual word distribution (

Figure pat00010
), Subject text word distribution (
Figure pat00011
) And background visual word distribution (
Figure pat00012
) Of the database image using the background distribution topic model including the estimated model parameters,
Figure pat00013
), And the above-mentioned text word distribution by topic
Figure pat00014
) And the specific theme rate in the image (
Figure pat00015
A first background distribution topic modeling unit for calculating a correspondence degree of the tag with respect to the database image and refining the tag according to the degree of correspondence of the tag with respect to the database image, A second preprocessing unit for acquiring a word and obtaining one or more text words from the search keyword, a second preprocessing unit for acquiring a subject distribution of the query image using the background distribution topic model including the estimated model parameter
Figure pat00016
A second background distribution topic modeling unit for modeling the subject distribution of the database image
Figure pat00017
) And the subject distribution of the query image in the database
Figure pat00018
And a similarity evaluating unit for evaluating the similarity of the database image using the similarity estimating unit.

According to the present invention as described above, the correlation between a visual word, a text word, and a background can be formulated and used for image retrieval, thereby enhancing image retrieval accuracy.

In addition, according to the present invention, it is possible to predict the omission of a tag and remove noise.

Further, according to the present invention, it is possible to calculate the similarity between the query and the database image efficiently and effectively.

1 is a diagram for explaining a configuration of an image search system according to an embodiment of the present invention;
2 is a diagram for explaining a background distribution topic model according to an embodiment of the present invention;
3 is a flowchart illustrating an image search method according to an embodiment of the present invention.
FIG. 4 is a flowchart for explaining the preprocessing step according to an embodiment of the present invention in more detail; FIG.
5 is a flowchart for explaining the model parameter estimating step according to an embodiment of the present invention in more detail.

The above and other objects, features, and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, which are not intended to limit the scope of the present invention. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components, and all combinations described in the specification and claims can be combined in any manner. It is to be understood that, unless the context requires otherwise, references to singular forms may include more than one, and references to singular forms may also include plural forms.

To solve the problem of manual image tagging, an automatic annotation system has been proposed which can automatically deduce and modify tags. In the most noteworthy recent studies, image annotation has been regarded as a sorting task, and the classifier has trained to map visual features to tags. The main disadvantage of this approach, however, is that the annotation is limited to a small number of tag words by well-classified training data. Because the tags on images on the Internet actually contain very differentiated noise, the above limitations make it difficult to use in real-world applications.

Also, the classification based approach works well, but it is not easy to implement. The number of classification classes is increased by the number of words, and in order to increase accuracy, it requires a large number of training image sets. In addition, the correlation between visual characteristics and linguistic concepts is not so simple, which is one of the problems that makes implementation difficult.

There is a Generative Probabilistic Model called Corresponding Latent Dirichlet Allocation (hereinafter referred to as "CorrLDA") as a method for breaking the semantic gap problem and predicting image related keywords. The generation probability model is a model of the view that generates data according to a random process from a certain probability distribution and its parameters. CorrLDA is a model for finding the relationship between the image domain and the potential variable representation of a set of text (words). In order to predict related keywords in the image, we can find potential semantic topics from the concurrency pattern of image content and corresponding text . The CorrLDA model provides a way to learn potential topics from image features and text words. This model derives a direct relationship between visual and textual topics, using the degree of correspondence between visual features and text words through document-to-subject ratios.

The CorrLDA model has the advantage that the correlation between visual and textual topics is explicitly exploited through potential topics, and as a result, the semantic content of images can be extracted effectively, and the ability to solve multiple labeling problems directly Can be effectively utilized in real applications where the data set is updated dynamically.

The present invention is designed to solve the problem that CorrLDA is not sufficiently utilized in retrieval work and is characterized by extending the CorrLDA model using a new concept related to background words.

The Latent Dirichlet Allocation (LDA), which is a prototype of the CorrLDA model described above, is a probabilistic model of what subjects are present in each document for a given document. By analyzing the distribution of the number of words found in a given document based on the distribution of the number of words per topic known in advance, it is possible to predict which topics the document will deal with.

Similarly, it can be assumed that the database image and the query image used in one embodiment of the present invention have several subjects, and the subjects follow the Dirichlet distribution. That is, an image can be viewed as a set of one or more visual words, as a potential Dirichlet allocation would see the document as a set of words.

In other words, a topic is a probability distribution of visual words contained in an image, and consists of semantically similar or related words. In the present invention, each image is represented by a stochastic mixture of subjects, and each subject is represented by a distribution of visual words and text words (tags).

In one embodiment of the present invention, an image feature may be defined in the form of a visaul word, which can be obtained through a process of extracting feature information from a database image or a query image and then digitizing it. The image corpus according to one embodiment of the present invention thus generated can be expressed by a Bag of visual word and a Bag of text word. The visual vocabulary can be understood as a result of extracting image feature information from an image, such as SIFT, clustering the extracted image feature information to find a visual word, and then expressing each image with a histogram of visual words. A vocabulary can be understood as a set of one or more tags that are manually attached to each image.

In the present specification, 'at least one visual word' means a visual word book, and 'one or more text words' means a text word book. In addition, 'text word' and 'tag' refer to the same object, 'tag' means a word that is manually mapped to an image by a user at a collection site such as the web or a social network service, Can be understood as a more comprehensive concept as opposed to a 'word'.

Meanwhile, the image retrieval method according to an embodiment of the present invention uses a concept of removing a 'background word' which is a visual word narrow or not meaningful, which is generally included in the background of an image. Since background words are included in almost all images but they are not useful and noise can be measured in the similarity measurement in image search, it is possible to improve the accuracy of image search through topic modeling considering background words.

Further, the present invention proposes a score system for evaluating the similarity between a query and an image stored in a database in extracting a subject and refining a tag.

According to one embodiment of the present invention, visual words can be classified into a subject word and a background word. Therefore, in the present invention, each image can be represented by a visual subject distribution, a background distribution, and a text subject distribution.

Hereinafter, a method of searching for a semantic image based on a topic modeling according to an embodiment of the present invention will be described with reference to the drawings.

1 is a diagram illustrating an image retrieval system according to an embodiment of the present invention.

1, an image search system according to an exemplary embodiment of the present invention may include a search data generation module 100 and a search module 200. The search data generation module 100 may include a database 110, A first pre-processing unit 130 and a first background distribution topic modeling unit 150. The search module 200 may include a second pre-processing unit 230, a second background distribution topic modeling unit 250, And an evaluation unit 270.

According to an embodiment, the search data generation module 100 may be executed off-line, and the search module 200 may be executed on-line.

The database 110 of the search data generation module 100 stores one or more database images. The database image may be an image collected from the web or an image stored by a user, and the image may be one or more tags mapped.

In recent social network services, hash tags (# tags) are described in an image so as to be able to explain the characteristics of the images, and the images related to the tags can be easily retrieved using the tags. Depending on the characteristics of the web gathered image, the collected images may be stored in the database together with the tags mapped to the images.

One or more database images stored in the database may be transferred to the first preprocessing unit 130. The first preprocessor 130 preprocesses the database image to obtain one or more visual words and one or more text words from the database image. For example, the first preprocessing unit 130 extracts an image feature from a database image using an affine invariant detector, and the image feature is a 128-DSIFT (Scale-Invariant Feature Transform) descriptor a discriptor may be used. The first preprocessing unit 130 may extract representative values of the cluster by clustering extracted image features, and quantize representative values to obtain visual words. For example, the first preprocessing unit 130 can quantize the descriptor into visual words using the codebook learned by the k-means clustering. Further, when the tag is not mapped to the database image, the first preprocessing unit 130 may generate a tag (text word) for the database image using the visual word.

The first background distribution topic modeling unit 150 may classify at least one visual word obtained from the first preprocessing unit 130 and one or more text words obtained from the tag into a model parameter of a background distribution topic model according to an embodiment of the present invention Can be used for learning and estimation.

More specifically, the first background distribution topic modeling unit 150 may include a calculation unit 153 (not shown) and a model parameter estimation unit 155 (not shown).

The first background distribution topic modeling unit 150 uses the visual word and the text word to generate a visual word distribution

Figure pat00019
), Subject text word distribution (
Figure pat00020
) And background visual word distribution (
Figure pat00021
), The calculation unit 153 can estimate the model parameters of the background distribution topic model including the ratio of the visual words assigned to the subject distribution, the specific visual words ("
Figure pat00022
), The specific image (
Figure pat00023
), The ratio of the visual words assigned to the subject distribution is derived by using the probability that a specific subject (t) appears in the visual distribution, and the ratio of visual words assigned to the background distribution, (T) to the text word (t)
Figure pat00024
), The specific image (
Figure pat00025
) Using the probability that a particular topic (t) will appear in the text word
Figure pat00026
) May be assigned to the specific subject (t).

The model parameter estimating unit 155 repeatedly performs the operation in the calculating unit 153 for the database image set stored in the database until the model parameters are converged,

Figure pat00027
), The subject-specific text word distribution (
Figure pat00028
) And the visual word distribution of the background (
Figure pat00029
Can be obtained.

A background distribution topic model according to an embodiment of the present invention will be described with reference to FIG. FIG. 2A shows a probability graph of a conventional corresponding latency allocation (hereinafter referred to as CorrLDA), FIG. 2B shows a probability graph of a background distribution topic model (CTMB) according to an embodiment of the present invention, FIG.

The parameters of the CTMB according to the embodiment of the present invention shown in FIG. 2 (b) are shown in Table 1.

parameter Explanation

Figure pat00030
Preset Dirichlet Hyper parameters
Figure pat00031
Bernoulli distribution of word types (visual words / background) in images
Figure pat00032
Visual word of the background Multinomial distribution
Figure pat00033
Theme Multinomial distribution of image d
Figure pat00034
Visual word multinomial distribution of subject t
Figure pat00035
Text word multinomial distribution of topic t

In Fig. 2 (b), the rectangle represents a replica. D database images and individual database images (

Figure pat00036
)
Figure pat00037
And
Figure pat00038
Text words. In Fig. 2 (b), the shaded node represents the observed variable and the white node represents the latent variable. In CTMB, each database image is modeled as a topic distribution, with each topic modeled as a distribution of visual words of size v and text words of size w. The potential variable z represents a visual subject, and y represents a text subject. In CTMB, image features, ie, visual features of images, are used for text word generation. The CTMB probabilistic model goes through the following steps.

1. For each topic t,

a. Visual word distribution:

Figure pat00039
~
Figure pat00040

b. Text word distribution:

Figure pat00041
~
Figure pat00042

2. Background distribution:

Figure pat00043
~
Figure pat00044

3. For each database image (

Figure pat00045
, d = 1, ..., D)

a. Word type distribution:

Figure pat00046
~ Beta (
Figure pat00047
)

b. Topic distribution:

Figure pat00048
~
Figure pat00049

4. For each visual word distribution,

Figure pat00050
,
Figure pat00051

a. Switch Sample

Figure pat00052
~
Figure pat00053

b.

Figure pat00054
If so,

I. Visual subject variable

Figure pat00055
~ Multinomial (
Figure pat00056
)

Ii. Visual subject word

Figure pat00057
~ Multinomial (
Figure pat00058
)

c.

Figure pat00059
If so,

I. Visual background word

Figure pat00060
~ Multinomial (
Figure pat00061
)

5. Each text word

Figure pat00062
,
Figure pat00063

a. Text subject variable

Figure pat00064
~ Unif (
Figure pat00065
)

b. Text word

Figure pat00066
~ Multinomial (
Figure pat00067
)

In the above process, Dirichlet and Multinomial mean Dirichlet distribution and Multinomial distribution, respectively. The multinomial distribution is chosen to establish conjugation with the Dirichlet distributions for the word distribution, thus simplifying computation and ensuring efficient inference. The switch variable s controls the generation of visual words. The image contains two types of visual words, one is the subject distribution Multinomial (

Figure pat00068
) And the other is the background distribution Multinomial (
Figure pat00069
) To derive a direct correlation between the visual word and the background. Otherwise, the text theme y corresponds to one of the visual themes z through a uniform distribution, and the text word corresponds to the theme distribution Multinomial (
Figure pat00070
). Therefore, the use of the CTMB according to an embodiment of the present invention greatly enhances the correlation between visual words and text words.

The goal of the CTMB model is to give a given image corpus

Figure pat00071
(1) < / RTI > to maximize the model parameter < RTI ID =
Figure pat00072
).

Figure pat00073

here,

Figure pat00074
Lt; RTI ID = 0.0 > {
Figure pat00075
}. parameter
Figure pat00076
Wow
Figure pat00077
,
Figure pat00078
Represent the distribution of visual subjects, backgrounds, and text subjects, respectively.

Since it is very difficult to estimate the above distribution accurately, an approximate estimation algorithm can be used. In one embodiment of the parameter estimation of the CTMB, the following Monte Carlo EM algorithm can be used. In this algorithm, a squared Gibbs sampling algorithm can be used for sampling of the latent variables z, s, y, which can be calculated as shown in Equations 2 to 4.

More specifically, according to the Monte Carlo EM algorithm, an image corpus formed of one or more visual words (visual words) and one or more text words (text words) is input, and the output is an estimated parameter

Figure pat00079
. According to the algorithm, first, each parameter is initialized ({
Figure pat00080
}), And if the number of subjects is K, then k = 1, ..., K, {
Figure pat00081
} Is given, it is possible to obtain the respective images (" (1) ") from the crushed Gibbs sampling using the following equations (2) ) Can be obtained by the N-Gibbs step. Next, using Equations (6) to (8), {
Figure pat00083
} Can be completed.

Figure pat00084

Figure pat00085

Figure pat00086

Referring to Equations 2 to 4,

Figure pat00087
and
Figure pat00088
Means the number of visual words in the image d assigned to the associated subject word and background word, respectively.
Figure pat00089
Is the number of visual words in image d assigned to subject t,
Figure pat00090
The number of times the visual word v, the text word w are assigned to the subject t,
Figure pat00091
Is the number of times the word v is assigned to the background word distribution of the image corpus. And
Figure pat00092
Represents all assignments except the current assignment.

The above equation

Figure pat00093
By separately marginalizing it. As can be observed, the first term in equation (2)
Figure pat00094
And background distribution
Figure pat00095
dp represents the percentage of the visual words assigned. The second term in equation (2) is a specific term in a particular subject (t)
Figure pat00096
), And the second term in Equation (3) represents the probability of the background word. The last term in equation (2)
Figure pat00097
) Represents the probability that a particular subject (t) will come. Further, equation (4)
Figure pat00098
Is assigned to the subject t, the last term of which expresses the correspondence with the visual content through the proportion of the visual words assigned to the subject t in one image.

Since all latent variables are calculated from the sampling equation,

Figure pat00099
Is estimated by examining the posterior distribution. Following several iterative steps, the parameters
Figure pat00100
. The rear part of the topic-visual word polynomial is calculated as shown in the following equations (5) to (8).

Figure pat00101

Figure pat00102
,
Figure pat00103
,
Figure pat00104
to be. therefore,
Figure pat00105
The
Figure pat00106
Can be estimated as the post-mortem average. This is simply a normalized dirichlet parameter as follows.

Figure pat00107

Similar to the estimation in equation (6), the distribution corresponding to the background distribution and the text distribution

Figure pat00108
Wow
Figure pat00109
Can be estimated as follows.

Figure pat00110

Figure pat00111

Once the model parameters are estimated, the background distribution topic modeling unit 130, 230 according to an embodiment of the present invention prepares to infer the image when a query image is input or when a new image is added to the database. In the inference step, the second term of Equations (2) and (3) and the first term of Equation (4)

Figure pat00112
≪ / RTI >
Figure pat00113
≪ / RTI > can be derived.

The image retrieval framework according to an embodiment of the present invention considers each image independently. This allows the image to be distributed among the different systems to be processed, allowing the framework to scale and billions of images in the database to work in real-world applications.

The first background distribution topic modeling unit 150 uses the background distribution topic model including the estimated model parameters to calculate the subject distribution of the database image (

Figure pat00114
), And the distribution of thematic text words (
Figure pat00115
) And the specific theme rate in the image (
Figure pat00116
) Can be used to calculate the correspondence of the tag to the database image, and the tag can be refined accordingly. The subject distribution modeling of the image is as described above, and the tag is purified as described below.

In the background distribution topic model (CTMB) according to an embodiment of the present invention,

Figure pat00117
Tags for
Figure pat00118
(Corresponding probability) of Equation (9) can be calculated according to Equation (9).

Figure pat00119

Thus, the refinement of a tag that predicts a tag and improves by adding a missing tag is achieved by aligning the relevant tag based on the computed probability. In other words, the probability of an irrelevant tag should be small, but the probability of a missing tag is increased through the extracted topic. here,

Figure pat00120
Wow
Figure pat00121
Represent the tag-topic distribution and the topic-document ratio, respectively.

When the query image or the search keyword is input to the search module 200, the second preprocessing unit 230 may acquire one or more visual words from the query image and obtain one or more text words from the search keyword, The operation of the first preprocessing unit 230 is the same as the operation of the first preprocessing unit 130 that preprocesses the tag mapped to the database image and the database image.

The second background distribution topic modeling unit 250 uses a background distribution topic model including the estimated model parameters to calculate the subject distribution (

Figure pat00122
) Can be modeled. Here, the second background distribution topic modeling unit 250 performs the topic modeling using the model parameters estimated by the first background distribution topic modeling unit 150. The topic modeling of the second background distribution topic modeling unit 250 is performed in the same manner as the operation in the first background distribution topic modeling unit 150 described above.

The first background distribution topic modeling unit 150 extracts the subject of the database image, refines the tag mapped to the database image 170, and the second background distribution topic modeling unit 250 extracts the subject of the query image When the tag is generated by using it, the similarity evaluating unit 270 evaluates the similarity between the database image and the query image or the similarity between the search keyword and the database image by using them.

More specifically, the similarity evaluator 270 evaluates the similarity between the database image (

Figure pat00123
) For each query image (
Figure pat00124
)
Figure pat00125
) Can be calculated according to Equation (10).

Figure pat00126

In the above formula

Figure pat00127
And
Figure pat00128
The database image (
Figure pat00129
) And query image (
Figure pat00130
), Respectively,
Figure pat00131
And
Figure pat00132
The database image (
Figure pat00133
) And query image (
Figure pat00134
) Are two W-dimensional vectors representing each textual information.
Figure pat00135
Is a parameter for controlling the weight of the preset text similarity, and this value can be set based on the user's basic setting or query type. For example, if the query is a search keyword,
Figure pat00136
Can be set to one. in this case,
Figure pat00137
, Where the ith element can be set to 1 if the ith tag appears in the query.

On the other hand, if the search request includes both an image or an image and a search keyword,

Figure pat00138
And the text representation of the refined tag r can be estimated after applying the CTMB model. In this task, since the text information and the time information have the same importance,
Figure pat00139
Will be set to 0.5. If the search request (query) contains only the query image without the search keyword, the text portion of the CTMB is excluded, and the subject can be extracted entirely from the image feature.

Finally, the list providing unit 280 (not shown) may arrange the database images in descending order according to the score according to the evaluation result in the similarity evaluating unit 270, and provide them to the user.

Next, referring to FIG. 3, a semantic image search method using topic modeling according to an embodiment of the present invention will be described. For convenience of explanation, it is assumed that a semantic image search according to an embodiment of the present invention is performed in an arbitrary search system. Actually, the image retrieval method according to an embodiment of the present invention can be performed by a plurality of different subjects (modules), and it is noted that each step may be separately performed in offline / online.

Referring to FIG. 3, in the image search method according to an exemplary embodiment of the present invention for searching an image corresponding to a query image or a search keyword from one or more database images stored in a database, One or more visual words and one or more text words are obtained from the image (S100), and visual word and text words are used to obtain a visual word distribution

Figure pat00140
), Subject text word distribution (
Figure pat00141
) And background visual word distribution (
Figure pat00142
(S200). ≪ / RTI > Next, the search system uses a background distribution topic model containing estimated model parameters to determine the subject distribution of the database image (
Figure pat00143
Can be modeled (S300). Next, the search system searches for a text word distribution
Figure pat00144
) And image theme ratio (
Figure pat00145
) To calculate the correspondence of the tag to the database image, and refine the tag accordingly (S400).

On the other hand, when a search request is received in the search system, the search system can determine whether the search image includes the query image or whether the search keyword is included in the search request (S500). As a result of the determination, when a search request including the query image is input, one or more visual words are obtained from the query image, and the background distribution topic model including the estimated model parameters is used to calculate the subject distribution

Figure pat00146
(S630), and distributes the subject distribution of the database image
Figure pat00147
) And the subject distribution of query images in the database (
Figure pat00148
) May be used to evaluate the similarity of the database image (S700). If a search keyword is included in the search request, the search system may search the tag corresponding to the search keyword among the tags of the refined database image through steps 100 through 300 (S650). In step 700, the degree of similarity between the keyword and the tag can be evaluated. Steps 630 and 650 may be performed simultaneously, and only one step may be performed if the search request includes either a query image or a search keyword.

If the similarity score of the database image similar to at least one of the query image or the search keyword is calculated according to the result of the similarity evaluation of the step 700, the search system can sort and provide the database images in the order of high similarity (S800)

Referring to FIG. 4, in operation 100, the search system extracts an image feature from a database image (S130), extracts a representative value of the cluster by clustering the extracted image feature (S150) By acquiring a visual word by quantization (S170), a visual word can be obtained from the database image.

5, in step 210, the search system determines the ratio of the visual words assigned to the subject distribution, the specific visual words ("

Figure pat00149
), The specific image (
Figure pat00150
) Can be used to derive the percentage of visual words assigned to the subject distribution using the probability that a particular topic (t) will appear. Next, in step 230, the retrieval system derives the ratio of the visual words assigned to the background distribution using the ratio of the visual words assigned to the background distribution and the probability of the background words. Then, in step 250, a text word ("
Figure pat00151
), The specific image (
Figure pat00152
) Using the probability of a particular topic (t)
Figure pat00153
) Is assigned to a particular subject (t). The search system repeatedly performs the steps a to c for one or more database images until the parameter values are converged to obtain a normalized dirichlet parameter,
Figure pat00154
), Subject text word distribution (
Figure pat00155
) And background visual word distribution (
Figure pat00156
(S290). The model parameters thus obtained are then used in step 630 to model the subject distribution of the query image.

At step 290, the subject's visual word distribution (

Figure pat00157
), The subject-specific text word distribution (
Figure pat00158
) And the visual word distribution of the background (
Figure pat00159
) Is estimated according to the above-described equations (6) to (8), and in this equation
Figure pat00160
,
Figure pat00161
,
Figure pat00162
A predetermined dirichlet parameter,
Figure pat00163
Is the number of times the visual word v is assigned to the subject t,
Figure pat00164
Is the number of times the text word w is assigned to the subject t,
Figure pat00165
Means the number of times the visual word v is assigned to the background distribution.

On the other hand, if the degree of similarity evaluation method of step 700 is examined in more detail,

Figure pat00166
) For each query image (
Figure pat00167
)
Figure pat00168
) May be calculated according to the above-described Equation (10), and in Equation (10)
Figure pat00169
And
Figure pat00170
The database image (
Figure pat00171
) And query image (
Figure pat00172
), Respectively,
Figure pat00173
And
Figure pat00174
The database image (
Figure pat00175
) And query image (
Figure pat00176
), Respectively,
Figure pat00177
May be a parameter for controlling the weight of the preset text similarity.

According to an embodiment of the present invention, since the development of the Internet, millions of images uploaded to the Internet can be efficiently retrieved using text or images. The present invention is applicable to web applications and mobile applications, and a user can very easily and simply query related images using text, images, or both, using a smart phone or an Internet browser. The processing of each image can be performed independently, and according to the present invention, it is possible to perform fast image processing, and thus, practical application to an application is possible. In addition, each image is represented by a small vector of visual words and text words, which reduces storage space and reduces network latency.

Some embodiments omitted in this specification are equally applicable if their implementation subject is the same. It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to be exemplary and explanatory only and are not restrictive of the invention, The present invention is not limited to the drawings.

100: Search data generation module
110: Database
130: first pre-
150: first background distribution topic modeling unit
200: Search module
230: second pre-
250: second background distribution topic modeling unit
270:

Claims (15)

1. A method for retrieving an image corresponding to a query image or a search keyword from one or more database images stored in a database,
A preprocessing step of obtaining one or more visual words and one or more text words from the database image to which one or more tags are mapped;
The visual word and the text word are used to determine a visual word distribution
Figure pat00178
), Subject text word distribution (
Figure pat00179
) And background visual word distribution (
Figure pat00180
Estimating a model parameter of a background distribution topic model including the background model;
Using the background distribution topic model including the estimated model parameters,
Figure pat00181
);
The subject-specific text word distribution (
Figure pat00182
) And image theme ratio (
Figure pat00183
Calculating a correspondence degree of the tag with respect to the database image, and refining the tag accordingly;
Acquiring at least one visual word from the query image when a search request including the query image is input and extracting a subject distribution of the query image using the background distribution topic model including the estimated model parameter
Figure pat00184
);
The subject distribution of the database image (
Figure pat00185
) And the subject distribution of the query image in the database
Figure pat00186
) To evaluate the similarity of the database image.
The method according to claim 1,
The pre-
Extracting image features from the database image;
Clustering the extracted image features to extract representative values of clusters;
And quantizing the representative value to obtain a visual word.
The method according to claim 1,
The model parameter estimating step
The percentage of visual words assigned to the subject distribution, the specific visual word (s)
Figure pat00187
), The specific image (
Figure pat00188
A) deriving a ratio of visual words assigned to the subject distribution using a probability that a specific subject (t) appears in the subject distribution;
A step b of deriving a ratio of visual words assigned to the background distribution by using the ratio of visual words assigned to the background distribution and the probability of the background word;
In a particular topic (t), a text word
Figure pat00189
), The specific image (
Figure pat00190
) Using the probability that a particular topic (t) will appear in the text word
Figure pat00191
(C) deriving a probability of being assigned to the specific subject (t);
The step (a) to the step (c) are repeatedly performed on the one or more database images to obtain a distribution of the topic-specific visual words
Figure pat00192
), The subject-specific text word distribution (
Figure pat00193
) And the visual word distribution of the background (
Figure pat00194
≪ / RTI >
The method of claim 3,
The model parameter obtaining step
The above-mentioned visual word distribution (
Figure pat00195
), The subject-specific text word distribution (
Figure pat00196
) And the visual word distribution of the background (
Figure pat00197
) Is estimated according to the following equation,
In the above formula
Figure pat00198
,
Figure pat00199
,
Figure pat00200
A predetermined dirichlet parameter,
Figure pat00201
Is the number of times the visual word v is assigned to the subject t,
Figure pat00202
Is the number of times the text word w is assigned to the subject t,
Figure pat00203
Is the number of times the visual word v is assigned to the background distribution.
Figure pat00204

Figure pat00205

Figure pat00206

The method according to claim 1,
The similarity evaluation step
The database image (
Figure pat00207
) For each query image (
Figure pat00208
)
Figure pat00209
) According to the following equation,
In the above formula
Figure pat00210
And
Figure pat00211
(&Quot;
Figure pat00212
) And the query image (
Figure pat00213
), Respectively,
Figure pat00214
And
Figure pat00215
(&Quot;
Figure pat00216
) And the query image (
Figure pat00217
), Respectively,
Figure pat00218
Is a parameter for controlling a weight of a preset text similarity.
Figure pat00219

The method according to claim 1,
The similarity evaluation step
Further comprising the step of evaluating a similarity between the refined tag and the search keyword when a search request including the search keyword is input.
The method according to claim 1,
And sorting database images similar to at least one of the query image or the search keyword in descending order of degree of similarity according to the result of the similarity evaluation.
A database that stores one or more database images;
A first preprocessing unit for acquiring one or more visual words and one or more text words from the database image to which one or more tags are mapped;
The visual word and the text word are used to determine a visual word distribution
Figure pat00220
), Subject text word distribution (
Figure pat00221
) And background visual word distribution (
Figure pat00222
) Of the database image using the background distribution topic model including the estimated model parameters,
Figure pat00223
), And the above-mentioned text word distribution by topic
Figure pat00224
) And the specific theme rate in the image (
Figure pat00225
A first background distribution topic modeling unit for calculating a correspondence degree of the tag to the database image by using the first background distribution topic modeling unit,
A second pre-processing unit for acquiring one or more visual words from the query image when a query image or a search keyword is input, and acquiring one or more text words from the search keyword;
Using the background distribution topic model including the estimated model parameters,
Figure pat00226
A second background distribution topic modeling unit that models the second background distribution topic modeling unit;
The subject distribution of the database image (
Figure pat00227
) And the subject distribution of the query image in the database
Figure pat00228
And a similarity evaluator for evaluating the similarity of the database image using the similarity evaluator.
9. The method of claim 8,
The first pre-
Extracting image features from the database image, clustering the extracted image features to extract representative values of clusters, and quantizing the representative values to obtain visual words.
9. The method of claim 8,
The first background distribution topic modeling unit
The percentage of visual words assigned to the subject distribution, the specific visual word (s)
Figure pat00229
), The specific image (
Figure pat00230
), The ratio of the visual words assigned to the subject distribution is derived by using the probability that a specific subject (t) appears in the visual distribution, and the ratio of visual words assigned to the background distribution, (T) to the text word (t)
Figure pat00231
), The specific image (
Figure pat00232
) Using the probability that a particular topic (t) will appear in the text word
Figure pat00233
) To be assigned to the specific subject (t);
The calculations are repeatedly performed on the one or more database images to calculate a distribution of the topic-specific visual words ("
Figure pat00234
), The subject-specific text word distribution (
Figure pat00235
) And the visual word distribution of the background (
Figure pat00236
And a model parameter estimating unit which obtains the model parameter estimating unit.
9. The method of claim 8,
The model parameter estimator
The above-mentioned visual word distribution (
Figure pat00237
), The subject-specific text word distribution (
Figure pat00238
) And the visual word distribution of the background (
Figure pat00239
) Is estimated according to the following equation,
In the above formula
Figure pat00240
,
Figure pat00241
,
Figure pat00242
A predetermined dirichlet parameter,
Figure pat00243
Is the number of times the visual word v is assigned to the subject t,
Figure pat00244
Is the number of times the text word w is assigned to the subject t,
Figure pat00245
Is the number of times the visual word v is assigned to the background distribution.
Figure pat00246

Figure pat00247

Figure pat00248

9. The method of claim 8,
The similarity-
The database image (
Figure pat00249
) For each query image (
Figure pat00250
)
Figure pat00251
) Was calculated according to the following formula,
In the above formula
Figure pat00252
And
Figure pat00253
(&Quot;
Figure pat00254
) And the query image (
Figure pat00255
), Respectively,
Figure pat00256
And
Figure pat00257
(&Quot;
Figure pat00258
) And the query image (
Figure pat00259
), Respectively,
Figure pat00260
Is a parameter for controlling a weight of a preset text similarity.
Figure pat00261

9. The method of claim 8,
The similarity-
When a search request including the search keyword is input
And evaluates the similarity between the refined tag and the search keyword.
9. The method of claim 8,
And a list providing unit for sorting database images similar to at least one of the query image or the search keyword in order of high similarity according to a result of the similarity evaluation in the similarity evaluating unit.
An image retrieval application stored on a computer readable medium for carrying out a method of any one of claims 1 to 7.
KR1020170090390A 2017-07-17 2017-07-17 Method, system and computer program for semantic image retrieval based on topic modeling KR101976081B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020170090390A KR101976081B1 (en) 2017-07-17 2017-07-17 Method, system and computer program for semantic image retrieval based on topic modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020170090390A KR101976081B1 (en) 2017-07-17 2017-07-17 Method, system and computer program for semantic image retrieval based on topic modeling

Publications (2)

Publication Number Publication Date
KR20190008699A true KR20190008699A (en) 2019-01-25
KR101976081B1 KR101976081B1 (en) 2019-08-28

Family

ID=65280573

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020170090390A KR101976081B1 (en) 2017-07-17 2017-07-17 Method, system and computer program for semantic image retrieval based on topic modeling

Country Status (1)

Country Link
KR (1) KR101976081B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200114708A (en) * 2019-03-29 2020-10-07 경북대학교 산학협력단 Electronic device, image searching system and controlling method thereof
CN113343679A (en) * 2021-07-06 2021-09-03 合肥工业大学 Multi-modal topic mining method based on label constraint
KR20210123119A (en) * 2020-04-02 2021-10-13 네이버 주식회사 Method and system for retrieving associative image through multimodality ranking model using different modal features
WO2022057419A1 (en) * 2020-09-21 2022-03-24 Oppo广东移动通信有限公司 Method and apparatus for acquiring information related to subject, and storage medium and electronic device
WO2022085823A1 (en) * 2020-10-22 2022-04-28 주식회사 데이타솔루션 Device and method for generating positioning map using topic modeling technique

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101255841B1 (en) * 2011-01-06 2013-04-23 서울대학교산학협력단 Method and system for associative image search based on bi-source topic model

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101255841B1 (en) * 2011-01-06 2013-04-23 서울대학교산학협력단 Method and system for associative image search based on bi-source topic model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Nguyen Anh Tu 등, "Topic modeling and improvement of image representation for large-scale image retrieval", Information Sciences, Volume 366, Pages 99-120, 20 October 2016* *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200114708A (en) * 2019-03-29 2020-10-07 경북대학교 산학협력단 Electronic device, image searching system and controlling method thereof
KR20210123119A (en) * 2020-04-02 2021-10-13 네이버 주식회사 Method and system for retrieving associative image through multimodality ranking model using different modal features
WO2022057419A1 (en) * 2020-09-21 2022-03-24 Oppo广东移动通信有限公司 Method and apparatus for acquiring information related to subject, and storage medium and electronic device
WO2022085823A1 (en) * 2020-10-22 2022-04-28 주식회사 데이타솔루션 Device and method for generating positioning map using topic modeling technique
CN113343679A (en) * 2021-07-06 2021-09-03 合肥工业大学 Multi-modal topic mining method based on label constraint
CN113343679B (en) * 2021-07-06 2024-02-13 合肥工业大学 Multi-mode subject mining method based on label constraint

Also Published As

Publication number Publication date
KR101976081B1 (en) 2019-08-28

Similar Documents

Publication Publication Date Title
CN106815297B (en) Academic resource recommendation service system and method
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
KR101976081B1 (en) Method, system and computer program for semantic image retrieval based on topic modeling
US9589208B2 (en) Retrieval of similar images to a query image
USRE47340E1 (en) Image retrieval apparatus
US8150170B2 (en) Statistical approach to large-scale image annotation
CN108280114B (en) Deep learning-based user literature reading interest analysis method
US10482146B2 (en) Systems and methods for automatic customization of content filtering
US20180341686A1 (en) System and method for data search based on top-to-bottom similarity analysis
CN112559684A (en) Keyword extraction and information retrieval method
CN110543595A (en) in-station search system and method
Hidayat et al. Automatic text summarization using latent Drichlet allocation (LDA) for document clustering
CN112052356A (en) Multimedia classification method, apparatus and computer-readable storage medium
CN103778206A (en) Method for providing network service resources
TW201243627A (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
CN115827990B (en) Searching method and device
CN117349406A (en) Patent information retrieval system and method based on big data
Tian et al. Automatic image annotation with real-world community contributed data set
Sowmyayani et al. STHARNet: Spatio-temporal human action recognition network in content based video retrieval
CN111767404B (en) Event mining method and device
Su et al. Parallel big image data retrieval by conceptualised clustering and un-conceptualised clustering
CN115374781A (en) Text data information mining method, device and equipment
CN114969375A (en) Method and system for giving artificial intelligence learning to machine based on psychological knowledge
CN113704617A (en) Article recommendation method, system, electronic device and storage medium
Nguyen et al. Pagerank-based approach on ranking social events: a case study with flickr

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E90F Notification of reason for final refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant