CN103559191B - Based on latent space study and Bidirectional sort study across media sort method - Google Patents
Based on latent space study and Bidirectional sort study across media sort method Download PDFInfo
- Publication number
- CN103559191B CN103559191B CN201310410565.2A CN201310410565A CN103559191B CN 103559191 B CN103559191 B CN 103559191B CN 201310410565 A CN201310410565 A CN 201310410565A CN 103559191 B CN103559191 B CN 103559191B
- Authority
- CN
- China
- Prior art keywords
- image
- text
- study
- retrieval
- media
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
Abstract
The invention discloses a kind of based on latent space study and Bidirectional sort study across media sort method.Comprise the steps: 1) ordered samples of text retrieval image and the ordered samples unification of image retrieval text are configured to training sample;2) training sample obtaining structure carries out learning based on latent space study and sorting across media of Bidirectional sort study, obtains semantic information of multimedia space and across media order models;3) what use study obtained carries out sorting across media across media order models.The present invention can be applied not only to text retrieval image and image retrieval text, and owing to being modeled two retrieval directions simultaneously, the semantic understanding ability of the retrieval model obtained is higher, and retrieval precision is more preferable compared with the method only considering unidirectional sequence study.
Description
Technical field
The present invention designs cross-media retrieval, particularly relates to a kind of based on latent space study and two-way row
Sequence study across media sort method.
Background technology
Image is the most common file type, and it has certain semanteme.In general,
Image is made up of pixel one by one, and computer can not directly understand the language that image is contained
Justice information.Along with multimedia technology and the development of network technology, increasing image emerges
Come.It is interested that retrieval technique can help user quickly to find oneself in the data of magnanimity
Content, becomes field the most important in Computer Applied Technology.Traditional retrieval technique,
Either retrieval based on key word is also based on the retrieval of content, all can not meet use well
Family is wished by text retrieval image or the demand of image retrieval text.Retrieval based on key word
In system, need in advance image to be labeled.But the amount of images owing to presently, there are is huge
Greatly, therefore annotation process quantities is vast and numerous, and owing to marked content is inevitably marked
The impact of note person's subjective factors, for same image, different mark persons may mark not
Same key word, therefore key word tends not to objectively respond whole semantemes that image is contained.
Content-based retrieval system then need not be labeled image, and user submits a retrieval sample to
Image is retrieved by example, but traditional content-based retrieval technology two weakness of existence:
One is the media object that user can only retrieve mode identical with inquiring about example, can only be examined by image
Rope image;Two are the low-level image feature of image and high-level semantic exists semantic gap and therefore retrieves performance
It is restricted.In order to cross over the semantic gap between different modalities data, it is more fully understood that multimedia
Semanteme, simultaneously in order to meet user's demand across Media Inquiries, seek a kind of based on semantic across
Media sort method is the most meaningful.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of based on latent space study
With Bidirectional sort study across media sort method.
Based on latent space study and Bidirectional sort study across media sort method, including walking as follows
Rapid:
1) ordered samples of text retrieval image and the ordered samples of image retrieval text are unified structure
Build as training sample;
2) carry out based on latent space study and Bidirectional sort study to building the training sample that obtains
Across media sequence study, obtain semantic information of multimedia space and across media order models;
3) what use study obtained carries out sorting across media across media order models: user submits to and looks into
After asking example, first find this inquiry example coordinate in semantic information of multimedia space, then
According to across media object coordinate in semantic information of multimedia space, calculate inquiry example and its
He is all across the media object similarity in semantic information of multimedia space, and according to this similarity,
It is ranked up across media object all.
Described step 1) including:
1) word bag model is utilized to carry out feature representation all text documents in training sample, and
Utilizing TF-IDF method to be weighted each word, text is finally represented as
Wherein m is the dimension of text space;
2) all image documents in training sample are extracted SIFT local feature region, and to these
Local feature region carries out K-Means cluster, builds code book and vision list with cluster centre
Word.Then to every pictures, each office of this picture is calculated by Euclidean distance arest neighbors
Portion's characteristic point should belong to which vision word in code book, last and to text document
Process as, utilize word bag model and TF-IDF method to carry out feature representation, image is
It is represented as eventuallyWherein n is the dimension of image space;
3) for text retrieval image direction, to each query text, an image is built
Sorted lists, wherein to be marked as query semantics relevant or semantic for the image in list
Uncorrelated, the training sample of the most each text retrieval image is represented as tlv tripleWherein N is training sample number, tiFor retrieval text, piFor
Image collection,It is the sequence on image collection,Represent whole Sorting space;
4) for image retrieval text orientation, to each query image, a text is built
The sorted lists of document, wherein the text document in list be marked as query semantics be correlated with
Or semantic uncorrelated, the training sample of each image retrieval text is represented as tlv tripleM is training sample number, pjFor retrieving image,
tjIt is text document set,It it is the sequence closed of text document collection;
5) the inquiry list in both direction is combined and obtains unified training sample.
Described step 2) including:
1) using structural support vector machine to build an optimization problem, its object function is so that and reflects
Penetrate function between structure risk and empiric risk, obtain compromise:
Wherein,It is the mapping matrix mapping the text to latent space,It is will
Image is mapped to the mapping matrix of latent space, and k is the dimension of latent space, ξ1, iAnd ξ2, jIt is lax
Variable.The function F of definition is as follows:
Wherein, p+And p-Represent the image collection relevant to query text t and civilian with inquiring about respectively
The incoherent image collection of this t, t+And t-Represent the text relevant to query image p respectively
Set, text collection incoherent with query image p.yijValue according to sequence y determine:
If the sequence that document i is than document j is forward, then yij=1, otherwise yij=-1.Additionally,
Definition loss function is Δ (y*, y)=1-MAP (y*, y), MAP is Mean Average
Precision, performance measurement standard conventional in a kind of information retrieval, MAP value is the biggest,
Sequence performance is the best, and the value of loss function is the least;
2) input the two-way ordered samples training sample as optimization problem, solve and obtain parameter U
And V.
Described step 3) including:
1) in the case of input being text query sample t, to all image piAccording to below equation
Calculate the similarity of itself and query sample: f (t, pi)=(Ut)TVpi, then by similarity from greatly
To little, image is ranked up;
2) in the case of input being image querying sample p, to all text document tiAccording to following
Formula calculates the similarity of itself and query sample: f (ti, p)=(Uti)TVp, then by similarity
From big to small text document is ranked up.
The present invention is compared with background technology, and have has the advantages that:
The present invention is directed to Bidirectional sort training sample and propose a set of new based on semantic content
Search method.Merge latent space study due to the method and Bidirectional sort learnt two kinds of mechanism,
Take full advantage of Bidirectional sort training sample, be simultaneous for sequence performance and directly optimize, because of
This has the performance that preferably sorts.
Accompanying drawing explanation
Fig. 1 is to illustrate across media sort method based on what latent space study and Bidirectional sort learnt
Figure;
Fig. 2 is the example of the Query Result of the present invention.
Detailed description of the invention
Multimedia document is carried out by the present invention by merging latent space study and Bidirectional sort study
Semantic understanding, is mapped to a unification by all of multimedia document (text document, image)
Semantic information of multimedia latent space in, thus realize across media sequence retrieval.
Based on latent space study and Bidirectional sort study across media sort method, including walking as follows
Rapid:
1) ordered samples of text retrieval image and the ordered samples of image retrieval text are unified structure
Build as training sample;
2) carry out based on latent space study and Bidirectional sort study to building the training sample that obtains
Across media sequence study, obtain semantic information of multimedia space and across media order models;
3) what use study obtained carries out sorting across media across media order models: user submits to and looks into
After asking example, first find this inquiry example coordinate in semantic information of multimedia space, then
According to across media object coordinate in semantic information of multimedia space, calculate inquiry example and its
He is all across the media object similarity in semantic information of multimedia space, and according to this similarity,
It is ranked up across media object all.
Described step 1) including:
1) word bag model is utilized to carry out feature representation all text documents in training sample, and
Utilizing TF-IDF method to be weighted each word, text is finally represented as
Wherein m is the dimension of text space;
2) all image documents in training sample are extracted SIFT local feature region, and to these
Local feature region carries out K-Means cluster, builds code book and vision list with cluster centre
Word.Then to every pictures, each office of this picture is calculated by Euclidean distance arest neighbors
Portion's characteristic point should belong to which vision word in code book, last and to text document
Process as, utilize word bag model and TF-IDF method to carry out feature representation, image is
It is represented as eventuallyWherein n is the dimension of image space;
3) for text retrieval image direction, to each query text, an image is built
Sorted lists, wherein to be marked as query semantics relevant or semantic for the image in list
Uncorrelated, the training sample of the most each text retrieval image is represented as tlv tripleWherein N is training sample number, tiFor retrieval text, piFor
Image collection,It is the sequence on image collection,Represent whole Sorting space;
4) for image retrieval text orientation, to each query image, a text is built
The sorted lists of document, wherein the text document in list be marked as query semantics be correlated with
Or semantic uncorrelated, the training sample of each image retrieval text is represented as tlv tripleM is training sample number, pjFor retrieving image,
tjIt is text document set,It it is the sequence closed of text document collection;
5) the inquiry list in both direction is combined and obtains unified training sample.
Described step 2) including:
1) using structural support vector machine to build an optimization problem, its object function is so that and reflects
Penetrate function between structure risk and empiric risk, obtain compromise:
Wherein,It is the mapping matrix mapping the text to latent space,It is will
Image is mapped to the mapping matrix of latent space, and k is the dimension of latent space, ξ1, iAnd ξ2, jIt is lax
Variable.The function F of definition is as follows:
Wherein, p+And p-Represent the image collection relevant to query text t and civilian with inquiring about respectively
The incoherent image collection of this t, t+And t-Represent the text relevant to query image p respectively
Set, text collection incoherent with query image p.yijValue according to sequence y determine:
If the sequence that document i is than document j is forward, then yij=1, otherwise yij=-1.Additionally,
Definition loss function is Δ (y*, y)=1-MAP (y*, y), MAP is Mean Average
Precision, performance measurement standard conventional in a kind of information retrieval, MAP value is the biggest,
Sequence performance is the best, and the value of loss function is the least;
2) input the two-way ordered samples training sample as optimization problem, solve obtain parameter U and
V.Concrete derivation algorithm is as follows:
To the searching optimum y in step 3 and step 5, it is possible to use SVMMAP method is entered
Row solves.Finally solve U and V obtained and be i.e. respectively text space linearly reflecting to latent space
Penetrate function and the image space linear mapping function to latent space.
Described step 3) including:
1) in the case of input being text query sample t, to all image piAccording to below equation
Calculate the similarity of itself and query sample: f (t, pi)=(Ut)TVpi, then by similarity from greatly
To little, image is ranked up;
2) in the case of input being image querying sample p, to all text document tiAccording to following
Formula calculates the similarity of itself and query sample: f (ti, p)=(Uti)TVp, then by similarity
From big to small text document is ranked up.
Embodiment
In order to verify the effect of the present invention, grab from the webpage of " figure wikipedia-every day one "
Taking about 2900 webpages, be divided into 10 big classes, each webpage contains an image and several
The description text of Duan Xiangguan, tests in this, as data set.If image and text are all returned
Belong to a class of 10 big apoplexy due to endogenous wind, then it is assumed that image is relevant with text, the most uncorrelated.By number
Being divided into training set and test set according to collection, the present invention is trained in training set, is then surveying
Independent assessment is carried out on examination collection.Feature extraction is carried out according to step described in the present invention, wherein
After removing common word and uncommon word, text space is set as 5000 dimensions, and image space is set as
1000 dimensions.In order to evaluate the performance of the algorithm of the present invention objectively, inventor uses average standard
Really the present invention is evaluated by rate (Mean Average Precision, MAP).MAP's
Result is as shown in table 1:
MAP@50 | MAP@all | |
Text query image | 0.3981 | 0.2123 |
Image querying text | 0.2599 | 0.2528 |
Table 1
Wherein MAP@50 is front 50 return calculated MAP value of result, MAP@all
It is all calculated MAP value of return result.
In order to preferably represent present invention result on cross-media retrieval, present in fig. 2
The example of some Query Results.Fig. 2 is twice retrieval result of the present invention, respectively text inspection
Rope image and image retrieval text.Wherein when showing image retrieval text, the text of return makes
With the image of its correspondence as displaying.From the result presented it will be seen that either with image
Query text, or with text query image, the method for the present invention all has preferable effect,
Can return traditional single mode retrieval the most close irrealizable result.
Claims (2)
1. based on latent space study and Bidirectional sort study across a media sort method, it is special
Levy and be to comprise the steps:
1) ordered samples of text retrieval image and the ordered samples of image retrieval text are unified structure
Build as training sample;
2) carry out based on latent space study and Bidirectional sort study to building the training sample that obtains
Across media sequence study, obtain semantic information of multimedia space and across media order models;
3) what use study obtained carries out sorting across media across media order models: user submits to and looks into
After asking example, first find this inquiry example coordinate in semantic information of multimedia space, then basis
Across media object coordinate in semantic information of multimedia space, calculate inquiry example with other all across
Media object is in the similarity in semantic information of multimedia space, and according to this similarity, to all across matchmaker
Body object is ranked up.
The most according to claim 1 a kind of based on latent space study and Bidirectional sort study
Across media sort method, it is characterised in that described step 1) including:
1) word bag model is utilized to carry out feature representation all text documents in training sample, and
Utilizing TF-IDF method to be weighted each word, text is finally represented as t ∈ Rm, its
Middle m is the dimension of text space;
2) all image documents in training sample are extracted SIFT local feature region, and to these
Local feature region carries out K-Means cluster, builds code book and vision word with cluster centre;
Then to every pictures, each local feature region of this picture is calculated by Euclidean distance arest neighbors
Which vision word in code book should be belonged to, finally the same with to the process of text document,
Utilizing word bag model and TF-IDF method to carry out feature representation, image is finally represented as
p∈Rn, wherein n is the dimension of image space;
3) for text retrieval image direction, to each query text, an image is built
Sorted lists, wherein the image in list is marked as that query semantics is relevant or semanteme not phase
Closing, the training sample of the most each text retrieval image is represented as tlv tripleWherein N is training sample number, tiFor retrieval text, piFor figure
Image set closes,It is the sequence on image collection,Represent whole Sorting space;
4) for image retrieval text orientation, to each query image, a text is built
The sorted lists of document, wherein the text document in list be marked as query semantics relevant or
Semantic uncorrelated, the training sample of each image retrieval text is represented as tlv tripleM is training sample number, pjFor retrieval image, tjIt is
Text document set,It it is the sequence closed of text document collection;
5) the inquiry list in both direction is combined and obtains unified training sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310410565.2A CN103559191B (en) | 2013-09-10 | 2013-09-10 | Based on latent space study and Bidirectional sort study across media sort method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310410565.2A CN103559191B (en) | 2013-09-10 | 2013-09-10 | Based on latent space study and Bidirectional sort study across media sort method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103559191A CN103559191A (en) | 2014-02-05 |
CN103559191B true CN103559191B (en) | 2016-09-14 |
Family
ID=50013438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310410565.2A Expired - Fee Related CN103559191B (en) | 2013-09-10 | 2013-09-10 | Based on latent space study and Bidirectional sort study across media sort method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103559191B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2787138C1 (en) * | 2021-07-21 | 2022-12-29 | АБИ Девелопмент Инк. | Structure optimization and use of codebooks for document analysis |
US11893818B2 (en) | 2021-07-21 | 2024-02-06 | Abbyy Development Inc. | Optimization and use of codebooks for document analysis |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3158508A1 (en) * | 2014-06-20 | 2017-04-26 | Google, Inc. | Fine-grained image similarity |
CN105446973B (en) * | 2014-06-20 | 2019-02-26 | 华为技术有限公司 | The foundation of user's recommended models and application method and device in social networks |
CN104346440B (en) * | 2014-10-10 | 2017-06-23 | 浙江大学 | A kind of across media hash indexing methods based on neutral net |
CN104317834B (en) * | 2014-10-10 | 2017-09-29 | 浙江大学 | A kind of across media sort methods based on deep neural network |
CN104346450B (en) * | 2014-10-29 | 2017-06-23 | 浙江大学 | A kind of across media sort methods based on multi-modal recessive coupling expression |
CN105701227B (en) * | 2016-01-15 | 2019-02-01 | 北京大学 | A kind of across media method for measuring similarity and search method based on local association figure |
CN106095829B (en) * | 2016-06-01 | 2019-08-06 | 华侨大学 | Cross-media retrieval method based on deep learning and the study of consistency expression of space |
CN106529583A (en) * | 2016-11-01 | 2017-03-22 | 哈尔滨工程大学 | Bag-of-visual-word-model-based indoor scene cognitive method |
CN107357884A (en) * | 2017-07-10 | 2017-11-17 | 中国人民解放军国防科学技术大学 | A kind of different distance measure across media based on two-way study sequence |
CN107562812B (en) * | 2017-08-11 | 2021-01-15 | 北京大学 | Cross-modal similarity learning method based on specific modal semantic space modeling |
CN107657008B (en) * | 2017-09-25 | 2020-11-03 | 中国科学院计算技术研究所 | Cross-media training and retrieval method based on deep discrimination ranking learning |
CN108228757A (en) * | 2017-12-21 | 2018-06-29 | 北京市商汤科技开发有限公司 | Image search method and device, electronic equipment, storage medium, program |
CN108829847B (en) * | 2018-06-20 | 2020-11-17 | 山东大学 | Multi-modal modeling method based on translation and application thereof in commodity retrieval |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1920818A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Transmedia search method based on multi-mode information convergence analysis |
-
2013
- 2013-09-10 CN CN201310410565.2A patent/CN103559191B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1920818A (en) * | 2006-09-14 | 2007-02-28 | 浙江大学 | Transmedia search method based on multi-mode information convergence analysis |
Non-Patent Citations (4)
Title |
---|
A Low Rank Structural Large Margin Method for Cross-Modal Ranking;Xinyan Lu, Fei Wu et al.;《Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval》;20130801;全文 * |
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval;Yi Yang, Yue-Ting Zhuang et al.;《IEEE Transactions on Multimedia》;20080430;第10卷(第3期);全文 * |
Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval;Yue-Ting Zhuang, Yi Yang et al.;《IEEE Transaction on Multimedia》;20080229;第10卷(第2期);全文 * |
互联网跨媒体分析与检索:理论与算法;吴飞,庄越挺;《计算机辅助设计与图形学学报》;20100131;第22卷(第1期);全文 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2787138C1 (en) * | 2021-07-21 | 2022-12-29 | АБИ Девелопмент Инк. | Structure optimization and use of codebooks for document analysis |
US11893818B2 (en) | 2021-07-21 | 2024-02-06 | Abbyy Development Inc. | Optimization and use of codebooks for document analysis |
Also Published As
Publication number | Publication date |
---|---|
CN103559191A (en) | 2014-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103559191B (en) | Based on latent space study and Bidirectional sort study across media sort method | |
CN104317834B (en) | A kind of across media sort methods based on deep neural network | |
US10146862B2 (en) | Context-based metadata generation and automatic annotation of electronic media in a computer network | |
CN106250412B (en) | Knowledge mapping construction method based on the fusion of multi-source entity | |
US9589208B2 (en) | Retrieval of similar images to a query image | |
CN107180045B (en) | Method for extracting geographic entity relation contained in internet text | |
Ye et al. | Sentiment classification for movie reviews in Chinese by improved semantic oriented approach | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
CN104239513B (en) | A kind of semantic retrieving method of domain-oriented data | |
US8856129B2 (en) | Flexible and scalable structured web data extraction | |
CN101320375B (en) | Digital book search method based on user click action | |
US20150178321A1 (en) | Image-based 3d model search and retrieval | |
Yin et al. | Facto: a fact lookup engine based on web tables | |
CN106095829A (en) | Cross-media retrieval method based on degree of depth study with the study of concordance expression of space | |
CN103886020B (en) | A kind of real estate information method for fast searching | |
CN110309268A (en) | A kind of cross-language information retrieval method based on concept map | |
US20120166439A1 (en) | Method and system for classifying web sites using query-based web site models | |
Ionescu et al. | Result diversification in social image retrieval: a benchmarking framework | |
CN105005590B (en) | A kind of generation method of the interim abstract of the special topic of information media | |
CN115796181A (en) | Text relation extraction method for chemical field | |
Noel et al. | Applicability of Latent Dirichlet Allocation to multi-disk search | |
CN105740879A (en) | Zero-sample image classification method based on multi-mode discriminant analysis | |
Wang et al. | Constructing a comprehensive events database from the web | |
TW201243627A (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
Sun et al. | Towards tags ranking for social images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160914 Termination date: 20180910 |