CN105760507A - Cross-modal subject correlation modeling method based on deep learning - Google Patents

Cross-modal subject correlation modeling method based on deep learning Download PDF

Info

Publication number
CN105760507A
CN105760507A CN201610099438.9A CN201610099438A CN105760507A CN 105760507 A CN105760507 A CN 105760507A CN 201610099438 A CN201610099438 A CN 201610099438A CN 105760507 A CN105760507 A CN 105760507A
Authority
CN
China
Prior art keywords
text
theme
image
vocabulary
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610099438.9A
Other languages
Chinese (zh)
Other versions
CN105760507B (en
Inventor
张玥杰
程勇
刘志鑫
金城
张涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610099438.9A priority Critical patent/CN105760507B/en
Publication of CN105760507A publication Critical patent/CN105760507A/en
Application granted granted Critical
Publication of CN105760507B publication Critical patent/CN105760507B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • G06F16/94Hypermedia
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention belongs to the technical field of cross-media correlation learning, and particularly relates to a cross-modal subject correction modeling method based on deep learning.The method includes two main algorithms of multi-modal file expression based on deep vocabularies and correlation subject model modeling fusing cross-modal subjection correction learning.A deep learning technology is utilized for constructing deep semantic vocabularies and deep vision vocabularies to describe a semantic description part and an image part in a multi-modal file.Based on multi-modal file expression, a cross-modal correlation subject model is constructed to model a whole multi-modal file set, so that the generation process of the multi-modal file and the correlation between different modals are described.The accuracy is high, and adaptability is high.The cross-modal subject correction modeling method has important meaning for efficient cross-media information retrieval in consideration of multi-modal semantic information on the basis of the large-scale multi-modal file (a text and an image), can improve retrieval correlation and promote user experience, and has great application value in the field of cross-media information retrieval.

Description

Cross-module state topic relativity modeling method based on degree of depth study
Technical field
The invention belongs to across Media Correlation learning art field, be specifically related to based on the degree of depth study across modality images-text subject correlation study method.
Background technology
Development and the maturation of Web2.0 along with Internet technology, the multi-modal document of accumulative magnanimity on the internet, how to analyze and process the labyrinth of these multi-modal documents, thus providing theories integration to have changed into a very important study hotspot for practical applications such as cross-media retrieval.As a rule, a multi-modal document generally exists with the form of multiple modalities co-occurrence, for instance, many web graph pictures attach a lot of user-defined iamge description or mark, additionally also have the document package form containing some illustrations of some networks.But, although these multi-modal data are usually associated with each other, but due to the problem of semantic gap, visual information and text at image describe and have very big difference and difference [1] between information, and this makes the semantic association making full use of between different modalities become very difficult.Therefore, how fully to excavate the relation that different modalities data are implicit behind, and merge important [2,3] that multi-modal document is modeled becoming very by multi-modal information better.And utilize topic model that multi-modal document is modeled, and then the association excavated between different modalities is a Critical policies, in the research of cross-module state theme modeling, there are three problems of being mutually related needs to be solved simultaneously:
1, find and the image in multi-modal document and content of text are described expression by structure document elements more representative, more valuable respectively.
2, more reasonably topic relativity model can be set up better association between different modalities data in multi-modal document to be described, namely visual pattern and text describe between association.
3, set up one by the study of cross-module state topic relativity for the internal association between image and content of text and objectively weigh mechanism.
For solving first problem, how most important exploration exactly can set up one group of document elements optimized, thus utilizing the document elements of these optimizations more accurately, more fully hereinafter to the vision in multi-modal document and semantic feature to express.
For solving Second Problem, it is most important that the probability topic model of a robust more can be set up, the likelihood angle value of the multi-modal subject document observed is made to reach maximum thus excavating implicit subject information behind.
For solving the 3rd problem, maximally effective settling mode is that the attribute character of different modalities is mapped in common embedding subspace, thus the related information maximized between different modalities information.
Current more existing researcheres propose distinct methods for multi-modal data modeling, substantially can be divided into two classes from modeling these methods of angle, and the first kind is statistics dependence modeling method, and Equations of The Second Kind is to build joint probability to generate model method.
(1) modeling method that statistics relies on
The core concept of statistical modeling method is that the data characteristics of different modalities is mapped to identical latent space, it is therefore desirable for farthest excavate the statistic correlation between different modalities data characteristics.For image and text, by building corresponding mapping matrix, respectively the characteristics of image of different structure and text feature are mapped in identical public subspace, fall into a trap in public subspace the dependency of nomogram picture and text, more relevant image and text distance in public subspace are also more near, otherwise distance more far means that image is also more low with the dependency of text.Canonical correlation analysis method (CanonicalCorrelationAnalysis, CCA) being a kind of most typical statistics dependence method, it obtains its corresponding space base vector matrix by asking for the maximum correlation of visual signature matrix and semantic feature matrix;Space base vector matrix keeps the dependency of Image Visual Feature and semantic feature substantially, and provides it to be mapped to the mapping relations of isomorphism subspace;And then by the visual feature vector of image and semantic feature DUAL PROBLEMS OF VECTOR MAPPING to isomorphism subspace under same dimension and build cross-module state fusion feature, it is achieved the unified representation of media data different modalities information.Dependence between image and text is inquired into by the such as KernelCCA of work afterwards (KCCA) and deepCCA (DCCA) in deeper level.
Statistical modeling method and topic model are combined together by work [4], the method extracts the visual theme feature of image and the text subject feature of text respectively first with potential Di Li Cray model, utilize afterwards Canonical correlation method by visual theme feature and text subject Feature Mapping to isomorphism subspace to find and to calculate its dependency.It extends its work in [5], and utilizes KCCA to calculate its dependency.
(2) build joint probability and generate model method
Multi-modal topic model is to build joint probability to generate the Typical Representative method of model, has many related works recent years for the vision content in multi-modal document and semantic description to carry out probability topic modeling [6,7,8,9,10].[Blei2003] sets up a series of topic model [11] complicated step by step in its work in 2003, wherein CorrespondenceLatentDirchletAllocation (Corr-LDA) is wherein optimum cross-module state topic model, there is corresponding dependence between implicit theme between this model hypothesis different modalities, namely the implicit theme of corresponding mark comes from image vision information implicit theme behind.This hypothesis is set up the generation of unidirectional mapping relations and text vocabulary and is depended on the vision content information of image.Afterwards, [Wang2009] proposes a kind of topic model having a supervision to learn the potential relation [12] between image and mark word, and [Putthividhva2010] then proposes a kind of multi-modal potential Di Li Cray model [13] returned based on theme.[Rasiwasia2010] studies multi-modal document Chinese version and picture material combines modeling [3].[Nguyen2013] proposes a kind of method of image labeling, and the method is based on the distribution [9] of the union feature distribution with word and word and theme.[Niu2014] proposes a kind of semi-supervised relation topic model and the relation between picture material and image carries out explicit modeling [14].[Wang2014] then proposes a kind of semi-supervised multi-modal common theme and strengthens model, and the relation [15] mutually promoted between different modalities theme inquired into by this model.[Zheng2014] proposes a kind of supervision mutation model that has for DocNADE and the visual vocabulary of image, mark vocabulary and class target Joint Distribution is modeled [16].[Chen, 2015] solves the modeling wide gap [17] between image and text by building vision-emotion LDA model.
As seen through the above analysis, current method all obtains some progress when multi-modal Document Modeling, but all above method does not take into full account the impact that three below aspect is brought yet:
(1) in multi-modal document, depth information excavates most of existing images-label degree of association learning method and generally only focuses on the association to explore between different modalities in traditional visual signature method for expressing and markup information feature, does not consider the depth characteristic contained in these different modalities.For building the semantic and internal semantic association of overall Vision, this will cause a series of serious loss of learning problem.The degree of depth exploration of multi-modal document then can be made up this defect so that the characteristic element obtained represents multi-modal document better.
(2) model most existing theme modeling methods when considering the topic relativity building different modalities based on the relation topic relativity of depth analysis, be typically based on such it is assumed that the theme that namely different modalities is hidden behind is consistent.And such hypothesis is as a rule excessively absolute; meeting is simultaneously introduced some unnecessary noises structure topic relativity; therefore build one more reasonably it is assumed that merge the characteristic information of the degree of depth, form a more optimal relation topic relativity modeling mechanism and become particularly significant.
(3) the most existing multi-modal topic model of cross-module state degree of association study based on degree of depth theme feature typically directly considers the theme distribution feature that coupling different modalities is hidden behind when calculating dependency between different modalities, thus the internal association caught between visual pattern and text description.But, such a kind of direct matching way does not consider the isomerism of image and text well, therefore by degree of depth theme feature is mapped to public space thus learning its dependency can excavate its dependency well, thus solving problem suggested above.
Therefore, being highly desirable to use for reference current existing relevant mature technology, take one thing with another problem above simultaneously, the topic relativity computational methods analyzed more fully hereinafter and calculate between different modalities.The present invention thus excites, from local to entirety, the technological frame (including three main algorithm) designing a kind of novelty is contained, degree of depth vocabulary in multi-modal document builds, relation topic model builds, the study of allos topic relativity, thus setting up effective cross-module state topic relativity computational methods, final for improve across media image retrieval performance.
Summary of the invention
It is an object of the invention to propose a kind of cross-module state topic relativity modeling method based on degree of depth study, to improve across Media Society image retrieval performance.
Present invention firstly provides a novel degree of depth cross-module state topic relativity correlation model, this model is modeled for extensive multi-modal language material, can analyse in depth and understand the related information between image and text in multi-modal document, utilize constructed model, it is possible to effectively facilitate the performance of cross-media retrieval.This model mainly includes following components:
(1) degree of depth vocabulary builds (DeepWordConstruction).For multi-modal document, utilize degree of deep learning art to build degree of depth vocabulary respectively and be indicated as basic element;Degree of depth vocabulary includes deep vision vocabulary and degree of depth text vocabulary, and wherein, deep vision vocabulary is used for describing the image vision content in document better, and degree of depth text vocabulary is then as being used for describing the basic element of document Chinese version content.Compared with traditional visual vocabulary and text vocabulary, degree of depth vocabulary can excavate the semantic information of document to a deeper level.By such building mode, multi-modal document can represent better with degree of depth vocabulary.
(2) multi-modal subject information generates (MultimodalTopicInformationGeneration).On the degree of depth vocabulary basis built, utilize topic model LDA to excavate the subject information that different modalities data are hidden behind further.Topic model assumes that document sets has theme collection one group common behind, and in document, each word correspond to a theme, based on such it is assumed that can obtain each document theme feature behind by deriving and document is further represented.
(3) cross-module state theme association analysis (Cross-modalTopicCorrelationAnalysis).Assume that the theme that the document of different modalities is hidden behind is allos but relevant, the theme that such as " wedding " is corresponding in text document is likely to there is significantly high related information with image " white " theme behind, therefore by the method for structure common subspace, the theme feature of different modalities is mapped in public subspace, to find the related information between different modalities.
(4) relation theme modeling (RelationalTopicModeling).Relation topic model is when generating the theme feature of different modalities, consider the related information of image and document simultaneously, namely the information of same mode is not only considered when building the theme of a certain document, further contemplate the related information with other mode simultaneously, so that final theme merges multi-modal information, and finally build the theme distribution and cross-module state related information that obtain multi-modal document behind.
Compare to current existing multi-modal theme modeling method, method proposed by the invention also exists two big advantages in the application: first, accuracy is high, it is mainly reflected in: this method utilizes the degree of depth vocabulary built to replace tradition vocabulary, mode profound level information can be excavated deeper into ground, the problem that semantic gap brings can be alleviated well such that it is able to advantageously promote the efficiency of cross-media retrieval.Second, strong adaptability, because constructed model is modeled for the association between different modalities well, so going for image retrieval text and text retrieval image is two-way across media information retrieval, and this model can also expand to more easily for other mode in media information retrieval (such as audio frequency etc.).
Cross-module state topic relativity modeling method based on degree of depth study provided by the invention, specifically comprises the following steps that
(1) data prediction: from the data image of multi-medium data concentrated collection different modalities, obtains image and image description data, arranges seldom appearance or useless mark word in image labeling data set;
(2) multi-modal depth characteristic is extracted: utilize the semantic feature of degree of deep learning method visual signature Yu iamge description to extract image.Specifically, it is utilized respectively Region-CNN (ConvolutionalNeuralNetwork) model and Skip-gram model comes the provincial characteristics of abstract image and the lexical feature of text.Wherein, first Region-CNN detects representational region candidate collection in image, utilizes the convolutional neural networks of pre-training to come the feature extracted corresponding to respective regions afterwards;Skip-gram model is then utilize the co-occurrence information between text vocabulary and vocabulary directly to train the characteristic vector obtaining vocabulary to represent.
(3) degree of depth word bag model is built: obtained image area characteristics and text lexical feature in step (2) are clustered initially with clustering algorithm K-means, obtain limiting deep vision dictionary and the degree of depth text dictionary of dimension, and then provincial characteristics all of in respective image is mapped to corresponding visual dictionary, deep vision word bag model is obtained thus building, similarly, the vocabulary in all of text can also be mapped to text dictionary and obtains degree of depth text word bag model;
(4) multi-modal theme generates: utilize the hypothesis of potential Di Li Cray model to simulate the generation process of whole multi-modal data collection, and it is derived by the theme distribution feature that text collection and image set are hidden behind, makes full use of the co-occurrence information between vocabulary;
(5) the relation topic model modeling that cross-module state topic relativity is analyzed is merged: build corresponding relation topic model, namely while building topic model, consider the dependency of theme feature between different modalities, using the multi-modal theme feature that obtains in step (4) as initial value, utilize the dependency to calculate between image and text of the related information between image and text simultaneously, utilize calculated dependency to update the subject information of multi-modal document, thus cross-iteration ground carries out correlation calculations and obtains final relation topic model with theme distribution renewal and then structure;
(6) based on topic relativity across media information retrieval: the cross-module state topic relativity obtained is applied in media information retrieval, it is the inquiry of certain mode given respectively, utilizes correlation calculations to obtain and the data of this inquiry other mode maximally related.
Below above each step is described in detail:
(1) data prediction
The data image gathering different modalities is mainly carried out preliminary pretreatment by this step, specifically, because comprising some noises in the middle of the mark that image comprises, these noises are because the randomness of user annotation and cause, therefore the mode that can be filtered by word frequency, is filtered out word frequency lower than the word of certain threshold value thus obtaining new dictionary.
(2) multi-modal depth characteristic is extracted
In the present invention, it is utilized respectively Region-CNN and Skip-gram model and comes the provincial characteristics of abstract image and the lexical feature of text.Illustrate separately below:
Given image, Region-CNN, first with selecting the method for search to select position that object is likely to occur as Candidate Set (usual about 2,000) from image, exists with the form of region.Afterwards, then for each extracted region CNN feature.On implementing, each image-region is converted into fixing Pixel Dimensions 227*227 by Region-CNN, and the convolutional network for extracting feature is fully connected layer by 5 convolutional layers and 2 and constitutes.Extracting visual signature with Region-CNN to compare traditional visual signature, its advantage is mainly reflected in the CNN profound feature extracted and is more nearly the semanteme of image itself, it is possible to the problem alleviating semantic gap to a certain extent.
Given text document, utilizes Skip-gram model training to obtain each the word characteristic of correspondence vector occurred in text document.Skip-gram model is the distributed expression that a kind of very effective method carrys out learning text vocabulary, and this model is proposed in 2013 by Mikolov et al. the earliest, is used widely afterwards in the task of different natural language processings.This model can catch the syntax and semantics relation between text vocabulary well, and the word of semantic similitude can be condensed together, the text term vector learning method comparing traditional.One of Skip-gram important advantage is that training effectiveness when it is trained for mass data is high because being not related to the density matrix operation of complexity.Representing that with TD the text of whole multi-modal document data set describes part, TW is all of text vocabulary occurred in TD, and TV is the dictionary that text vocabulary is corresponding, for each vocabulary tw, iv in TWtwAnd ovtwIt is the input feature value for tw and output characteristic vector, Context (tw) is the word tw vocabulary hereinafter occurred thereon, in the present invention window size corresponding for context is set to 5, represents W ∈ R by unified to all input vectors corresponding to whole text data set and output vector with a long parameter vector2*|TV|*dim, wherein dim is the dimension of input vector and output vector.Therefore, the object function of whole Skip-gram model can be described below:
B S G ( ω ) = argmax ω 1 | W | Σ i = 1 | W | Σ j = 1 C o n t e x t ( w i ) log P ( w j | w i )
= argmax ω 1 | W | Σ i = 1 | W | Σ j = 1 C o n t e x t ( w i ) exp ( O w j · I w i ) Σ k = 1 | V | exp ( O w k · I w i ) - - - ( 1 )
Skip-gram is trained, utilize traditional softmax train the calculation cost that brings can unusual height, therefore the negative sample method of sampling is utilized to approximate calculation ogP (twj|twi), its computing formula is as follows:
log P ( w j | w i ) = l o g σ ( O w j · I w i ) + Σ k = 1 m E w k ~ P ( w ) l o g σ ( O w j · I w i ) - - - ( 2 )
Wherein, σ () is sigmoid function, and m is the quantity of negative sample, and each negative sample is to be distributed P (tw) from the noise based on word frequency to generate.
(3) degree of depth word bag model is built
Obtain on the basis of respective depth vocabulary in step (two), build degree of depth word bag model by the method for vector quantization (VectorQuantization) [25] further.Specifically, the region candidate collection obtained for utilizing R-CNN to extract and corresponding feature, multi-modal document data is concentrated the provincial characteristics that all images comprise to cluster by the method first with K-means, it is fixed the classification of quantity, the central point of each cluster classification is as the representative element of the category, and all these classifications constitute a corresponding dictionary.Afterwards, each candidate region in image is mapped in the middle of corresponding classification to represent, mapping method is the Euclidean distance by calculating the feature in each region and class center feature, thus finding the corresponding classification nearest with provincial characteristics, adds up in the position of the corresponding category of vector.Utilize such way the every piece image in whole data set all can be represented the form becoming deep vision word bag, the i.e. corresponding vector of every piece image, the dimension of vector is the number of classification, and the element value of the vector number of times that to be the category occur in the picture, with vector VT ∈ RCRepresenting, wherein C is the class number that cluster obtains.Similarly, for all of term vector corresponding to text document, also the mode such as through cluster obtains corresponding degree of depth text dictionary, and each text is finally expressed as the form of degree of depth text word bag with same mapping method.
(4) multi-modal theme generates
Multi-modal information is a kind of very important expression way for multi-modal document content, say, that the visual information of image is combined with semantic description.Therefore, for better between computation vision image and text marking across Modal Correlation, extract representational multi-modal feature more exactly and become particularly significant, and multi-modal character representation can explore associating between the perceptual property of image and semantic meaning representation feature better.
Potential Di Li Cray distribution (LDA) algorithm is a production probabilistic model for discrete data, this algorithm is subject to showing great attention to of picture/text research field, LDA utilizes one group of probability distribution to represent every section of document, and each word in document is generated from an independent theme.The advantage of LDA is in that it considers the inherent statistical framework of document such as different word co-occurrence informations etc. in whole collection of document, it is assumed that each vocabulary in every section of document is generated from an independent theme, and this theme is generated by a Di Li Cray distribution on all themes.Each section of document is all expressed as one group of ProbabilityDistribution Vector in theme set by LDA, and these vectors are for representing visual signature and the text feature of sociogram.
In step (four), potential Di Li Cray model is utilized respectively image and text collection to be carried out probabilistic Modeling, potential Di Li Cray model hypothesis is a under cover common theme set in the behind of document sets, and each section of concrete document correspond to a probability distribution in this theme set behind respectively, each word in the document correspond to a theme generated by this probability distribution behind;And the probability distribution of all documents does not have no bearing on, generated from a common Di Li Cray distribution;On the basis of this model hypothesis, deep vision word bag step (three) obtained and degree of depth text word bag are as input, utilize the probability topic distribution that LDA model is hidden behind to be derived by different modalities document (text document and visual document), set up the relation topic model merging cross-module state related information for next step and set up basis.
(5) the relation topic model modeling that cross-module state topic relativity is analyzed is merged
Correlation information between different modalities is dissolved in topic model building process by structure relation topic model, specifically, the theme distribution of different modalities step (four) obtained is as initial value, the dependency obtained between different modalities theme feature is calculated by the theme feature of different modalities being mapped to the mode of public subspace, and the calculating of this dependency is dissolved in topic model, and then consider the correlation information with another mode when the theme that the document behind of a certain mode of deriving is hidden, so that the subject information finally given considers not only the distributed intelligence between same mode, it is also contemplated for the relation between other mode simultaneously.
The main target of this step is in that to build a joint probability distribution so that the multi-modal document likelihood angle value observed reaches maximum.In the process building model, by multi-modal collection of document DMBeing divided into three parts to constitute, namely Part I is visual pattern set DV, Part II is text description collections DT, Part III is link set LVT(related information between this set instruction image and text).Wherein, DVBy deep vision lexical set DWVConstitute, and DVVIt is deep vision dictionary, text description collections D simultaneouslyTBy degree of depth text lexical set DWTConstitute, DVTIt it is degree of depth text dictionary.For lvt∈LVT,lvt=1 means visual pattern dv∈DVWith text, d is describedt∈DTIt is relevant, and lvt=0 means visual pattern dvWith text, d is describedtIt is incoherent.Based on above description, relation topic model formalization representation is as follows: given TSVFor visual theme set, TSTBeing text subject set, α and β is two hyper parameter, and wherein α is distributed for the Di Li Cray of theme, and β is distributed for the Di Li Cray of theme-degree of depth vocabulary, θvCorresponding visual pattern dvTheme distribution behind, θtCorresponding visual pattern dtTheme distribution behind, Φ is the corresponding multinomial distribution corresponding to all degree of depth vocabulary of each theme, z is the behind subject information of all vocabulary of the correspondence actually generated by θ, and Dir () and Mult () represents the distribution of Di Li Cray and multinomial distribution, N respectivelydThe quantity of expression degree of depth vocabulary in document d, n represents the n-th degree of depth vocabulary.The generation process of whole relation topic model is as follows:
(1) for each theme tv ∈ DT in visual theme setV:
A () obtains the multinomial distribution of the corresponding all visual vocabularies of tv according to the Di Li Cray profile samples of theme-visual vocabulary, it may be assumed that φv tv~Dir (φvv)。
(2) for each theme tt ∈ DT in text subject setT:
A () obtains the multinomial distribution of the corresponding all text vocabulary of tt according to the Di Li Cray profile samples of theme-text vocabulary, it may be assumed that φt tt~Dir (φtt)。
(3) for each visual document d ∈ DV:
A () obtains d theme distribution corresponding behind according to the Di Li Cray profile samples in theme set, it may be assumed that θv d~Dir (θvv)。
B () is for each deep vision vocabulary w in dv d,n:
I. the theme distribution according to document d behind obtains the theme that this vocabulary is corresponding, it may be assumed that zv d,n~Mult (θv d)
Ii. the vocabulary corresponding in this position of document is obtained according to theme-visual vocabulary sampling, it may be assumed that wv d,n~Mult (φv zd,n)
(4) for each text document d ∈ DT:
A () obtains d theme distribution corresponding behind according to the Di Li Cray profile samples in theme set, it may be assumed that θt d~Dir (θtt);
B () is for each degree of depth text vocabulary w in dt d,n:
I. the theme distribution according to document d behind obtains the theme that this vocabulary is corresponding, it may be assumed that zt d,n~Mult (θt d);
Ii. the vocabulary corresponding in this position of document is obtained according to theme-text vocabulary sampling, it may be assumed that wt d,n~Mult (φt zd,n);
(5) for each link lvt∈LVT, represent visual document dvWith text document dtBetween related information:
A () is according to dvWith dtTheme feature calculate its dependency thus to lvtSample, it may be assumed thatMv,Mt), whereinWithCorresponding document d respectivelyvWith dtExperience theme distribution, WithBeing that two mapping matrixes map vision and text subject feature respectively to public subspace, wherein the dimension of public subspace is dim dimension, TCor (lvt=1) document d is representedtWith dvTopic relativity, and TCor (lvt=0) document d is representedtWith dvTheme non-correlation.
Based on above procedure, the final joint probability distribution form that builds is modeled for whole multi-modal collection of document, as follows:
Wherein, the generation process of Section 1 correspondence theme-degree of depth vocabulary, the generation process of middle two corresponding deep vision vocabulary and degree of depth text vocabulary, last represents the generation process that image-description connects.
(6) across media information retrieval (application of relation topic model)
Step (six) is the relation topic model that step (five) is set up, for across media information retrieval, for image and text, two classes can be divided into across media information retrieval, i.e. text-inquiry-image and image-inquiry-text, text-inquiry-image is it is considered that according to given query text, utilizing relation topic model to calculate different images all images are ranked up by text degree of association, image-inquiry-text is then all text documents are ranked up for the degree of association of given query image according to different text documents.
For given inquiry (such as utilizing image querying text), the relation topic model of utilization derives corresponding theme feature, and the correlation calculations method utilizing the theme feature obtained in step (five) calculates the correlation information (such as text document) between other mode documents, by the height of correlation information, text document being ranked up, obtaining text document maximally related with query image thus returning.Similarly, said process be also applied for utilizing text query image across media information retrieval process.
In sum, the present invention is directed in multi-modal document the content isomerism between different modalities and relatedness, a kind of cross-module state topic relativity modeling method based on degree of depth study is proposed, and then by the form of probabilistic model, the generation process of whole multi-modal document can be described, and the dependency between the document of different modalities is quantified.The inventive method can effectively apply to for large-scale image in media information retrieval, improve retrieval relevance, strengthen Consumer's Experience.
Accompanying drawing explanation
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the schematic diagram building the multi-modal document of degree of depth lexical representation.
Fig. 3 is the schematic diagram of cross-module state relation topic relativity modeling process.
Fig. 4 is the comparison diagram of proposed relation topic model and traditional multi-modal topic model.
Fig. 5 utilizes constructed relation topic model to carry out the design sketch across media information retrieval.
Detailed description of the invention
Below in conjunction with accompanying drawing, the cross-module state relatedness computation method that the present invention is directed to sociogram is discussed in detail.
(1) data object is gathered
Gather data object, obtain image and image labeling data, arrange and image labeling data seldom occur in whole data set or useless mark word.It is typically in the data set obtained, wherein with a lot of noise datas, so just it should be carried out suitable process and filtration before using these data to carry out feature extraction.For image, the image obtained is all unified JPG form, it is not necessary to do any conversion.For the text marking of image, the image labeling obtained contains a lot of meaningless words, does not have the word of any implication as word adds numeral.Some image labeling is up to tens, in order to allow image labeling describe the main information of image well, should give up those useless, insignificant marks.Therefore, the process method step taked is as follows:
Step 1: the frequency that in statistical data collection mark, all words occur in data set;
Step 2: filter out the meaningless word with numeral in those words;
Step 3: in each image labeling in whole data set the less word of the frequency of occurrences, be construed as in image the minor information of ratio, and deleted.
By above-mentioned steps, the image labeling after just can being processed.For removing the less word of frequency in step 3, it reason for this is that the mark of same class image in image clustering or there is much identical, words of being close in meaning.Therefore according to the frequency of occurrences, it is filtered completely rationally.
(2) multi-modal feature extraction
Fig. 2 displaying utilizes degree of deep learning style extract feature and build the process of degree of depth vocabulary, utilizes Region-CNN the region of image to be detected and extracted the CNN feature of correspondence in the present invention, and the dimension of feature is 4,096 dimensions.As a rule, Region-CNN can select the region of about 2,000 as candidate for every piece image, and such piece image characteristic of correspondence matrix just has 2,000*4,096 to tie up.And if afterwards all regions of all images are clustered, data volume is M*2,000*4, and 096, M is the number of image, it is clear that the space-time cost that such data volume is brought is huge.For solving such a practical problem, concrete operations carry out the method that internal-external cluster combines, namely internal cluster (being polymerized to 10 classes) is carried out once firstly for all regions comprised in every piece image, all regions are carried out once outside cluster (being polymerized to 100 classes) afterwards again, so actually finally carry out the data volume of outside cluster just for M*10*4,096, largely reduce the space-time cost of cluster.The problem that another one needs to illustrate, it is that Region-CNN extracts visual signature or Skip-gram extracts lexical feature and utilizes pre-training model to be operated, wherein Region-CNN utilizes AlexNet to carry out pre-training on ImageNet, and Skip-gram then utilizes and trains, on the wikipedia document comprising 6,000,000,000 vocabulary, the model obtained.This is primarily due to the substantial amounts of data of training need of deep neural network, therefore for avoiding the problem of over-fitting, utilizes the model trained on large-scale dataset to be operated real data extracting corresponding feature.
(3) cross-module state topic relativity calculates
Fig. 3 shows cross-module state relation topic relativity modeling process, mentions utilization in introduction before Carry out computation vision document dvWith text document dtDependency, MvAnd MtIt is the mapping matrix for visual theme feature and text subject feature respectively, TCor (lvt=1) document d is representedtWith dvTopic relativity, and TCor (lvt=0) document d is representedtWith dvTheme non-correlation, the definition of TCor () is as follows:
T C o r ( l v t | z d v ‾ , z d t ‾ , M v , M t ) = s i g m o i d ( f v · f t ) , l v t = 1 1 - s i g m o i d ( f v · f t ) , l v t = 0 0.5 + 0.5 * cos i n e ( f v , f t ) l v t = 1 0.5 - 0.5 * cos i n e ( f v , f t ) l v t = 0 - - - ( 4 )
f v = z d v ‾ * M v , f t = z d t ‾ * M t
Here adopting both of which to come for different data types, pattern one is to utilize Sigmoid function to be mapped to by dot product in [0,1] scope, and second pattern calculates topic relativity by two vectorial cosine similarity of normalization.Meanwhile, based on the multi-modal theme distribution generated, it is possible to use the method for maximal possibility estimation (MLE) is trained and obtained parameter MvAnd Mt, namely maximize the log likelihood angle value of formula (4), shown in object function is defined as:
F ( M v , M t ) = argmax ( M v , M t ) Σ l v t = 1 log 1 1 + e - ( f v · f t ) + Σ l v t = 0 log e - ( f v · f t ) 1 + e - ( f v · f t ) argmax M v , M t Σ l v t = 1 log ( 0.5 + f v · f t 2 * | f v | * | f t | ) + Σ l v t = 0 log ( 0.5 - f v · f t 2 * | f v | * | f t | ) - - - ( 5 )
Based on such object function, mapping matrix MvAnd MtGradient descent method calculating can be passed through obtain.It should be noted that in actual training process, it is assumed that the quantity of multi-modal document is | DM|, under normal circumstances each multi-modal document only comprises one group of image and text, the number of image document and the number of text document substantially the same, and be equal to the quantity of multi-modal document, namely | Dv|=| DT|=| DM|.If the text occurred in same multi-modal document and image are relevant, then not uncorrelated at same multi-modal document, the ratio of the positive sample of the training data being so converted to (i.e. image-text relevant to) and negative sample (image-text uncorrelated to) is about 1/ | DM|.Such serious disproportion causing negative sample and positive sample than regular meeting, additionally image and text can not illustrate this image and text completely uncorrelated (being likely to belong to same category) completely at same multi-modal document, therefore the ratio making negative sample and positive sample in practice is 1:1, and following constraint is met when randomly choosing negative sample, namely corresponding image and text can not from same categories.
(4) multi-modal relation topic model is derived
Formula (3) shows relation topic model constructed in the present invention, utilizes the method for this sampling of jeep to be derived by the parameter [26] of model.Being intended to of this sampling of jeep obtains the theme that in multi-modal document, each vocabulary is implied behind, first the process of sampling is derived by the subject information corresponding about degree of depth vocabulary, vocabulary and the marginal distribution of corresponding cross-module state association link, as follows:
Wherein, md,ttCorresponding is the number of times that theme tt occurs in document d, ntt,wCorresponding is the number of the vocabulary that theme tt generates in whole document sets.The single argument probability distribution for subject information z can be derived by further according to formula (6), and then obtain for the sampling rule of the behind theme of each word in document.As shown in formula (7),
P ( z v d , n = t v | Z - d , n , DW V , DW T , L V T ) ∝ P ( Z T , Z V , DW V , DW T , L V T )
∝ m ^ v d , t v + α v Σ t v ∈ DT V m ^ v d , t v + | DT V | α v n ^ v t v , w v d , n + β v Σ w ∈ DV V n ^ v t v , w + | DV V | β v Π l v t ∈ L V T dinl v t T C o r ( l v t | z d v ‾ , z d t ‾ , M v , M t ) P ( z t d , n = t t | Z - d , n , DW V , DW T , L V T ) ∝ P ( Z T , Z V , DW V , DW T , L V T ) ∝ m ^ t d , t t + α t Σ t t ∈ DT T m ^ t d , t t + | DT T | α t n ^ t t t , w t d , n + β t Σ w ∈ DV T n ^ t t t , w + | DV T | β t Π l v t ∈ L V T dinl v t T C o r ( l v t | z d v ‾ , z d t ‾ , M v , M t ) - - - ( 7 )
WhereinRepresent in document d, remove the occurrence number of theme tt after current word, andRepresent the number removing the current word theme tt word comprised.Based on such sampling rule, it is possible to sampling obtains the subject information that in whole document sets, each word is implied behind.Similarly, after the sampling each time terminates, all utilize formula (5) calculates how to obtain mapping matrix M on the theme distribution basis that present sample obtainstAnd Mv, and the M obtained within the present sample timetAnd MvUsing the input as sampling process next time, so move in circles, until reaching iteration termination condition, thus obtaining final subject information and mapping matrix MtAnd Mv.Correspondingly, in relation topic model other parameters asθV、θTThen can pass through computing formula (8) to finally give:
(5) application example
Fig. 5 utilizes constructed relation topic model to carry out the design sketch across media information retrieval, wherein it is divided into both of which, one is to utilize image retrieval text (ImageQuery-to-Text), another kind is to utilize text retrieval image (TextQuery-to-Image), and its relevance score calculates as shown in formula (9).
R a n k i n g S c o r e ( i m a g e q u e r y - t o - t e x t ) = R a n k i n g S c o r e ( d t | d v ) = T C o r ( l v t = 1 | θ v d v , θ t d t , M v , M t ) Σ d t ∈ D T T C o r ( l v t = 1 | θ v d v , θ t d t , M v , M t ) R a n k i n g S c o r e ( t e x t q u e r y - t o - i m a g e )
= R a n k i n g S c o r e ( d v | d t ) = T C o r ( l v t = 1 | θ v d v , θ t d t , M v , M t ) Σ d v ∈ D V T C o r ( l v t = 1 | θ v d v , θ t d t , M v , M t ) - - - ( 9 ) .
List of references
[1]Fan,J.P.;He,X.F.;Zhou,N.;Peng,J.Y.;andJain,R.2012.QuantitativeCharacterizationofSemanticGapsforLearningComplexityEstimationandInferenceModelSelection.IEEETransactionsonMultimedia14(5):1414-1428.
[2]Datta,R.;Joshi,D.;Li,J.;andWang,J.Z.2008.ImageRetrieval:Ideas,Influences,andTrendsoftheNewAge.ACMComputingSurveys(CSUR)40(2),Article5.
[3]Rasiwasia,N.;Pereira,J.C.;Coviello,E.;Doyle,G.;Lanckriet,G.R.G.;Levy,R.;andVasconcelos,N.2010.ANewApproachtoCross-modalMultimediaRetrieval.InProceedingsofMM2010,251-260.
[4]Pereira,J,C.;Coviello,E.;Doyle,G.;Rasiwasia,N.;Lanckriet,G.R.G.;Levy,R.;andVasconcelos,N.2014.OntheRoleofCorrelationandAbstractioninCross-ModalMultimediaRetrieval.IEEETransactionsonPatternAnalysisandMachineIntelligence(PAMI)36(3):521-535.
[5]Barnard,K.;Duygulu,P.;Forsyth,D.;Freitas,N.;Blei,D.M.;andJordan,M.I.2003.MatchingWordsandPictures.JournalofMachineLearningResearch.3:1107-1135.
[6]Wang,X.;Liu,Y.;Wang,D.;andWu,F.2013.Cross-mediaTopicMiningonWikipedia.InProceedingsofMM2013,689-692.
[7]Frome,A.;Corrado,G.S.;Shlens,J.;Bengio,S.;Dean,J.;Ranzato,M.A.;andMikolov,T.2013.DeViSE:ADeepVisual-SemanticEmbeddingModel.InProceedingsofNIPS2013.
[8]Feng,F.X.;Wang,X.J.;andLi,R.F.2014.Cross-modalRetrievalwithCorrespondenceAutoencoder.InProceedingsofMM2014,7-16.
[9]Nguyen,C.T.;Kaothanthong,N.;Tokuyama,T.;andPhanX.H.2013.AFeature-Word-TopicModelforImageAnnotationandRetrieval.ACMTransactionsontheWeb7(3),Article12.
[10]Ramage,D.;Heymann,P.;Manning,C.D.;andMolina,H.G.2009.ClusteringtheTaggedWeb.InProceedingsofWSDM2009,54-63.
[11]Blei,D.M.;andJordan,M.I.2003.ModelingAnnotatedData.InProceedingsofSIGIR2003,127-134.
[12]Wang,C.;Blei,D.;andFei-FeiL.2009.SimultaneousImageClassificationandAnnotation.InProceedingsofCVPR2009,1903-1910.
[13]Putthividhya,D.;Attias,H.T.;andNagarajan,S.S.2010.TopicRegressionMulti-ModalLatentDirichletAllocationforImageAnnotation.InProceedingsofCVPR2010,3408-3415.
[14]Niu,Z.X.;Hua,G.;Gao,X.B.;andTian,Q.2014.Semi-supervisedRelationalTopicModelforWeaklyAnnotatedImageRecognitioninSocialMedia.InProceedingsofCVPR2014,4233-4240.
[15]Wang,Y.F.;Wu,F.;Song,J.;Li,X.;andZhuang,Y.T.2014.Multi-modalMutualTopicReinforceModelingforCross-mediaRetrieval.InProceedingsofMM2014,307-316.
[16]Zheng,Y.;Zhang,Y.J.;andLarochelle,H.2014.TopicModelingofMultimodalData:anAutoregressiveApproach.InProceedingsofCVPR2014,1370-1377.
[17]Chen,T.;SalahEldeen,H.M.;He,X.N.;Kan,M.Y.;andLu,D.Y.2015.VELDA:RelatinganImageTweet’sTextandImages.InProceedingsofAAAI2015.
[18]Girshick,R.;Donahue,J.;Darrell,T.;andMalik,J.2014.Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation.InProceedingsofCVPR2014,580-587.
[19]Hariharan,B.;Arbelaez,P.;Girshick,R.;andMalik,J.2014.SimultaneousDetectionandSegmentation.InProceedingsofECCV2014,297-312.
[20]Karpathy,A.;Joulin,A.;andFei-Fei,L.2014.DeepFragmentEmbeddingsforBidirectionalImageSentenceMapping.InProceedingsofNIPS2014.
[21]Zhang,N.;Donahue,J.;Girshick,R.;andDarrell,T.2014.Part-BasedR-CNNsforFine-GrainedCategoryDetection.InProceedingsofECCV2014,834-849.
[22]Mikolov,T.;Sutskever,I.;Chen,K.;Corrado,G.;andDean,J.2013.DistributedRepresentationsofWordsandPhrasesandtheirCompositionality.InProceedingsofNIPS2013.
[23]Tang,D.Y.;Wei,F.R.;Qin,B.;Zhou,M.;andLiu,T.2014.BuildingLarge-ScaleTwitter-SpecificSentimentLexicon:ARepresentationLearningApproach.InProceedingsofCOLING2014,172-182.
[24]Karpathy,A.;Joulin,A.;andFei-Fei,L.2014.DeepFragmentEmbeddingsforBidirectionalImageSentenceMapping.InProceedingsofNIPS2014.
[25]Sivic,J.,andZisserman,A.2003.VideoGoogle:ATextRetrievalApproachtoObjectMatchinginVideos.InProceedingsofICCV2003,2:1470-1477.
[26]Griffiths,T.L.;andSteyvers,M.2004.FindingScientificTopics.
InProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,101(1):5228-5235。

Claims (6)

1. the cross-module state topic relativity modeling method based on degree of depth study, it is characterised in that specifically comprise the following steps that
(1) data prediction: from the data image of multi-medium data concentrated collection different modalities, obtains image and image description data, arranges seldom appearance or useless mark word in image labeling data set;
(2) multi-modal depth characteristic is extracted: utilize the semantic feature of degree of deep learning method visual signature Yu iamge description to extract image;Specifically, it is utilized respectively Region-CNN model and Skip-gram model comes the provincial characteristics of abstract image and the lexical feature of text;Wherein, first Region-CNN detects representational region candidate collection in image, utilizes the convolutional neural networks of pre-training to come the feature extracted corresponding to respective regions afterwards;Skip-gram model is then utilize the co-occurrence information between text vocabulary and vocabulary directly to train the characteristic vector obtaining vocabulary to represent;
(3) degree of depth word bag model is built: obtained image area characteristics and text lexical feature in step (2) are clustered initially with clustering algorithm K-means, obtain limiting deep vision dictionary and the degree of depth text dictionary of dimension, and then provincial characteristics all of in respective image is mapped to corresponding visual dictionary, obtain deep vision word bag model thus building;Similarly, the vocabulary in all of text is also mapped onto text dictionary and obtains degree of depth text word bag model;
(4) multi-modal theme generates: utilize the hypothesis of potential Di Li Cray model to simulate the generation process of whole multi-modal data collection, and it is derived by the theme distribution feature that text collection and image set are hidden behind, makes full use of the co-occurrence information between vocabulary;
(5) the relation topic model modeling that cross-module state topic relativity is analyzed is merged: build corresponding relation topic model, namely while building topic model, consider the dependency of theme feature between different modalities, using the multi-modal theme feature that obtains in step (4) as initial value, utilize the dependency to calculate between image and text of the related information between image and text simultaneously, utilize calculated dependency to update the subject information of multi-modal document, thus cross-iteration ground carries out correlation calculations and obtains final relation topic model with theme distribution renewal and then structure;
(6) based on topic relativity across media information retrieval: the cross-module state topic relativity obtained is applied in media information retrieval, it is the inquiry of certain mode given respectively, utilizes correlation calculations to obtain and the data of this inquiry other mode maximally related.
2. method according to claim 1, it is characterised in that: in step (2), described in be utilized respectively Region-CNN and Skip-gram model and come the provincial characteristics of abstract image and the lexical feature of text, detailed process is as follows:
Given image, Region-CNN, first with selecting the method for search to select position that object is likely to occur as Candidate Set from image, exists with the form of region;Afterwards, then for each extracted region CNN feature;On implementing, each image-region is converted into fixing Pixel Dimensions 227*227 by Region-CNN, and the convolutional network for extracting feature is fully connected layer by 5 convolutional layers and 2 and constitutes;
Given text document, utilizes Skip-gram model training to obtain each the word characteristic of correspondence vector occurred in text document;Representing that with TD the text of whole multi-modal document data set describes part, TW is all of text vocabulary occurred in TD, and TV is the dictionary that text vocabulary is corresponding, for each vocabulary tw, iv in TWtwAnd ovtwBeing the input feature value for tw and output characteristic vector, Context (tw) is the word tw vocabulary hereinafter occurred thereon;Window size corresponding for context is set to 5, represents W ∈ R by unified to all input vectors corresponding to whole text data set and output vector with a long parameter vector2*|TV|*dim, wherein dim is the dimension of input vector and output vector;The object function of whole Skip-gram model is described below:
B S G ( ω ) = arg max ω 1 | W | Σ i = 1 | W | Σ j = 1 C o n t e x t ( w i ) log P ( w j | w i ) = argmax ω 1 | W | Σ i = 1 | W | Σ j = 1 C o n t e x t ( w i ) exp ( O w j · I w i ) Σ k = 1 | V | exp ( O w k · I w i ) - - - ( 1 )
The negative sample method of sampling is utilized to carry out approximate calculation ogP (twj|twi), its computing formula is as follows:
log P ( w j | w i ) = l o g σ ( O w j · I w i ) + Σ k = 1 m E w k ~ P ( w ) l o g σ ( O w j · I w i ) - - - ( 2 )
Wherein, σ () is sigmoid function, and m is the quantity of negative sample, and each negative sample is to be distributed P (tw) from the noise based on word frequency to generate.
3. method according to claim 1, it is characterized in that: step (3) is to obtain on the basis of respective depth vocabulary in step (2), degree of depth word bag model is built further by the method for vector quantization, detailed process is: for utilizing R-CNN to extract the region candidate collection that obtains and corresponding feature, multi-modal document data is concentrated the provincial characteristics that all images comprise to cluster by the method first with K-means, it is fixed the classification of quantity, the central point of each cluster classification is as the representative element of the category, all these classifications constitute a corresponding dictionary;Afterwards, each candidate region in image is mapped in the middle of corresponding classification to represent, mapping method is the Euclidean distance by calculating the feature in each region and class center feature, thus finding the corresponding classification nearest with provincial characteristics, add up in the position of the corresponding category of vector, thus the every piece image in whole data set is all represented the form becoming deep vision word bag, the i.e. corresponding vector of every piece image, the dimension of vector is the number of classification, and the element value of the vector number of times that to be the category occur in the picture, with vector VT ∈ RCRepresenting, wherein C is the class number that cluster obtains;Similarly, for all of term vector corresponding to text document, the mode also by cluster obtains corresponding degree of depth text dictionary, and each text is finally expressed as the form of degree of depth text word bag with same mapping method.
4. method according to claim 1, it is characterized in that: in step (4), potential Di Li Cray model is utilized respectively image and text collection to be carried out probabilistic Modeling, potential Di Li Cray model hypothesis is a under cover common theme set in the behind of document sets, and each section of concrete document correspond to a probability distribution in this theme set behind respectively, each word in the document correspond to a theme generated by this probability distribution behind;And the probability distribution of all documents does not have no bearing on, generated from a common Di Li Cray distribution;On the basis of this model hypothesis, deep vision word bag step (3) obtained and degree of depth text word bag, as input, utilize the probability topic distribution that LDA model is hidden behind to be derived by different modalities document.
5. method according to claim 1, it is characterised in that: step (5) is in the process building model, by multi-modal collection of document DMBeing divided into three parts to constitute, namely Part I is visual pattern set DV, Part II is text description collections DT, Part III is link set LVT, this set indicates the related information between image and text;Wherein, DVBy deep vision lexical set DWVConstitute, and DVVIt is deep vision dictionary, text description collections D simultaneouslyTBy degree of depth text lexical set DWTConstitute, DVTIt it is degree of depth text dictionary;For lvt∈LVT,lvt=1 means visual pattern dv∈DVWith text, d is describedt∈DTIt is relevant, and lvt=0 means visual pattern dvWith text, d is describedtIt is incoherent;Based on above description, relation topic model formalization representation is as follows: given DTVFor visual theme set, DTTBeing text subject set, α and β is two hyper parameter, and wherein α is distributed for the Di Li Cray of theme, and β is distributed for the Di Li Cray of theme-degree of depth vocabulary, θvCorresponding visual pattern dvTheme distribution behind, θtCorresponding visual pattern dtTheme distribution behind, Φ is the corresponding multinomial distribution corresponding to all degree of depth vocabulary of each theme, z is the behind subject information of all vocabulary of the correspondence actually generated by θ, and Dir () and Mult () represents the distribution of Di Li Cray and multinomial distribution, N respectivelydThe quantity of expression degree of depth vocabulary in document d, n represents the n-th degree of depth vocabulary;The generation process of whole relation topic model is as follows:
(1) for each theme tv ∈ DT in visual theme setV:
Di Li Cray profile samples according to theme-visual vocabulary obtains the multinomial distribution of the corresponding all visual vocabularies of tv, it may be assumed that φv tv~Dir (φvv);
(2) for each theme tt ∈ DT in text subject setT:
Di Li Cray profile samples according to theme-text vocabulary obtains the multinomial distribution of the corresponding all text vocabulary of tt, it may be assumed that φt tt~Dir (φtt);
(3) for each visual document d ∈ DV:
A () obtains d theme distribution corresponding behind according to the Di Li Cray profile samples in theme set, it may be assumed that
θv d~Dir (θvv);
B () is for each deep vision vocabulary w in dv d,n:
I. the theme distribution according to document d behind obtains the theme that this vocabulary is corresponding, it may be assumed that zv d,n~Mult (θv d);
Ii. the vocabulary corresponding in this position of document is obtained according to theme-visual vocabulary sampling, it may be assumed that wv d,n~Mult (φv zd,n);
(4) for each text document d ∈ DT:
A () obtains d theme distribution corresponding behind according to the Di Li Cray profile samples in theme set, it may be assumed that
θt d~Dir (θtt);
B () is for each degree of depth text vocabulary w in dt d,n:
I. the theme distribution according to document d behind obtains the theme that this vocabulary is corresponding, it may be assumed that zt d,n~Mult (θt d);
Ii. the vocabulary corresponding in this position of document is obtained according to theme-text vocabulary sampling, it may be assumed that wt d,n~Mult (φt zd,n);
(5) for each link lvt∈LVT, represent visual document dvWith text document dtBetween related information:
A () is according to dvWith dtTheme feature calculate its dependency thus to lvtSample, it may be assumed that WhereinWithCorresponding document d respectivelyvWith dtExperience theme distribution, WithBeing that two mapping matrixes map vision and text subject feature respectively to public subspace, wherein the dimension of public subspace is dim dimension, TCor (lvt=1) document d is representedtWith dvTopic relativity, and TCor (lvt=0) document d is representedtWith dvTheme non-correlation;
Based on above procedure, the final joint probability distribution form that builds is modeled for whole multi-modal collection of document, as follows:
Wherein, the generation process of Section 1 correspondence theme-degree of depth vocabulary, the generation process of middle two corresponding deep vision vocabulary and degree of depth text vocabulary, last represents the generation process that image-description connects.
6. method according to claim 1, it is characterised in that: step (6) is the relation topic model that step (5) is set up, for across media information retrieval;It is divided into two classes across media information retrieval, i.e. text-inquiry-image and image-inquiry-text, text-inquiry-image is it is considered that according to given query text, utilizing relation topic model to calculate different images all images are ranked up by text degree of association, image-inquiry-text is all text documents to be ranked up for the degree of association of given query image according to different text documents;
Image querying text is utilized for given, the relation topic model of utilization derives corresponding theme feature, and the correlation calculations method utilizing the theme feature obtained in step (5) calculates the correlation information between other mode documents, by the height of correlation information, text document being ranked up, obtaining text document maximally related with query image thus returning;Similarly, said process be also applied for utilizing text query image across media information retrieval process.
CN201610099438.9A 2016-02-23 2016-02-23 Cross-module state topic relativity modeling method based on deep learning Expired - Fee Related CN105760507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610099438.9A CN105760507B (en) 2016-02-23 2016-02-23 Cross-module state topic relativity modeling method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610099438.9A CN105760507B (en) 2016-02-23 2016-02-23 Cross-module state topic relativity modeling method based on deep learning

Publications (2)

Publication Number Publication Date
CN105760507A true CN105760507A (en) 2016-07-13
CN105760507B CN105760507B (en) 2019-05-03

Family

ID=56330274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610099438.9A Expired - Fee Related CN105760507B (en) 2016-02-23 2016-02-23 Cross-module state topic relativity modeling method based on deep learning

Country Status (1)

Country Link
CN (1) CN105760507B (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156374A (en) * 2016-09-13 2016-11-23 华侨大学 A kind of view-based access control model dictionary optimizes and the image search method of query expansion
CN106650756A (en) * 2016-12-28 2017-05-10 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image text description method based on knowledge transfer multi-modal recurrent neural network
CN106778880A (en) * 2016-12-23 2017-05-31 南开大学 Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method
CN106777050A (en) * 2016-12-09 2017-05-31 大连海事大学 It is a kind of based on bag of words and to take into account the footwear stamp line expression and system of semantic dependency
CN106886783A (en) * 2017-01-20 2017-06-23 清华大学 A kind of image search method and system based on provincial characteristics
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN107480289A (en) * 2017-08-24 2017-12-15 成都澳海川科技有限公司 User property acquisition methods and device
CN107798624A (en) * 2017-10-30 2018-03-13 北京航空航天大学 A kind of technical label in software Ask-Answer Community recommends method
CN107870992A (en) * 2017-10-27 2018-04-03 上海交通大学 Editable image of clothing searching method based on multichannel topic model
CN108073576A (en) * 2016-11-09 2018-05-25 上海诺悦智能科技有限公司 Intelligent search method, searcher and search engine system
WO2018103538A1 (en) * 2016-12-08 2018-06-14 北京推想科技有限公司 Deep learning method and device for analysis of high-dimensional medical data
CN108256549A (en) * 2017-12-13 2018-07-06 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108305296A (en) * 2017-08-30 2018-07-20 深圳市腾讯计算机系统有限公司 Iamge description generation method, model training method, equipment and storage medium
CN108399409A (en) * 2018-01-19 2018-08-14 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
WO2018205715A1 (en) * 2017-05-08 2018-11-15 京东方科技集团股份有限公司 Medical image representation-generating system, training method therefor and representation generation method
CN108830903A (en) * 2018-04-28 2018-11-16 杨晓春 A kind of steel billet method for detecting position based on CNN
CN109145936A (en) * 2018-06-20 2019-01-04 北京达佳互联信息技术有限公司 A kind of model optimization method and device
CN109214412A (en) * 2018-07-12 2019-01-15 北京达佳互联信息技术有限公司 A kind of training method and device of disaggregated model
CN109213853A (en) * 2018-08-16 2019-01-15 昆明理工大学 A kind of Chinese community's question and answer cross-module state search method based on CCA algorithm
CN109213988A (en) * 2017-06-29 2019-01-15 武汉斗鱼网络科技有限公司 Barrage subject distillation method, medium, equipment and system based on N-gram model
CN109325583A (en) * 2017-07-31 2019-02-12 财团法人工业技术研究院 Deep neural network, method and readable media using deep neural network
CN109472232A (en) * 2018-10-31 2019-03-15 山东师范大学 Video semanteme characterizing method, system and medium based on multi-modal fusion mechanism
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
WO2019149135A1 (en) * 2018-02-05 2019-08-08 阿里巴巴集团控股有限公司 Word vector generation method, apparatus and device
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN110209822A (en) * 2019-06-11 2019-09-06 中译语通科技股份有限公司 Sphere of learning data dependence prediction technique based on deep learning, computer
CN110337016A (en) * 2019-06-13 2019-10-15 山东大学 Short-sighted frequency personalized recommendation method and system based on multi-modal figure convolutional network
CN110442721A (en) * 2018-11-28 2019-11-12 腾讯科技(深圳)有限公司 Neural network language model, training method, device and storage medium
CN110503147A (en) * 2019-08-22 2019-11-26 山东大学 Multi-mode image categorizing system based on correlation study
CN110647632A (en) * 2019-08-06 2020-01-03 上海孚典智能科技有限公司 Image and text mapping technology based on machine learning
CN111078902A (en) * 2018-10-22 2020-04-28 三星电子株式会社 Display device and operation method thereof
CN111259152A (en) * 2020-01-20 2020-06-09 刘秀萍 Deep multilayer network driven feature aggregation category divider
CN111310453A (en) * 2019-11-05 2020-06-19 上海金融期货信息技术有限公司 User theme vectorization representation method and system based on deep learning
WO2020155418A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and device, and storage medium
CN112257445A (en) * 2020-10-19 2021-01-22 浙大城市学院 Multi-modal tweet named entity recognition method based on text-picture relation pre-training
CN112507064A (en) * 2020-11-09 2021-03-16 国网天津市电力公司 Cross-modal sequence-to-sequence generation method based on topic perception
CN112632969A (en) * 2020-12-13 2021-04-09 复旦大学 Incremental industry dictionary updating method and system
CN112836746A (en) * 2021-02-02 2021-05-25 中国科学技术大学 Semantic correspondence method based on consistency graph modeling
CN113051932A (en) * 2021-04-06 2021-06-29 合肥工业大学 Method for detecting category of network media event of semantic and knowledge extension topic model
US11068652B2 (en) * 2016-11-04 2021-07-20 Mitsubishi Electric Corporation Information processing device
CN113139468A (en) * 2021-04-24 2021-07-20 西安交通大学 Video abstract generation method fusing local target features and global features
CN113157959A (en) * 2020-12-17 2021-07-23 云知声智能科技股份有限公司 Cross-modal retrieval method, device and system based on multi-modal theme supplement
CN111464881B (en) * 2019-01-18 2021-08-13 复旦大学 Full-convolution video description generation method based on self-optimization mechanism
CN113297485A (en) * 2021-05-24 2021-08-24 中国科学院计算技术研究所 Method for generating cross-modal representation vector and cross-modal recommendation method
CN113298265A (en) * 2021-05-22 2021-08-24 西北工业大学 Heterogeneous sensor potential correlation learning method based on deep learning
CN113343679A (en) * 2021-07-06 2021-09-03 合肥工业大学 Multi-modal topic mining method based on label constraint
CN113392196A (en) * 2021-06-04 2021-09-14 北京师范大学 Topic retrieval method and system based on multi-mode cross comparison
CN113408282A (en) * 2021-08-06 2021-09-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for topic model training and topic prediction
CN113516118A (en) * 2021-07-29 2021-10-19 西北大学 Image and text combined embedded multi-mode culture resource processing method
CN114880527A (en) * 2022-06-09 2022-08-09 哈尔滨工业大学(威海) Multi-modal knowledge graph representation method based on multi-prediction task
US11621075B2 (en) 2016-09-07 2023-04-04 Koninklijke Philips N.V. Systems, methods, and apparatus for diagnostic inferencing with a multimodal deep memory network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559192A (en) * 2013-09-10 2014-02-05 浙江大学 Media-crossed retrieval method based on modal-crossed sparse topic modeling
CN103559193A (en) * 2013-09-10 2014-02-05 浙江大学 Topic modeling method based on selected cell
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104899253A (en) * 2015-05-13 2015-09-09 复旦大学 Cross-modality image-label relevance learning method facing social image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559192A (en) * 2013-09-10 2014-02-05 浙江大学 Media-crossed retrieval method based on modal-crossed sparse topic modeling
CN103559193A (en) * 2013-09-10 2014-02-05 浙江大学 Topic modeling method based on selected cell
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104899253A (en) * 2015-05-13 2015-09-09 复旦大学 Cross-modality image-label relevance learning method facing social image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴飞等: ""跨媒体组合语义深度学习"", 《浙江省信号处理学会2015年年会——信号处理在大数据》 *

Cited By (93)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11621075B2 (en) 2016-09-07 2023-04-04 Koninklijke Philips N.V. Systems, methods, and apparatus for diagnostic inferencing with a multimodal deep memory network
CN106156374A (en) * 2016-09-13 2016-11-23 华侨大学 A kind of view-based access control model dictionary optimizes and the image search method of query expansion
US11068652B2 (en) * 2016-11-04 2021-07-20 Mitsubishi Electric Corporation Information processing device
CN108073576A (en) * 2016-11-09 2018-05-25 上海诺悦智能科技有限公司 Intelligent search method, searcher and search engine system
WO2018103538A1 (en) * 2016-12-08 2018-06-14 北京推想科技有限公司 Deep learning method and device for analysis of high-dimensional medical data
CN106777050A (en) * 2016-12-09 2017-05-31 大连海事大学 It is a kind of based on bag of words and to take into account the footwear stamp line expression and system of semantic dependency
CN106777050B (en) * 2016-12-09 2019-09-06 大连海事大学 It is a kind of based on bag of words and to take into account the shoes stamp line expression and system of semantic dependency
CN106778880B (en) * 2016-12-23 2020-04-07 南开大学 Microblog topic representation and topic discovery method based on multi-mode deep Boltzmann machine
CN106778880A (en) * 2016-12-23 2017-05-31 南开大学 Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method
CN106650756A (en) * 2016-12-28 2017-05-10 广东顺德中山大学卡内基梅隆大学国际联合研究院 Image text description method based on knowledge transfer multi-modal recurrent neural network
CN106650756B (en) * 2016-12-28 2019-12-10 广东顺德中山大学卡内基梅隆大学国际联合研究院 knowledge migration-based image text description method of multi-mode recurrent neural network
CN106886783A (en) * 2017-01-20 2017-06-23 清华大学 A kind of image search method and system based on provincial characteristics
US11024066B2 (en) 2017-05-08 2021-06-01 Boe Technology Group Co., Ltd. Presentation generating system for medical images, training method thereof and presentation generating method
WO2018205715A1 (en) * 2017-05-08 2018-11-15 京东方科技集团股份有限公司 Medical image representation-generating system, training method therefor and representation generation method
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN107273517B (en) * 2017-06-21 2021-07-23 复旦大学 Graph-text cross-modal retrieval method based on graph embedding learning
CN109213988B (en) * 2017-06-29 2022-06-21 武汉斗鱼网络科技有限公司 Barrage theme extraction method, medium, equipment and system based on N-gram model
CN109213988A (en) * 2017-06-29 2019-01-15 武汉斗鱼网络科技有限公司 Barrage subject distillation method, medium, equipment and system based on N-gram model
CN109325583B (en) * 2017-07-31 2022-03-08 财团法人工业技术研究院 Deep neural network structure, method using deep neural network, and readable medium
CN109325583A (en) * 2017-07-31 2019-02-12 财团法人工业技术研究院 Deep neural network, method and readable media using deep neural network
CN107480289B (en) * 2017-08-24 2020-06-30 成都澳海川科技有限公司 User attribute acquisition method and device
CN107480289A (en) * 2017-08-24 2017-12-15 成都澳海川科技有限公司 User property acquisition methods and device
US11907851B2 (en) * 2017-08-30 2024-02-20 Tencent Technology (Shenzhen) Company Limited Image description generation method, model training method, device and storage medium
CN108305296A (en) * 2017-08-30 2018-07-20 深圳市腾讯计算机系统有限公司 Iamge description generation method, model training method, equipment and storage medium
US11270160B2 (en) * 2017-08-30 2022-03-08 Tencent Technology (Shenzhen) Company Limited Image description generation method, model training method, device and storage medium
WO2019042244A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Image description generation method, model training method and device, and storage medium
US20220156518A1 (en) * 2017-08-30 2022-05-19 Tencent Technology (Shenzhen) Company Limited. Image description generation method, model training method, device and storage medium
TWI803514B (en) * 2017-08-30 2023-06-01 大陸商騰訊科技(深圳)有限公司 Image description generation method, model training method, devices and storage medium
CN107870992A (en) * 2017-10-27 2018-04-03 上海交通大学 Editable image of clothing searching method based on multichannel topic model
CN107798624A (en) * 2017-10-30 2018-03-13 北京航空航天大学 A kind of technical label in software Ask-Answer Community recommends method
CN107798624B (en) * 2017-10-30 2021-09-28 北京航空航天大学 Technical label recommendation method in software question-and-answer community
CN108256549B (en) * 2017-12-13 2019-03-15 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108256549A (en) * 2017-12-13 2018-07-06 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108399409A (en) * 2018-01-19 2018-08-14 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108399409B (en) * 2018-01-19 2019-06-18 北京达佳互联信息技术有限公司 Image classification method, device and terminal
US11048983B2 (en) 2018-01-19 2021-06-29 Beijing Dajia Internet Information Technology Co., Ltd. Method, terminal, and computer storage medium for image classification
WO2019141042A1 (en) * 2018-01-19 2019-07-25 北京达佳互联信息技术有限公司 Image classification method, device, and terminal
WO2019149135A1 (en) * 2018-02-05 2019-08-08 阿里巴巴集团控股有限公司 Word vector generation method, apparatus and device
US11030411B2 (en) 2018-02-05 2021-06-08 Alibaba Group Holding Limited Methods, apparatuses, and devices for generating word vectors
CN108595636A (en) * 2018-04-25 2018-09-28 复旦大学 The image search method of cartographical sketching based on depth cross-module state correlation study
CN108830903A (en) * 2018-04-28 2018-11-16 杨晓春 A kind of steel billet method for detecting position based on CNN
CN109145936A (en) * 2018-06-20 2019-01-04 北京达佳互联信息技术有限公司 A kind of model optimization method and device
CN109145936B (en) * 2018-06-20 2019-07-09 北京达佳互联信息技术有限公司 A kind of model optimization method and device
CN110110122A (en) * 2018-06-22 2019-08-09 北京交通大学 Image based on multilayer semanteme depth hash algorithm-text cross-module state retrieval
CN109214412A (en) * 2018-07-12 2019-01-15 北京达佳互联信息技术有限公司 A kind of training method and device of disaggregated model
CN109213853A (en) * 2018-08-16 2019-01-15 昆明理工大学 A kind of Chinese community's question and answer cross-module state search method based on CCA algorithm
CN109213853B (en) * 2018-08-16 2022-04-12 昆明理工大学 CCA algorithm-based Chinese community question-answer cross-modal retrieval method
CN111078902A (en) * 2018-10-22 2020-04-28 三星电子株式会社 Display device and operation method thereof
CN109472232A (en) * 2018-10-31 2019-03-15 山东师范大学 Video semanteme characterizing method, system and medium based on multi-modal fusion mechanism
CN110442721A (en) * 2018-11-28 2019-11-12 腾讯科技(深圳)有限公司 Neural network language model, training method, device and storage medium
CN110442721B (en) * 2018-11-28 2023-01-06 腾讯科技(深圳)有限公司 Neural network language model, training method, device and storage medium
CN111464881B (en) * 2019-01-18 2021-08-13 复旦大学 Full-convolution video description generation method based on self-optimization mechanism
JP2022509327A (en) * 2019-01-31 2022-01-20 シェンチェン センスタイム テクノロジー カンパニー リミテッド Cross-modal information retrieval method, its device, and storage medium
CN109886326B (en) * 2019-01-31 2022-01-04 深圳市商汤科技有限公司 Cross-modal information retrieval method and device and storage medium
TWI785301B (en) * 2019-01-31 2022-12-01 大陸商深圳市商湯科技有限公司 A cross-modal information retrieval method, device and storage medium
JP7164729B2 (en) 2019-01-31 2022-11-01 シェンチェン センスタイム テクノロジー カンパニー リミテッド CROSS-MODAL INFORMATION SEARCH METHOD AND DEVICE THEREOF, AND STORAGE MEDIUM
WO2020155423A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and apparatus, and storage medium
WO2020155418A1 (en) * 2019-01-31 2020-08-06 深圳市商汤科技有限公司 Cross-modal information retrieval method and device, and storage medium
JP2022510704A (en) * 2019-01-31 2022-01-27 シェンチェン センスタイム テクノロジー カンパニー リミテッド Cross-modal information retrieval methods, devices and storage media
TWI737006B (en) * 2019-01-31 2021-08-21 大陸商深圳市商湯科技有限公司 Cross-modal information retrieval method, device and storage medium
CN109886326A (en) * 2019-01-31 2019-06-14 深圳市商汤科技有限公司 A kind of cross-module state information retrieval method, device and storage medium
CN110209822A (en) * 2019-06-11 2019-09-06 中译语通科技股份有限公司 Sphere of learning data dependence prediction technique based on deep learning, computer
CN110337016A (en) * 2019-06-13 2019-10-15 山东大学 Short-sighted frequency personalized recommendation method and system based on multi-modal figure convolutional network
CN110647632A (en) * 2019-08-06 2020-01-03 上海孚典智能科技有限公司 Image and text mapping technology based on machine learning
CN110647632B (en) * 2019-08-06 2020-09-04 上海孚典智能科技有限公司 Image and text mapping technology based on machine learning
CN110503147A (en) * 2019-08-22 2019-11-26 山东大学 Multi-mode image categorizing system based on correlation study
CN110503147B (en) * 2019-08-22 2022-04-08 山东大学 Multi-mode image classification system based on correlation learning
CN111310453B (en) * 2019-11-05 2023-04-25 上海金融期货信息技术有限公司 User theme vectorization representation method and system based on deep learning
CN111310453A (en) * 2019-11-05 2020-06-19 上海金融期货信息技术有限公司 User theme vectorization representation method and system based on deep learning
CN111259152A (en) * 2020-01-20 2020-06-09 刘秀萍 Deep multilayer network driven feature aggregation category divider
CN112257445A (en) * 2020-10-19 2021-01-22 浙大城市学院 Multi-modal tweet named entity recognition method based on text-picture relation pre-training
CN112257445B (en) * 2020-10-19 2024-01-26 浙大城市学院 Multi-mode push text named entity recognition method based on text-picture relation pre-training
CN112507064A (en) * 2020-11-09 2021-03-16 国网天津市电力公司 Cross-modal sequence-to-sequence generation method based on topic perception
CN112507064B (en) * 2020-11-09 2022-05-24 国网天津市电力公司 Cross-modal sequence-to-sequence generation method based on topic perception
CN112632969A (en) * 2020-12-13 2021-04-09 复旦大学 Incremental industry dictionary updating method and system
CN112632969B (en) * 2020-12-13 2022-06-21 复旦大学 Incremental industry dictionary updating method and system
CN113157959A (en) * 2020-12-17 2021-07-23 云知声智能科技股份有限公司 Cross-modal retrieval method, device and system based on multi-modal theme supplement
CN112836746B (en) * 2021-02-02 2022-09-09 中国科学技术大学 Semantic correspondence method based on consistency graph modeling
CN112836746A (en) * 2021-02-02 2021-05-25 中国科学技术大学 Semantic correspondence method based on consistency graph modeling
CN113051932B (en) * 2021-04-06 2023-11-03 合肥工业大学 Category detection method for network media event of semantic and knowledge expansion theme model
CN113051932A (en) * 2021-04-06 2021-06-29 合肥工业大学 Method for detecting category of network media event of semantic and knowledge extension topic model
CN113139468A (en) * 2021-04-24 2021-07-20 西安交通大学 Video abstract generation method fusing local target features and global features
CN113298265A (en) * 2021-05-22 2021-08-24 西北工业大学 Heterogeneous sensor potential correlation learning method based on deep learning
CN113298265B (en) * 2021-05-22 2024-01-09 西北工业大学 Heterogeneous sensor potential correlation learning method based on deep learning
CN113297485B (en) * 2021-05-24 2023-01-24 中国科学院计算技术研究所 Method for generating cross-modal representation vector and cross-modal recommendation method
CN113297485A (en) * 2021-05-24 2021-08-24 中国科学院计算技术研究所 Method for generating cross-modal representation vector and cross-modal recommendation method
CN113392196A (en) * 2021-06-04 2021-09-14 北京师范大学 Topic retrieval method and system based on multi-mode cross comparison
CN113343679A (en) * 2021-07-06 2021-09-03 合肥工业大学 Multi-modal topic mining method based on label constraint
CN113343679B (en) * 2021-07-06 2024-02-13 合肥工业大学 Multi-mode subject mining method based on label constraint
CN113516118A (en) * 2021-07-29 2021-10-19 西北大学 Image and text combined embedded multi-mode culture resource processing method
CN113408282A (en) * 2021-08-06 2021-09-17 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for topic model training and topic prediction
CN114880527B (en) * 2022-06-09 2023-03-24 哈尔滨工业大学(威海) Multi-modal knowledge graph representation method based on multi-prediction task
CN114880527A (en) * 2022-06-09 2022-08-09 哈尔滨工业大学(威海) Multi-modal knowledge graph representation method based on multi-prediction task

Also Published As

Publication number Publication date
CN105760507B (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN105760507A (en) Cross-modal subject correlation modeling method based on deep learning
Liu et al. A survey of sentiment analysis based on transfer learning
CN110489395B (en) Method for automatically acquiring knowledge of multi-source heterogeneous data
Vadicamo et al. Cross-media learning for image sentiment analysis in the wild
CN108573411B (en) Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments
Zhang et al. A quantum-inspired multimodal sentiment analysis framework
Zhu et al. Unsupervised visual hashing with semantic assistant for content-based image retrieval
Gan et al. Recognizing an action using its name: A knowledge-based approach
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN104899253A (en) Cross-modality image-label relevance learning method facing social image
CN104933029A (en) Text image joint semantics analysis method based on probability theme model
CN111324765A (en) Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation
Mallik et al. Acquisition of multimedia ontology: an application in preservation of cultural heritage
Zuo et al. Representation learning of knowledge graphs with entity attributes and multimedia descriptions
CN114997288A (en) Design resource association method
Wang et al. Rare-aware attention network for image–text matching
CN113128237B (en) Semantic representation model construction method for service resources
CN112632223B (en) Case and event knowledge graph construction method and related equipment
Lang et al. A Survey on Out-of-Distribution Detection in NLP
Long et al. Bi-calibration networks for weakly-supervised video representation learning
Qian et al. Boosted multi-modal supervised latent Dirichlet allocation for social event classification
Zhang et al. An al-based spatial knowledge graph for enhancing spatial data and knowledge search and discovery
Xiao et al. Research on multimodal emotion analysis algorithm based on deep learning
CN114595370A (en) Model training and sorting method and device, electronic equipment and storage medium
Yang et al. Graph convolutional networks with dependency parser towards multiview representation learning for sentiment analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190503