CN104462489B - A kind of cross-module state search method based on Deep model - Google Patents

A kind of cross-module state search method based on Deep model Download PDF

Info

Publication number
CN104462489B
CN104462489B CN201410800393.4A CN201410800393A CN104462489B CN 104462489 B CN104462489 B CN 104462489B CN 201410800393 A CN201410800393 A CN 201410800393A CN 104462489 B CN104462489 B CN 104462489B
Authority
CN
China
Prior art keywords
mode
modality
rbm
corr
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410800393.4A
Other languages
Chinese (zh)
Other versions
CN104462489A (en
Inventor
李睿凡
张光卫
鲁鹏
芦效峰
冯方向
李蕾
刘咏彬
王小捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201410800393.4A priority Critical patent/CN104462489B/en
Publication of CN104462489A publication Critical patent/CN104462489A/en
Application granted granted Critical
Publication of CN104462489B publication Critical patent/CN104462489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of cross-module state search method based on Deep model, and this method includes:Obtain the rudimentary expression vector of target retrieval mode and each mode that is retrieved in search library respectively using feature extracting method;The rudimentary expression vector of target retrieval mode is retrieved the rudimentary expression vector of mode with each in search library respectively, passes through and the advanced expression vector that Boltzmann machine Corr RBMs Deep models obtain each mode that is retrieved in the advanced expression vector sum search library of target retrieval mode is limited corresponding to stacking;The distance of target retrieval mode and each mode that is retrieved in search library is calculated using the advanced expression vector of each mode that is retrieved in the advanced expression vector sum search library of target retrieval mode;At least one mode that is retrieved closest with target retrieval mode in search library is defined as to the object with target retrieval mode vectors correlation.

Description

Cross-modal retrieval method based on deep model
Technical Field
The invention relates to a multimedia retrieval technology, in particular to a cross-modal retrieval method based on a deep model.
Background
The development of the internet in recent years has led to an explosive growth in multimodal data. For example, products on an e-commerce website typically contain backbone text, a short textual description, and related pictures; pictures shared on social networking sites are often accompanied by tagged descriptors; some online news contains picture and video information that is more attractive than simple text reports, and the rapid growth of multi-modal data has brought about a huge demand for cross-modal retrieval.
Unlike traditional single-modality retrieval, cross-modality retrieval focuses more on relationships between different modalities. Thus, the cross-modal search problem includes two challenge problems: firstly, data from different modalities have completely different statistical characteristics, so that the incidence relation of the data of different modalities is difficult to obtain directly; secondly, the features extracted from different modal data usually have high dimensional characteristics and the size of the data set is very large, which makes efficient retrieval difficult to achieve.
Disclosure of Invention
In view of this, the present invention provides a deep model-based cross-modal retrieval method, which applies a deep model to solve the problem of processing cross-modal data, so that the cross-modal data processed by the deep model can be subjected to distance calculation efficiently, thereby obtaining a better retrieval result. The technical scheme provided by the invention is as follows:
a cross-modal retrieval method based on a deep model comprises the following steps:
respectively obtaining a low-level expression vector of each searched modal in a target searching modal and a searching library by using a characteristic extraction method;
the low-level expression vector of the target retrieval mode is respectively matched with the low-level expression vector of each retrieved mode in the retrieval library, and the high-level expression vector of the target retrieval mode and the high-level expression vector of each retrieved mode in the retrieval library are obtained by stacking corresponding limited Boltzmann machine Corr-RBMs deep models;
calculating the distance between the target retrieval modality and each retrieved modality in the retrieval library by using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval library;
determining at least one retrieved modality in the retrieval library which is closest to the target retrieval modality as an object matching the target retrieval modality.
In summary, the technical solution of the present invention provides a deep model-based cross-modal retrieval method, in which low-level expressions obtained by extracting features from cross-modal raw data are processed by stacking corresponding deep models of Corr-RBMs of a Restricted Boltzmann Machine (Corr-RBM) to obtain low-dimensional high-level expressions of the cross-modal data in the same representation space, and then distance calculation is performed on the low-dimensional high-level expressions of the cross-modal data, and a retrieval result is determined according to the distance.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a deep neural network model of Corr-RBMs according to the present invention;
FIG. 3 is a diagram of a neural network structure of a Corr-RBM model according to the present invention;
FIG. 4 is a block diagram of a restricted Boltzmann machine RBM model;
FIG. 5 is a flowchart of a method for determining Θ based on an objective function F;
FIG. 6 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to solve the problem of cross-modal retrieval, the invention provides a cross-modal retrieval method based on a deep layer model of Corr-RBMs, and a flow chart of the technical scheme of the invention is shown in figure 1 and comprises the following steps:
step 101: and respectively obtaining a low-level expression vector of any one of the target retrieval mode and the retrieval library by using a feature extraction method.
In this step, in order to search an object matched with a target search modality in a search library, a low-level expression vector of any one of the target search modality and the search library to be searched needs to be searched, the low-level expression vector obtained by the feature extraction method has a higher general dimension, and the low-level expression vector elements of different modalities are different and generally cannot be directly used for search operation.
Step 102: and respectively obtaining the low-level expression vector of the target retrieval mode and the low-level expression vector of each retrieved mode in the retrieval library by stacking the corresponding limited Boltzmann machine Corr-RBMs deep model.
In the step, the low-level expression vector of the target retrieval mode is combined with the low-level expression vector of each retrieved mode in the retrieval library respectively, and the high-level expression vector of the target retrieval mode and the high-level expression vector of each retrieved mode in the retrieval library are obtained by stacking corresponding limited Boltzmann machine Corr-RBMs deep models. The high-level expression vector of the target retrieval mode obtained through the Corr-RBMs deep model and the high-level expression vector of each retrieved mode in the retrieval library have the characteristics of low dimension, consistent spatial elements and the like, and can be used for efficiently performing retrieval operation.
Step 103: and calculating the distance between the target retrieval mode and any retrieved mode in the retrieval library by using the high-level expression vector of the target retrieval mode and the high-level expression vector of each retrieved mode in the retrieval library.
In particular, the distance of the target retrieval modality from each retrieved modality in the search library may be represented by a euclidean distance.
Step 104: and determining at least one retrieved modality which is closest to the target retrieval modality in the retrieval library as an object which is matched with the target retrieval modality.
In this step, the distances between each retrieved modality and the target retrieval modality in the retrieval library are sorted, and at least one retrieved modality closest to the target retrieval modality is selected and determined as an object matched with the target retrieval modality.
The invention has proposed a method for using and stacking the deep model of Corr-RBMs to carry on the retrieval of the cross modality, fig. 2 is the deep model neural network structure diagram of Corr-RBMs of the invention and stacking the Corr-RBMs, as shown in fig. 2, the deep model of Corr-RBMs is stacked by at least two layers of Corr-RBMs, the deep model of Corr-RBMs can obtain the high-level expression of the primitive data of the two different modalities from the low-level expression of the primitive data of the two different modalities; the structure diagram of the neural network of each layer of the Corr-RBM model is shown in fig. 3, the Corr-RBM model is built on the basis of a limited Boltzmann machine RBM, fig. 4 is the structure diagram of the neural network of the limited Boltzmann machine, and the details of the RBM model, the Corr-RBM model and the deep layer Corr-RBMs are respectively described below.
First, RBM model:
FIG. 4 is a diagram of a neural network structure of an RBM, and as shown in FIG. 4, a visible layer V of the RBM includes m neural units V 1 ~v m Each of the nerve units v i Is biased by b i No connection between the neural units of the visible layer; the hidden layer H comprises s nerve units H 1 ~h s Each of the nerve units h j Is c j There is no connection between the lamina neural units; visible layer nerve unit v i And hidden layer nerve cell h j The connection weight value of is w ij . For the convenience of understanding, only the connection weights of the partial visible layer neural unit and the hidden layer neural unit are drawn in fig. 4.
RBM has an undirected graph structure with Logistic activation function δ (x) = 1/(1 + exp (-x)), then the joint probability distribution of the neural units of visible layer V and hidden layer H is:
wherein Z is a normalization constant, E (v, h) is an energy function defined by different configurations of the visible layer neural unit and the hidden layer neural unit of the RBM, and E (v, h) has different representations according to the different configurations of the visible layer neural unit and the hidden layer neural unit, that is, as long as the configuration of the visible layer neural unit and the configuration of the hidden layer neural unit of the RBM are determined, corresponding energy functions exist, which is not described in detail herein.
Visible layer nerve unit v of RBM i Offset b of i Is hiddenHidden layer nerve cell h j Bias c of j Visible layer nerve unit v i And hidden layer nerve unit h j Connection weight w ij The learning of (1) can be obtained by comparing the divergence estimation algorithm, which is a mature prior art and will not be described in detail herein.
(II) a corresponding restricted Boltzmann machine Corr-RBM model:
fig. 3 is a structural diagram of a Corr-RBM model according to the present invention, and as shown in fig. 3, the Corr-RBM model includes a first mode RBM and a second mode RBM, the first mode RBM and the second mode RBM include the same number m of neural units of a visible layer and the same number s of neural units of a hidden layer, and there is a dependency constraint between the hidden layers of the first mode RBM and the second mode RBM.
Let Θ denote the parameter set of the Corr-RBM model, i.e. Θ = { W = { I ,C I ,B I ,W T ,C T ,B T Wherein superscript I denotes a first modality, superscript T denotes a second modality, in particular W I A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the first modality, C I Set of visible layer neural unit bias parameters for RBM of first modality, B I Hidden layer neural cell bias parameter set, W, for RBM of first modality T A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the second modality, C T Set of visible layer neural unit bias parameters for RBM of second modality, B T And (4) biasing a parameter set for a hidden layer neural unit of the second mode RBM.
The parameter set Θ of the Corr-RBM model is determined by the following parameter learning algorithm:
the objective function F is defined according to the following principle: the set of parameters Θ of the Corr-RBM model can minimize the distance of the first modality from the second modality over the shared representation space and minimize the negative log-likelihood function of the first modality and the second modality. The objective function F is F = l D +αl I +βl T I.e. Θ is the set of parameters that minimizes F.
Wherein the content of the first and second substances,
wherein l D Is the distance between the first mode and the second mode in the nesting space, l I Negative log-likelihood function of the first modality, l T A negative log likelihood function for the second modality, α and β being constants, α ∈ (0,1), β ∈ (0,1); f. of I (. Is a first modality RBM visible layer to hidden layer mapping function, f T () is a second modality RBM visible layer to hidden layer mapping function; p is a radical of I (.) a joint probability distribution of the RBM visible layer and hidden layer neural units in the first modality, p T (.) is the joint probability distribution of the second modality RBM visible layer and hidden layer neural units, | | | | is a two-norm map.
To determine Θ from the objective function F, an alternating iterative optimization procedure can be used, first of all for two likelihood functions l I And l T Updating by using a comparative divergence estimation algorithm, and then updating l by using a gradient descent method D Convergence can be detected on the validation set using cross-modality retrieval, and in particular, fig. 5 is a flowchart for determining Θ according to the objective function F, including the following steps:
step 501: and updating the parameters of the RBM of the first modality by using a comparative divergence estimation algorithm.
Set of connection weight parameters between visible layer neural units and hidden layer neural units of first modality RBMVisible layer nerve unitIs offset fromAnd hidden layer nerve cellIs offset fromBy theta I Expressed uniformly according to the formula theta I ←θ I +τ·α·△θ I Updating, wherein tau is a learning rate, and tau is an element of 0,1; α ∈ (0,1);and the number of the first and second electrodes,
wherein the content of the first and second substances,<·> data for the mathematical expectations under the empirical distribution,<·> model is a mathematical expectation under the model distribution;
step 502: and updating the parameters of the RBM of the second modality by using a comparative divergence estimation algorithm.
Connection between visible layer neural unit and hidden layer neural unit of second modality RBMSet of weight parametersVisible layer nerve cellIs offset fromAnd hidden layer nerve cellIs offset fromBy theta T Expressed uniformly according to the formula theta T ←θ T +τ·β·△θ T Updating, wherein the beta epsilon (0,1);and the number of the first and second electrodes,
step 503: and updating the distance of the first modality from the second modality on the nesting space by using a gradient descent method.
Specifically, the distance l of the first modality from the second modality on the nesting space is updated by using a gradient descent method according to the following formula D
Where δ' (. Cndot.) = δ (. Cndot.) (1- δ (. Cndot.)), and δ (. Cndot.)) is the Logistic activation function δ (x) = 1/(1 + exp (-x)).
Step 504: and repeating the steps 501-503 until the algorithm converges.
The parameter set theta of the Corr-RBM model can be obtained by the method.
(III) deep layer model of Corr-RBMs
Fig. 2 is a diagram of a neural network structure of deep Corr-RBMs models, as shown in fig. 2, the deep Corr-RBMs models are formed by stacking at least two corresponding restricted boltzmann machine Corr-RBM models, each deep Corr-RBMs model includes first-mode Corr-RBMs and second-mode Corr-RBMs, the first-mode Corr-RBMs processes low-level expression of a target retrieval mode, and the second-mode Corr-RBMs processes low-level expression of any retrieved mode in a retrieval library.
The input of the first mode RBM visible layer nerve unit of the bottom layer Corr-RBM is the low-level expression of the first mode obtained by feature extraction of the first mode original data, the input of the second mode RBM visible layer nerve unit of the bottom layer Corr-RBM is the low-level expression of the first mode obtained by feature extraction of the second mode original data, and the low-level expression obtained by specific extraction of the original data is the prior art and is not described in detail herein.
A first RBM hidden layer of the top layer Corr-RBM outputs a high level representation of a first modality and a second RBM hidden layer of the top layer Corr-RBM outputs a high level representation of a second modality.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
In this embodiment, it is assumed that the search library includes N searched modalities, and the technical solution of the present invention is described by taking an example of searching for an object related to a picture P in the search library, and fig. 6 is a flowchart of this embodiment, as shown in fig. 6, including the following steps:
step 601: and acquiring the low-level expression of each searched mode in the search library and the low-level expression of the picture P by adopting a feature extraction method.
In this step, the modality type of the retrieved modality in the search library is not limited, and may be an image modality, a text modality, or a voice modality, and the original data of different modalities all have a mature feature extraction method at present, for example, the image modality may apply MPEG-7 and Gist descriptors to perform feature extraction, the text modality may apply bag-of-words model to perform feature extraction, and the like, and the process of obtaining the picture P and the low-level expression of each retrieved modality in the search library is not described in detail here.
Step 602: and respectively processing the low-level expression of the picture P and the low-level expression of each searched mode in the search library through a Corr-RBMs deep layer model to obtain the high-level expression of the picture P and the high-level expression of each searched mode in the search library, and then performing Euclidean distance calculation by using the high-level expression of the picture P and the high-level expression of each searched mode in the search library to calculate the Euclidean distance between the picture P and each searched mode in the search library.
In the step, any retrieved mode in the retrieval library and the picture P are taken as a combination, the low-level expression of the retrieved mode and the low-level expression of the picture P in the combination are processed through a Corr-RBMs deep model to obtain the high-level expression of the retrieved mode and the high-level expression of the picture P in the combination, and then the Euclidean distance between the picture P and the retrieved mode is calculated according to an Euclidean distance calculation formula.
In general, for two points t and t in n-dimensional Euclidean spacey, their distance d is calculated asThe Euclidean distance between the picture P and any searched modality is calculated.
Step 603: and sorting the images P from low to high according to the Euclidean distance between the images P and each searched mode in the search library, and selecting K searched modes in the front row as search results to be output.
In the embodiment, the low-level expression of the picture modality and the low-level expression of each searched modality in the search library are processed through the Corr-RBMs deep model to obtain respective high-level expressions, and then the high-level expressions are used for carrying out Euclidean distance calculation to obtain the search result efficiently.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, in which any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (3)

1. A cross-modal retrieval method based on a deep model, wherein the deep model is a stack-up corresponding restricted Boltzmann machine (Corr-RBMs) deep model, and the method comprises the following steps:
respectively obtaining a target retrieval mode and a low-level expression vector of each retrieved mode in a retrieval library by using a feature extraction method;
the low-level expression vector of the target retrieval mode is respectively matched with the low-level expression vector of each retrieved mode in the retrieval library, and the high-level expression vector of the target retrieval mode and the high-level expression vector of each retrieved mode in the retrieval library are obtained by stacking corresponding limited Boltzmann machine Corr-RBMs deep models;
calculating the distance between the target retrieval modality and each retrieved modality in the retrieval library by using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval library;
determining at least one retrieved modality in the retrieval library which is closest to the target retrieval modality as an object matched with the target retrieval modality;
wherein, the first and the second end of the pipe are connected with each other,
the deep layer models of the Corr-RBMs are formed by stacking at least two layers of corresponding limited Boltzmann machine Corr-RBMs, each deep layer model of the Corr-RBMs comprises first-mode Corr-RBMs and second-mode Corr-RBMs, the first-mode Corr-RBMs process the low-level expression vectors of the target retrieval modes, and the second-mode Corr-RBMs process the low-level expression vectors of any retrieved mode in the retrieval library;
the Corr-RBM includes a first-mode restricted boltzmann machine RBM and a second-mode restricted boltzmann machine RBM, the first-mode RBM and the second-mode RBM include the same number of visible layer neural units and the same number of hidden layer neural units, and a hidden layer of the first-mode RBM and the second-mode RBM has a dependency constraint therebetween.
2. The method of claim 1, further comprising:
configuration parameters theta = { W of Corr-RBM I ,C I ,B I ,W T ,C T ,B T Wherein superscript I denotes a first modality, superscript T denotes a second modality, in particular W I A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the first modality, C I Set of visible layer neural unit bias parameters for RBM of first modality, B I Hidden layer neural cell bias parameter set, W, for first modality RBM T A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the second modality, C T Set of visible layer neural unit bias parameters for RBM of second modality, B T A hidden layer neural unit bias parameter set of a second mode RBM;
the configuration parameters theta of the corresponding restricted Boltzmann machine Corr-RBM are order objective functionsMinimum configuration parameters, and
wherein the content of the first and second substances,the distance between the first mode and the second mode on the nesting space,is a negative log-likelihood function of the first modality,a negative log-likelihood function for the second modality; α and β are constants, and α ∈ (0,1), β ∈ (0,1); f. of I (. Is a first modality RBM visible layer to hidden layer mapping function, f T () and a second modality RBM visible layer to hidden layer mapping function; p is a radical of I (.) a joint probability distribution of the RBM visible layer and hidden layer neural units in the first modality, p T () a joint probability distribution of the visible layer and hidden layer neural units for the second modality RBM; | | · | | is a two-norm mapping; v refers to a visible unit in the RBM, corresponding to a visible variable; m is the number of modal samples.
3. The method according to claim 2, wherein the algorithm for determining Θ from the objective function F is:
A. set of connection weight parameters between visible layer neural units and hidden layer neural units of first modality RBMVisible layer nerve cellIs offset fromAnd hidden layer nerve cellIs offset fromBy theta I Expressed uniformly according to the formula theta I ←θ I +τ·α·△θ I Updating, wherein tau is a learning rate and tau e (0,1); α ∈ (0,1);and the number of the first and second electrodes,
wherein the content of the first and second substances,<·> data for the mathematical expectations under the empirical distribution,<·> model is a mathematical expectation under a model distribution;
B. set of connection weight parameters between visible layer neural units and hidden layer neural units of second modality RBMVisible layer nerve cellIs offset fromAnd hidden layer nerve unitIs offset fromBy theta T Expressed uniformly according to the formula theta T ←θ T +τ·β·△θ T Updating, wherein the beta epsilon (0,1);and the number of the first and second electrodes,
C. updating using a gradient descent method according to the following formula
Wherein δ' (·) = δ (·) (1- δ (·)), and δ (·) is a Logistic activation function δ (x) = 1/(1 + exp (-x));
and repeating the steps A to C until the algorithm converges.
CN201410800393.4A 2014-12-18 2014-12-18 A kind of cross-module state search method based on Deep model Active CN104462489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410800393.4A CN104462489B (en) 2014-12-18 2014-12-18 A kind of cross-module state search method based on Deep model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410800393.4A CN104462489B (en) 2014-12-18 2014-12-18 A kind of cross-module state search method based on Deep model

Publications (2)

Publication Number Publication Date
CN104462489A CN104462489A (en) 2015-03-25
CN104462489B true CN104462489B (en) 2018-02-23

Family

ID=52908524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410800393.4A Active CN104462489B (en) 2014-12-18 2014-12-18 A kind of cross-module state search method based on Deep model

Country Status (1)

Country Link
CN (1) CN104462489B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866596B (en) * 2015-05-29 2018-09-14 北京邮电大学 A kind of video classification methods and device based on autocoder
US9984772B2 (en) * 2016-04-07 2018-05-29 Siemens Healthcare Gmbh Image analytics question answering
CN106250878B (en) * 2016-08-19 2019-12-31 中山大学 Multi-modal target tracking method combining visible light and infrared images
CN107832351A (en) * 2017-10-21 2018-03-23 桂林电子科技大学 Cross-module state search method based on depth related network
CN109189968B (en) * 2018-08-31 2020-07-03 深圳大学 Cross-modal retrieval method and system
CN109783657B (en) * 2019-01-07 2022-12-30 北京大学深圳研究生院 Multi-step self-attention cross-media retrieval method and system based on limited text space

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693316A (en) * 2012-05-29 2012-09-26 中国科学院自动化研究所 Linear generalization regression model based cross-media retrieval method
CN103488713A (en) * 2013-09-10 2014-01-01 浙江大学 Cross-modal search method capable of directly measuring similarity of different modal data
CN103793507A (en) * 2014-01-26 2014-05-14 北京邮电大学 Method for obtaining bimodal similarity measure with deep structure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566260B2 (en) * 2010-09-30 2013-10-22 Nippon Telegraph And Telephone Corporation Structured prediction model learning apparatus, method, program, and recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693316A (en) * 2012-05-29 2012-09-26 中国科学院自动化研究所 Linear generalization regression model based cross-media retrieval method
CN103488713A (en) * 2013-09-10 2014-01-01 浙江大学 Cross-modal search method capable of directly measuring similarity of different modal data
CN103793507A (en) * 2014-01-26 2014-05-14 北京邮电大学 Method for obtaining bimodal similarity measure with deep structure

Also Published As

Publication number Publication date
CN104462489A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104462489B (en) A kind of cross-module state search method based on Deep model
CN108132968B (en) Weak supervision learning method for associated semantic elements in web texts and images
CN110046656B (en) Multi-mode scene recognition method based on deep learning
Caicedo et al. Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization
CN105022754B (en) Object classification method and device based on social network
CN107683469A (en) A kind of product classification method and device based on deep learning
CN107832663A (en) A kind of multi-modal sentiment analysis method based on quantum theory
TW201324378A (en) Image Classification
Huang et al. Large-scale heterogeneous feature embedding
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
CN112256965A (en) Neural collaborative filtering model recommendation method based on lambdamat
CN111159473A (en) Deep learning and Markov chain based connection recommendation method
CN111026887B (en) Cross-media retrieval method and system
CN108805280B (en) Image retrieval method and device
CN112632984A (en) Graph model mobile application classification method based on description text word frequency
CN104462485B (en) A kind of cross-module state search method based on corresponding deep layer belief network
CN111523586A (en) Noise-aware-based full-network supervision target detection method
CN111079011A (en) Deep learning-based information recommendation method
US20230410465A1 (en) Real time salient object detection in images and videos
CN113553975A (en) Pedestrian re-identification method, system, equipment and medium based on sample pair relation distillation
CN107169830B (en) Personalized recommendation method based on clustering PU matrix decomposition
Wu et al. Content embedding regularized matrix factorization for recommender systems
CN111695570A (en) Variational prototype reasoning-based semantic segmentation method under small sample
CN110085292A (en) Drug recommended method, device and computer readable storage medium
CN109885758A (en) A kind of recommended method of the novel random walk based on bigraph (bipartite graph)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Li Ruifan

Inventor after: Zhang Guangwei

Inventor after: Lu Peng

Inventor after: Lu Xiaofeng

Inventor after: Feng Fangxiang

Inventor after: Li Lei

Inventor after: Liu Yongbin

Inventor after: Wang Xiaojie

Inventor before: Li Ruifan

Inventor before: Lu Peng

Inventor before: Lu Xiaofeng

Inventor before: Feng Fangxiang

Inventor before: Li Lei

Inventor before: Liu Yongbin

Inventor before: Wang Xiaojie

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant