CN109299216B - A kind of cross-module state Hash search method and system merging supervision message - Google Patents
A kind of cross-module state Hash search method and system merging supervision message Download PDFInfo
- Publication number
- CN109299216B CN109299216B CN201811269037.9A CN201811269037A CN109299216B CN 109299216 B CN109299216 B CN 109299216B CN 201811269037 A CN201811269037 A CN 201811269037A CN 109299216 B CN109299216 B CN 109299216B
- Authority
- CN
- China
- Prior art keywords
- network
- hash
- text
- image
- indicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a kind of cross-module state Hash search methods and system for merging supervision message, which comprises building image network, text network and converged network;Image and text feature training sample pair are obtained, respectively input picture network and text network;Using image network and the output feature of text network as the input of the converged network, and define the output of the converged network;According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;The objective function is solved, unified Hash codes are obtained;Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.The present invention is based on deep learning frame end to end, simultaneously learning characteristic is indicated and Hash encodes, and can more effectively be captured the correlation between different modalities data, be facilitated the raising of cross-module state retrieval precision.
Description
Technical field
This disclosure relates to cross-module state search method, more specifically to a kind of cross-module state Hash for merging supervision message
Search method and system.
Background technique
In recent years, with the sharp increase of different types of data on network, approximate KNN (ANN) is searched in related application
In play increasingly important role.For example, information retrieval, data mining, computer vision etc..Hash technology is due to its calculating
At low cost and storage efficiency is high, has become one of technology most popular in ANN search.The basic thought of Hash is to pass through
The Hamming space that the data of higher-dimension are mapped to compact binary encoded by hash function is practised, while retaining luv space as far as possible
Similarity Structure.Many hash methods applied in single mode scene have been suggested at present, however are had in real world
There are the data of identical semanteme often to there are multiple modalities, for example, image, text, video etc..In order to make full use of isomeric data it
Between relationship, ANN search in develop cross-module state Hash (CMH) method be necessary.Specifically, in cross-module state similitude
In search, the mode for inquiring data is different from the mode for the data that are retrieved.The disclosure is with image inspection text (I2T) and text inspection figure
As being analyzed and being tested for (T2I) task, while the method can extend to the retrieval between any other mode.
Most of existing cross-module state Hash (CMH) method is the feature based on manual processing, feature extraction and Hash
Code learning process independently carries out.This differentiation that may will limit sample indicates, and then damages the standard of the Hash codes of study
True property.Recently, the hash method based on deep learning propose a kind of learning framework end to end simultaneously learning characteristic indicate and
Hash coding, can more effectively capture the non-linear dependencies between different modalities than shallow-layer learning method.As classics
Method, depth cross-module state Hash (DCMH) expands to traditional depth model in the retrieval of cross-module state, and to each mode
Execute the learning framework end to end with deep neural network.The depth Hash (PRDH) of relation guiding is further integrated between pair
It is constrained between a variety of pairs, from the similitude for enhancing Hash codes between mode and in mode.
In the above-mentioned depth cross-module state Hash frame referred to, for the paired samples from two different modalities, they
Hash codes be usually forced to be arranged to it is the same.Also, these methods are learned respectively by the deep neural network of every kind of mode
The character representation of single sample is practised, minimizes the loss between different modalities feature then to establish the relationship of isomery.Thus
It suffers from the drawback that only by simply applying constraint to the last layer of the neural network of different modalities, can not sufficiently dig
Dig the complex relationship between multi-modal data.
Summary of the invention
To overcome above-mentioned the deficiencies in the prior art, present disclose provides a kind of cross-module state Hash retrievals for merging supervision message
Method and system, the method is based on deep learning frame end to end, and simultaneously learning characteristic indicates and Hash encodes, can
The correlation between different modalities data is more effectively captured than conventional learning algorithms, facilitates mentioning for cross-module state retrieval precision
It is high.
To achieve the above object, one or more other embodiments of the present disclosure provide following technical solution:
A kind of cross-module state Hash search method merging supervision message, comprising the following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net
The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Further, described image network includes 5 convolutional layers and 3 full articulamentums;Text network includes two and connects entirely
Connect layer;Converged network includes two full articulamentums;Wherein, described image network and the hidden unit of text network the last layer
Number is equal, and the second layer of converged network is Hash layer, and its activation primitive is discriminant function.
Further, the output feature of described image network and text network is obtained into institute by nonlinear activation function
State the input of converged network.
Further, the objective function for learning unified Hash codes are as follows:
Wherein, embedded constraint item between first item is pair, and Wherein H*i、H*jRespectively indicate different training
The converged network of sample pair exports, S={ sijIndicate pair between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes square
Battle array, p (sij| B) when indicating given Hash codes B, sijConditional probability distribution, λ indicates super ginseng;Section 2 minimizes converged network
Output and binary code between loss, H=h (Z;θz)∈Rk×nFor the output of converged network;Section 3 is Constraints of Equilibrium
, for maximizing the information of each Hash codes, η indicates super ginseng,Indicate F norm.
Further, solving the objective function includes:
Initialisation image, text and converged network parameter θ={ θv,θt,θzAnd batch size;
Fixed network parameter θ={ θv,θt,θz, update unified Hash codes B;
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedv,θt,θz};
It constantly alternately updates, until convergence.
Further, in the Hash network of the modality-specific, image network include 5 convolutional layers, 2 full articulamentums and
1 Hash layer, text network include 1 full articulamentum and 1 Hash layer;Wherein, in described image network and text network
The activation primitive of Hash layer is discriminant function.
Further, the Hash network of the trained modality-specific includes: to solve overall goal function, obtains image network
With the parameter of text network;The objective function are as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f
(vi;θv) indicate the character representation of i-th of sample exported from image network, G*j=g (tj;θj) indicate to export from text network
J-th of sample character representation;J2The unified Hash codes for using the first stage to obtain train modality-specific as supervision message
Hash network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates that text feature is defeated
Out;J3Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text
The mapping matrix of this mode, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
Further, solving the overall goal function includes:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and update W1And W2;
Then W is fixed1And W2, image parameter θ is updated respectively using small lot stochastic gradient descent methodvWith text parameter θt;
It constantly alternately updates, until convergence.
One or more embodiments provide a kind of computer system, including memory, processor and are stored in memory
Computer program that is upper and can running on a processor, the processor realize that letter is supervised in the fusion when executing described program
The cross-module state Hash search method of breath.
One or more embodiments provide a kind of computer readable storage medium, are stored thereon with computer program, should
The cross-module state Hash search method of the fusion supervision message is realized when program is executed by processor.
One or more of above-mentioned technical proposal has the advantages that
1, the learning process of traditional cross-module state hash method, feature extraction and Hash coding is independent from each other, this
It is open to be based on deep learning frame end to end, while learning characteristic indicates and Hash coding, can more effectively capture difference
Correlation between modal data.
2, the feature of different modalities is input to converged network by the disclosure in couples, more to explore by nonlinear conversion
Correlation between modal data, and the Hash codes of high quality are obtained to supervise the training of the Hash network of modality-specific;It utilizes
The tactful solving optimization problem that iteration updates, and keep the discrete feature of Hash codes without carrying out pine to it in optimization process
It relaxes, which reduces quantization errors;Affinity information and classification information are embedded in Hash under same manifold frame between pair
Network maintains similitude and semantic consistency between mode well.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
Fig. 1 is the flow diagram that the cross-module state Hash search method of supervision message is merged in embodiment one;
Fig. 2 is the flow diagram that the cross-module state Hash search method of supervision message is merged in embodiment one.
Specific embodiment
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.
Embodiment one
Present embodiment discloses a kind of cross-module state Hash search methods for merging supervision message, as shown in Figs. 1-2, including with
Lower step:
First stage: unified Hash codes study
Step 1: three networks of building: image network, text network and converged network.(1) CNN-F that image network uses
Network.Original CNN-F model shares 8 layers, including 5 convolutional layers and 3 full articulamentums.(2) for text modality, first will
Each samples of text is expressed as bag-of-word (BOW) vector, and BOW vector is then input to tool, and there are two full articulamentums
Text network.Particularly, the hidden unit number of image and text network the last layer is equal and long according to different codings
Different values is arranged in degree and data set.(3) converged network is made of two full articulamentums, combines image and text in couples
The output of network.In order to obtain unified Hash codes, the second layer for merging network is designed as the Hash with k hidden unit
Layer, and its activation primitive is discriminant function.
Step 2: data-oriented collectionN represents the sum of training sample pair, viIndicate picture feature, ti
Indicate text feature, yiIndicate semantic marker vector.In addition, S={ sijIndicate pair between similarity matrix.The target in this stage is
Learn compact binary code b for each samplei∈{-1,1}k, B ∈ { -1,1 }k×nIndicate unified Hash codes matrix.
Step 3:Indicate the character representation exported from image network,It indicates to export from text network
Character representation.Pass through nonlinear activation function (tanh function)In conjunction with both the above mould
The output of state obtains the input of converged network.Further, the output H=h (Z of converged network is defined;θz)∈Rk×n.For study system
One Hash codes construct objective function:
Wherein, between first item is pair embedded constraint item and Wherein H*i、H*jRespectively indicate different training
The converged network of sample pair exports.S={ sijIndicate pair between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes square
Battle array, p (sij| B) when indicating given Hash codes B, sijConditional probability distribution.By minimizing the negative log-likelihood in first item
Function carrys out the similitude in holding matrix S, that is, so that the similitude (inner product) between two similar samples is big as far as possible, and
Similitude (inner product) between dissimilar sample is small as far as possible.Section 2 minimize converged network output and binary code it
Between loss, so that the unified Hash codes learnt can keep the non-linear dependencies between training sample well.
Section 3 is Constraints of Equilibrium item, and for maximizing the information of each Hash codes, that is, requiring each to have equal opportunity is 1
Or -1., λ indicates super and joins (and λ > 0), and η indicates super ginseng (and η > 0),Indicate F norm.
Step 4: for the optimization problem of formula (1), being solved using iteration more new strategy.Pass through fixed network parameter
θ={ θv,θt,θzThe unified Hash codes B of study, B is then fixed, small lot stochastic gradient descent method (SGD) undated parameter is utilized
θ={ θv,θt,θz, by constantly alternately updating, until convergence, acquires optimal unified Hash codes B.Specifically, including it is following
Step:
Initialisation image, text and converged network parameter θ={ θv,θt,θzAnd batch size;
Fixed network parameter θ={ θv,θt,θz, unified Hash codes B is updated according to the following formula;
B=sign (λ H)
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedv,θt,θz, its ladder is calculated as follows
Degree;
It constantly alternately updates, until convergence.
Second stage: modality-specific Hash network training
Step 1: image network and text network are redesigned, for training the Hash network of modality-specific.In addition to that will scheme
The full articulamentum of the last one of picture and text network replace with Hash layer (have k hidden unit) and using discriminant function as
Its activation primitive, the setting of other layers with it is identical on last stage.
Step 2: in this stage, main training image network f (V;θv) and text network g (T;θt) corresponding to obtain
Hash function hv() and ht() encodes the sample outside training data.
Step 3: define overall goal function:
Wherein, J1It is pairs of embedded constraint between mode, for keeping the cross-module state between image and the output of text network
Similitude;J2The unified Hash codes for using the first stage to obtain train the Hash network of modality-specific as supervision message;J3Directly
The network that label information is linearly mapped to modality-specific is connect, sufficiently to excavate semantic information.J4It is Constraints of Equilibrium, is used to most
Change the information of each greatly.They are defined as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f
(vi;θv) indicate the character representation of i-th of sample exported from image network, G*j=g (tj;θj) indicate to export from text network
J-th of sample character representation;J2The unified Hash codes for using the first stage to obtain train modality-specific as supervision message
Hash network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates that text feature is defeated
Out;J3Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text
The mapping matrix of this mode, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
Step 4: for the optimization problem of formula (2), being used in the same manner iteration more new strategy and solved: by fixing it
His parameter updates some parameter therein.Particularly, using small lot stochastic gradient descent, and pass through backpropagation (BP)
Algorithm carrys out undated parameter θvAnd θt.Specifically, comprising the following steps:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and update W respectively according to the following formula1And W2;
Then W is fixed1And W2, image parameter θ is updated respectively using small lot stochastic gradient descent methodvWith text parameter θt,
Its gradient is calculated as follows;
It constantly alternately updates, until convergence.
We test in MIRFLICKR-25K and NUS-WIDE two datasets respectively.
MIRFLICKR-25K data set includes 25,000 sample collected from the website Flickr, each sample packet
Containing a picture and some text labels.And 24 labels are given in total, each sample is by one, at least label therein
Mark.We select the sample of at least 20 label for labelling for testing, and altogether include 20,015 image-text pair.Wherein,
Text modality is represented as the BoW vector of 1386 dimensions, and directly uses original pixels as input for image modalities.It is testing
In, we take 2,000 sample as inquiry at random, remaining is as the database being retrieved.In order to reduce calculating cost, Wo Mencong
Take 5,000 samples for training in database.
It includes 269,648 samples that NUS-WIDE, which is the picture database of a true webpage, they are by 81 theme marks
Label mark.Each each sample includes a picture and text label associated with it.In an experiment, we choose maximum 10
Class constitutes a subset, altogether includes 186,577 image-texts pair.For each sample, text modality is expressed as 1,
The BoW vector of 000 dimension, image modalities directly use original pixels as input.On this data set, our stochastical samplings 2,
000 each sample is as inquiry, remaining is as database.Similarly, take 5,000 data point for instructing from database at random
Practice.
The present embodiment is implemented under MatConvNet frame.For image network, our uses are in ImageNet number
It is initialized according to the CNN-F network for collecting upper pre-training.For the parameter of other deep neural networks, we are random to be carried out initially
Change.In addition, its dimension is arranged in we on MIRFLICKR-25K data set for having the text network there are two full articulamentum
It is [8192 → 2500];And on NUS-WIDE data set, when code length is 16 and 32, dimension is set as [8192
→ 1000], when code length is 64, it is set as [8192 → 600].It is defeated for combining image and text network in couples
Converged network out, the dimension that its full articulamentum is arranged in we on all data sets is all [4096 → k].In an experiment,
Empirically value is 1 to all parameters, and learning rate is from 10-1.5To 10-3Change, the outer loop the number of iterations in algorithm is set as
500 times.Algorithm realizes that process is as follows.
1st stage: unified Hash codes study
Input: pictures V and text set T;Similarity matrix S between couple;Parameter γ, β, α;Code length k
Output: unified Hash matrix B
Initialization: initialisation image, text and converged network parameter θ={ θv,θt,θz, batch size Nv=Nt=128,
Cycle-index
Circulation executes following sentence
1. preset parameter θ={ θv,θt,θz, B is updated according to formula B=sign (λ H)
2.for iter=1,2 ... tz{
1. sampling N respectively from V and T at randomvAnd NtA data point constructs small lot
2. for sample v pairs of in small lotiAnd ti, calculated separately by propagated forward And h
(zi;θz)
3. the gradient of top layer is calculated, according to following formula:
4. carrying out backpropagation, undated parameter θ={ θ to image, text and converged networkv,θt,θz}}
Until convergence
2nd stage: modality-specific Hash network training
Input: pictures V and text set T;Similarity matrix S between couple;Mark matrix Y;The Hash matrix B of study;
Parameter γ, β, α;Code length k
Output: the Hash network parameter θ of modality-specificvAnd θt
Initialization: initialisation image, text network parameter θvAnd θt, batch size Nv=Nt=128, cycle-index
Circulation executes following sentence
1. preset parameter θvAnd θt, according to formulaUpdate W1, according to formulaUpdate W2
2.for iter=1,2 ... tv{
1. N is sampled from V at randomvA data point constructs small lot
2. for each sample vi, f (v is calculated by propagated forwardi;θv)
3. the derivative in backpropagation following formula, undated parameter θv
3.for iter=1,2 ... tt{
1. N is sampled from T at randomtA data point constructs small lot
2. for each sample vt, g (t is calculated by propagated forwardi;θt)
3. the derivative in backpropagation following formula, undated parameter θt
Until convergence
Tested on both data sets, and compared other current popular 6 kinds of methods (LSSH, CMFH,
DCH,SCM,SePHkm,DCMH).In order to guarantee the fairness compared, the 7th layer of extraction of our image networks from this method
CNN feature is used for the control methods of shallow-layer.From table 1-2 it can be seen that method provided in this embodiment on different data sets all
Show the retrieval performance better than other methods.
Table 1
Table 2
Embodiment two
The purpose of the present embodiment is to provide a kind of computing device.
A kind of computer system can be run on a memory and on a processor including memory, processor and storage
Computer program, the processor are realized when executing described program:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net
The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Embodiment three
The purpose of the present embodiment is to provide a kind of computer readable storage medium.
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor
Following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net
The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Each step involved in above embodiments two and three is corresponding with embodiment of the method one, and specific embodiment can be found in
The related description part of embodiment one.Term " computer readable storage medium " is construed as including one or more instruction set
Single medium or multiple media;It should also be understood as including any medium, any medium can be stored, encodes or be held
It carries instruction set for being executed by processor and processor is made either to execute in the disclosure method.
The above one or more embodiment has the advantages that
1, the learning process of traditional cross-module state hash method, feature extraction and Hash coding is independent from each other, this
It is open to be based on deep learning frame end to end, while learning characteristic indicates and Hash coding, can more effectively capture difference
Correlation between modal data.
2, the feature of different modalities is input to converged network by the disclosure in couples, more to explore by nonlinear conversion
Correlation between modal data, and the Hash codes of high quality are obtained to supervise the training of the Hash network of modality-specific;It utilizes
The tactful solving optimization problem that iteration updates, and keep the discrete feature of Hash codes without carrying out pine to it in optimization process
It relaxes, which reduces quantization errors;Affinity information and classification information are embedded in Hash under same manifold frame between pair
Network maintains similitude and semantic consistency between mode well.
It will be understood by those skilled in the art that each module or each step of above-mentioned the application can be filled with general computer
It sets to realize, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.The application be not limited to any specific hardware and
The combination of software.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Although above-mentioned be described in conjunction with specific embodiment of the attached drawing to the application, model not is protected to the application
The limitation enclosed, those skilled in the art should understand that, on the basis of the technical solution of the application, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within the protection scope of the application.
Claims (8)
1. a kind of cross-module state Hash search method for merging supervision message, which comprises the following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the converged network
Output;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific;
The objective function for learning unified Hash codes are as follows:
Wherein, embedded constraint item between first item is pair, and
Wherein H*i、H*jRespectively indicate the converged network output of different training samples pair, S={ sijExpression pair
Between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes matrix, p (sij| B) when indicating given Hash codes B, sijItem
Part probability distribution, λ indicate super ginseng;Section 2 minimizes the loss between the output and binary code of converged network, H=h (Z;
θz)∈Rk×nFor the output of converged network;Section 3 is Constraints of Equilibrium item, for maximizing the information of each Hash codes, η table
Show super ginseng,Indicate F norm;
The Hash network of the trained modality-specific includes: to solve overall goal function, obtains image network and text network
Parameter;The overall goal function are as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f (vi;θv)
Indicate the character representation of i-th of the sample exported from image network, G*j=g (tj;θj) indicate j-th exported from text network
The character representation of sample;J2The unified Hash codes for using the first stage to obtain train the Hash of modality-specific as supervision message
Network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates text feature output;J3It will
Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text modality
Mapping matrix, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
2. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that the figure
As network includes 5 convolutional layers and 3 full articulamentums;Text network includes two full articulamentums;Converged network includes two complete
Articulamentum;Wherein, the hidden unit number of described image network and text network the last layer is equal, the second layer of converged network
For Hash layer, and its activation primitive is discriminant function.
3. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that will be described
Image network and the output feature of text network obtain the input of the converged network by nonlinear activation function.
4. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that solve institute
Stating objective function includes:
Initialisation image, text and converged network parameter θ={ θv,θt,θzAnd batch size;
Fixed network parameter θ={ θv,θt,θz, update unified Hash codes B;
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedv,θt,θz};
It constantly alternately updates, until convergence.
5. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that the spy
In the Hash network of cover half state, image network includes 5 convolutional layers, 2 full articulamentums and 1 Hash layer, and text network includes 1
A full articulamentum and 1 Hash layer;Wherein, the activation primitive of the Hash layer in described image network and text network is to differentiate letter
Number.
6. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that solve institute
Stating overall goal function includes:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and obtain W1 and W2;
Then W1 and W2 is fixed, updates network parameter using small lot stochastic gradient descent method;
It constantly alternately updates, until convergence.
7. a kind of computer system including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes fusion as claimed in any one of claims 1 to 6 when executing described program
The cross-module state Hash search method of supervision message.
8. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The cross-module state Hash search method of fusion supervision message as claimed in any one of claims 1 to 6 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811269037.9A CN109299216B (en) | 2018-10-29 | 2018-10-29 | A kind of cross-module state Hash search method and system merging supervision message |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811269037.9A CN109299216B (en) | 2018-10-29 | 2018-10-29 | A kind of cross-module state Hash search method and system merging supervision message |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109299216A CN109299216A (en) | 2019-02-01 |
CN109299216B true CN109299216B (en) | 2019-07-23 |
Family
ID=65158169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811269037.9A Expired - Fee Related CN109299216B (en) | 2018-10-29 | 2018-10-29 | A kind of cross-module state Hash search method and system merging supervision message |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299216B (en) |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960732B (en) * | 2019-03-29 | 2023-04-18 | 广东石油化工学院 | Deep discrete hash cross-modal retrieval method and system based on robust supervision |
CN110059198B (en) * | 2019-04-08 | 2021-04-13 | 浙江大学 | Discrete hash retrieval method of cross-modal data based on similarity maintenance |
CN110059154B (en) * | 2019-04-10 | 2022-04-15 | 山东师范大学 | Cross-modal migration hash retrieval method based on inheritance mapping |
CN110083532B (en) * | 2019-04-12 | 2023-05-23 | 中科寒武纪科技股份有限公司 | Method and device for positioning operation errors in fusion mode based on deep learning framework |
CN110222140B (en) * | 2019-04-22 | 2021-07-13 | 中国科学院信息工程研究所 | Cross-modal retrieval method based on counterstudy and asymmetric hash |
CN110188209B (en) * | 2019-05-13 | 2021-06-04 | 山东大学 | Cross-modal Hash model construction method based on hierarchical label, search method and device |
CN110188223B (en) * | 2019-06-06 | 2022-10-04 | 腾讯科技(深圳)有限公司 | Image processing method and device and computer equipment |
CN111127385B (en) * | 2019-06-06 | 2023-01-13 | 昆明理工大学 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
CN110298395B (en) * | 2019-06-18 | 2023-04-18 | 天津大学 | Image-text matching method based on three-modal confrontation network |
CN110647804A (en) * | 2019-08-09 | 2020-01-03 | 中国传媒大学 | Violent video identification method, computer system and storage medium |
CN110597878B (en) * | 2019-09-16 | 2023-09-15 | 广东工业大学 | Cross-modal retrieval method, device, equipment and medium for multi-modal data |
CN110750660B (en) * | 2019-10-08 | 2023-03-10 | 西北工业大学 | Half-pairing multi-mode data hash coding method |
CN110765281A (en) * | 2019-11-04 | 2020-02-07 | 山东浪潮人工智能研究院有限公司 | Multi-semantic depth supervision cross-modal Hash retrieval method |
CN113064959B (en) * | 2020-01-02 | 2022-09-23 | 南京邮电大学 | Cross-modal retrieval method based on deep self-supervision sorting Hash |
CN111241310A (en) * | 2020-01-10 | 2020-06-05 | 济南浪潮高新科技投资发展有限公司 | Deep cross-modal Hash retrieval method, equipment and medium |
CN111353076B (en) * | 2020-02-21 | 2023-10-10 | 华为云计算技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
CN111460201B (en) * | 2020-03-04 | 2022-09-23 | 南京邮电大学 | Cross-modal retrieval method for modal consistency based on generative countermeasure network |
CN111782921A (en) * | 2020-03-25 | 2020-10-16 | 北京沃东天骏信息技术有限公司 | Method and device for searching target |
CN111599438B (en) * | 2020-04-02 | 2023-07-28 | 浙江工业大学 | Real-time diet health monitoring method for diabetics based on multi-mode data |
CN111753190A (en) * | 2020-05-29 | 2020-10-09 | 中山大学 | Meta learning-based unsupervised cross-modal Hash retrieval method |
CN111914156B (en) * | 2020-08-14 | 2023-01-20 | 中国科学院自动化研究所 | Cross-modal retrieval method and system for self-adaptive label perception graph convolution network |
CN111914950B (en) * | 2020-08-20 | 2021-04-16 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Unsupervised cross-modal retrieval model training method based on depth dual variational hash |
CN112559820B (en) * | 2020-12-17 | 2022-08-30 | 中国科学院空天信息创新研究院 | Sample data set intelligent question setting method, device and equipment based on deep learning |
CN112667841B (en) * | 2020-12-28 | 2023-03-24 | 山东建筑大学 | Weak supervision depth context-aware image characterization method and system |
CN112817604B (en) * | 2021-02-18 | 2022-08-05 | 北京邮电大学 | Android system control intention identification method and device, electronic equipment and storage medium |
CN112989097A (en) * | 2021-03-23 | 2021-06-18 | 北京百度网讯科技有限公司 | Model training and picture retrieval method and device |
CN113095415B (en) * | 2021-04-15 | 2022-06-14 | 齐鲁工业大学 | Cross-modal hashing method and system based on multi-modal attention mechanism |
CN113157739B (en) * | 2021-04-23 | 2024-01-09 | 平安科技(深圳)有限公司 | Cross-modal retrieval method and device, electronic equipment and storage medium |
CN113449849B (en) * | 2021-06-29 | 2022-05-27 | 桂林电子科技大学 | Learning type text hash method based on self-encoder |
CN113763441B (en) * | 2021-08-25 | 2024-01-26 | 中国科学院苏州生物医学工程技术研究所 | Medical image registration method and system without supervision learning |
CN114329109B (en) * | 2022-03-15 | 2022-06-03 | 山东建筑大学 | Multimodal retrieval method and system based on weakly supervised Hash learning |
CN114942984B (en) * | 2022-05-26 | 2023-11-21 | 北京百度网讯科技有限公司 | Pre-training and image-text retrieval method and device for visual scene text fusion model |
CN115687571B (en) * | 2022-10-28 | 2024-01-26 | 重庆师范大学 | Depth unsupervised cross-modal retrieval method based on modal fusion reconstruction hash |
CN115840827B (en) * | 2022-11-07 | 2023-09-19 | 重庆师范大学 | Deep unsupervised cross-modal hash retrieval method |
CN115982403B (en) * | 2023-01-12 | 2024-02-02 | 之江实验室 | Multi-mode hash retrieval method and device |
CN115880556B (en) * | 2023-02-21 | 2023-05-02 | 北京理工大学 | Multi-mode data fusion processing method, device, equipment and storage medium |
CN116049459B (en) * | 2023-03-30 | 2023-07-14 | 浪潮电子信息产业股份有限公司 | Cross-modal mutual retrieval method, device, server and storage medium |
CN116594994B (en) * | 2023-03-30 | 2024-02-23 | 重庆师范大学 | Application method of visual language knowledge distillation in cross-modal hash retrieval |
CN116244484B (en) * | 2023-05-11 | 2023-08-08 | 山东大学 | Federal cross-modal retrieval method and system for unbalanced data |
CN116431847B (en) * | 2023-06-14 | 2023-11-14 | 北京邮电大学 | Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure |
CN116825210B (en) * | 2023-08-28 | 2023-11-17 | 山东大学 | Hash retrieval method, system, equipment and medium based on multi-source biological data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273505A (en) * | 2017-06-20 | 2017-10-20 | 西安电子科技大学 | Supervision cross-module state Hash search method based on nonparametric Bayes model |
CN107402993A (en) * | 2017-07-17 | 2017-11-28 | 山东师范大学 | The cross-module state search method for maximizing Hash is associated based on identification |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131556B2 (en) * | 2007-04-03 | 2012-03-06 | Microsoft Corporation | Communications using different modalities |
CN104317837B (en) * | 2014-10-10 | 2017-06-23 | 浙江大学 | A kind of cross-module state search method based on topic model |
CN104899253B (en) * | 2015-05-13 | 2018-06-26 | 复旦大学 | Towards the society image across modality images-label degree of correlation learning method |
JP6656570B2 (en) * | 2015-07-13 | 2020-03-04 | 国立大学法人 筑波大学 | Cross-modal sensory analysis system, presentation information determination system, information presentation system, cross-modal sensory analysis program, presentation information determination program, and information presentation program |
CN107256271B (en) * | 2017-06-27 | 2020-04-03 | 鲁东大学 | Cross-modal Hash retrieval method based on mapping dictionary learning |
CN107871014A (en) * | 2017-11-23 | 2018-04-03 | 清华大学 | A kind of big data cross-module state search method and system based on depth integration Hash |
-
2018
- 2018-10-29 CN CN201811269037.9A patent/CN109299216B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273505A (en) * | 2017-06-20 | 2017-10-20 | 西安电子科技大学 | Supervision cross-module state Hash search method based on nonparametric Bayes model |
CN107402993A (en) * | 2017-07-17 | 2017-11-28 | 山东师范大学 | The cross-module state search method for maximizing Hash is associated based on identification |
Also Published As
Publication number | Publication date |
---|---|
CN109299216A (en) | 2019-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299216B (en) | A kind of cross-module state Hash search method and system merging supervision message | |
CN110334219B (en) | Knowledge graph representation learning method based on attention mechanism integrated with text semantic features | |
CN109165306B (en) | Image retrieval method based on multitask Hash learning | |
CN113707235B (en) | Drug micromolecule property prediction method, device and equipment based on self-supervision learning | |
CN109299341A (en) | One kind confrontation cross-module state search method dictionary-based learning and system | |
CN111753189A (en) | Common characterization learning method for few-sample cross-modal Hash retrieval | |
CN110826303A (en) | Joint information extraction method based on weak supervised learning | |
CN114418954A (en) | Mutual learning-based semi-supervised medical image segmentation method and system | |
CN110516095A (en) | Weakly supervised depth Hash social activity image search method and system based on semanteme migration | |
CN109840322A (en) | It is a kind of based on intensified learning cloze test type reading understand analysis model and method | |
CN112561064B (en) | Knowledge base completion method based on OWKBC model | |
CN112000770B (en) | Semantic feature graph-based sentence semantic matching method for intelligent question and answer | |
CN112199532B (en) | Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN114896434B (en) | Hash code generation method and device based on center similarity learning | |
CN111460824A (en) | Unmarked named entity identification method based on anti-migration learning | |
CN116932722A (en) | Cross-modal data fusion-based medical visual question-answering method and system | |
CN109960732A (en) | A kind of discrete Hash cross-module state search method of depth and system based on robust supervision | |
CN110598022B (en) | Image retrieval system and method based on robust deep hash network | |
CN116258990A (en) | Cross-modal affinity-based small sample reference video target segmentation method | |
CN115827954A (en) | Dynamically weighted cross-modal fusion network retrieval method, system and electronic equipment | |
CN114021584A (en) | Knowledge representation learning method based on graph convolution network and translation model | |
CN116720519B (en) | Seedling medicine named entity identification method | |
CN112668633B (en) | Adaptive graph migration learning method based on fine granularity field | |
CN109978013A (en) | A kind of depth clustering method for figure action identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190723 Termination date: 20211029 |