CN109299216B - A kind of cross-module state Hash search method and system merging supervision message - Google Patents

A kind of cross-module state Hash search method and system merging supervision message Download PDF

Info

Publication number
CN109299216B
CN109299216B CN201811269037.9A CN201811269037A CN109299216B CN 109299216 B CN109299216 B CN 109299216B CN 201811269037 A CN201811269037 A CN 201811269037A CN 109299216 B CN109299216 B CN 109299216B
Authority
CN
China
Prior art keywords
network
hash
text
image
indicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201811269037.9A
Other languages
Chinese (zh)
Other versions
CN109299216A (en
Inventor
张化祥
王粒
冯珊珊
任玉伟
刘丽
张庆科
朱磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201811269037.9A priority Critical patent/CN109299216B/en
Publication of CN109299216A publication Critical patent/CN109299216A/en
Application granted granted Critical
Publication of CN109299216B publication Critical patent/CN109299216B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of cross-module state Hash search methods and system for merging supervision message, which comprises building image network, text network and converged network;Image and text feature training sample pair are obtained, respectively input picture network and text network;Using image network and the output feature of text network as the input of the converged network, and define the output of the converged network;According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;The objective function is solved, unified Hash codes are obtained;Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.The present invention is based on deep learning frame end to end, simultaneously learning characteristic is indicated and Hash encodes, and can more effectively be captured the correlation between different modalities data, be facilitated the raising of cross-module state retrieval precision.

Description

A kind of cross-module state Hash search method and system merging supervision message
Technical field
This disclosure relates to cross-module state search method, more specifically to a kind of cross-module state Hash for merging supervision message Search method and system.
Background technique
In recent years, with the sharp increase of different types of data on network, approximate KNN (ANN) is searched in related application In play increasingly important role.For example, information retrieval, data mining, computer vision etc..Hash technology is due to its calculating At low cost and storage efficiency is high, has become one of technology most popular in ANN search.The basic thought of Hash is to pass through The Hamming space that the data of higher-dimension are mapped to compact binary encoded by hash function is practised, while retaining luv space as far as possible Similarity Structure.Many hash methods applied in single mode scene have been suggested at present, however are had in real world There are the data of identical semanteme often to there are multiple modalities, for example, image, text, video etc..In order to make full use of isomeric data it Between relationship, ANN search in develop cross-module state Hash (CMH) method be necessary.Specifically, in cross-module state similitude In search, the mode for inquiring data is different from the mode for the data that are retrieved.The disclosure is with image inspection text (I2T) and text inspection figure As being analyzed and being tested for (T2I) task, while the method can extend to the retrieval between any other mode.
Most of existing cross-module state Hash (CMH) method is the feature based on manual processing, feature extraction and Hash Code learning process independently carries out.This differentiation that may will limit sample indicates, and then damages the standard of the Hash codes of study True property.Recently, the hash method based on deep learning propose a kind of learning framework end to end simultaneously learning characteristic indicate and Hash coding, can more effectively capture the non-linear dependencies between different modalities than shallow-layer learning method.As classics Method, depth cross-module state Hash (DCMH) expands to traditional depth model in the retrieval of cross-module state, and to each mode Execute the learning framework end to end with deep neural network.The depth Hash (PRDH) of relation guiding is further integrated between pair It is constrained between a variety of pairs, from the similitude for enhancing Hash codes between mode and in mode.
In the above-mentioned depth cross-module state Hash frame referred to, for the paired samples from two different modalities, they Hash codes be usually forced to be arranged to it is the same.Also, these methods are learned respectively by the deep neural network of every kind of mode The character representation of single sample is practised, minimizes the loss between different modalities feature then to establish the relationship of isomery.Thus It suffers from the drawback that only by simply applying constraint to the last layer of the neural network of different modalities, can not sufficiently dig Dig the complex relationship between multi-modal data.
Summary of the invention
To overcome above-mentioned the deficiencies in the prior art, present disclose provides a kind of cross-module state Hash retrievals for merging supervision message Method and system, the method is based on deep learning frame end to end, and simultaneously learning characteristic indicates and Hash encodes, can The correlation between different modalities data is more effectively captured than conventional learning algorithms, facilitates mentioning for cross-module state retrieval precision It is high.
To achieve the above object, one or more other embodiments of the present disclosure provide following technical solution:
A kind of cross-module state Hash search method merging supervision message, comprising the following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Further, described image network includes 5 convolutional layers and 3 full articulamentums;Text network includes two and connects entirely Connect layer;Converged network includes two full articulamentums;Wherein, described image network and the hidden unit of text network the last layer Number is equal, and the second layer of converged network is Hash layer, and its activation primitive is discriminant function.
Further, the output feature of described image network and text network is obtained into institute by nonlinear activation function State the input of converged network.
Further, the objective function for learning unified Hash codes are as follows:
Wherein, embedded constraint item between first item is pair, and Wherein H*i、H*jRespectively indicate different training The converged network of sample pair exports, S={ sijIndicate pair between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes square Battle array, p (sij| B) when indicating given Hash codes B, sijConditional probability distribution, λ indicates super ginseng;Section 2 minimizes converged network Output and binary code between loss, H=h (Z;θz)∈Rk×nFor the output of converged network;Section 3 is Constraints of Equilibrium , for maximizing the information of each Hash codes, η indicates super ginseng,Indicate F norm.
Further, solving the objective function includes:
Initialisation image, text and converged network parameter θ={ θvtzAnd batch size;
Fixed network parameter θ={ θvtz, update unified Hash codes B;
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedvtz};
It constantly alternately updates, until convergence.
Further, in the Hash network of the modality-specific, image network include 5 convolutional layers, 2 full articulamentums and 1 Hash layer, text network include 1 full articulamentum and 1 Hash layer;Wherein, in described image network and text network The activation primitive of Hash layer is discriminant function.
Further, the Hash network of the trained modality-specific includes: to solve overall goal function, obtains image network With the parameter of text network;The objective function are as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f (vi;θv) indicate the character representation of i-th of sample exported from image network, G*j=g (tj;θj) indicate to export from text network J-th of sample character representation;J2The unified Hash codes for using the first stage to obtain train modality-specific as supervision message Hash network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates that text feature is defeated Out;J3Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text The mapping matrix of this mode, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
Further, solving the overall goal function includes:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and update W1And W2
Then W is fixed1And W2, image parameter θ is updated respectively using small lot stochastic gradient descent methodvWith text parameter θt
It constantly alternately updates, until convergence.
One or more embodiments provide a kind of computer system, including memory, processor and are stored in memory Computer program that is upper and can running on a processor, the processor realize that letter is supervised in the fusion when executing described program The cross-module state Hash search method of breath.
One or more embodiments provide a kind of computer readable storage medium, are stored thereon with computer program, should The cross-module state Hash search method of the fusion supervision message is realized when program is executed by processor.
One or more of above-mentioned technical proposal has the advantages that
1, the learning process of traditional cross-module state hash method, feature extraction and Hash coding is independent from each other, this It is open to be based on deep learning frame end to end, while learning characteristic indicates and Hash coding, can more effectively capture difference Correlation between modal data.
2, the feature of different modalities is input to converged network by the disclosure in couples, more to explore by nonlinear conversion Correlation between modal data, and the Hash codes of high quality are obtained to supervise the training of the Hash network of modality-specific;It utilizes The tactful solving optimization problem that iteration updates, and keep the discrete feature of Hash codes without carrying out pine to it in optimization process It relaxes, which reduces quantization errors;Affinity information and classification information are embedded in Hash under same manifold frame between pair Network maintains similitude and semantic consistency between mode well.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
Fig. 1 is the flow diagram that the cross-module state Hash search method of supervision message is merged in embodiment one;
Fig. 2 is the flow diagram that the cross-module state Hash search method of supervision message is merged in embodiment one.
Specific embodiment
It is noted that described further below be all exemplary, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other.
Embodiment one
Present embodiment discloses a kind of cross-module state Hash search methods for merging supervision message, as shown in Figs. 1-2, including with Lower step:
First stage: unified Hash codes study
Step 1: three networks of building: image network, text network and converged network.(1) CNN-F that image network uses Network.Original CNN-F model shares 8 layers, including 5 convolutional layers and 3 full articulamentums.(2) for text modality, first will Each samples of text is expressed as bag-of-word (BOW) vector, and BOW vector is then input to tool, and there are two full articulamentums Text network.Particularly, the hidden unit number of image and text network the last layer is equal and long according to different codings Different values is arranged in degree and data set.(3) converged network is made of two full articulamentums, combines image and text in couples The output of network.In order to obtain unified Hash codes, the second layer for merging network is designed as the Hash with k hidden unit Layer, and its activation primitive is discriminant function.
Step 2: data-oriented collectionN represents the sum of training sample pair, viIndicate picture feature, ti Indicate text feature, yiIndicate semantic marker vector.In addition, S={ sijIndicate pair between similarity matrix.The target in this stage is Learn compact binary code b for each samplei∈{-1,1}k, B ∈ { -1,1 }k×nIndicate unified Hash codes matrix.
Step 3:Indicate the character representation exported from image network,It indicates to export from text network Character representation.Pass through nonlinear activation function (tanh function)In conjunction with both the above mould The output of state obtains the input of converged network.Further, the output H=h (Z of converged network is defined;θz)∈Rk×n.For study system One Hash codes construct objective function:
Wherein, between first item is pair embedded constraint item and Wherein H*i、H*jRespectively indicate different training The converged network of sample pair exports.S={ sijIndicate pair between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes square Battle array, p (sij| B) when indicating given Hash codes B, sijConditional probability distribution.By minimizing the negative log-likelihood in first item Function carrys out the similitude in holding matrix S, that is, so that the similitude (inner product) between two similar samples is big as far as possible, and Similitude (inner product) between dissimilar sample is small as far as possible.Section 2 minimize converged network output and binary code it Between loss, so that the unified Hash codes learnt can keep the non-linear dependencies between training sample well. Section 3 is Constraints of Equilibrium item, and for maximizing the information of each Hash codes, that is, requiring each to have equal opportunity is 1 Or -1., λ indicates super and joins (and λ > 0), and η indicates super ginseng (and η > 0),Indicate F norm.
Step 4: for the optimization problem of formula (1), being solved using iteration more new strategy.Pass through fixed network parameter θ={ θvtzThe unified Hash codes B of study, B is then fixed, small lot stochastic gradient descent method (SGD) undated parameter is utilized θ={ θvtz, by constantly alternately updating, until convergence, acquires optimal unified Hash codes B.Specifically, including it is following Step:
Initialisation image, text and converged network parameter θ={ θvtzAnd batch size;
Fixed network parameter θ={ θvtz, unified Hash codes B is updated according to the following formula;
B=sign (λ H)
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedvtz, its ladder is calculated as follows Degree;
It constantly alternately updates, until convergence.
Second stage: modality-specific Hash network training
Step 1: image network and text network are redesigned, for training the Hash network of modality-specific.In addition to that will scheme The full articulamentum of the last one of picture and text network replace with Hash layer (have k hidden unit) and using discriminant function as Its activation primitive, the setting of other layers with it is identical on last stage.
Step 2: in this stage, main training image network f (V;θv) and text network g (T;θt) corresponding to obtain Hash function hv() and ht() encodes the sample outside training data.
Step 3: define overall goal function:
Wherein, J1It is pairs of embedded constraint between mode, for keeping the cross-module state between image and the output of text network Similitude;J2The unified Hash codes for using the first stage to obtain train the Hash network of modality-specific as supervision message;J3Directly The network that label information is linearly mapped to modality-specific is connect, sufficiently to excavate semantic information.J4It is Constraints of Equilibrium, is used to most Change the information of each greatly.They are defined as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f (vi;θv) indicate the character representation of i-th of sample exported from image network, G*j=g (tj;θj) indicate to export from text network J-th of sample character representation;J2The unified Hash codes for using the first stage to obtain train modality-specific as supervision message Hash network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates that text feature is defeated Out;J3Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text The mapping matrix of this mode, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
Step 4: for the optimization problem of formula (2), being used in the same manner iteration more new strategy and solved: by fixing it His parameter updates some parameter therein.Particularly, using small lot stochastic gradient descent, and pass through backpropagation (BP) Algorithm carrys out undated parameter θvAnd θt.Specifically, comprising the following steps:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and update W respectively according to the following formula1And W2
Then W is fixed1And W2, image parameter θ is updated respectively using small lot stochastic gradient descent methodvWith text parameter θt, Its gradient is calculated as follows;
It constantly alternately updates, until convergence.
We test in MIRFLICKR-25K and NUS-WIDE two datasets respectively.
MIRFLICKR-25K data set includes 25,000 sample collected from the website Flickr, each sample packet Containing a picture and some text labels.And 24 labels are given in total, each sample is by one, at least label therein Mark.We select the sample of at least 20 label for labelling for testing, and altogether include 20,015 image-text pair.Wherein, Text modality is represented as the BoW vector of 1386 dimensions, and directly uses original pixels as input for image modalities.It is testing In, we take 2,000 sample as inquiry at random, remaining is as the database being retrieved.In order to reduce calculating cost, Wo Mencong Take 5,000 samples for training in database.
It includes 269,648 samples that NUS-WIDE, which is the picture database of a true webpage, they are by 81 theme marks Label mark.Each each sample includes a picture and text label associated with it.In an experiment, we choose maximum 10 Class constitutes a subset, altogether includes 186,577 image-texts pair.For each sample, text modality is expressed as 1, The BoW vector of 000 dimension, image modalities directly use original pixels as input.On this data set, our stochastical samplings 2, 000 each sample is as inquiry, remaining is as database.Similarly, take 5,000 data point for instructing from database at random Practice.
The present embodiment is implemented under MatConvNet frame.For image network, our uses are in ImageNet number It is initialized according to the CNN-F network for collecting upper pre-training.For the parameter of other deep neural networks, we are random to be carried out initially Change.In addition, its dimension is arranged in we on MIRFLICKR-25K data set for having the text network there are two full articulamentum It is [8192 → 2500];And on NUS-WIDE data set, when code length is 16 and 32, dimension is set as [8192 → 1000], when code length is 64, it is set as [8192 → 600].It is defeated for combining image and text network in couples Converged network out, the dimension that its full articulamentum is arranged in we on all data sets is all [4096 → k].In an experiment, Empirically value is 1 to all parameters, and learning rate is from 10-1.5To 10-3Change, the outer loop the number of iterations in algorithm is set as 500 times.Algorithm realizes that process is as follows.
1st stage: unified Hash codes study
Input: pictures V and text set T;Similarity matrix S between couple;Parameter γ, β, α;Code length k
Output: unified Hash matrix B
Initialization: initialisation image, text and converged network parameter θ={ θvtz, batch size Nv=Nt=128,
Cycle-index
Circulation executes following sentence
1. preset parameter θ={ θvtz, B is updated according to formula B=sign (λ H)
2.for iter=1,2 ... tz{
1. sampling N respectively from V and T at randomvAnd NtA data point constructs small lot
2. for sample v pairs of in small lotiAnd ti, calculated separately by propagated forward And h (zi;θz)
3. the gradient of top layer is calculated, according to following formula:
4. carrying out backpropagation, undated parameter θ={ θ to image, text and converged networkvtz}}
Until convergence
2nd stage: modality-specific Hash network training
Input: pictures V and text set T;Similarity matrix S between couple;Mark matrix Y;The Hash matrix B of study;
Parameter γ, β, α;Code length k
Output: the Hash network parameter θ of modality-specificvAnd θt
Initialization: initialisation image, text network parameter θvAnd θt, batch size Nv=Nt=128, cycle-index
Circulation executes following sentence
1. preset parameter θvAnd θt, according to formulaUpdate W1, according to formulaUpdate W2
2.for iter=1,2 ... tv{
1. N is sampled from V at randomvA data point constructs small lot
2. for each sample vi, f (v is calculated by propagated forwardi;θv)
3. the derivative in backpropagation following formula, undated parameter θv
3.for iter=1,2 ... tt{
1. N is sampled from T at randomtA data point constructs small lot
2. for each sample vt, g (t is calculated by propagated forwardi;θt)
3. the derivative in backpropagation following formula, undated parameter θt
Until convergence
Tested on both data sets, and compared other current popular 6 kinds of methods (LSSH, CMFH, DCH,SCM,SePHkm,DCMH).In order to guarantee the fairness compared, the 7th layer of extraction of our image networks from this method CNN feature is used for the control methods of shallow-layer.From table 1-2 it can be seen that method provided in this embodiment on different data sets all Show the retrieval performance better than other methods.
Table 1
Table 2
Embodiment two
The purpose of the present embodiment is to provide a kind of computing device.
A kind of computer system can be run on a memory and on a processor including memory, processor and storage Computer program, the processor are realized when executing described program:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Embodiment three
The purpose of the present embodiment is to provide a kind of computer readable storage medium.
A kind of computer readable storage medium, is stored thereon with computer program, realization when which is executed by processor Following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the fusion net The output of network;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific.
Each step involved in above embodiments two and three is corresponding with embodiment of the method one, and specific embodiment can be found in The related description part of embodiment one.Term " computer readable storage medium " is construed as including one or more instruction set Single medium or multiple media;It should also be understood as including any medium, any medium can be stored, encodes or be held It carries instruction set for being executed by processor and processor is made either to execute in the disclosure method.
The above one or more embodiment has the advantages that
1, the learning process of traditional cross-module state hash method, feature extraction and Hash coding is independent from each other, this It is open to be based on deep learning frame end to end, while learning characteristic indicates and Hash coding, can more effectively capture difference Correlation between modal data.
2, the feature of different modalities is input to converged network by the disclosure in couples, more to explore by nonlinear conversion Correlation between modal data, and the Hash codes of high quality are obtained to supervise the training of the Hash network of modality-specific;It utilizes The tactful solving optimization problem that iteration updates, and keep the discrete feature of Hash codes without carrying out pine to it in optimization process It relaxes, which reduces quantization errors;Affinity information and classification information are embedded in Hash under same manifold frame between pair Network maintains similitude and semantic consistency between mode well.
It will be understood by those skilled in the art that each module or each step of above-mentioned the application can be filled with general computer It sets to realize, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.The application be not limited to any specific hardware and The combination of software.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Although above-mentioned be described in conjunction with specific embodiment of the attached drawing to the application, model not is protected to the application The limitation enclosed, those skilled in the art should understand that, on the basis of the technical solution of the application, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within the protection scope of the application.

Claims (8)

1. a kind of cross-module state Hash search method for merging supervision message, which comprises the following steps:
Construct image network, text network and converged network;
Image and text feature training sample pair are obtained, respectively input picture network and text network;
Using image network and the output feature of text network as the input of the converged network, and define the converged network Output;
According to the output of the converged network and pair between similitude building learn the objective functions of unified Hash codes;
The objective function is solved, unified Hash codes are obtained;
Using the unified Hash codes as supervision message, in conjunction with semantic information, the Hash network of training modality-specific;
The objective function for learning unified Hash codes are as follows:
Wherein, embedded constraint item between first item is pair, and
Wherein H*i、H*jRespectively indicate the converged network output of different training samples pair, S={ sijExpression pair Between similarity matrix, B ∈ { -1,1 }k×nIndicate unified Hash codes matrix, p (sij| B) when indicating given Hash codes B, sijItem Part probability distribution, λ indicate super ginseng;Section 2 minimizes the loss between the output and binary code of converged network, H=h (Z; θz)∈Rk×nFor the output of converged network;Section 3 is Constraints of Equilibrium item, for maximizing the information of each Hash codes, η table Show super ginseng,Indicate F norm;
The Hash network of the trained modality-specific includes: to solve overall goal function, obtains image network and text network Parameter;The overall goal function are as follows:
Wherein, α, β, γ respectively indicate super ginseng;J1It is pairs of embedded constraint between mode,Wherein F*i=f (vi;θv) Indicate the character representation of i-th of the sample exported from image network, G*j=g (tj;θj) indicate j-th exported from text network The character representation of sample;J2The unified Hash codes for using the first stage to obtain train the Hash of modality-specific as supervision message Network, B ∈ { -1,1 }k×nIndicate that unified Hash codes matrix, F indicate picture feature output, G indicates text feature output;J3It will Label information is linearly mapped to the network of modality-specific,WithRespectively indicate image and text modality Mapping matrix, Y indicate semantic matrix;J4It is Constraints of Equilibrium, for maximizing each information.
2. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that the figure As network includes 5 convolutional layers and 3 full articulamentums;Text network includes two full articulamentums;Converged network includes two complete Articulamentum;Wherein, the hidden unit number of described image network and text network the last layer is equal, the second layer of converged network For Hash layer, and its activation primitive is discriminant function.
3. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that will be described Image network and the output feature of text network obtain the input of the converged network by nonlinear activation function.
4. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that solve institute Stating objective function includes:
Initialisation image, text and converged network parameter θ={ θvtzAnd batch size;
Fixed network parameter θ={ θvtz, update unified Hash codes B;
Then B is fixed, small lot stochastic gradient descent method undated parameter θ={ θ is utilizedvtz};
It constantly alternately updates, until convergence.
5. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that the spy In the Hash network of cover half state, image network includes 5 convolutional layers, 2 full articulamentums and 1 Hash layer, and text network includes 1 A full articulamentum and 1 Hash layer;Wherein, the activation primitive of the Hash layer in described image network and text network is to differentiate letter Number.
6. a kind of cross-module state Hash search method for merging supervision message as described in claim 1, which is characterized in that solve institute Stating overall goal function includes:
Initialisation image network parameter θv, text network parameter θtAnd batch size;
Preset parameter θvAnd θt, solve objective function and obtain W1 and W2;
Then W1 and W2 is fixed, updates network parameter using small lot stochastic gradient descent method;
It constantly alternately updates, until convergence.
7. a kind of computer system including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes fusion as claimed in any one of claims 1 to 6 when executing described program The cross-module state Hash search method of supervision message.
8. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The cross-module state Hash search method of fusion supervision message as claimed in any one of claims 1 to 6 is realized when row.
CN201811269037.9A 2018-10-29 2018-10-29 A kind of cross-module state Hash search method and system merging supervision message Expired - Fee Related CN109299216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811269037.9A CN109299216B (en) 2018-10-29 2018-10-29 A kind of cross-module state Hash search method and system merging supervision message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811269037.9A CN109299216B (en) 2018-10-29 2018-10-29 A kind of cross-module state Hash search method and system merging supervision message

Publications (2)

Publication Number Publication Date
CN109299216A CN109299216A (en) 2019-02-01
CN109299216B true CN109299216B (en) 2019-07-23

Family

ID=65158169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811269037.9A Expired - Fee Related CN109299216B (en) 2018-10-29 2018-10-29 A kind of cross-module state Hash search method and system merging supervision message

Country Status (1)

Country Link
CN (1) CN109299216B (en)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960732B (en) * 2019-03-29 2023-04-18 广东石油化工学院 Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN110059198B (en) * 2019-04-08 2021-04-13 浙江大学 Discrete hash retrieval method of cross-modal data based on similarity maintenance
CN110059154B (en) * 2019-04-10 2022-04-15 山东师范大学 Cross-modal migration hash retrieval method based on inheritance mapping
CN110083532B (en) * 2019-04-12 2023-05-23 中科寒武纪科技股份有限公司 Method and device for positioning operation errors in fusion mode based on deep learning framework
CN110222140B (en) * 2019-04-22 2021-07-13 中国科学院信息工程研究所 Cross-modal retrieval method based on counterstudy and asymmetric hash
CN110188209B (en) * 2019-05-13 2021-06-04 山东大学 Cross-modal Hash model construction method based on hierarchical label, search method and device
CN110188223B (en) * 2019-06-06 2022-10-04 腾讯科技(深圳)有限公司 Image processing method and device and computer equipment
CN111127385B (en) * 2019-06-06 2023-01-13 昆明理工大学 Medical information cross-modal Hash coding learning method based on generative countermeasure network
CN110298395B (en) * 2019-06-18 2023-04-18 天津大学 Image-text matching method based on three-modal confrontation network
CN110647804A (en) * 2019-08-09 2020-01-03 中国传媒大学 Violent video identification method, computer system and storage medium
CN110597878B (en) * 2019-09-16 2023-09-15 广东工业大学 Cross-modal retrieval method, device, equipment and medium for multi-modal data
CN110750660B (en) * 2019-10-08 2023-03-10 西北工业大学 Half-pairing multi-mode data hash coding method
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method
CN113064959B (en) * 2020-01-02 2022-09-23 南京邮电大学 Cross-modal retrieval method based on deep self-supervision sorting Hash
CN111241310A (en) * 2020-01-10 2020-06-05 济南浪潮高新科技投资发展有限公司 Deep cross-modal Hash retrieval method, equipment and medium
CN111353076B (en) * 2020-02-21 2023-10-10 华为云计算技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111460201B (en) * 2020-03-04 2022-09-23 南京邮电大学 Cross-modal retrieval method for modal consistency based on generative countermeasure network
CN111782921A (en) * 2020-03-25 2020-10-16 北京沃东天骏信息技术有限公司 Method and device for searching target
CN111599438B (en) * 2020-04-02 2023-07-28 浙江工业大学 Real-time diet health monitoring method for diabetics based on multi-mode data
CN111753190A (en) * 2020-05-29 2020-10-09 中山大学 Meta learning-based unsupervised cross-modal Hash retrieval method
CN111914156B (en) * 2020-08-14 2023-01-20 中国科学院自动化研究所 Cross-modal retrieval method and system for self-adaptive label perception graph convolution network
CN111914950B (en) * 2020-08-20 2021-04-16 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Unsupervised cross-modal retrieval model training method based on depth dual variational hash
CN112559820B (en) * 2020-12-17 2022-08-30 中国科学院空天信息创新研究院 Sample data set intelligent question setting method, device and equipment based on deep learning
CN112667841B (en) * 2020-12-28 2023-03-24 山东建筑大学 Weak supervision depth context-aware image characterization method and system
CN112817604B (en) * 2021-02-18 2022-08-05 北京邮电大学 Android system control intention identification method and device, electronic equipment and storage medium
CN112989097A (en) * 2021-03-23 2021-06-18 北京百度网讯科技有限公司 Model training and picture retrieval method and device
CN113095415B (en) * 2021-04-15 2022-06-14 齐鲁工业大学 Cross-modal hashing method and system based on multi-modal attention mechanism
CN113157739B (en) * 2021-04-23 2024-01-09 平安科技(深圳)有限公司 Cross-modal retrieval method and device, electronic equipment and storage medium
CN113449849B (en) * 2021-06-29 2022-05-27 桂林电子科技大学 Learning type text hash method based on self-encoder
CN113763441B (en) * 2021-08-25 2024-01-26 中国科学院苏州生物医学工程技术研究所 Medical image registration method and system without supervision learning
CN114329109B (en) * 2022-03-15 2022-06-03 山东建筑大学 Multimodal retrieval method and system based on weakly supervised Hash learning
CN114942984B (en) * 2022-05-26 2023-11-21 北京百度网讯科技有限公司 Pre-training and image-text retrieval method and device for visual scene text fusion model
CN115687571B (en) * 2022-10-28 2024-01-26 重庆师范大学 Depth unsupervised cross-modal retrieval method based on modal fusion reconstruction hash
CN115840827B (en) * 2022-11-07 2023-09-19 重庆师范大学 Deep unsupervised cross-modal hash retrieval method
CN115982403B (en) * 2023-01-12 2024-02-02 之江实验室 Multi-mode hash retrieval method and device
CN115880556B (en) * 2023-02-21 2023-05-02 北京理工大学 Multi-mode data fusion processing method, device, equipment and storage medium
CN116049459B (en) * 2023-03-30 2023-07-14 浪潮电子信息产业股份有限公司 Cross-modal mutual retrieval method, device, server and storage medium
CN116594994B (en) * 2023-03-30 2024-02-23 重庆师范大学 Application method of visual language knowledge distillation in cross-modal hash retrieval
CN116244484B (en) * 2023-05-11 2023-08-08 山东大学 Federal cross-modal retrieval method and system for unbalanced data
CN116431847B (en) * 2023-06-14 2023-11-14 北京邮电大学 Cross-modal hash retrieval method and device based on multiple contrast and double-way countermeasure
CN116825210B (en) * 2023-08-28 2023-11-17 山东大学 Hash retrieval method, system, equipment and medium based on multi-source biological data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273505A (en) * 2017-06-20 2017-10-20 西安电子科技大学 Supervision cross-module state Hash search method based on nonparametric Bayes model
CN107402993A (en) * 2017-07-17 2017-11-28 山东师范大学 The cross-module state search method for maximizing Hash is associated based on identification

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131556B2 (en) * 2007-04-03 2012-03-06 Microsoft Corporation Communications using different modalities
CN104317837B (en) * 2014-10-10 2017-06-23 浙江大学 A kind of cross-module state search method based on topic model
CN104899253B (en) * 2015-05-13 2018-06-26 复旦大学 Towards the society image across modality images-label degree of correlation learning method
JP6656570B2 (en) * 2015-07-13 2020-03-04 国立大学法人 筑波大学 Cross-modal sensory analysis system, presentation information determination system, information presentation system, cross-modal sensory analysis program, presentation information determination program, and information presentation program
CN107256271B (en) * 2017-06-27 2020-04-03 鲁东大学 Cross-modal Hash retrieval method based on mapping dictionary learning
CN107871014A (en) * 2017-11-23 2018-04-03 清华大学 A kind of big data cross-module state search method and system based on depth integration Hash

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273505A (en) * 2017-06-20 2017-10-20 西安电子科技大学 Supervision cross-module state Hash search method based on nonparametric Bayes model
CN107402993A (en) * 2017-07-17 2017-11-28 山东师范大学 The cross-module state search method for maximizing Hash is associated based on identification

Also Published As

Publication number Publication date
CN109299216A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109299216B (en) A kind of cross-module state Hash search method and system merging supervision message
CN110334219B (en) Knowledge graph representation learning method based on attention mechanism integrated with text semantic features
CN109165306B (en) Image retrieval method based on multitask Hash learning
CN113707235B (en) Drug micromolecule property prediction method, device and equipment based on self-supervision learning
CN109299341A (en) One kind confrontation cross-module state search method dictionary-based learning and system
CN111753189A (en) Common characterization learning method for few-sample cross-modal Hash retrieval
CN110826303A (en) Joint information extraction method based on weak supervised learning
CN114418954A (en) Mutual learning-based semi-supervised medical image segmentation method and system
CN110516095A (en) Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN109840322A (en) It is a kind of based on intensified learning cloze test type reading understand analysis model and method
CN112561064B (en) Knowledge base completion method based on OWKBC model
CN112000770B (en) Semantic feature graph-based sentence semantic matching method for intelligent question and answer
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN111460824A (en) Unmarked named entity identification method based on anti-migration learning
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
CN109960732A (en) A kind of discrete Hash cross-module state search method of depth and system based on robust supervision
CN110598022B (en) Image retrieval system and method based on robust deep hash network
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
CN115827954A (en) Dynamically weighted cross-modal fusion network retrieval method, system and electronic equipment
CN114021584A (en) Knowledge representation learning method based on graph convolution network and translation model
CN116720519B (en) Seedling medicine named entity identification method
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN109978013A (en) A kind of depth clustering method for figure action identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190723

Termination date: 20211029