CN109299342A - A kind of cross-module state search method based on circulation production confrontation network - Google Patents
A kind of cross-module state search method based on circulation production confrontation network Download PDFInfo
- Publication number
- CN109299342A CN109299342A CN201811455802.6A CN201811455802A CN109299342A CN 109299342 A CN109299342 A CN 109299342A CN 201811455802 A CN201811455802 A CN 201811455802A CN 109299342 A CN109299342 A CN 109299342A
- Authority
- CN
- China
- Prior art keywords
- data
- network
- loss function
- cross
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a kind of cross-module state search method based on circulation production confrontation network, this method devises a kind of novel binary channels circulation production confrontation neural network, and establishes the semantic dependency across modal data by the training neural network.Given different modalities data can two-way flow in a network, each modal data fights network by one group of production and generates another modal data, generate input of the data again as next group of production confrontation network, to realize that the bidirectional circulating of data generates, network continuously learns to across the semantic relation between modal data.In order to improve effectiveness of retrieval, the result of generator middle layer is approximately also corresponding binary system Hash codes using threshold function table and approximate function by this method, and a variety of constraint conditions are devised to guarantee the otherness of data between the similitude of same mode, homogeneous data and cross-module state, class, to further improve the Stability and veracity of retrieval.
Description
Technical field
The invention belongs to technical field of multimedia information retrieval, and in particular to a kind of to fight network based on circulation production
Cross-module state search method.
Technical background
With the arrival of Internet era, people can be touched whenever and wherever possible including picture, video, text, audio etc.
The massive information of multiple modalities, the content that oneself needs how is got from these massive informations are concerned about as Internet user
Emphasis, user frequently rely on Google, Baidu, must should wait search engines provide accurate retrieval service.However it is traditional mutual
Network search service largely also rests on the degree of single mode retrieval, for the retrieval across modal data using less, retrieval
Efficiency, accuracy, stability it is all to be improved, and largely all rely on existing data label, can not accomplish no mark
Sign the cross-module state retrieval of data.Therefore, studying novel cross-module state search method has very strong realistic meaning and practical value,
Key is the semantic relation by establishing between multi-modal isomeric data directly to retrieve other similar modal datas, is being not necessarily to
It realizes in the case where marking all modal datas across the direct retrieval between modal data, finally further increases the property of retrieval
Energy.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of cross-module state retrievals based on circulation production confrontation network
Method can effectively submit the performance of existing cross-module state retrieval technique.
To achieve the goals above, the cross-module state retrieval side based on circulation production confrontation network designed by the present invention
Method, which comprises the following steps:
Two loop modules are designed, described two loop modules share two generators with identical network structure, and
Hash coding is carried out to the output data of generator middle layer, the purpose of generator is as true as possible by training generation
Across modal data,;
One of loop module realizes mode m → mode t → mode m process by two generators, separately
One loop module realizes mode t → mode m → mode t process also by two generators;
Respective arbiter is designed for different modalities in each loop module, the arbiter attempts to the mode
It generates data and initial data is classified, and carry out dynamic confrontation with generator, ultimately generate device and arbiter in given instruction
Reach dynamic equilibrium under the conditions of white silk.
Further, it for the multi-modal multi-class characteristic of data flow, is constrained under the conditions of non-supervisory using manifold to protect
Demonstrate,prove the data similarity and otherness between mode between classification;Since class label is given under surveillance requirements, three are used
Tuple constraint minimizes the characteristic distance between similar different modalities between data, maximizes both inhomogeneity or the data of different modalities
Between characteristic distance.
Further, the loss function of the arbiter specifically:
The circulation loss function with the generation data of mode compared with initial data relatively to obtain ultimately produced are as follows:
Wherein i indicates the data that i-th calculates, a total of n training sample data, and arbiter in the training process can not
Disconnectedly towards reduction LdiscDirection iterative learning, DimgAnd DtxtRespectively indicate two arbiters, (mori, tori) respectively indicate mould
The original feature vector of state m and mode t, (mcyc, tcyc) respectively indicate the feature that mode m and mode t are generated by recirculating network
Vector.
Still further, the loss function of the generator specifically:
Wherein θ1It is the hyper parameter of network, | | * | |2L2 distance is sought in expression.
Further, if the feature vector of two generator middle layers output is mcomAnd tcom, generate Hash coding
Formula are as follows:
mhash=sgn (mcom-0.5)
thash=sgn (tcom-0.5)
Wherein sgn is threshold function table, and formula is meant that each floating number in middle layer floating type feature vector, value
Corresponding hash code bit is set as+1 when greater than 0.5, and corresponding hash code bit is set as -1 when value is less than 0.5.
Still further, the approximate error of the Hash intersymbol for the generation of quantization characteristic vector sum, this method devise phase
The loss function of pass is as constraint, specifically used likelihood functions of Hash codes under the conditions of feature vector, with i-th sample
Hash codes jth positionWith feature vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
It is close between feature vector and the Hash codes of generation to assess that loss function is further designed according to likelihood function
Like error:
Wherein n is total sample number, dhashFor vector digit.
Still further, classification constraint is carried out to generator middle layer feature vector in the present invention, to design classification
Loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network,
ciIt is the actual class label of the sample, what classification loss function actually calculated is L2 distance between the two.
Constraint for the homogeneous data to cross-module state to similitude is carried out, this method is by training image sample data and it
Similar samples of text data establish connection, and design loss function and constrain the homogeneous data of cross-module state, lose letter
Number formula is as follows:
WithIt is generator G respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, damage
Mistake function calculates the L2 distance between semantic similar corresponding cross-module state homogeneous data.
In the case where there is the data training of supervision, since data all have class label, constrain to come using triple
Minimize the distance between the cross-module state data vector under identical semantic label, the ternary loss function of design are as follows:
Wherein m, t respectively represent image and text data, α, and β represents two categories label, and * representative is to generate data, i generation
The data of table i-th calculating;For non-supervisory training, this method devise manifold constraint come guarantee same mode and across
The similarity of semantic similarity data in modal data establishes similarity to the data to be constrained after calculating kNN matrix
Then matrix carries out manifold constraint to feature vector in public subspace, design manifold constraint loss function is as follows:
Wherein neib, non respectively represent neighbouring and not proximity data, other symbol meanings are with before.
Further, loss function design in summary, the generator loss function in the case of Training is set
It is calculated as:
Generator loss function design in the case of unsupervised training are as follows:
θ2, θ3, θ4, θ5It is the weight hyper parameter of network.Whole network is calculated using the optimization of RMSProp stochastic gradient descent
Method is trained iteration, iterative formula are as follows:
Since the decline of arbiter gradient is very fast in real process, every S generator of trained iteration of network of this method design
Arbiter of ability iteration, and use hyper parameter cgen, cdiscNetwork weight is trimmed, prevents network weight excessive.
The present invention has the advantages that
The present invention fights network by the circulation production using two groups of generators and arbiter building preferably to establish
Semantic relation between multi-modal data, and a variety of constraint conditions are devised to improve the Stability and veracity of retrieval, it uses
Binary system Hash codes substitute primitive character to improve effectiveness of retrieval, study and explore a kind of novel generate based on circulation
Formula fights the cross-module state search method of network, retrieves specific to the cross-module state between image and text.
Detailed description of the invention
Fig. 1 is the neural network general frame figure of the embodiment of the present invention.
Fig. 2 is the triple constraint schematic diagram of the embodiment of the present invention.
Fig. 3 is that the manifold constraint book of the embodiment of the present invention is intended to.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments:
In recent years, along with the upsurge of artificial intelligence, depth learning technology gradually rises and has influenced computer science
Various fields, also have more and more people in technical field of multimedia information retrieval and improve existing inspection using deep learning
The Stability and veracity of rope.Production confrontation network (the generative adversarial used in this method
It network is) a kind of new neural network for estimating to generate model by antagonistic process being widely used in recent years, in network
The generator (generator) for learning data distribution and arbiter for differentiating the data true and false are had trained simultaneously
(discriminator), generator and arbiter are confronted with each other in the training process, are finally reached dynamic equilibrium.Production pair
Anti- network is widely used in the various fields such as image generation, semantic segmentation, data enhancing, can be well according to loss function
Learn the data distribution rule to training sample, and generates new data similar with training sample.This method utilizes two groups of generations
Formula fights network and forms novel recirculating network, and improves network by Hash codes and a variety of constraint conditions and be used for multi-modal retrieval
When efficiency, Stability and veracity.
Cross-module state search method provided by the invention based on circulation production confrontation network, mainly devises a kind of new
The neural network of type, main overall structure is referring to Fig. 1.Embodiment is by taking the mutual retrieval between image and text data as an example to this
The neural network framework and flow chart of data processing of invention are specifically described, as follows:
First in embodiment, original two dimensional image data actual needs passes through preliminary processing, and the present embodiment is selected deep
Spend 19 layers of VGGNet of learning areas prevalence, and the 4096 dimensional feature vectors original as input that the fc7 of VGGNet layer is exported
Beginning characteristics of image mori, i.e. characteristics of image dimension dimgIt is 4096.Simultaneously, the urtext data of input also will be by processing
Become preliminary feature vector, the present embodiment handles textual data using conventional bag of words (Bag-of-Words) model
According to the processing method that the length of obtained BoW vector is selected with text data and specifically is related, in order to which implementation reference rises
See, the BoW vector dimension in the present embodiment is set as 2000 dimensions, i.e. text feature dimension dtxtBe 2000, and using the vector as
The urtext feature t of inputori。
Step 1, it designs first group of production and fights network, contain generator Gm→tWith arbiter Dtxt, according to input
Original image-urtext data are to (mori, tori) obtain generating text data tgen, to extract raw according to image data
At the mapping mode of text data, to obtain the semantic relation between image-text data.Specific implementation process is described as follows:
As shown in Figure 1, the network of top half is considered as first group of production confrontation network, generator is mainly contained
Gm→tWith arbiter Dtxt, at this moment input is original image-urtext data to (mori, tori).Data flow in a network, former
Beginning image moriPass through generator Gm→tIt obtains generating text tgen, i.e. tgen=Gm→t(mori), and it is desirable that generate text tgenAs far as possible
With urtext toriIt is similar.Generator Gm→tIt is made of the one-dimensional convolutional layer of multilayer, feature vector dimension variation therein is dimg→
512→dhash→100→dtxt。dimgThe dimension for indicating the primitive image features of input, is in the present embodiment 4096;dhashFor
It will be used for the dimension of the middle layer feature of Hash codes generation, size determines by required Hash code length, can be 64,
128,256 etc. is a variety of;dtxtFor the dimension of the urtext feature inputted in network, and the characteristic length of text is generated, at this
It is 2000 in embodiment.Arbiter D at the same timetxtWith generator Gm→tDynamic confrontation is carried out, urtext spy is distinguished in trial
Levy tgenWith generation text feature tori.Arbiter DtxtIt is the feedforward neural network of full articulamentum composition, characteristic dimension therein becomes
Turn to dtxt→512→16.When generator and arbiter reach dynamic equilibrium under given training condition, generator Gm→tEnergy
The mapping mode that text data is generated according to image data is extracted well, to obtain original image-generation text data
Between semantic relation.
Step 2, it designs second group of production and fights network, contain generator Gt→mWith arbiter Dimg, inputting is upper one
Obtained original image-generation text data is walked to (mori, tgen), obtain chain image mcycAnd it extracts according to text data
The mapping mode for generating image data, to obtain the semantic relation between text-image data.Specific implementation process illustrates such as
Under:
As shown in Figure 1, the network of lower half portion is considered as second group of production confrontation network, generator is mainly contained
Gt→mWith arbiter Dimg, at this moment input is original image-generation text data to (mori, tgen).Data flow in a network, raw
At text tgenPass through generator Gt→mObtain chain image mcyc, i.e. mcyc=Gt→m(tgen)=Gt→m(Gm→t(mori)), and it is desirable that
Chain image feature mcycWith primitive image features moriIt is similar as much as possible.Generator Gt→mIt is made of the one-dimensional inverse convolutional layer of multilayer,
Feature vector dimension variation therein is dtxt→100→dhash→512→dimg。dtxtFor the urtext spy inputted in network
The dimension of sign is in the present embodiment 2000;dhashIt is big for the dimension that will be used for the middle layer feature that Hash codes generate
It is small to be determined by required Hash code length, it is a variety of to can be 64,128,256 etc., and to fight network with first group of production
In Hash code length it is identical;dimgIndicate the dimension of the primitive image features of input, and the chain image feature ultimately produced
Length is in the present embodiment 4096.Arbiter D at the same timeimgWith generator Gt→mDynamic confrontation is carried out, trial, which is distinguished, to follow
Ring characteristics of image mcycWith primitive image features mori.Arbiter DimgIt is the feedforward neural network of full articulamentum composition, spy therein
Sign dimension variation is dimg→512→100→16.When generator and arbiter reach dynamic equilibrium under given training condition,
The mapping mode that image data is generated according to text data can be extracted well, to obtain generating text-chain image
Semantic relation between data.
Step 3, network is fought using two groups of productions of above two step design, it equally can be anti-by data flow direction
Turn, it is final to realize the mapping mode that text data is generated by image data, so that the semanteme between obtaining image-text data closes
System.I.e. comprehensive the first two steps fight network for the urtext feature t of input first with second group of productionoriIt is generated as
Generate characteristics of image mgen, obtain the semantic relation between text-image data;Recycle first group of production confrontation network that will give birth to
At characteristics of image mgenIt is generated as circulation text feature tcyc, obtain the semantic relation between image-text data.It has been finally reached instruction
Image data and text data circulate in two groups of production confrontation networks, generate confrontation, continue to optimize network when practicing
Purpose, specific implementation process are described as follows:
Input data is still original image-urtext data to (mori, tori), the sequence phase executed with two steps above
Instead, first with the generator generator G of second group of production confrontation networkt→mBy the urtext feature t of inputoriIt is generated as
Generate characteristics of image Gt→m, i.e. mgen=Gt→m(tori), generator Gt→mIn feature vector dimension variation it is as before, be
dtxt→100→dhash→512→dimg.Arbiter D at the same timeimgWith generator Gt→mDynamic confrontation is carried out, attempts to distinguish
Primitive image features moriWith generation characteristics of image mgen.Confrontation reaches generator G after dynamic equilibriumt→mIt can learn to arrive original text
Semantic relation between sheet-generation image data.Then the generator G of first group of production confrontation network is recycledm→tScheme generating
As feature mgenIt is generated as circulation text feature tcyc, i.e. tcyc=Gm→t(mten)=Gm→t(Gt→m(tori)), generator Gm→tIn
Feature vector dimension variation is as before, is dimg→512→dhash→100→dtxt.Arbiter D at the same timetxtWith generation
Device Gm→tDynamic confrontation is carried out, urtext feature t is distinguished in trialoriWith circulation text feature tcyc.Confrontation reaches dynamic equilibrium
Generator G afterwardsm→tIt can learn to the semantic relation generated between image-circulation text data.
The bidirectional circulating flow channel of image data and text data in a network by steps 1 and 2,3, in embodiment
It is established, wherein a channel, primitive image features data moriNetwork, which is fought, by first group of production obtains production text
Feature tgen, then by tgenNetwork, which is fought, by second group of production generates chain image feature mcyc;Another channel, original text
Notebook data toriSecond group of confrontation generation net is first passed through to obtain generating characteristics of image mgen, then by mgenPass through first group of production pair
Anti- network production recycles text feature tcyc.This sampled images and text data bidirectional circulating can generate in two groups of networks, with
This has arbiter D simultaneouslyimgAnd DtxtConfrontation generator is participated in, to improve effect of the e-learning across semantic relation between modal data
Fruit.Wherein arbiter DimgAnd DtxtLoss function design are as follows:
Wherein i indicates the data that i-th calculates, a total of n training sample data, and arbiter in the training process can not
Disconnectedly towards reduction LdiscDirection iterative learning.After the completion of the production confrontation network struction of bidirectional circulating, one of advantage
The loop-around data finally obtained exactly can be used and relatively obtain circulation loss function compared with initial data, while being also generate
The important component of device loss function:
Wherein θ1It is the hyper parameter of network, is 0.001 in the present embodiment, | | * | |2L2 distance is sought in expression.
Step 4, in order to improve cross-module state effectiveness of retrieval in practice, this method applicable threshold function is from two groups of generations
The Hash codes m that can indicate image and text feature is extracted respectively in the public subspace of formula confrontation network generatorhashWith
thash, and likelihood function is devised to assess the approximate error between two kinds of Hash codes.Specific implementation process is described as follows:
In two groups of production confrontation networks, since the input and output of generator are the characteristic of different modalities respectively,
The middle layer of generator is treated as the public subspace (as shown in Figure 1) across modal data by this example, and will in above step
The characteristic length of this layer is designed to the length d of the Hash codes neededhash.If the feature vector of middle layer is mcomAnd tcom, generate
Formula be mhash=sgn (mcom- 0.5) and thash=sgn (tcom- 0.5), wherein sgn is threshold function table, and formula is meant that
Each floating number in middle layer floating type feature vector, corresponding hash code bit is set as+1 when value is greater than 0.5, and value is less than
Corresponding hash code bit is set as -1 when 0.5.Such threshold transformation can be for the every of the feature vector of each training sample
One, each training sample can be to a Hash codes isometric with feature vector.Hash codes m is used in embodimenthash、thash
Substitute public sub-space feature vectors mcom、tcomIt retrieves, so that it may when by original retrieval between different floating type feature vectors
Distance calculate replace with Hash intersymbol Hamming distance calculate, greatly improve the calculating speed of retrieval.
For the approximate error for the Hash intersymbol that quantization characteristic vector sum generates, the present embodiment devises relevant loss letter
Number is as constraint.Example has used likelihood function of Hash codes under the conditions of feature vector, with the Hash codes jth of i-th of sample
PositionWith feature vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
It is close between feature vector and the Hash codes of generation to assess that embodiment according to likelihood function designs loss function
Like error:
Wherein n is total sample number, dhashFor vector digit.The loss function of assessment Hash codes approximate error will be used as network
One of constraint condition play a role in training.
Step 5, in order to construct the better network model of effect, when the present embodiment utilizes a variety of constraint conditions to network training
The data characteristics of generation is constrained, and is allowed to retain larger class feature, to improve accuracy when retrieval.It is more for data flow
The multi-class characteristic of mode, constrained under the conditions of non-supervisory using manifold guarantee data similarity between mode between classification and
Otherness;Since sample class label is given under surveillance requirements, constrained using triple to minimize similar different moulds
Characteristic distance between state between data maximizes both characteristic distances between inhomogeneity or the data of different modalities.Specific implementation process
It is described as follows:
There is the feature vector for introducing another small sortation network in the case of supervising to obtain to the public subspace of generator
Carry out classification constraint.For there is the cross-module state data set of supervision, i.e., when the data sample of training has class label, in order to more
Data category label is made full use of, the present embodiment carries out classification expression to public subspace using small sortation network, and designs
Classification loss function is allowed to be different from other layer of vector, carry stronger to constrain the generations of public sub-space feature vectors
Strong classification information also can correctly be classified when predicting classification.Classification loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network,
ciIt is the actual class label of the sample, what classification loss function actually calculated is L2 distance between the two.
Constraint to the homogeneous data of cross-module state to similitude is carried out.In the data of cross-module state, have many semantic similar
Pairs of training data, as some image data sample and another text data sample semantic similarity be very in training data
Height has similar category attribute.It is in the present embodiment that training image sample data is similar to it in order to utilize this characteristic
Samples of text data establish connection, and design loss function and the homogeneous data of cross-module state constrained.Loss function is public
Formula is as follows:
WithIt is generator G respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, loss
Function calculates semantic similar L2 distance of the correspondence across modal data.
It further expandsThe present embodiment is considered simultaneously between homogeneous data cross-module state and with same in mode
The similarity constraint of class data, i.e., the distance of semantic similar pairs of cross-module state training data and the feature vector with modal data
Semantic other dissimilar feature vectors should be less than.Under the training for having supervision, since data all have class label,
Therefore the distance between the cross-module state data vector under identical semantic label is minimized using triple constraint.Triple constraint
Signal is as shown in Fig. 2, icon of different shapes represents different classes of data, and different textures represents the mode of data not
Together, the data in feature space and the same class data in same modal data or cross-module state are closely located, with cross-module state inhomogeneity
Other data distance is farther out.In embodiment, to generate image dataFor (generate data feature tag be exactly its original
The class label of beginning input data), the text data t of distinguishing label similar with its is chosen firstα, i, while randomly selecting inhomogeneity
Other text data tβ, i, wherein α, β represent two categories label, and * representative is to generate data, and i represents the number of i-th calculating
According to the triple constraint for generating image seeks to minimizetα, iBetween distance, maximize simultaneouslytβ, i.Likewise,
For generating textThe constraint of its triple and mα, i, mβ, iIt is related.Therefore design triple constraint loss function is as follows:
For non-supervisory training, the present embodiment devises manifold constraint to guarantee same mode and across in modal data
The similarity of semantic similarity data.When due to using the training of non-supervisory data, data do not contain class label, therefore the present embodiment
K- neighbour matrix is constructed to guarantee that the data of semantic similarity are polymerize, semantic different data are separated.As shown in figure 3, this
Embodiment establishes similarity matrix after calculating kNN matrix, to the data to be constrained, then in public subspace to spy
It levies vector and carries out manifold constraint.By text data tαObtained generation image dataFor, according to tαKNN matrix calculate
As a result, by tαA closest data of k (k is set as 2 in the present embodiment) 1 is denoted as in similarity matrix, the number not closed on
According to being denoted as 0 in similarity matrix.After text data generates to obtain image feature vector, randomly select in similarity matrix
For the 1 corresponding generation image feature vector conduct of text dataIt is corresponding for 0 text data in similarity matrix
Generate image feature vector conductIn prevalence constraint, to minimizeWithBetween distance to guarantee semanteme
The similarity of the generation feature vector of close data is high, maximizesWithBetween distance guarantee different semantic datas
The similarity for generating feature vector is low.Similarly for generating text data, also have To carry out manifold about
Beam.Therefore design manifold constraint loss function is as follows:
In conclusion the generator loss function that our the available loss functions through various constraints are constituted.There is supervision
Under data training, generator loss function is by circulation loss functionHash codes loss functionThree
Tuple constrains loss functionCross-module state homogeneous data loss functionWith classification loss functionComposition, formula are as follows:
Wherein θ2, θ3, θ4, θ5It is the adjustable hyper parameter of network respectively, is set to 5,5,0.001 in the present embodiment,
20.Under non-supervisory data training, generator loss function is by circulation loss functionHash codes loss
FunctionManifold constrains loss functionCross-module state homogeneous data loss functionComposition, it is public
Formula is as follows:
The value of hyper parameter as set before.
In summary 5 steps are designed after arbiter loss function and generator loss function using common minimum
Very big algorithm iteration minimizes network losses, and the purpose of semantic relation between multi-modal data is established with realization.In the present embodiment
Minimax Algorithm uses stochastic gradient descent optimization algorithm, specifically used more stable RMSProp optimization algorithm.By
Confront with each other in arbiter and generator, thus the calculation method of the two be it is opposite, they all can be in each round iteration pair
The last round of iteration result of anti-other side, and reach dynamic equilibrium in this confront with each other.Calculation method is as follows:
Since arbiter is comparatively fast trained in real process, S generator of network every trained iteration of this method design is
Arbiter of iteration.Network training correlation hyper parameter S is set as 10 in the present embodiment, and the learning rate μ of network is set as
0.0001, the sample size in batches (batch size) of training is set as 64 every time;The weight learnt in network is carried out simultaneously
Trimming will be trained every time and be greater than c in generatorgenWeight be set to cgen, c is greater than in arbiterdiscWeight be set to cdisc, with
The weight for exempting to learn is excessive.
Step 6, trained neural network is used for cross-module state data search, it is mainly that data are public by generator
The feature vector boil down to Hash codes that subspace obtains recycle the Hamming distance of different data Hash intersymbol to retrieve.Specifically
Implementation process is described as follows:
After network training as described above study, generator just obtains image and text data in embodiment
Extracting mode across semantic relation relevant information between modal data.Embodiment can carry out the two-way inspection across modal data at this time
Rope, the weight parameter in network that training fixed first finishes, by image to be retrieved and text data mtest, ttestPass through instruction
Practice the generator G finishedm→t, Gt→mObtain the feature vector m on public subspacecom, tcom, then feature vector is generated as breathing out
Uncommon code mhash, thashFor use.When using image retrieval text, the Hash codes of the image are taken outCalculate itself and all texts
The Hamming distance of Hash codes, apart from nearest Hash codesThe text of representative is image → text cross-module state retrieval knot
Fruit;When using text retrieval image, the Hash codes of the text are taken outCalculate the Hamming distance of itself and all image hash codes
From apart from nearest Hash codesThe image of representative is text → image cross-module state retrieval result.
Above embodiments are merely to illustrate design philosophy and feature of the invention, and its object is to make technology in the art
Personnel can understand the content of the present invention and implement it accordingly, and protection scope of the present invention is not limited to the above embodiments.So it is all according to
It is within the scope of the present invention according to equivalent variations made by disclosed principle, mentality of designing or modification.
Claims (7)
1. a kind of cross-module state search method based on circulation production confrontation network, which comprises the following steps:
Two loop modules are designed, described two loop modules share two generators with identical network structure, and to life
The output data for middle layer of growing up to be a useful person has carried out Hash coding, and the purpose of generator is to generate cross-module as true as possible by training
State data;
One of loop module realizes mode m → mode t → mode m process by two generators, another
Loop module realizes mode t → mode m → mode t process also by two generators;
Arbiter is designed in each loop module, the arbiter divides the generation data and initial data of same mode
Class, and dynamic confrontation is carried out with generator, it ultimately generates device and arbiter and reaches dynamic equilibrium under given training condition.
2. the cross-module state search method according to claim 1 based on circulation production confrontation network, it is characterised in that:
For the multi-modal multi-class characteristic of data flow, constrained using manifold under the conditions of non-supervisory to guarantee between mode and classification
Between data similarity and otherness;Since class label is given under surveillance requirements, constrained using triple come minimum
Change the characteristic distance between similar different modalities between data, maximizes both characteristic distances between inhomogeneity or the data of different modalities.
3. the cross-module state search method according to claim 2 based on circulation production confrontation network, it is characterised in that:
The loss function of the arbiter specifically:
The circulation loss function with the generation data of mode compared with initial data relatively to obtain ultimately produced are as follows:
Wherein i indicates the data that i-th calculates, and a total of n training sample data, arbiter in the training process can be constantly
Towards reduction LdiscDirection iterative learning, DimgAnd DtxtRespectively indicate two arbiters, (mori, tori) respectively indicate original mould
State m and original mode t, mcycGenerate mode m feature, tcycGenerate mode t feature.
4. the cross-module state search method according to claim 3 based on circulation production confrontation network, it is characterised in that:
The loss function of the generator specifically:
Wherein θ1It is the hyper parameter of network, | | * | |2L2 distance is sought in expression.
5. the cross-module state search method according to claim 4 based on circulation production confrontation network, it is characterised in that:
If the middle layer feature vector of two generators is mcomAnd tcom, generate the formula of Hash coding are as follows:
mhash=sgn (mcom-0.5)
thash=sgn (tcom-0.5)
Wherein sgn is threshold function table, and formula is meant that each floating number in middle layer floating type feature vector, value are greater than
Corresponding hash code bit is set as+1 when 0.5, and corresponding hash code bit is set as -1 when value is less than 0.5.
6. the cross-module state search method according to claim 5 based on circulation production confrontation network, it is characterised in that: be
The approximate error for the Hash intersymbol that quantization characteristic vector sum generates, this method devise relevant loss function as constraint,
Specifically used likelihood functions of Hash codes under the conditions of feature vector, with the Hash codes jth position of i-th of sampleAnd spy
Levy vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
Loss function is further designed according to likelihood function to assess the approximation between feature vector and the Hash codes of generation accidentally
Difference:
Wherein n is total sample number, dhashFor vector digit.
7. the cross-module state search method according to claim 6 based on circulation production confrontation network, it is characterised in that: this
Classification constraint is carried out to generator middle layer feature vector in invention, to design classification loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network, ciIt is
The actual class label of the sample, what classification loss function actually calculated is L2 distance between the two;For cross-module state
Constraint of the homogeneous data to similitude is carried out, this method establish training image sample data samples of text data similar with it
Connection, and design loss function and the homogeneous data of cross-module state is constrained;Loss function formula is as follows:
WithIt is generator C respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, loss function
Calculate semantic similar L2 distance of the correspondence across modal data;In the case where there is the data training of supervision, since data all have
There is class label, therefore minimize using triple constraint the distance between the cross-module state data vector under identical semantic label,
The ternary loss function of design are as follows:
Wherein m, t respectively represent image and text data, α, and β represents two categories label, and * representative is to generate data, and i represents the
The data of i calculating;For non-supervisory training, this method devises manifold constraint to guarantee same mode and cross-module state
The similarity of semantic similarity data in data establishes similarity moment to the data to be constrained after calculating kNN matrix
Then battle array carries out manifold constraint to feature vector in public subspace;It is as follows to design manifold constraint loss function:
Wherein neib, non respectively represent neighbouring and not proximity data, other symbol meanings are with before;Comprehensive various functions,
Generator loss function designs under the data training for having supervision are as follows:
Generator loss function designs under non-supervisory data training are as follows:
θ2, θ3, θ4, θ5It is the weight hyper parameter of network;Whole network is carried out using RMSProp stochastic gradient descent optimization algorithm
Training iteration, iterative formula are as follows:
Since the decline of arbiter gradient is very fast in real process, every S generator of trained iteration of network of this method design just changes
Arbiter of generation, and use hyper parameter cgen, cdiscNetwork weight is trimmed, prevents network weight excessive.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811455802.6A CN109299342B (en) | 2018-11-30 | 2018-11-30 | Cross-modal retrieval method based on cycle generation type countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811455802.6A CN109299342B (en) | 2018-11-30 | 2018-11-30 | Cross-modal retrieval method based on cycle generation type countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109299342A true CN109299342A (en) | 2019-02-01 |
CN109299342B CN109299342B (en) | 2021-12-17 |
Family
ID=65142338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811455802.6A Active CN109299342B (en) | 2018-11-30 | 2018-11-30 | Cross-modal retrieval method based on cycle generation type countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299342B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019652A (en) * | 2019-03-14 | 2019-07-16 | 九江学院 | A kind of cross-module state Hash search method based on deep learning |
CN110032734A (en) * | 2019-03-18 | 2019-07-19 | 百度在线网络技术(北京)有限公司 | Near synonym extension and generation confrontation network model training method and device |
CN110059157A (en) * | 2019-03-18 | 2019-07-26 | 华南师范大学 | A kind of picture and text cross-module state search method, system, device and storage medium |
CN110222140A (en) * | 2019-04-22 | 2019-09-10 | 中国科学院信息工程研究所 | A kind of cross-module state search method based on confrontation study and asymmetric Hash |
CN110309861A (en) * | 2019-06-10 | 2019-10-08 | 浙江大学 | A kind of multi-modal mankind's activity recognition methods based on generation confrontation network |
CN110443309A (en) * | 2019-08-07 | 2019-11-12 | 浙江大学 | A kind of electromyography signal gesture identification method of combination cross-module state association relation model |
CN110909181A (en) * | 2019-09-30 | 2020-03-24 | 中国海洋大学 | Cross-modal retrieval method and system for multi-type ocean data |
CN110930469A (en) * | 2019-10-25 | 2020-03-27 | 北京大学 | Text image generation method and system based on transition space mapping |
CN110990595A (en) * | 2019-12-04 | 2020-04-10 | 成都考拉悠然科技有限公司 | Zero sample cross-mode retrieval method for cross-domain alignment embedding space |
CN111104982A (en) * | 2019-12-20 | 2020-05-05 | 电子科技大学 | Label-independent cross-task confrontation sample generation method |
CN111127385A (en) * | 2019-06-06 | 2020-05-08 | 昆明理工大学 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
CN111353076A (en) * | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
CN111523663A (en) * | 2020-04-22 | 2020-08-11 | 北京百度网讯科技有限公司 | Model training method and device and electronic equipment |
CN111581405A (en) * | 2020-04-26 | 2020-08-25 | 电子科技大学 | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning |
CN111783980A (en) * | 2020-06-28 | 2020-10-16 | 大连理工大学 | Ranking learning method based on dual cooperation generation type countermeasure network |
CN111881884A (en) * | 2020-08-11 | 2020-11-03 | 中国科学院自动化研究所 | Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device |
WO2021000664A1 (en) * | 2019-07-03 | 2021-01-07 | 中国科学院自动化研究所 | Method, system, and device for automatic calibration of differences in cross-modal target detection |
CN112199462A (en) * | 2020-09-30 | 2021-01-08 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
CN112364192A (en) * | 2020-10-13 | 2021-02-12 | 中山大学 | Zero sample Hash retrieval method based on ensemble learning |
CN112487217A (en) * | 2019-09-12 | 2021-03-12 | 腾讯科技(深圳)有限公司 | Cross-modal retrieval method, device, equipment and computer-readable storage medium |
CN113204522A (en) * | 2021-07-05 | 2021-08-03 | 中国海洋大学 | Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network |
WO2021189383A1 (en) * | 2020-03-26 | 2021-09-30 | 深圳先进技术研究院 | Training and generation methods for generating high-energy ct image model, device, and storage medium |
CN113706646A (en) * | 2021-06-30 | 2021-11-26 | 酷栈(宁波)创意科技有限公司 | Data processing method for generating landscape painting |
CN113779283A (en) * | 2021-11-11 | 2021-12-10 | 南京码极客科技有限公司 | Fine-grained cross-media retrieval method with deep supervision and feature fusion |
WO2022104540A1 (en) * | 2020-11-17 | 2022-05-27 | 深圳大学 | Cross-modal hash retrieval method, terminal device, and storage medium |
CN116524420A (en) * | 2023-07-03 | 2023-08-01 | 武汉大学 | Key target detection method and system in traffic scene |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103473307A (en) * | 2013-09-10 | 2013-12-25 | 浙江大学 | Cross-media sparse Hash indexing method |
US20140168077A1 (en) * | 2012-12-14 | 2014-06-19 | Barnesandnoble.Com Llc | Multi-touch navigation mode |
CN106547826A (en) * | 2016-09-30 | 2017-03-29 | 西安电子科技大学 | A kind of cross-module state search method, device and computer-readable medium |
CN107871014A (en) * | 2017-11-23 | 2018-04-03 | 清华大学 | A kind of big data cross-module state search method and system based on depth integration Hash |
CN108256627A (en) * | 2017-12-29 | 2018-07-06 | 中国科学院自动化研究所 | The mutual generating apparatus of audio-visual information and its training system that generation network is fought based on cycle |
CN108510559A (en) * | 2017-07-19 | 2018-09-07 | 哈尔滨工业大学深圳研究生院 | It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method |
-
2018
- 2018-11-30 CN CN201811455802.6A patent/CN109299342B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140168077A1 (en) * | 2012-12-14 | 2014-06-19 | Barnesandnoble.Com Llc | Multi-touch navigation mode |
CN103473307A (en) * | 2013-09-10 | 2013-12-25 | 浙江大学 | Cross-media sparse Hash indexing method |
CN106547826A (en) * | 2016-09-30 | 2017-03-29 | 西安电子科技大学 | A kind of cross-module state search method, device and computer-readable medium |
CN108510559A (en) * | 2017-07-19 | 2018-09-07 | 哈尔滨工业大学深圳研究生院 | It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method |
CN107871014A (en) * | 2017-11-23 | 2018-04-03 | 清华大学 | A kind of big data cross-module state search method and system based on depth integration Hash |
CN108256627A (en) * | 2017-12-29 | 2018-07-06 | 中国科学院自动化研究所 | The mutual generating apparatus of audio-visual information and its training system that generation network is fought based on cycle |
Non-Patent Citations (1)
Title |
---|
欧卫华等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019652A (en) * | 2019-03-14 | 2019-07-16 | 九江学院 | A kind of cross-module state Hash search method based on deep learning |
CN110019652B (en) * | 2019-03-14 | 2022-06-03 | 九江学院 | Cross-modal Hash retrieval method based on deep learning |
CN110032734B (en) * | 2019-03-18 | 2023-02-28 | 百度在线网络技术(北京)有限公司 | Training method and device for similar meaning word expansion and generation of confrontation network model |
CN110032734A (en) * | 2019-03-18 | 2019-07-19 | 百度在线网络技术(北京)有限公司 | Near synonym extension and generation confrontation network model training method and device |
CN110059157A (en) * | 2019-03-18 | 2019-07-26 | 华南师范大学 | A kind of picture and text cross-module state search method, system, device and storage medium |
CN110222140A (en) * | 2019-04-22 | 2019-09-10 | 中国科学院信息工程研究所 | A kind of cross-module state search method based on confrontation study and asymmetric Hash |
CN110222140B (en) * | 2019-04-22 | 2021-07-13 | 中国科学院信息工程研究所 | Cross-modal retrieval method based on counterstudy and asymmetric hash |
CN111127385A (en) * | 2019-06-06 | 2020-05-08 | 昆明理工大学 | Medical information cross-modal Hash coding learning method based on generative countermeasure network |
CN110309861A (en) * | 2019-06-10 | 2019-10-08 | 浙江大学 | A kind of multi-modal mankind's activity recognition methods based on generation confrontation network |
CN110309861B (en) * | 2019-06-10 | 2021-05-25 | 浙江大学 | Multi-modal human activity recognition method based on generation of confrontation network |
US11823429B2 (en) | 2019-07-03 | 2023-11-21 | Institute Of Automation, Chinese Academy Of Sciences | Method, system and device for difference automatic calibration in cross modal target detection |
WO2021000664A1 (en) * | 2019-07-03 | 2021-01-07 | 中国科学院自动化研究所 | Method, system, and device for automatic calibration of differences in cross-modal target detection |
CN110443309A (en) * | 2019-08-07 | 2019-11-12 | 浙江大学 | A kind of electromyography signal gesture identification method of combination cross-module state association relation model |
CN112487217A (en) * | 2019-09-12 | 2021-03-12 | 腾讯科技(深圳)有限公司 | Cross-modal retrieval method, device, equipment and computer-readable storage medium |
CN110909181A (en) * | 2019-09-30 | 2020-03-24 | 中国海洋大学 | Cross-modal retrieval method and system for multi-type ocean data |
CN110930469A (en) * | 2019-10-25 | 2020-03-27 | 北京大学 | Text image generation method and system based on transition space mapping |
CN110930469B (en) * | 2019-10-25 | 2021-11-16 | 北京大学 | Text image generation method and system based on transition space mapping |
CN110990595A (en) * | 2019-12-04 | 2020-04-10 | 成都考拉悠然科技有限公司 | Zero sample cross-mode retrieval method for cross-domain alignment embedding space |
CN110990595B (en) * | 2019-12-04 | 2023-05-05 | 成都考拉悠然科技有限公司 | Cross-domain alignment embedded space zero sample cross-modal retrieval method |
CN111104982B (en) * | 2019-12-20 | 2021-09-24 | 电子科技大学 | Label-independent cross-task confrontation sample generation method |
CN111104982A (en) * | 2019-12-20 | 2020-05-05 | 电子科技大学 | Label-independent cross-task confrontation sample generation method |
CN111353076B (en) * | 2020-02-21 | 2023-10-10 | 华为云计算技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
CN111353076A (en) * | 2020-02-21 | 2020-06-30 | 华为技术有限公司 | Method for training cross-modal retrieval model, cross-modal retrieval method and related device |
WO2021189383A1 (en) * | 2020-03-26 | 2021-09-30 | 深圳先进技术研究院 | Training and generation methods for generating high-energy ct image model, device, and storage medium |
CN111523663A (en) * | 2020-04-22 | 2020-08-11 | 北京百度网讯科技有限公司 | Model training method and device and electronic equipment |
CN111523663B (en) * | 2020-04-22 | 2023-06-23 | 北京百度网讯科技有限公司 | Target neural network model training method and device and electronic equipment |
CN111581405B (en) * | 2020-04-26 | 2021-10-26 | 电子科技大学 | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning |
CN111581405A (en) * | 2020-04-26 | 2020-08-25 | 电子科技大学 | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning |
CN111783980A (en) * | 2020-06-28 | 2020-10-16 | 大连理工大学 | Ranking learning method based on dual cooperation generation type countermeasure network |
CN111881884A (en) * | 2020-08-11 | 2020-11-03 | 中国科学院自动化研究所 | Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device |
CN112199462A (en) * | 2020-09-30 | 2021-01-08 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
WO2022068195A1 (en) * | 2020-09-30 | 2022-04-07 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
CN112364192A (en) * | 2020-10-13 | 2021-02-12 | 中山大学 | Zero sample Hash retrieval method based on ensemble learning |
WO2022104540A1 (en) * | 2020-11-17 | 2022-05-27 | 深圳大学 | Cross-modal hash retrieval method, terminal device, and storage medium |
CN113706646A (en) * | 2021-06-30 | 2021-11-26 | 酷栈(宁波)创意科技有限公司 | Data processing method for generating landscape painting |
CN113204522A (en) * | 2021-07-05 | 2021-08-03 | 中国海洋大学 | Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network |
CN113779283A (en) * | 2021-11-11 | 2021-12-10 | 南京码极客科技有限公司 | Fine-grained cross-media retrieval method with deep supervision and feature fusion |
CN116524420A (en) * | 2023-07-03 | 2023-08-01 | 武汉大学 | Key target detection method and system in traffic scene |
CN116524420B (en) * | 2023-07-03 | 2023-09-12 | 武汉大学 | Key target detection method and system in traffic scene |
Also Published As
Publication number | Publication date |
---|---|
CN109299342B (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299342A (en) | A kind of cross-module state search method based on circulation production confrontation network | |
Yu et al. | Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering | |
Li et al. | Factorizable net: an efficient subgraph-based framework for scene graph generation | |
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
Yan et al. | Image classification by cross-media active learning with privileged information | |
Lai et al. | Instance-aware hashing for multi-label image retrieval | |
Qu et al. | Joint hierarchical category structure learning and large-scale image classification | |
CN109918528A (en) | A kind of compact Hash code learning method based on semanteme protection | |
CN109558487A (en) | Document Classification Method based on the more attention networks of hierarchy | |
CN111753189A (en) | Common characterization learning method for few-sample cross-modal Hash retrieval | |
Wang et al. | Facilitating image search with a scalable and compact semantic mapping | |
Zhang et al. | Deep relation embedding for cross-modal retrieval | |
Liu et al. | Improving cross-modal image-text retrieval with teacher-student learning | |
CN108427740B (en) | Image emotion classification and retrieval algorithm based on depth metric learning | |
CN109271539A (en) | A kind of image automatic annotation method and device based on deep learning | |
Islam et al. | InceptB: a CNN based classification approach for recognizing traditional bengali games | |
CN110688502A (en) | Image retrieval method and storage medium based on depth hash and quantization | |
Kim et al. | Exploiting web images for video highlight detection with triplet deep ranking | |
Shen et al. | Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description. | |
Feng et al. | Learning to rank image tags with limited training examples | |
CN109960732A (en) | A kind of discrete Hash cross-module state search method of depth and system based on robust supervision | |
Wang et al. | A deep clustering via automatic feature embedded learning for human activity recognition | |
CN115827954A (en) | Dynamically weighted cross-modal fusion network retrieval method, system and electronic equipment | |
Lu et al. | A sustainable solution for IoT semantic interoperability: Dataspaces model via distributed approaches | |
CN113779283B (en) | Fine-grained cross-media retrieval method with deep supervision and feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |