CN109299342A - A kind of cross-module state search method based on circulation production confrontation network - Google Patents

A kind of cross-module state search method based on circulation production confrontation network Download PDF

Info

Publication number
CN109299342A
CN109299342A CN201811455802.6A CN201811455802A CN109299342A CN 109299342 A CN109299342 A CN 109299342A CN 201811455802 A CN201811455802 A CN 201811455802A CN 109299342 A CN109299342 A CN 109299342A
Authority
CN
China
Prior art keywords
data
network
loss function
cross
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811455802.6A
Other languages
Chinese (zh)
Other versions
CN109299342B (en
Inventor
倪立昊
王骞
邹勤
李明慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201811455802.6A priority Critical patent/CN109299342B/en
Publication of CN109299342A publication Critical patent/CN109299342A/en
Application granted granted Critical
Publication of CN109299342B publication Critical patent/CN109299342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of cross-module state search method based on circulation production confrontation network, this method devises a kind of novel binary channels circulation production confrontation neural network, and establishes the semantic dependency across modal data by the training neural network.Given different modalities data can two-way flow in a network, each modal data fights network by one group of production and generates another modal data, generate input of the data again as next group of production confrontation network, to realize that the bidirectional circulating of data generates, network continuously learns to across the semantic relation between modal data.In order to improve effectiveness of retrieval, the result of generator middle layer is approximately also corresponding binary system Hash codes using threshold function table and approximate function by this method, and a variety of constraint conditions are devised to guarantee the otherness of data between the similitude of same mode, homogeneous data and cross-module state, class, to further improve the Stability and veracity of retrieval.

Description

A kind of cross-module state search method based on circulation production confrontation network
Technical field
The invention belongs to technical field of multimedia information retrieval, and in particular to a kind of to fight network based on circulation production Cross-module state search method.
Technical background
With the arrival of Internet era, people can be touched whenever and wherever possible including picture, video, text, audio etc. The massive information of multiple modalities, the content that oneself needs how is got from these massive informations are concerned about as Internet user Emphasis, user frequently rely on Google, Baidu, must should wait search engines provide accurate retrieval service.However it is traditional mutual Network search service largely also rests on the degree of single mode retrieval, for the retrieval across modal data using less, retrieval Efficiency, accuracy, stability it is all to be improved, and largely all rely on existing data label, can not accomplish no mark Sign the cross-module state retrieval of data.Therefore, studying novel cross-module state search method has very strong realistic meaning and practical value, Key is the semantic relation by establishing between multi-modal isomeric data directly to retrieve other similar modal datas, is being not necessarily to It realizes in the case where marking all modal datas across the direct retrieval between modal data, finally further increases the property of retrieval Energy.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of cross-module state retrievals based on circulation production confrontation network Method can effectively submit the performance of existing cross-module state retrieval technique.
To achieve the goals above, the cross-module state retrieval side based on circulation production confrontation network designed by the present invention Method, which comprises the following steps:
Two loop modules are designed, described two loop modules share two generators with identical network structure, and Hash coding is carried out to the output data of generator middle layer, the purpose of generator is as true as possible by training generation Across modal data,;
One of loop module realizes mode m → mode t → mode m process by two generators, separately One loop module realizes mode t → mode m → mode t process also by two generators;
Respective arbiter is designed for different modalities in each loop module, the arbiter attempts to the mode It generates data and initial data is classified, and carry out dynamic confrontation with generator, ultimately generate device and arbiter in given instruction Reach dynamic equilibrium under the conditions of white silk.
Further, it for the multi-modal multi-class characteristic of data flow, is constrained under the conditions of non-supervisory using manifold to protect Demonstrate,prove the data similarity and otherness between mode between classification;Since class label is given under surveillance requirements, three are used Tuple constraint minimizes the characteristic distance between similar different modalities between data, maximizes both inhomogeneity or the data of different modalities Between characteristic distance.
Further, the loss function of the arbiter specifically:
The circulation loss function with the generation data of mode compared with initial data relatively to obtain ultimately produced are as follows:
Wherein i indicates the data that i-th calculates, a total of n training sample data, and arbiter in the training process can not Disconnectedly towards reduction LdiscDirection iterative learning, DimgAnd DtxtRespectively indicate two arbiters, (mori, tori) respectively indicate mould The original feature vector of state m and mode t, (mcyc, tcyc) respectively indicate the feature that mode m and mode t are generated by recirculating network Vector.
Still further, the loss function of the generator specifically:
Wherein θ1It is the hyper parameter of network, | | * | |2L2 distance is sought in expression.
Further, if the feature vector of two generator middle layers output is mcomAnd tcom, generate Hash coding Formula are as follows:
mhash=sgn (mcom-0.5)
thash=sgn (tcom-0.5)
Wherein sgn is threshold function table, and formula is meant that each floating number in middle layer floating type feature vector, value Corresponding hash code bit is set as+1 when greater than 0.5, and corresponding hash code bit is set as -1 when value is less than 0.5.
Still further, the approximate error of the Hash intersymbol for the generation of quantization characteristic vector sum, this method devise phase The loss function of pass is as constraint, specifically used likelihood functions of Hash codes under the conditions of feature vector, with i-th sample Hash codes jth positionWith feature vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
It is close between feature vector and the Hash codes of generation to assess that loss function is further designed according to likelihood function Like error:
Wherein n is total sample number, dhashFor vector digit.
Still further, classification constraint is carried out to generator middle layer feature vector in the present invention, to design classification Loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network, ciIt is the actual class label of the sample, what classification loss function actually calculated is L2 distance between the two.
Constraint for the homogeneous data to cross-module state to similitude is carried out, this method is by training image sample data and it Similar samples of text data establish connection, and design loss function and constrain the homogeneous data of cross-module state, lose letter Number formula is as follows:
WithIt is generator G respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, damage Mistake function calculates the L2 distance between semantic similar corresponding cross-module state homogeneous data.
In the case where there is the data training of supervision, since data all have class label, constrain to come using triple Minimize the distance between the cross-module state data vector under identical semantic label, the ternary loss function of design are as follows:
Wherein m, t respectively represent image and text data, α, and β represents two categories label, and * representative is to generate data, i generation The data of table i-th calculating;For non-supervisory training, this method devise manifold constraint come guarantee same mode and across The similarity of semantic similarity data in modal data establishes similarity to the data to be constrained after calculating kNN matrix Then matrix carries out manifold constraint to feature vector in public subspace, design manifold constraint loss function is as follows:
Wherein neib, non respectively represent neighbouring and not proximity data, other symbol meanings are with before.
Further, loss function design in summary, the generator loss function in the case of Training is set It is calculated as:
Generator loss function design in the case of unsupervised training are as follows:
θ2, θ3, θ4, θ5It is the weight hyper parameter of network.Whole network is calculated using the optimization of RMSProp stochastic gradient descent Method is trained iteration, iterative formula are as follows:
Since the decline of arbiter gradient is very fast in real process, every S generator of trained iteration of network of this method design Arbiter of ability iteration, and use hyper parameter cgen, cdiscNetwork weight is trimmed, prevents network weight excessive.
The present invention has the advantages that
The present invention fights network by the circulation production using two groups of generators and arbiter building preferably to establish Semantic relation between multi-modal data, and a variety of constraint conditions are devised to improve the Stability and veracity of retrieval, it uses Binary system Hash codes substitute primitive character to improve effectiveness of retrieval, study and explore a kind of novel generate based on circulation Formula fights the cross-module state search method of network, retrieves specific to the cross-module state between image and text.
Detailed description of the invention
Fig. 1 is the neural network general frame figure of the embodiment of the present invention.
Fig. 2 is the triple constraint schematic diagram of the embodiment of the present invention.
Fig. 3 is that the manifold constraint book of the embodiment of the present invention is intended to.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments:
In recent years, along with the upsurge of artificial intelligence, depth learning technology gradually rises and has influenced computer science Various fields, also have more and more people in technical field of multimedia information retrieval and improve existing inspection using deep learning The Stability and veracity of rope.Production confrontation network (the generative adversarial used in this method It network is) a kind of new neural network for estimating to generate model by antagonistic process being widely used in recent years, in network The generator (generator) for learning data distribution and arbiter for differentiating the data true and false are had trained simultaneously (discriminator), generator and arbiter are confronted with each other in the training process, are finally reached dynamic equilibrium.Production pair Anti- network is widely used in the various fields such as image generation, semantic segmentation, data enhancing, can be well according to loss function Learn the data distribution rule to training sample, and generates new data similar with training sample.This method utilizes two groups of generations Formula fights network and forms novel recirculating network, and improves network by Hash codes and a variety of constraint conditions and be used for multi-modal retrieval When efficiency, Stability and veracity.
Cross-module state search method provided by the invention based on circulation production confrontation network, mainly devises a kind of new The neural network of type, main overall structure is referring to Fig. 1.Embodiment is by taking the mutual retrieval between image and text data as an example to this The neural network framework and flow chart of data processing of invention are specifically described, as follows:
First in embodiment, original two dimensional image data actual needs passes through preliminary processing, and the present embodiment is selected deep Spend 19 layers of VGGNet of learning areas prevalence, and the 4096 dimensional feature vectors original as input that the fc7 of VGGNet layer is exported Beginning characteristics of image mori, i.e. characteristics of image dimension dimgIt is 4096.Simultaneously, the urtext data of input also will be by processing Become preliminary feature vector, the present embodiment handles textual data using conventional bag of words (Bag-of-Words) model According to the processing method that the length of obtained BoW vector is selected with text data and specifically is related, in order to which implementation reference rises See, the BoW vector dimension in the present embodiment is set as 2000 dimensions, i.e. text feature dimension dtxtBe 2000, and using the vector as The urtext feature t of inputori
Step 1, it designs first group of production and fights network, contain generator Gm→tWith arbiter Dtxt, according to input Original image-urtext data are to (mori, tori) obtain generating text data tgen, to extract raw according to image data At the mapping mode of text data, to obtain the semantic relation between image-text data.Specific implementation process is described as follows:
As shown in Figure 1, the network of top half is considered as first group of production confrontation network, generator is mainly contained Gm→tWith arbiter Dtxt, at this moment input is original image-urtext data to (mori, tori).Data flow in a network, former Beginning image moriPass through generator Gm→tIt obtains generating text tgen, i.e. tgen=Gm→t(mori), and it is desirable that generate text tgenAs far as possible With urtext toriIt is similar.Generator Gm→tIt is made of the one-dimensional convolutional layer of multilayer, feature vector dimension variation therein is dimg→ 512→dhash→100→dtxt。dimgThe dimension for indicating the primitive image features of input, is in the present embodiment 4096;dhashFor It will be used for the dimension of the middle layer feature of Hash codes generation, size determines by required Hash code length, can be 64, 128,256 etc. is a variety of;dtxtFor the dimension of the urtext feature inputted in network, and the characteristic length of text is generated, at this It is 2000 in embodiment.Arbiter D at the same timetxtWith generator Gm→tDynamic confrontation is carried out, urtext spy is distinguished in trial Levy tgenWith generation text feature tori.Arbiter DtxtIt is the feedforward neural network of full articulamentum composition, characteristic dimension therein becomes Turn to dtxt→512→16.When generator and arbiter reach dynamic equilibrium under given training condition, generator Gm→tEnergy The mapping mode that text data is generated according to image data is extracted well, to obtain original image-generation text data Between semantic relation.
Step 2, it designs second group of production and fights network, contain generator Gt→mWith arbiter Dimg, inputting is upper one Obtained original image-generation text data is walked to (mori, tgen), obtain chain image mcycAnd it extracts according to text data The mapping mode for generating image data, to obtain the semantic relation between text-image data.Specific implementation process illustrates such as Under:
As shown in Figure 1, the network of lower half portion is considered as second group of production confrontation network, generator is mainly contained Gt→mWith arbiter Dimg, at this moment input is original image-generation text data to (mori, tgen).Data flow in a network, raw At text tgenPass through generator Gt→mObtain chain image mcyc, i.e. mcyc=Gt→m(tgen)=Gt→m(Gm→t(mori)), and it is desirable that Chain image feature mcycWith primitive image features moriIt is similar as much as possible.Generator Gt→mIt is made of the one-dimensional inverse convolutional layer of multilayer, Feature vector dimension variation therein is dtxt→100→dhash→512→dimg。dtxtFor the urtext spy inputted in network The dimension of sign is in the present embodiment 2000;dhashIt is big for the dimension that will be used for the middle layer feature that Hash codes generate It is small to be determined by required Hash code length, it is a variety of to can be 64,128,256 etc., and to fight network with first group of production In Hash code length it is identical;dimgIndicate the dimension of the primitive image features of input, and the chain image feature ultimately produced Length is in the present embodiment 4096.Arbiter D at the same timeimgWith generator Gt→mDynamic confrontation is carried out, trial, which is distinguished, to follow Ring characteristics of image mcycWith primitive image features mori.Arbiter DimgIt is the feedforward neural network of full articulamentum composition, spy therein Sign dimension variation is dimg→512→100→16.When generator and arbiter reach dynamic equilibrium under given training condition, The mapping mode that image data is generated according to text data can be extracted well, to obtain generating text-chain image Semantic relation between data.
Step 3, network is fought using two groups of productions of above two step design, it equally can be anti-by data flow direction Turn, it is final to realize the mapping mode that text data is generated by image data, so that the semanteme between obtaining image-text data closes System.I.e. comprehensive the first two steps fight network for the urtext feature t of input first with second group of productionoriIt is generated as Generate characteristics of image mgen, obtain the semantic relation between text-image data;Recycle first group of production confrontation network that will give birth to At characteristics of image mgenIt is generated as circulation text feature tcyc, obtain the semantic relation between image-text data.It has been finally reached instruction Image data and text data circulate in two groups of production confrontation networks, generate confrontation, continue to optimize network when practicing Purpose, specific implementation process are described as follows:
Input data is still original image-urtext data to (mori, tori), the sequence phase executed with two steps above Instead, first with the generator generator G of second group of production confrontation networkt→mBy the urtext feature t of inputoriIt is generated as Generate characteristics of image Gt→m, i.e. mgen=Gt→m(tori), generator Gt→mIn feature vector dimension variation it is as before, be dtxt→100→dhash→512→dimg.Arbiter D at the same timeimgWith generator Gt→mDynamic confrontation is carried out, attempts to distinguish Primitive image features moriWith generation characteristics of image mgen.Confrontation reaches generator G after dynamic equilibriumt→mIt can learn to arrive original text Semantic relation between sheet-generation image data.Then the generator G of first group of production confrontation network is recycledm→tScheme generating As feature mgenIt is generated as circulation text feature tcyc, i.e. tcyc=Gm→t(mten)=Gm→t(Gt→m(tori)), generator Gm→tIn Feature vector dimension variation is as before, is dimg→512→dhash→100→dtxt.Arbiter D at the same timetxtWith generation Device Gm→tDynamic confrontation is carried out, urtext feature t is distinguished in trialoriWith circulation text feature tcyc.Confrontation reaches dynamic equilibrium Generator G afterwardsm→tIt can learn to the semantic relation generated between image-circulation text data.
The bidirectional circulating flow channel of image data and text data in a network by steps 1 and 2,3, in embodiment It is established, wherein a channel, primitive image features data moriNetwork, which is fought, by first group of production obtains production text Feature tgen, then by tgenNetwork, which is fought, by second group of production generates chain image feature mcyc;Another channel, original text Notebook data toriSecond group of confrontation generation net is first passed through to obtain generating characteristics of image mgen, then by mgenPass through first group of production pair Anti- network production recycles text feature tcyc.This sampled images and text data bidirectional circulating can generate in two groups of networks, with This has arbiter D simultaneouslyimgAnd DtxtConfrontation generator is participated in, to improve effect of the e-learning across semantic relation between modal data Fruit.Wherein arbiter DimgAnd DtxtLoss function design are as follows:
Wherein i indicates the data that i-th calculates, a total of n training sample data, and arbiter in the training process can not Disconnectedly towards reduction LdiscDirection iterative learning.After the completion of the production confrontation network struction of bidirectional circulating, one of advantage The loop-around data finally obtained exactly can be used and relatively obtain circulation loss function compared with initial data, while being also generate The important component of device loss function:
Wherein θ1It is the hyper parameter of network, is 0.001 in the present embodiment, | | * | |2L2 distance is sought in expression.
Step 4, in order to improve cross-module state effectiveness of retrieval in practice, this method applicable threshold function is from two groups of generations The Hash codes m that can indicate image and text feature is extracted respectively in the public subspace of formula confrontation network generatorhashWith thash, and likelihood function is devised to assess the approximate error between two kinds of Hash codes.Specific implementation process is described as follows:
In two groups of production confrontation networks, since the input and output of generator are the characteristic of different modalities respectively, The middle layer of generator is treated as the public subspace (as shown in Figure 1) across modal data by this example, and will in above step The characteristic length of this layer is designed to the length d of the Hash codes neededhash.If the feature vector of middle layer is mcomAnd tcom, generate Formula be mhash=sgn (mcom- 0.5) and thash=sgn (tcom- 0.5), wherein sgn is threshold function table, and formula is meant that Each floating number in middle layer floating type feature vector, corresponding hash code bit is set as+1 when value is greater than 0.5, and value is less than Corresponding hash code bit is set as -1 when 0.5.Such threshold transformation can be for the every of the feature vector of each training sample One, each training sample can be to a Hash codes isometric with feature vector.Hash codes m is used in embodimenthash、thash Substitute public sub-space feature vectors mcom、tcomIt retrieves, so that it may when by original retrieval between different floating type feature vectors Distance calculate replace with Hash intersymbol Hamming distance calculate, greatly improve the calculating speed of retrieval.
For the approximate error for the Hash intersymbol that quantization characteristic vector sum generates, the present embodiment devises relevant loss letter Number is as constraint.Example has used likelihood function of Hash codes under the conditions of feature vector, with the Hash codes jth of i-th of sample PositionWith feature vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
It is close between feature vector and the Hash codes of generation to assess that embodiment according to likelihood function designs loss function Like error:
Wherein n is total sample number, dhashFor vector digit.The loss function of assessment Hash codes approximate error will be used as network One of constraint condition play a role in training.
Step 5, in order to construct the better network model of effect, when the present embodiment utilizes a variety of constraint conditions to network training The data characteristics of generation is constrained, and is allowed to retain larger class feature, to improve accuracy when retrieval.It is more for data flow The multi-class characteristic of mode, constrained under the conditions of non-supervisory using manifold guarantee data similarity between mode between classification and Otherness;Since sample class label is given under surveillance requirements, constrained using triple to minimize similar different moulds Characteristic distance between state between data maximizes both characteristic distances between inhomogeneity or the data of different modalities.Specific implementation process It is described as follows:
There is the feature vector for introducing another small sortation network in the case of supervising to obtain to the public subspace of generator Carry out classification constraint.For there is the cross-module state data set of supervision, i.e., when the data sample of training has class label, in order to more Data category label is made full use of, the present embodiment carries out classification expression to public subspace using small sortation network, and designs Classification loss function is allowed to be different from other layer of vector, carry stronger to constrain the generations of public sub-space feature vectors Strong classification information also can correctly be classified when predicting classification.Classification loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network, ciIt is the actual class label of the sample, what classification loss function actually calculated is L2 distance between the two.
Constraint to the homogeneous data of cross-module state to similitude is carried out.In the data of cross-module state, have many semantic similar Pairs of training data, as some image data sample and another text data sample semantic similarity be very in training data Height has similar category attribute.It is in the present embodiment that training image sample data is similar to it in order to utilize this characteristic Samples of text data establish connection, and design loss function and the homogeneous data of cross-module state constrained.Loss function is public Formula is as follows:
WithIt is generator G respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, loss Function calculates semantic similar L2 distance of the correspondence across modal data.
It further expandsThe present embodiment is considered simultaneously between homogeneous data cross-module state and with same in mode The similarity constraint of class data, i.e., the distance of semantic similar pairs of cross-module state training data and the feature vector with modal data Semantic other dissimilar feature vectors should be less than.Under the training for having supervision, since data all have class label, Therefore the distance between the cross-module state data vector under identical semantic label is minimized using triple constraint.Triple constraint Signal is as shown in Fig. 2, icon of different shapes represents different classes of data, and different textures represents the mode of data not Together, the data in feature space and the same class data in same modal data or cross-module state are closely located, with cross-module state inhomogeneity Other data distance is farther out.In embodiment, to generate image dataFor (generate data feature tag be exactly its original The class label of beginning input data), the text data t of distinguishing label similar with its is chosen firstα, i, while randomly selecting inhomogeneity Other text data tβ, i, wherein α, β represent two categories label, and * representative is to generate data, and i represents the number of i-th calculating According to the triple constraint for generating image seeks to minimizetα, iBetween distance, maximize simultaneouslytβ, i.Likewise, For generating textThe constraint of its triple and mα, i, mβ, iIt is related.Therefore design triple constraint loss function is as follows:
For non-supervisory training, the present embodiment devises manifold constraint to guarantee same mode and across in modal data The similarity of semantic similarity data.When due to using the training of non-supervisory data, data do not contain class label, therefore the present embodiment K- neighbour matrix is constructed to guarantee that the data of semantic similarity are polymerize, semantic different data are separated.As shown in figure 3, this Embodiment establishes similarity matrix after calculating kNN matrix, to the data to be constrained, then in public subspace to spy It levies vector and carries out manifold constraint.By text data tαObtained generation image dataFor, according to tαKNN matrix calculate As a result, by tαA closest data of k (k is set as 2 in the present embodiment) 1 is denoted as in similarity matrix, the number not closed on According to being denoted as 0 in similarity matrix.After text data generates to obtain image feature vector, randomly select in similarity matrix For the 1 corresponding generation image feature vector conduct of text dataIt is corresponding for 0 text data in similarity matrix Generate image feature vector conductIn prevalence constraint, to minimizeWithBetween distance to guarantee semanteme The similarity of the generation feature vector of close data is high, maximizesWithBetween distance guarantee different semantic datas The similarity for generating feature vector is low.Similarly for generating text data, also have To carry out manifold about Beam.Therefore design manifold constraint loss function is as follows:
In conclusion the generator loss function that our the available loss functions through various constraints are constituted.There is supervision Under data training, generator loss function is by circulation loss functionHash codes loss functionThree Tuple constrains loss functionCross-module state homogeneous data loss functionWith classification loss functionComposition, formula are as follows:
Wherein θ2, θ3, θ4, θ5It is the adjustable hyper parameter of network respectively, is set to 5,5,0.001 in the present embodiment, 20.Under non-supervisory data training, generator loss function is by circulation loss functionHash codes loss FunctionManifold constrains loss functionCross-module state homogeneous data loss functionComposition, it is public Formula is as follows:
The value of hyper parameter as set before.
In summary 5 steps are designed after arbiter loss function and generator loss function using common minimum Very big algorithm iteration minimizes network losses, and the purpose of semantic relation between multi-modal data is established with realization.In the present embodiment Minimax Algorithm uses stochastic gradient descent optimization algorithm, specifically used more stable RMSProp optimization algorithm.By Confront with each other in arbiter and generator, thus the calculation method of the two be it is opposite, they all can be in each round iteration pair The last round of iteration result of anti-other side, and reach dynamic equilibrium in this confront with each other.Calculation method is as follows:
Since arbiter is comparatively fast trained in real process, S generator of network every trained iteration of this method design is Arbiter of iteration.Network training correlation hyper parameter S is set as 10 in the present embodiment, and the learning rate μ of network is set as 0.0001, the sample size in batches (batch size) of training is set as 64 every time;The weight learnt in network is carried out simultaneously Trimming will be trained every time and be greater than c in generatorgenWeight be set to cgen, c is greater than in arbiterdiscWeight be set to cdisc, with The weight for exempting to learn is excessive.
Step 6, trained neural network is used for cross-module state data search, it is mainly that data are public by generator The feature vector boil down to Hash codes that subspace obtains recycle the Hamming distance of different data Hash intersymbol to retrieve.Specifically Implementation process is described as follows:
After network training as described above study, generator just obtains image and text data in embodiment Extracting mode across semantic relation relevant information between modal data.Embodiment can carry out the two-way inspection across modal data at this time Rope, the weight parameter in network that training fixed first finishes, by image to be retrieved and text data mtest, ttestPass through instruction Practice the generator G finishedm→t, Gt→mObtain the feature vector m on public subspacecom, tcom, then feature vector is generated as breathing out Uncommon code mhash, thashFor use.When using image retrieval text, the Hash codes of the image are taken outCalculate itself and all texts The Hamming distance of Hash codes, apart from nearest Hash codesThe text of representative is image → text cross-module state retrieval knot Fruit;When using text retrieval image, the Hash codes of the text are taken outCalculate the Hamming distance of itself and all image hash codes From apart from nearest Hash codesThe image of representative is text → image cross-module state retrieval result.
Above embodiments are merely to illustrate design philosophy and feature of the invention, and its object is to make technology in the art Personnel can understand the content of the present invention and implement it accordingly, and protection scope of the present invention is not limited to the above embodiments.So it is all according to It is within the scope of the present invention according to equivalent variations made by disclosed principle, mentality of designing or modification.

Claims (7)

1. a kind of cross-module state search method based on circulation production confrontation network, which comprises the following steps:
Two loop modules are designed, described two loop modules share two generators with identical network structure, and to life The output data for middle layer of growing up to be a useful person has carried out Hash coding, and the purpose of generator is to generate cross-module as true as possible by training State data;
One of loop module realizes mode m → mode t → mode m process by two generators, another Loop module realizes mode t → mode m → mode t process also by two generators;
Arbiter is designed in each loop module, the arbiter divides the generation data and initial data of same mode Class, and dynamic confrontation is carried out with generator, it ultimately generates device and arbiter and reaches dynamic equilibrium under given training condition.
2. the cross-module state search method according to claim 1 based on circulation production confrontation network, it is characterised in that:
For the multi-modal multi-class characteristic of data flow, constrained using manifold under the conditions of non-supervisory to guarantee between mode and classification Between data similarity and otherness;Since class label is given under surveillance requirements, constrained using triple come minimum Change the characteristic distance between similar different modalities between data, maximizes both characteristic distances between inhomogeneity or the data of different modalities.
3. the cross-module state search method according to claim 2 based on circulation production confrontation network, it is characterised in that:
The loss function of the arbiter specifically:
The circulation loss function with the generation data of mode compared with initial data relatively to obtain ultimately produced are as follows:
Wherein i indicates the data that i-th calculates, and a total of n training sample data, arbiter in the training process can be constantly Towards reduction LdiscDirection iterative learning, DimgAnd DtxtRespectively indicate two arbiters, (mori, tori) respectively indicate original mould State m and original mode t, mcycGenerate mode m feature, tcycGenerate mode t feature.
4. the cross-module state search method according to claim 3 based on circulation production confrontation network, it is characterised in that:
The loss function of the generator specifically:
Wherein θ1It is the hyper parameter of network, | | * | |2L2 distance is sought in expression.
5. the cross-module state search method according to claim 4 based on circulation production confrontation network, it is characterised in that:
If the middle layer feature vector of two generators is mcomAnd tcom, generate the formula of Hash coding are as follows:
mhash=sgn (mcom-0.5)
thash=sgn (tcom-0.5)
Wherein sgn is threshold function table, and formula is meant that each floating number in middle layer floating type feature vector, value are greater than Corresponding hash code bit is set as+1 when 0.5, and corresponding hash code bit is set as -1 when value is less than 0.5.
6. the cross-module state search method according to claim 5 based on circulation production confrontation network, it is characterised in that: be The approximate error for the Hash intersymbol that quantization characteristic vector sum generates, this method devise relevant loss function as constraint, Specifically used likelihood functions of Hash codes under the conditions of feature vector, with the Hash codes jth position of i-th of sampleAnd spy Levy vector jth positionFor (sample is either image is also possible to text):
WhereinIt is the relevant sigmoid function of feature vector:
Loss function is further designed according to likelihood function to assess the approximation between feature vector and the Hash codes of generation accidentally Difference:
Wherein n is total sample number, dhashFor vector digit.
7. the cross-module state search method according to claim 6 based on circulation production confrontation network, it is characterised in that: this Classification constraint is carried out to generator middle layer feature vector in invention, to design classification loss function formula are as follows:
WhereinIt is the feature vector of i-th of sampleThe sample predictions classification obtained by small sortation network, ciIt is The actual class label of the sample, what classification loss function actually calculated is L2 distance between the two;For cross-module state Constraint of the homogeneous data to similitude is carried out, this method establish training image sample data samples of text data similar with it Connection, and design loss function and the homogeneous data of cross-module state is constrained;Loss function formula is as follows:
WithIt is generator C respectivelyt→m, Gm→tGenerate the feature vector of image and the public subspace of text, loss function Calculate semantic similar L2 distance of the correspondence across modal data;In the case where there is the data training of supervision, since data all have There is class label, therefore minimize using triple constraint the distance between the cross-module state data vector under identical semantic label, The ternary loss function of design are as follows:
Wherein m, t respectively represent image and text data, α, and β represents two categories label, and * representative is to generate data, and i represents the The data of i calculating;For non-supervisory training, this method devises manifold constraint to guarantee same mode and cross-module state The similarity of semantic similarity data in data establishes similarity moment to the data to be constrained after calculating kNN matrix Then battle array carries out manifold constraint to feature vector in public subspace;It is as follows to design manifold constraint loss function:
Wherein neib, non respectively represent neighbouring and not proximity data, other symbol meanings are with before;Comprehensive various functions, Generator loss function designs under the data training for having supervision are as follows:
Generator loss function designs under non-supervisory data training are as follows:
θ2, θ3, θ4, θ5It is the weight hyper parameter of network;Whole network is carried out using RMSProp stochastic gradient descent optimization algorithm Training iteration, iterative formula are as follows:
Since the decline of arbiter gradient is very fast in real process, every S generator of trained iteration of network of this method design just changes Arbiter of generation, and use hyper parameter cgen, cdiscNetwork weight is trimmed, prevents network weight excessive.
CN201811455802.6A 2018-11-30 2018-11-30 Cross-modal retrieval method based on cycle generation type countermeasure network Active CN109299342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811455802.6A CN109299342B (en) 2018-11-30 2018-11-30 Cross-modal retrieval method based on cycle generation type countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811455802.6A CN109299342B (en) 2018-11-30 2018-11-30 Cross-modal retrieval method based on cycle generation type countermeasure network

Publications (2)

Publication Number Publication Date
CN109299342A true CN109299342A (en) 2019-02-01
CN109299342B CN109299342B (en) 2021-12-17

Family

ID=65142338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811455802.6A Active CN109299342B (en) 2018-11-30 2018-11-30 Cross-modal retrieval method based on cycle generation type countermeasure network

Country Status (1)

Country Link
CN (1) CN109299342B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019652A (en) * 2019-03-14 2019-07-16 九江学院 A kind of cross-module state Hash search method based on deep learning
CN110032734A (en) * 2019-03-18 2019-07-19 百度在线网络技术(北京)有限公司 Near synonym extension and generation confrontation network model training method and device
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium
CN110222140A (en) * 2019-04-22 2019-09-10 中国科学院信息工程研究所 A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN110309861A (en) * 2019-06-10 2019-10-08 浙江大学 A kind of multi-modal mankind's activity recognition methods based on generation confrontation network
CN110443309A (en) * 2019-08-07 2019-11-12 浙江大学 A kind of electromyography signal gesture identification method of combination cross-module state association relation model
CN110909181A (en) * 2019-09-30 2020-03-24 中国海洋大学 Cross-modal retrieval method and system for multi-type ocean data
CN110930469A (en) * 2019-10-25 2020-03-27 北京大学 Text image generation method and system based on transition space mapping
CN110990595A (en) * 2019-12-04 2020-04-10 成都考拉悠然科技有限公司 Zero sample cross-mode retrieval method for cross-domain alignment embedding space
CN111104982A (en) * 2019-12-20 2020-05-05 电子科技大学 Label-independent cross-task confrontation sample generation method
CN111127385A (en) * 2019-06-06 2020-05-08 昆明理工大学 Medical information cross-modal Hash coding learning method based on generative countermeasure network
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111523663A (en) * 2020-04-22 2020-08-11 北京百度网讯科技有限公司 Model training method and device and electronic equipment
CN111581405A (en) * 2020-04-26 2020-08-25 电子科技大学 Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN111783980A (en) * 2020-06-28 2020-10-16 大连理工大学 Ranking learning method based on dual cooperation generation type countermeasure network
CN111881884A (en) * 2020-08-11 2020-11-03 中国科学院自动化研究所 Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device
WO2021000664A1 (en) * 2019-07-03 2021-01-07 中国科学院自动化研究所 Method, system, and device for automatic calibration of differences in cross-modal target detection
CN112199462A (en) * 2020-09-30 2021-01-08 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
CN112364192A (en) * 2020-10-13 2021-02-12 中山大学 Zero sample Hash retrieval method based on ensemble learning
CN112487217A (en) * 2019-09-12 2021-03-12 腾讯科技(深圳)有限公司 Cross-modal retrieval method, device, equipment and computer-readable storage medium
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
WO2021189383A1 (en) * 2020-03-26 2021-09-30 深圳先进技术研究院 Training and generation methods for generating high-energy ct image model, device, and storage medium
CN113706646A (en) * 2021-06-30 2021-11-26 酷栈(宁波)创意科技有限公司 Data processing method for generating landscape painting
CN113779283A (en) * 2021-11-11 2021-12-10 南京码极客科技有限公司 Fine-grained cross-media retrieval method with deep supervision and feature fusion
WO2022104540A1 (en) * 2020-11-17 2022-05-27 深圳大学 Cross-modal hash retrieval method, terminal device, and storage medium
CN116524420A (en) * 2023-07-03 2023-08-01 武汉大学 Key target detection method and system in traffic scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473307A (en) * 2013-09-10 2013-12-25 浙江大学 Cross-media sparse Hash indexing method
US20140168077A1 (en) * 2012-12-14 2014-06-19 Barnesandnoble.Com Llc Multi-touch navigation mode
CN106547826A (en) * 2016-09-30 2017-03-29 西安电子科技大学 A kind of cross-module state search method, device and computer-readable medium
CN107871014A (en) * 2017-11-23 2018-04-03 清华大学 A kind of big data cross-module state search method and system based on depth integration Hash
CN108256627A (en) * 2017-12-29 2018-07-06 中国科学院自动化研究所 The mutual generating apparatus of audio-visual information and its training system that generation network is fought based on cycle
CN108510559A (en) * 2017-07-19 2018-09-07 哈尔滨工业大学深圳研究生院 It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140168077A1 (en) * 2012-12-14 2014-06-19 Barnesandnoble.Com Llc Multi-touch navigation mode
CN103473307A (en) * 2013-09-10 2013-12-25 浙江大学 Cross-media sparse Hash indexing method
CN106547826A (en) * 2016-09-30 2017-03-29 西安电子科技大学 A kind of cross-module state search method, device and computer-readable medium
CN108510559A (en) * 2017-07-19 2018-09-07 哈尔滨工业大学深圳研究生院 It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method
CN107871014A (en) * 2017-11-23 2018-04-03 清华大学 A kind of big data cross-module state search method and system based on depth integration Hash
CN108256627A (en) * 2017-12-29 2018-07-06 中国科学院自动化研究所 The mutual generating apparatus of audio-visual information and its training system that generation network is fought based on cycle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
欧卫华等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019652A (en) * 2019-03-14 2019-07-16 九江学院 A kind of cross-module state Hash search method based on deep learning
CN110019652B (en) * 2019-03-14 2022-06-03 九江学院 Cross-modal Hash retrieval method based on deep learning
CN110032734B (en) * 2019-03-18 2023-02-28 百度在线网络技术(北京)有限公司 Training method and device for similar meaning word expansion and generation of confrontation network model
CN110032734A (en) * 2019-03-18 2019-07-19 百度在线网络技术(北京)有限公司 Near synonym extension and generation confrontation network model training method and device
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium
CN110222140A (en) * 2019-04-22 2019-09-10 中国科学院信息工程研究所 A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN110222140B (en) * 2019-04-22 2021-07-13 中国科学院信息工程研究所 Cross-modal retrieval method based on counterstudy and asymmetric hash
CN111127385A (en) * 2019-06-06 2020-05-08 昆明理工大学 Medical information cross-modal Hash coding learning method based on generative countermeasure network
CN110309861A (en) * 2019-06-10 2019-10-08 浙江大学 A kind of multi-modal mankind's activity recognition methods based on generation confrontation network
CN110309861B (en) * 2019-06-10 2021-05-25 浙江大学 Multi-modal human activity recognition method based on generation of confrontation network
US11823429B2 (en) 2019-07-03 2023-11-21 Institute Of Automation, Chinese Academy Of Sciences Method, system and device for difference automatic calibration in cross modal target detection
WO2021000664A1 (en) * 2019-07-03 2021-01-07 中国科学院自动化研究所 Method, system, and device for automatic calibration of differences in cross-modal target detection
CN110443309A (en) * 2019-08-07 2019-11-12 浙江大学 A kind of electromyography signal gesture identification method of combination cross-module state association relation model
CN112487217A (en) * 2019-09-12 2021-03-12 腾讯科技(深圳)有限公司 Cross-modal retrieval method, device, equipment and computer-readable storage medium
CN110909181A (en) * 2019-09-30 2020-03-24 中国海洋大学 Cross-modal retrieval method and system for multi-type ocean data
CN110930469A (en) * 2019-10-25 2020-03-27 北京大学 Text image generation method and system based on transition space mapping
CN110930469B (en) * 2019-10-25 2021-11-16 北京大学 Text image generation method and system based on transition space mapping
CN110990595A (en) * 2019-12-04 2020-04-10 成都考拉悠然科技有限公司 Zero sample cross-mode retrieval method for cross-domain alignment embedding space
CN110990595B (en) * 2019-12-04 2023-05-05 成都考拉悠然科技有限公司 Cross-domain alignment embedded space zero sample cross-modal retrieval method
CN111104982B (en) * 2019-12-20 2021-09-24 电子科技大学 Label-independent cross-task confrontation sample generation method
CN111104982A (en) * 2019-12-20 2020-05-05 电子科技大学 Label-independent cross-task confrontation sample generation method
CN111353076B (en) * 2020-02-21 2023-10-10 华为云计算技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN111353076A (en) * 2020-02-21 2020-06-30 华为技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
WO2021189383A1 (en) * 2020-03-26 2021-09-30 深圳先进技术研究院 Training and generation methods for generating high-energy ct image model, device, and storage medium
CN111523663A (en) * 2020-04-22 2020-08-11 北京百度网讯科技有限公司 Model training method and device and electronic equipment
CN111523663B (en) * 2020-04-22 2023-06-23 北京百度网讯科技有限公司 Target neural network model training method and device and electronic equipment
CN111581405B (en) * 2020-04-26 2021-10-26 电子科技大学 Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN111581405A (en) * 2020-04-26 2020-08-25 电子科技大学 Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN111783980A (en) * 2020-06-28 2020-10-16 大连理工大学 Ranking learning method based on dual cooperation generation type countermeasure network
CN111881884A (en) * 2020-08-11 2020-11-03 中国科学院自动化研究所 Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device
CN112199462A (en) * 2020-09-30 2021-01-08 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
WO2022068195A1 (en) * 2020-09-30 2022-04-07 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
CN112364192A (en) * 2020-10-13 2021-02-12 中山大学 Zero sample Hash retrieval method based on ensemble learning
WO2022104540A1 (en) * 2020-11-17 2022-05-27 深圳大学 Cross-modal hash retrieval method, terminal device, and storage medium
CN113706646A (en) * 2021-06-30 2021-11-26 酷栈(宁波)创意科技有限公司 Data processing method for generating landscape painting
CN113204522A (en) * 2021-07-05 2021-08-03 中国海洋大学 Large-scale data retrieval method based on Hash algorithm combined with generation countermeasure network
CN113779283A (en) * 2021-11-11 2021-12-10 南京码极客科技有限公司 Fine-grained cross-media retrieval method with deep supervision and feature fusion
CN116524420A (en) * 2023-07-03 2023-08-01 武汉大学 Key target detection method and system in traffic scene
CN116524420B (en) * 2023-07-03 2023-09-12 武汉大学 Key target detection method and system in traffic scene

Also Published As

Publication number Publication date
CN109299342B (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN109299342A (en) A kind of cross-module state search method based on circulation production confrontation network
Yu et al. Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering
Li et al. Factorizable net: an efficient subgraph-based framework for scene graph generation
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
Yan et al. Image classification by cross-media active learning with privileged information
Lai et al. Instance-aware hashing for multi-label image retrieval
Qu et al. Joint hierarchical category structure learning and large-scale image classification
CN109918528A (en) A kind of compact Hash code learning method based on semanteme protection
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
CN111753189A (en) Common characterization learning method for few-sample cross-modal Hash retrieval
Wang et al. Facilitating image search with a scalable and compact semantic mapping
Zhang et al. Deep relation embedding for cross-modal retrieval
Liu et al. Improving cross-modal image-text retrieval with teacher-student learning
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN109271539A (en) A kind of image automatic annotation method and device based on deep learning
Islam et al. InceptB: a CNN based classification approach for recognizing traditional bengali games
CN110688502A (en) Image retrieval method and storage medium based on depth hash and quantization
Kim et al. Exploiting web images for video highlight detection with triplet deep ranking
Shen et al. Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description.
Feng et al. Learning to rank image tags with limited training examples
CN109960732A (en) A kind of discrete Hash cross-module state search method of depth and system based on robust supervision
Wang et al. A deep clustering via automatic feature embedded learning for human activity recognition
CN115827954A (en) Dynamically weighted cross-modal fusion network retrieval method, system and electronic equipment
Lu et al. A sustainable solution for IoT semantic interoperability: Dataspaces model via distributed approaches
CN113779283B (en) Fine-grained cross-media retrieval method with deep supervision and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant