CN109213876A

CN109213876A - Based on the cross-module state search method for generating confrontation network

Info

Publication number: CN109213876A
Application number: CN201810871910.5A
Authority: CN
Inventors: 刘立波; 徐峰; 程晓龙; 郑斌; 郭进祥
Original assignee: Ningxia University
Current assignee: Ningxia University
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2019-01-15
Anticipated expiration: 2038-08-02
Also published as: CN109213876B

Abstract

The invention discloses a kind of based on the cross-module state search method for generating confrontation network, be related to multimedia data retrieval technical field, the described method comprises the following steps: step 1 carries out feature extraction to the data of input mode and the data of target modalities using feature extracting method；GAN model is established and trained to step 2, so that GAN model can be generated the data of target modalities by the data of input mode；Step 3, the data of the target modalities generated using GAN model carry out similarity mode with the data of the corresponding mode obtained in step 1, that is, carry out the calculating of Euclidean distance；Step 4 arranges the calculated result of Euclidean distance from small to large, to obtain the result of cross-module state retrieval；Euclidean distance is smaller, and the similarity of the more forward result of ranking and searched targets is higher.Compared with existing cross-module state retrieval model, the present invention more makes full use of study and the mapping ability of deep neural network, improves cross-module state retrieval accuracy.

Description

Based on the cross-module state search method for generating confrontation network

Technical field

The present invention relates to multimedia data retrieval technical field more particularly to a kind of cross-module states based on generation confrontation network Search method.

Background technique

With the development of internet technology, more and more have the media data of identical semanteme in the form of multiple modalities Occur simultaneously, for example, the photo that news report is corresponding, the diagnosis explanation and medical imaging of patient.At present people with greater need for Another mode is retrieved by a kind of mode, and is not only the retrieval to single mode.Such as, it is seen that a photo will shine Piece submits to searching system, and searching system can retrieve text information related with this photo；Patient can be by the X of oneself Mating plate image submits to searching system, and searching system can return to the diagnostic text being consistent with the X-ray.It is this to use a kind of mode Data retrieval to the method for other modal datas, referred to as cross-module state is retrieved.

Traditional cross-module state search method, such as by text retrieval image, principle is according further to the text to image The retrieval realization of markup information, essential or a kind of retrieval of single mode.But the development of internet is very fast, a large amount of image It is continued to bring out with text information, this makes the mark to image time-consuming and laborious；Additionally due to artificial mark tends not to complete table Up to the content of image, to have certain influence to search result.And depth learning technology is in processing text and image side at present Face achieves good effect, starts with for people from depth learning technology, realizes that new cross-module state retrieval technique provides road.

Two kinds can be divided into currently based on the algorithm of deep learning: 1) first kind method by different modal datas respectively into Row abstract indicates, these results abstracted is then mapped to a public representation space again, to establish each mode Between association, but such method lacked indicate study and association study between connection, this makes public representation space Not only included the shared information of multiple modalities, but also included the peculiar information of single mode data, and be unfavorable for the progress of cross-module state retrieval；2) Association is learnt and indicates to learn to merge to become whole by the second class method, but there are still retrieval effectiveness shakinesses for current this method Calmly, the problems such as retrieval precision is not high.

Therefore, those skilled in the art is dedicated to developing a kind of better cross-module state search method, improves above-mentioned retrieval The problem that effect is unstable, retrieval precision is not high.

Summary of the invention

The present invention proposes aiming at the problem that cross-module state is retrieved based on generation confrontation network (Generative Adversarial Networks, GAN) cross-module state search method, using the good code capacity of deep neural network, not With the bridge for establishing conversion between modal data, so that depth model has a better expression effect, and the retrieval of cross-module state Accuracy is higher.

To achieve the above object, the present invention provides a kind of based on the cross-module state search method for generating confrontation network, special Sign is, the described method comprises the following steps:

Step 1 carries out feature extraction to the data of input mode and the data of target modalities using feature extracting method；

GAN model is established and trained to step 2, so that GAN model can be generated the number of target modalities by the data of input mode According to；

Step 3, data and the data of the corresponding mode obtained in step 1 of the target modalities generated using GAN model into Row similarity mode carries out the calculating of Euclidean distance；

Step 4 arranges the calculated result of Euclidean distance from small to large, to obtain the result of cross-module state retrieval；It is European Apart from smaller, the similarity of the more forward result of ranking and searched targets is higher.

Further, feature extraction described in the step 1 includes the following steps:

Step 1.1, when text data be input modal data when, image data is target modalities data, and vice versa；

Step 1.2 extracts feature using different method for the data of different modalities: image data feature passes through VGG- 16, FCN method extracts；Text modality data carry out feature extraction by word2vec method；For image and textual data According to the feature after extraction is indicated with vector mode.

Further, foundation described in the step 2 and training GAN model include the following steps:

Step 2.1 builds GAN network model using the method based on Tensorflow frame；

Step 2.2 is trained GAN model using training set data, obtains the various parameters of GAN model.

Further, GAN model is trained including walking as follows using training set data described in the step 2.2 It is rapid:

Step 2.2.1, the parameter θ of arbiter is initialized_dWith the parameter θ of generator_g；

Step 2.2.2, the arbiter in training GAN: target modalities data set is sent into arbiter and is trained, is differentiated Device learns to obtain its semantic information input data；

Step 2.2.3, it the generator in training GAN: using certain modal data as input modal data, is sent into and generates Device, generator will generate target modalities data according to input modal data and be sent to arbiter, and arbiter will be to generation Target modalities data are differentiated, and result is fed back to generator；

Step 2.2.4, step 2.2.2 and step 2.2.3 is repeated, until arbiter is restrained with generator, obtains GAN mould The parameter sets θ of type.

Further, the training of arbiter is included the following steps: in the step 2.2.2

Step 2.2.2.1: from the data P of training set_data(x) m training sample { x of input modal data is taken out in¹, x²,...,x^m}；

Step 2.2.2.2: from the data P of training set_data(x) m sample { z of target modalities data is taken out in¹, z²,...,z^m}；

Step 2.2.2.3: the data of generation are obtained

Step 2.2.2.4: the parameter θ of arbiter is updated_dTo maximization:

Wherein: P_dataIt (x) is the training set indicated with vector, including input modal data and target modalities data, G is represented The distribution of generator, D represent the result of arbiter.

Further, the training of generator is included the following steps: in the step 2.2.3

Step 2.2.3.1: from the data P of pre-set training set_data(x) it takes out and is different from step 2.2.2.2 in M sample { z¹,z²,...,z^m}；

Step 2.2.3.2: the parameter θ of generator is updated_gTo minimum:

Further, the calculating of Euclidean distance described in the step 3 is as follows: input modal data enters GAN model Afterwards, target modalities data are obtained, which will carry out Euclidean distance meter with all data in true corresponding modal data It calculates, the similarity degree between two vectors is reflected by Euclidean distance.

Further, in n-dimensional space, the calculation formula of the Euclidean distance d in the step 3 are as follows:

Wherein t_iAnd y_iFor two n-dimensional vectors.

The invention has the advantages that: the code capacity of GAN is made full use of, is constructed between the data of different modalities Bridge is mapped, more complicated network structure in the cross-module state retrieval model of existing depth network is got rid of；With existing cross-module State retrieval model is compared, and study and the mapping ability of deep neural network is more fully utilized, and it is quasi- to improve the retrieval of cross-module state Exactness.

It is described further below with reference to technical effect of the attached drawing to design of the invention, specific structure and generation, with It is fully understood from the purpose of the present invention, feature and effect.

Detailed description of the invention

Fig. 1 is the flow chart of technical solution of the present invention；

Fig. 2 is the structure chart that confrontation network model is generated in the present invention；

Fig. 3 is the flow chart that confrontation network model training is generated in the present invention；

Fig. 4 is the flow chart of the embodiment of the present invention；

Fig. 5 is the comparative result figure of cross-module state retrieval.

Specific embodiment

Multiple preferred embodiments of the invention are introduced below with reference to Figure of description, keep its technology contents more clear and just In understanding.The present invention can be emerged from by many various forms of embodiments, and protection scope of the present invention not only limits The embodiment that Yu Wenzhong is mentioned.

To solve cross-module state search problem, the present invention proposes a kind of cross-module state search method based on GAN, the technology of the present invention Program flow chart is as shown in Figure 1, comprising the following steps:

Step 1: feature extraction being carried out to the data of input mode and the data of target modalities using feature extracting method, is obtained The vector that mode and target modalities data must be inputted indicates；

Step 2: GAN model is established and train, so that GAN model can be generated separately by a kind of data (input mode) of mode A kind of data of mode (target modalities)；

Step 3: the target modalities data generated using GAN model are carried out with the data of the corresponding mode obtained in step 1 Similarity mode carries out the calculating of Euclidean distance；

Step 4: the calculated result of Euclidean distance being arranged from small to large, to obtain the result of cross-module state retrieval.

Fig. 2 is the neural network structure figure of GAN model in the present invention.GAN model includes a generator (Generator) With an arbiter (Discriminator), the effect of generator is the feature vector expression generation according to input modal data The feature vector of corresponding target modalities data indicates out, and the effect of arbiter is to guarantee that generator can be in training The feature vector for properly generating target modalities data indicates, correct with the mapping for ensuring to input between mode and output modalities.Below It describes in detail to GAN model.

GAN model:

The purpose for generating confrontation network (Generative Adversarial Network, GAN) is desirable to according to input Data generate target data.Unlike general encoder, GAN includes by two networks, and one is generator (Generator), one is arbiter (Discriminator), and the mutual game of the two is fought mutually by two networks to reach To best generation effect.By the differentiation of arbiter, training of the continuous iteration to generator, finally until arbiter can not Judge whether the data generated have any different with truthful data, generator has just reached fitting state, and generator may be used as at this time It is the bridge of input data and the mutual inversion of phases of output data.

GAN model working principle is illustrated for generating image:

For the distribution P of true pictures_data(x), x is a true picture, be can be represented by vectors, point of the vector Cloth is P_data, now need to generate the image under the distribution.

Assuming that existing generator is distributed as P_G(x, θ), the distribution are controlled by θ, θ be the distribution parameter (if It is gauss hybrid models, then θ is exactly the average value and variance of each Gaussian Profile).If there is true data { x¹, x²..., x^m, if it is desired to calculate a likelihood P_G(xⁱ, θ), for these data, be in the likelihood generated in modelIf we need that generator is allowed to generate true picture maximum probability, a θ is needed^*To maximize L.

It allows generator maximum probability to generate true picture, that is, needs to find a θ and enable P_GCloser to P_data.This In assume that P_G(x, θ) is a neural network.A vector z is randomly generated first, it is raw by this network of G (z)=x At picture x, in order to compare whether z and x is similar, one group of sample of z can be taken, this group of sample meets a distribution, then passing through Another distribution P can be generated in network_G, then compare itself and true distribution P_dataDifference.

The objective function of GAN is as follows:

Wherein G represents generator distribution, and D represents arbiter as a result, P_dataIt is truthful data, P_GIt is the data generated.If Fixed G, max V (G, D) mean that P_GAnd P_dataBetween difference, it is only necessary to find a best G, make max V minimum, just It is the difference minimum between two distributions.

G fixed first, solves optimal D:

For given x, the maximization D of optimization^*

P_data(x)log D(x)+P_G(x)log(1-D(x))

Solve D^*:

F (D)=alog (D)+blog (1-D)

By optimal D^*It substitutes intoIt can be obtained

Wherein JSD is the symmetrical smoothed version of KL, illustrates the difference between two distributions, which shows fixed G,Indicate that the difference between two distributions, minimum value are -2log2, maximum value 0.Work as P_G(x)=P_data(x) when, G It is optimal.

The training of GAN model:

GAN network include an a generator G and arbiter D, training when two networks alternately.Assuming that initial Generator and arbiter are G₀And D₀, first train D₀It findsThen D is fixed₀Start to train G₀, training process makes With gradient descent method, and so on, training D₁, G₁, D₂, G₂...

Training step is as shown in figure 3, detailed step is as follows:

1) step 2.1: the parameter θ of initialization arbiter and generator_dAnd θ_g；

2) step 2.2: training arbiter；

3) step 2.3: training generator；

4) step 2.4: step 2)~step 3) is alternately performed until algorithmic statement.

In step 2.2, the training of arbiter is included the following steps:

1) from data P_data(x) m training sample { x is taken out in¹,x²,...,x^m}；

2) from pre-set random vector P_prior(z) m sample { z is taken out in¹,z²,...,z^m}；

3) data generated are obtained

4) parameter θ of arbiter is updated_dTo maximization:

In step 2.3, the training of generator is included the following steps:

1) from pre-set random vector P_prior(z) m sample { z being different from step 2) is taken out in¹,z²,..., z^m}；

2) parameter θ of generator is updated_gTo minimum:

It can be obtained the parameter sets θ of GAN model by the above method.

For the purpose of the present invention, technical solution expression is more clearly understood, with reference to the accompanying drawing and specific embodiment The present invention is further described in detail again.

Embodiment:

Assuming that there is m to the text and image data of known corresponding relationship, i.e. training dataset；The text of unknown corresponding relationship With image data each n, i.e. test data set；It is illustrated by taking image retrieval text as an example, searched targets are test data set In some image s, search library include test set in k retrieval member, retrieve member be text modality data；Such as Fig. 4 institute Show, include following 4 steps:

1) step 401: using feature extracting method, to the text and image data progress feature in training set and test set It extracts, can obtain its vector using the methods of word2vec for text data indicates, image data can be used The methods of VGG16 or FCN extract its feature and obtain the expression of its vector；By the step, m can be obtained to known corresponding relationship Different modal datas feature vector, obtain each n of feature vector of the text and image modal data of unknown corresponding relationship It is a；

2) step 402: using m in training set to the feature vector of the different modalities data of known corresponding relationship to GAN mould Type is trained；By the step, GAN can be generated according to the image or text modality data of input approximate semantic text or Image modalities data.

To the specific steps of GAN model training in the step are as follows:

1) parameter θ of arbiter and generator is initialized_dAnd θ_g；

2) training arbiter；

3) training generator；

4) step 2)~step 3) is alternately performed until algorithmic statement.

In step 2), the training of arbiter is included the following steps:

1. taking out m text modality training sample { x from training set¹,x²,...,x^m}；

2. taking out m image modalities sample { z from training set¹,z²,...,z^m}；

3. obtaining the data of generation

4. updating the parameter θ of arbiter_dTo maximization:

In step 3), the training of generator is included the following steps:

1. from m image modalities sample { z being different from step 2) is taken out in training set¹,z²,...,z^m}；

2. updating the parameter θ of generator_gTo minimum:

It can be obtained the parameter sets θ of GAN model by the above method.

3) step 403: indicating image s to be retrieved using the vector of s obtained in step 401, and by the vector table Show that the GAN model be sent into and trained, the vector that GAN model can generate the target modalities of s indicate, i.e., containing semanteme identical with s Text vector indicates s '；

4) step 404: the vector corresponding to s of generation is indicated in s ' and target modalities data, i.e. k text modality inspection The vector of each member in rope member indicates, calculates Euclidean distance, and according to Euclidean distance from small to large be sequentially generated knot Fruit list；

In the step, the calculation formula of Euclidean distance d are as follows:

Wherein s ' is the vector expression of target image s to be retrieved, k_iIndicate the retrieval member of k text modality, d_iIndicate s ' With k_iEuclidean distance.Euclidean distance by calculating s ' and each k obtains d, arranges from small to large further according to d and d_iIt is corresponding k_i, just obtain search result list；

As shown in figure 5, the result of result and existing cross-module state search method that GAN model carries out the retrieval of cross-module state carries out Comparison, evaluation index are mAP (mean Average Precise)；MAP is the superiority and inferiority of common scaling information search result Standard；The inquiry specified for one, R result before returning；The calculation formula of its mAP are as follows:

Wherein, M represents the fruiting quantities that certain image s is retrieved, and p (r) indicates the accuracy rate in position r, and rel (r) is represented The correlation (correlation maximum 1, minimum 0,0 is uncorrelated) of the result of position r and image s, evaluating standard is that s and r is It is no that there is identical semanteme；In the present invention, first 50 that search result quantity is search result are returned.

In Fig. 5, i-t is indicated by image retrieval text, and t-i indicates that, by text retrieval image, AVG is indicated by image retrieval text This and by text retrieval image average mAP value；From fig. 5, it can be seen that method of the invention in Wikipedia data set and Retrieval precision in NUS-WIDE-10K data set is above other methods；Embody the cross-module state retrieval side based on GAN model Method more accurately learns the semantic relation arrived between different modalities, and cross-module state retrieval accuracy is higher.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that the ordinary skill of this field is without wound The property made labour, which according to the present invention can conceive, makes many modifications and variations.Therefore, all technician in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Scheme, all should be within the scope of protection determined by the claims.

Claims

1. a kind of based on the cross-module state search method for generating confrontation network, which is characterized in that the described method comprises the following steps:

GAN model is established and trained to step 2, so that GAN model can be generated the data of target modalities by the data of input mode；

Step 3, the data of the target modalities generated using GAN model carry out phase with the data of the corresponding mode obtained in step 1 It is matched like degree, that is, carries out the calculating of Euclidean distance；

Step 4 arranges the calculated result of Euclidean distance from small to large, to obtain the result of cross-module state retrieval；Euclidean distance Smaller, the similarity of the more forward result of ranking and searched targets is higher.

2. as described in claim 1 based on the cross-module state search method for generating confrontation network, which is characterized in that the step 1 Described in feature extraction include the following steps:

Step 1.2 extracts feature using different method for the data of different modalities: image data feature by VGG-16, FCN method extracts；Text modality data carry out feature extraction by word2vec method；For image and text data, Feature after extraction is indicated with vector mode.

3. as described in claim 1 based on the cross-module state search method for generating confrontation network, which is characterized in that the step 2 Described in foundation and training GAN model include the following steps:

Step 2.1 builds GAN network model using the method based on Tensorflow frame；

4. as claimed in claim 3 based on the cross-module state search method for generating confrontation network, which is characterized in that the step GAN model is trained using training set data described in 2.2 and is included the following steps:

Step 2.2.2, the arbiter in training GAN: target modalities data set is sent into arbiter and is trained, arbiter pair Input data learns to obtain its semantic information；

Step 2.2.3, the generator in training GAN: using certain modal data as input modal data, being sent into generator, raw Target modalities data will be generated according to input modal data and be sent to arbiter by growing up to be a useful person, and arbiter will be to the target mould of generation State data are differentiated, and result is fed back to generator；

Step 2.2.4, step 2.2.2 and step 2.2.3 is repeated, until arbiter is restrained with generator, obtains GAN model Parameter sets θ.

5. as claimed in claim 4 based on the cross-module state search method for generating confrontation network, which is characterized in that the step 2.2.2 the training of arbiter is included the following steps: in

Step 2.2.2.2: from the data P of training set_data(x) m sample { z of target modalities data is taken out in¹,z²,...,z^m}；

Step 2.2.2.3: the data of generation are obtained

Step 2.2.2.4: the parameter θ of arbiter is updated_dTo maximization:

Wherein: P_dataIt (x) is the training set indicated with vector, including input modal data and target modalities data, G, which is represented, to be generated The distribution of device, D represent the result of arbiter.

6. as claimed in claim 4 based on the cross-module state search method for generating confrontation network, which is characterized in that the step 2.2.3 the training of generator is included the following steps: in

Step 2.2.3.1: from the data P of pre-set training set_data(x) m be different from step 2.2.2.2 are taken out in Sample { z¹,z²,...,z^m}；

Step 2.2.3.2: the parameter θ of generator is updated_gTo minimum:

7. as described in claim 1 based on the cross-module state search method for generating confrontation network, which is characterized in that the step 3 Described in Euclidean distance calculating it is as follows: input modal data enter GAN model after, obtain target modalities data, the mode Data will with data progress Euclidean distance calculating all in true corresponding modal data, reflected by Euclidean distance two to Similarity degree between amount.

8. as described in claim 1 based on the cross-module state search method for generating confrontation network, which is characterized in that in n-dimensional space In, the calculation formula of the Euclidean distance d in the step 3 are as follows:

Wherein t_iAnd y_iFor two n-dimensional vectors.