CN109711442A

CN109711442A - Unsupervised layer-by-layer generation confrontation feature representation learning method

Info

Publication number: CN109711442A
Application number: CN201811536668.2A
Authority: CN
Inventors: 张婷婷; 牛彦杰; 崇志宏; 周俏; 董会
Original assignee: Jiangsu Xinwu Intelligent Information Technology Co ltd; Army Engineering University of PLA
Current assignee: Jiangsu Xinwu Intelligent Information Technology Co ltd; Army Engineering University of PLA
Priority date: 2018-12-15
Filing date: 2018-12-15
Publication date: 2019-05-03
Anticipated expiration: 2038-12-15
Also published as: CN109711442B

Abstract

The invention discloses an unsupervised layer-by-layer generation countermeasure feature representation learning method, which is characterized in that a stacking network is formed by 2 or more generation type countermeasure networks, wherein the first generation type countermeasure network takes multidimensional random noise as input, and the rest generation type countermeasure networks take the random noise and the implicit feature of the previous branch as input; generating images with corresponding sizes through each generation network, optimizing each discrimination network according to the cross entropy of each discrimination network, and optimizing the whole generation network according to expected values of intermediate layer characteristics of the discrimination network and similar statistical characteristics between the generated images; and extracting an abstract semantic feature representation vector by using the optimized SGANs, and determining the hash feature representation of the image by combining a hash representation method. The invention can generate high-level abstract semantic features and better learn the distribution of real images.

Description

Unsupervised layer-by-layer generation fights character representation learning method

Technical field

The present invention relates to machine learning techniques, and in particular to a kind of unsupervised layer-by-layer generation confrontation character representation study side Method.

Background technique

Massive image retrieval based on content is widely used in e-commerce, medical diagnosis and trade mark and intellectual property neck Domain usually carries out Similarity matching by the characteristics of image extracted, measures content similar image.The mode of manual extraction feature is by face The character representation as image such as color, texture, shape, profile, improves the accuracy of image retrieval to a certain extent.But hand The mode of work can only extract the color of image low level, Texture eigenvalue, cannot extract the high-level abstract semantics of image Feature.Character representation study can be automatically extracted to classification, retrieval or prediction from image to task useful feature, such as deep Degree neural network can extract image from the raw information of Pixel-level to abstract semantic concept information, and automatic study is abundant to image Semantic feature.But exists in the image data of magnanimity largely without the data of label, have the deep learning method of supervision can only Using the semantic feature for the image study image for having label on a small quantity, it is easy to produce over-fitting, generalization ability is poor.GAN is a kind of nothing The deep learning model of supervision can utilize the distribution for differentiating the Game Learning true picture of network and generation network, but single A network that generates is difficult to generate image multiplicity and with enough detailed information.

Summary of the invention

The purpose of the present invention is to provide a kind of unsupervised layer-by-layer generations to fight character representation learning method.

The technical solution for realizing the aim of the invention is as follows: a kind of unsupervised layer-by-layer generation confrontation character representation study side Method includes the following steps:

Step 1, the building SGANs network architecture: pass through 2 or more production confrontation network (GAN) composition accumulation nets Network (SGANs), wherein first production confrontation network is input with the random noise of multidimensional, remaining production fight network with The hidden feature of random noise and previous branch is input；

Step 2 carries out the network optimization: the image of corresponding size is generated by each generation network, according to each differentiation network Cross entropy, optimize each differentiation network, according to differentiate network middle layer feature desired value and generate image between it is similar Statistical nature optimizes entire generation network；

Step 3 carries out character representation: using the SGANs of optimization, abstract semantics character representation vector is extracted, in conjunction with hash Representation method determines the hash character representation of image.

As a preferred implementation manner, in step 2, each differentiation network is optimized respectively, optimization object function Are as follows:

In formula,It indicates each optimization aim for differentiating network, passes through maximizationCarry out the differentiation network optimization, x_iIt indicates True picture,Indicate true picture distribution,It indicates to generate image,Indicate the model profile of generation image, E table Show that expectation, D expression are the probability of true picture.

As a preferred implementation manner, in step 2, desired value matching optimization and structure one are carried out to each generation network After the optimization of cause property, optimize whole generation network, method particularly includes:

Desired value matching optimization, optimization object function are carried out to each generation network are as follows:

In formula,Indicate each desired value matching optimization target for generating network, f_iIt (x) is i-th of differentiation network middle layer Activation primitive, E indicate expectation function, x_iIndicate true picture,Indicate true picture distribution,It indicates to generate figure Picture,Indicate the model profile of generation image；

Structural integrity optimization, optimization object function are carried out to each generation network are as follows:

In formula,Indicate each structural integrity optimization aim for generating network, λ₁、λ₂For weighting coefficient, μ=∑_kx_k/ N table Show the mean value for generating the pixel of image, ∑=∑_k(x_k-μ)(x_k-μ)^T/ N indicates to generate the variance of the pixel of image, x_k= (R,G,B)^TIndicate that the pixel of generation image, N are the pixel number for generating image；

The whole network that generates is optimized, optimization object function are as follows:

In formula, L_GFor the optimization aim for entirely generating network, pass through maximization L_GThe entire optimization for generating network is carried out,Indicate the optimization aim of i-th of generation network, n indicates the number of production confrontation network.

As a kind of more preferable embodiment, λ₁、λ₂Respectively 1 and 4.

As a preferred implementation manner, in step 3, the hash representation method of use include local sensitivity ashing technique, Compose ashing technique, iterative quantization ashing technique and core ashing technique.

Compared with prior art, the present invention its remarkable advantage are as follows: 1) feature of the invention by complicated image hierarchy structure It indicates that learning tasks are divided into multiple subtasks, gradually learns image low level color, Texture eigenvalue to high-level abstract language Adopted feature；2) present invention generates the similar statistics feature of image according to different generation networks, and structure one is added in the network optimization Cause property optimization aim, the training of stabilizing network preferably learn the distribution of true picture.

Detailed description of the invention

Fig. 1 is the schematic network structure of single GAN.

Fig. 2 is the network structure of the unsupervised SGANs of the present invention.

Fig. 3 is residual error network of the present invention.

Fig. 4 is present invention confrontation character representation flow chart.

Specific embodiment

The present invention program is further illustrated in the following with reference to the drawings and specific embodiments.

The present invention devises accumulation network SGANs, and the character representation learning tasks of complicated image hierarchy structure are divided into Multiple subtasks gradually learn image low level color, Texture eigenvalue to high-level abstract semantics feature, and by its feature Applied to image retrieval.The characteristics of similar statistical nature should be had by generating image according to different generation networks, in the network optimization Structural integrity optimization aim is added, the training of stabilizing network preferably learns the distribution of true picture.Net is described in detail below Structure, optimization aim and the details of realization of network.

One, the SGANs network architecture

GAN is used to capture the distribution of real image data.The single structure for generating network is as shown in Figure 1.Each GAN is by one A generation network G and a differentiation network D composition.Generating network G is a deconvolution neural network, and input is a multidimensional Random noise vector z ∈ N (0,1) generates the image x=G (z) with true image same size by up-sampling.Differentiate net Network D is a convolutional neural networks, for differentiating that image is true picture or generates image, is equivalent to a two classification Device:

P (x=real)=D (x)

SGANs is made of multiple GAN.Fig. 2 gives the structure of unsupervised SGANs used herein.It is shown in Fig. 2 The accumulation network that is made of 3 GAN.True image can be re-adjusted to different sizes, and different generation networks is raw At the image of corresponding size.Wherein neural network F_iImplicit feature h is obtained by up-sampling_i, G_iWith coming by h_iGenerate image. Differentiate network D_iDifferentiate that image corresponding to i-th of GAN is the probability of true picture.Network 0 is generated with the random noise z of multidimensional ∈ (0,1) passes through neural network F as input₀Up-sampling obtains hidden feature:

h₀=F₀(z)

Hidden feature h₀Pass through neural network G₀Generate image:

For generating network i, hidden feature:

h_i=F_i(h_i-1,z)

Pass through hidden feature h_iGenerate image:

Two, the optimization aim of network

2.1 differentiate the optimization aim of network

In GAN, network is differentiated for differentiating the image that image is still generated from true picture, maximum possible is sentenced Not Chu image source.In SGANs, true picture is sized as x₀,x₁,...,x_n, n is of GAN in whole network Number.Differentiate that the optimization object function of network is similar with the objective function of two classification problems, for i-th of generation network, differentiates The cross entropy optimization aim of network are as follows:

In training, every image x_iFrom true image distributionEvery generation imageFrom mould Type distributionEach differentiation network, which only focuses on, is currently located the corresponding image of network, and each differentiation network respectively optimizes.

2.2 generate the optimization aim of network

Being originally generated in formula confrontation network and generating network G for cheating differentiation network to generate image discriminating is really to scheme Picture maximizes the probability for generating that image is true picture.In order to avoid the problem of gradient disappears in the training process, using most Smallization 1-D (x_syn) come replace maximize D (x_syn), it is originally generated the optimization aim of formula confrontation network are as follows:

In order to stably generate the training process of formula confrontation network, several optimization methods for generating network have been suggested.It is special Sign matching (Feature Matching) method generates network over training on differentiating network by preventing, and stably generates formula net The training process of network.Characteristic matching requires to generate the statistical nature that image matches true picture as far as possible, differentiates that network is used to It is specified to need matched statistical nature.In the implementation, training generates the desired value that net mate differentiates network middle layer feature.It is false If f (x) is the activation primitive for differentiating network middle layer, then the optimization aim of network is generated are as follows:

In SGANs, network i is fought for production, differentiates that the activation primitive of network middle layer is f_i, then optimize mesh It is designated as:

Different GAN in SGANs will generate different size of image, and some similar systems should be possessed between these images Count feature.The mean value and variance of the pixel in each channel between image are generated by minimizing, and guarantee to generate what network generated Have structured consistency between image, the optimization aim of structural integrity is added, promotes the quality for generating image.

Assuming that x_k=(R, G, B)^TIt indicates to generate a pixel in image, generates the mean value of the pixel of image:

μ=∑_kx_k/N

Generate the variance of the pixel of image are as follows:

∑=∑_k(x_k-μ)(x_k-μ)^T/N

Wherein N is the number of pixel in image.

The structural integrity optimization aim of i-th (i > 0) a GAN are as follows:

λ herein₁And λ₂Value be respectively 1 and 4.Generate network 0 optimization aim beSo for i-th A GAN, optimization aim become:

Multiple scale true pictures can be obtained by the same continuous picture signal with different sample rates.SGANs's It generates network and multiple and different but relevant image distribution is approached by joint training, so entirely accumulating the excellent of the generation network of network Change target are as follows:

Wherein, n is the number of GAN in entire accumulation network.

2.3 network training processes

Algorithm 1 is the training process of SGANs.The image for generating network and generating corresponding size each first, it is then successively excellent Change each differentiation network, finally optimizes entire generation network using the optimization aim of obtained all generation networks.

Algorithm 1SGANs training process

Three, the realization of SGANs

Herein by taking 3 GAN form unsupervised SGANs as an example, it is described in detail below in each GAN and generates network and differentiation net The specific implementation details of network, the structure of the number of plies and every layer network including network.Differentiate that neural network shares 3, respectively D_Net32, D_Net64 and D_Net128, the size of corresponding input picture are respectively 32x32,64x64 and 128x128.With mould As introducing in type, differentiate that network is mainly used for differentiating the true and false of image.Due to the input of first production confrontation network It is different from the generation input of network later, realizing that generating network Time Division is Init_G and Next_G.Init_G is used for first A generation network obtains hidden feature h with 100 dimensions random noise z ∈ N (0,1) as input₀.Subsequent generation network all by Next_G is realized, using the hidden feature of previous branch and random noise z as input, respectively obtains implicit feature h₁And h₂。 Image_Net is used to generate image by hidden feature.Image_Net(h₀) generate 32x32 size image, Image_Net (h₁) Generate the image of 64x64 size, Image_Net (h₂) generate 128x128 size image.

3.1 generate network implementations

Table 1 is upblock module, refers mainly to up-sample input.Wherein, UpSample will input as (N, C, H, W) Tensor up-sampling is (N, C, H*scale_factor, W*scale_factor).N refers to the size of batch size, and C is defeated The port number entered, H are the picture altitudes of input, and W is the width of input picture.

Conv refers to convolutional layer.In_channels is input feature vector figure (featrue map) quantity, out_ Channels refers to the quantity of the characteristic pattern of output.The third parameter of Conv is the size of convolution kernel (kernel), the 4th A parameter refers to the size of step-length (stride), the last one parameter is to fill the size of (padding).

BN (Batch Normalization) solves the problems, such as that the gradient in backpropagation disappears and gradient is exploded.It is public Formula are as follows:

Wherein E [x] and Var [x] is respectively the expectation and variance of corresponding dimension.γ and β is the parameter that can learn.

GLU (Gated Linear Units) is an activation primitive, and nonlinear factor is added for network, also can be one GLU unlike determining the problem of alleviating gradient disappearance in degree and Relu is a continuous function, nonmonotonic.Assuming that Y= [A,B]∈R^2d, then

Wherein A, B ∈ R^d.The size of the tensor shape of the output of GLU ([AB]) is the half of Y.

Table 1 up-samples module upblock

Fig. 3 is a residual error network, it is assumed that H (x)-x → F (x) in deep neural network, if it can be assumed that multiple non-thread Property layer combination can be similar to a complicated function, then equally assume that the residual error of hidden layer is similar to some complicated letter Number, hidden layer can be indicated are as follows: H (x)=F (x)+x.Residual error network, which solves the neural network number of plies, excessively causes gradient to disappear The problem of, while also improving the performance of network.Table 2 is residual block used herein.

2 residual block resblock of table

Operation	Specific implementation
		conv	Conv(channel_num,channel_num*2,3,1,1)
BN	Batch_norm2d(channel_num*2)
		activation	GLU()
conv	Conv(channel_num,channel_num,3,1,1)
		BN	Batch_norm2d(channel_num)
add	add(x)

Table 3 is the structure of Init_G network.Init_G network is using random noise vector z as input.It is linear by one Layer, BN layers, active coating and 3 up-sampling parts form.It is that network addition is non-linear that the output of linear layer is activated with GLU.

Table 3Init_G network structure

Operation	Specific implementation
		linear	Linear(in_dim,img_size*32)
BN	BatchNorm1d(img_size*32)
		activation	GLU()
upblock	upblock(img_size,img_size)
		upblock	upblock(img_size,img_size/2)
upblock	upblock(img_size/2,img_size/4)

Table 4 is the network structure of Next_G.Next_G is by a convolutional layer, a residual block and a up-sampling block group At.Random vector z is replicated first, then with hidden feature h_i-1Connection obtains tensor t conduct in the dimension of axis=1 The input of Next_G.Netxt_G network is primarily used to by hidden feature h_i-1Obtain hidden feature h_i, standard is done to generate image It is standby.

Table 4Next_G network structure

Operation	Specific implementation
		conv	Conv(t,img_size,3,1,1)
BN	Batch_norm2d(img_size*2)
		activation	GLU()
resblock	resblock(img_size)
		upblock	upblock(img_size,img_size/2)

Table 5 is Image_Net network, which is made of a convolutional layer.Tanh activation primitive is added after convolutional layer. Image_Net is with hidden feature h_iAs input, the image of corresponding size is generated.

Table 5Image_Net network structure

Operation	Specific implementation
		conv	Conv(in_channels,out_channels,3,1,1)
activation	tanh()

3.2 differentiate the realization of network

Table 6 is encode_image module.The module is using image data as input, by multiple convolutional layers, activation and BN Composition.

Table 6encode_image module

Table 7 is the network structure of D_NET32.D_NET32 is by an encode_image module, linear layer and active coating group At.

Table 7D_NET32 network structure

Operation	Specific implementation
		encode_image	encode_image(img_size)
linear	Linear(img_sizeximage_size,1)
		activation	Sigmoid (): judge whether image is true picture

Table 8 is block_leakyRelu module, is made of a convolutional layer, BN and active coating.

Table 8block_leakyRelu module

Operation	Specific implementation
		conv	Conv(in_channels,out_channels,3,1,1)
BN	BatchNorm2d(out_channels)
		activation	LeakyRelu(0.2)

Table 9 is down_block module, similar with block_leakyRelu to be made of convolutional layer, BN and active coating. Down_block makes the bulk of tensor reduce half.

Table 9down_block module

Operation	Specific implementation
		conv	Conv(in_channels,out_channels,4,2,1)
BN	BatchNorm2d(out_channels)
		activation	LeakyRelu(0.2)

Table 10 is the network structure of D_NET64.D_NET64 is by an encode_imge module and multiple BN, active coating group At.

Table 10D_NET64 network structure

Table 11 is the network structure of D_NET128.D_NET128 is by an encode_imge module, 2 own_block moulds Block, 2 block_leakyRelu modules, a linear layer and active coating composition.

Table 11D_NET128 network structure

Operation	Specific implementation
		encode_image	encode_image(img_size)
down_block	down_block(img_sizex8,img_sizex16)
		down_block	down_block(img_sizex16,img_sizex32)
block_leakyRelu	block_leakyRelu(img_sizex32,img_sizex16)
		block_leakyRelu	block_leakyRelu(img_sizex16,img_sizex8)
linear	Linear(img_sizeximage_size,hash_dim)
		activation	Sigmoid (): judge whether image is true picture

Four, hierarchical structure character representation and image retrieval

The high dimensional feature expression of image is difficult to set up effective index and promotes effectiveness of retrieval.In face of mass image data, The distance between image measurement will also devote a tremendous amount of time, and the retrieval of accurate neighbour's image is unpractical.By close It realizes that similarity retrieval is able to ascend the time efficiency of retrieval like arest neighbors method, meets the needs of most of applications.

Ashing technique is widely used in large-scale information retrieval task as one of approximate KNN method In.The image feature representation of higher-dimension is mapped to the Hamming space of low-dimensional by ashing technique, use is shorter to be made of 0 and 1 Hash character representation of the binary-coding as image, so that the hash character representation of image has good aggregation properties, retrieval Time efficiency can all be significantly improved.Assuming that indicating image with the binary-coding that length is L, then it is equivalent to image Feature space is divided into 2^LSub-spaces.The image of identical hash character representation will drop into same space, Hamming distance From the image for 1 then in adjacent subspace.Assuming that image has 1000 classes, if the hash feature of image can be good at table Aggregation properties between diagram picture then only need the hash character representation that length is 10 bits can be by different classes of image It accurately assigns in different subspaces.

After the hash character representation for obtaining image, there are two types of the retrievals that basic method realizes similar image, improve The accuracy and time efficiency of retrieval.One is hash table is established, different from traditional ashing technique, hash table is wanted in retrieval The collision rate for increasing the hash character representation of similar image as far as possible, so that similar image drops into same bucket.Separately A kind of outer way is the direct query image that calculates at a distance from image in image data base.Since hash character representation uses the Chinese Prescribed distance calculates the distance between image, and calculating speed quickly, and hashes character representation and usually uses shorter bit, can be with It is loaded into memory to accelerate the speed of retrieval.In addition it can use the number that inverted index reduces the candidate image for needing to compare Amount, further speeds up the speed of retrieval.

Such as Fig. 4, extracting the high-level abstract characteristics of image by SGANs indicates, then using hashing technique by extracting The character representation of higher-dimension obtains the low degree of image and hashes character representation.The preceding layer of the linear layer used herein for differentiating network Abstract semantic feature of the output as image.LSH, SH, ITQ and KSH are that high dimensional feature is commonly mapped to low-dimensional Hamming space, obtain data hash indicate method.It is most general using gaussian random projection matrix that local sensitivity hashes LSH Rate is by similar image hash into same bucket.Image data will be carried over into as far as possible in the local feature of luv space Hamming space.Spectrum hash SH carries out spectrum analysis to high dimensional data, converts Laplce spy for problem by relaxed constraints condition The dimensionality reduction problem for levying figure, to obtain the hash character representation of image data.It is a kind of based on PCA that iterative quantization, which hashes ITQ, Image hash algorithm makes the variance of all directions after rotating keep balancing as far as possible by rotating principal component.Core hash KSH passes through core Method study obtains the data of hash function processing linearly inseparable.Above-mentioned ashing technique is applied to image retrieval In, but the character representation of previous image is all that manual mode extracts.The color for being characterized in low level, texture are extracted by hand Etc. information, cannot reflect that the semantic information of image abstraction, the accuracy rate of retrieval are low.

It is dissipated herein by what the characteristics of image based on manual extraction and the image abstraction semantic feature extracted based on SGANs were indicated Column method compares, and studies effect of the character representation of the image based on SGANs in image retrieval.

Claims

1. unsupervised layer-by-layer generation fights character representation learning method, which comprises the steps of:

Step 1, the building SGANs network architecture: fighting group of networks into accumulation network by 2 or more productions, wherein the One production confrontation network is input with the random noise of multidimensional, remaining production fights network with random noise and previous The hidden feature of branch is input；

Step 2 carries out the network optimization: the image of corresponding size is generated by each generation network, according to each friendship for differentiating network Entropy is pitched, each differentiation network is optimized, according to the desired value for differentiating network middle layer feature and generates the similar statistics between image Feature optimizes entire generation network；

Step 3 carries out character representation: using the SGANs of optimization, extracting abstract semantics character representation vector, indicates in conjunction with hash Method determines the hash character representation of image.

2. unsupervised layer-by-layer generation according to claim 1 fights character representation learning method, which is characterized in that step 2 In, each differentiation network is optimized respectively, optimization object function are as follows:

In formula,It indicates each optimization aim for differentiating network, passes through maximizationCarry out the differentiation network optimization, x_iIndicate true figure Picture,Indicate true picture distribution,It indicates to generate image,Indicating the model profile of generation image, E indicates expectation, D expression is the probability of true picture.

3. unsupervised layer-by-layer generation according to claim 1 fights character representation learning method, which is characterized in that step 2 In, after carrying out desired value matching optimization and structural integrity optimization to each generation network, optimize whole generation network, specific side Method are as follows:

In formula,Indicate each desired value matching optimization target for generating network, f_iIt (x) is swashing for i-th of differentiation network middle layer Function living, E indicate expectation function, x_iIndicate true picture,Indicate true picture distribution,It indicates to generate image, Indicate the model profile of generation image；

In formula,Indicate each structural integrity optimization aim for generating network, λ₁、λ₂For weighting coefficient, μ=∑_kx_k/ N indicates life At the mean value of the pixel of image, ∑=∑_k(x_k-μ)(x_k-μ)^T/ N indicates to generate the variance of the pixel of image, x_k=(R, G, B)^TIndicate that the pixel of generation image, N are the pixel number for generating image；

4. unsupervised layer-by-layer generation according to claim 3 fights character representation learning method, which is characterized in that λ₁、λ₂Point It Wei 1 and 4.

5. unsupervised layer-by-layer generation according to claim 1 fights character representation learning method, which is characterized in that step 3 In, the hash representation method of use includes local sensitivity ashing technique, spectrum ashing technique, iterative quantization ashing technique and core hash Method.