CN110175251A - The zero sample Sketch Searching method based on semantic confrontation network - Google Patents

The zero sample Sketch Searching method based on semantic confrontation network Download PDF

Info

Publication number
CN110175251A
CN110175251A CN201910442481.4A CN201910442481A CN110175251A CN 110175251 A CN110175251 A CN 110175251A CN 201910442481 A CN201910442481 A CN 201910442481A CN 110175251 A CN110175251 A CN 110175251A
Authority
CN
China
Prior art keywords
semantic
network
sketch
layer
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910442481.4A
Other languages
Chinese (zh)
Inventor
杨延华
许欣勋
张啸哲
邓成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910442481.4A priority Critical patent/CN110175251A/en
Publication of CN110175251A publication Critical patent/CN110175251A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention proposes a kind of zero sample Sketch Searching method based on semantic confrontation network, mainly solve the problems, such as that prior art sketch variance within clusters are larger and the lower visual knowledge of zero sample setting is difficult to move to from known class and has no class.Its scheme are as follows: obtain training sample set;The semantic confrontation network of building, extracts RGB image feature by VGG16 network;Building generates network to generate the RGB image feature with identification;By the semantic confrontation network generative semantics feature of sketch input to be retrieved, semantic feature and random Gaussian input are generated and generate RGB image feature in network, is found in image retrieval library and obtains search result with most like preceding 200 images of RGB image feature.Present invention reduces the variance within clusters of sketch characteristics of image, it can guarantee the RGB image feature generated in each classification according to sketch image, improve the retrieval performance of zero sample Sketch Searching, can be used for e-commerce, medical diagnosis, remotely sensed image.

Description

Zero sample sketch retrieval method based on semantic countermeasure network
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a zero-sample sketch retrieval method which can be used for electronic commerce, medical diagnosis and remote sensing imaging.
Background
The sketch retrieval refers to retrieving real natural images according to the hand-drawn sketch. The zero sample sketch retrieval method is a method for retrieving real natural images of hand-drawn sketches of unknown classes. The existing sketch retrieval method mainly comprises two types: features based on artificial design and methods based on deep learning. The method based on artificial design features comprises a gradient field HOG descriptor and a SIFT descriptor, while the method based on deep learning comprises a twin network, a triplet network, a deep sketch hash and the like, and the main ideas of the methods are to extract discriminant features of images or text information and project the discriminant features to a common feature space for similarity measurement. However, the existing sketch retrieval method is premised on that all the categories are required to be known in the training stage, so that the scale of the training data cannot be guaranteed to cover all the categories in a real scene, and the retrieval performance is sharply reduced when the categories are not found in the test. Meanwhile, different people have different understandings on the sketch, so that the intra-class variance of the drawn sketch is large, and the task of sketch retrieval is more challenging.
The zero sample sketch retrieval is to realize the visual knowledge migration from a known category to an unseen category under the setting of a zero sample, thereby solving the problem of the existing sketch retrieval. Currently, researchers have proposed two methods for Zero sample Sketch retrieval, for example, an article entitled "Zero-Shot Sketch-Image Hashing" published by Yuming Shen and Li Liu et al in the Computer Vision and pattern recognition conference of 2018 discloses a Zero sample Sketch hash retrieval method, which constructs an end-to-end three-network framework, wherein the first two networks are binary encoders, the third network utilizes a kronecker fusion layer and a graph convolution, reduces heterogeneity of Sketch images, enhances semantic relation between data, and also proposes a hash generation method for reconstructing semantic knowledge representation of Zero sample retrieval; an article entitled "a Zero-Shot frame for Sketch-Based Image Retrieval" published at the European Conference on Computer Vision Conference of 2018 by saii Kiran yelarthi et al discloses a method of generating a model Based on a depth condition against an automatic encoder and a variation automatic encoder, which takes a Sketch feature vector as an input, randomly fills missing information using the generated model to generate a natural Image feature vector, and then retrieves an Image from a database using these generated natural Image feature vectors. Although the above methods achieve good performance, neither method takes into account the problem of large variance in the sketch class, so that semantic information extracted by a pre-trained convolutional neural network has weak discrimination capability, and it is difficult to accurately migrate the visual knowledge of the sketch from the known class to the unseen class.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a zero sample sketch retrieval method based on a semantic confrontation network, so that better discriminant semantic information is extracted through a pre-trained convolutional neural network, and the visual knowledge of the sketch is accurately transferred from a known class to an unseen class.
The technical idea of the invention is that the semantic features of the sketch are learned by adopting a semantic countermeasure module in an end-to-end semantic countermeasure network, so that the intra-class variance of the sketch features is reduced; by adding the triple loss into the generation module, the identifiability of the RGB image features generated in each category is ensured, so that the problem that visual identification is difficult to migrate from a known category to an unseen category under zero sample setting is solved.
According to the above thought, the implementation steps of the invention include the following:
(1) obtaining a training sample set:
(1a) respectively extracting 10,400 RGB images and corresponding 10,400 binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
(1b) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
(1c) 298,478 images after random horizontal turning are resized to 224 multiplied by 224, and 298,478 images are respectively formed into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2
(2) Constructing a semantic countermeasure network:
setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein:
the semantic feature extraction network is used for extracting semantic features of the binary sketch image;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image;
a semantic discriminator for performing countercheck learning on the semantic features of the extracted draft image and the word vectors corresponding to the class marks through a countercheck loss LadvSD) Parameters of the semantic feature extraction network are updated, and the judgment of semantic features of the output sketch image is improved;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning;
(3) performing feature extraction on the RGB images in the training sample set:
(3a) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on an ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
(3b) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096;
(4) constructing a generating network:
constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
a coordinate layer for extracting output sketch semantic feature vector x of the network from the semantic featuressemAnd RGB image feature vector ximgCarrying out dimensional splicing;
a conditional coder for distributing the data P (x) with the output of the concatenate layer as inputimg,xsem) Obtaining prior distribution P (z) of hidden latent variable z after passing through a conditional coder, and calculating a mean vector mu and a standard deviation vector sigma of the prior distribution P (z);
a triple loss layer for keeping the discriminability of the generated features in each training class, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function, wherein the loss function of the loss layer is Ltri
KL loss layer for distributing P (x) dataimg,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation;
a decoder for learning the potential vector z with the dimension of 1024 to obtain the semantic feature x with the dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesThe mathematical expression of the decoding process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
an image reconstruction loss layer for ensuring that the generated RGB image features have sufficient discriminability, using a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
a regressor for converting the output of the decoderAs input, semantic features are reconstructed by a regressorThe mathematical expression of the regression process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
semantic reconstruction loss layer to guarantee generated RGB image featuresCategory level semantic information can be saved, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemSemantic features representing sketches;
(5) training the semantic countermeasure network and the generation network:
(5a) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
(5b) let the loss function of the whole network be L ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem
(5c) Taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, and realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L to obtain the trained semantic countermeasure network and the generation network;
(6) carrying out zero sample sketch retrieval on the sketch image to be retrieved:
(6a) extracting a sketch image from a test sample set which is not intersected with the training sample set, and cutting the sketch image to obtain a sketch image to be retrieved;
(6b) inputting the sketch image to be retrieved into a trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
(6c) splicing the semantic feature vectors and the random Gaussian noise, inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
(6d) and taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, and searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance.
Compared with the prior art, the invention has the following advantages:
in the training stage, by means of the advantages of category-level semantic information, the semantic countermeasure module in the end-to-end semantic countermeasure network is adopted to learn the semantic features of the sketch, so that the intra-category variance of the sketch image features is reduced; and triple loss is added in a generating network, so that the identifiability of the RGB image features generated in each class is ensured, and the problem that visual identification is difficult to migrate from a known class to an unseen class under zero sample setting is solved.
Compared with the prior art, the method simplifies the training process and effectively improves the retrieval performance of zero-sample sketch retrieval.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a graph comparing the search results of the present invention with the conventional method.
Detailed description of the preferred embodiments
The invention is described in further detail below with reference to the following figures and specific implementations:
referring to fig. 1, the zero sample sketch retrieval method based on the semantic countermeasure network of the invention comprises the following implementation steps:
step 1, a training sample set is obtained.
1.1) respectively extracting 10,400 RGB images and 10,400 corresponding binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
1.2) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
1.3) resizing 298,478 images after random horizontal flipping to 224 × 224, and respectively forming 298,478 images into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2
Wherein,for the ith RGB image in the Sketchy database,is composed ofA binary sketch image of the corresponding category,for the jth RGB image in the TU-Berlin database,is composed ofAnd (4) corresponding to the binary sketch image of the category.
And 2, constructing a semantic countermeasure network.
Setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein:
the semantic feature extraction network is used for extracting semantic features of the binary sketch image, specifically is a VGG16 network pre-trained on ImageNet, selects a fifth convolutional layer of the VGG16 network as convolutional output, and outputs a semantic feature vector with the dimension of 300 through a full connection layer;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image, and acquiring category-level word vector representation with the dimension of 300 by adopting a word vector model pre-trained on Wikipedia;
a semantic discriminator used for carrying out counterstudy on the semantic features of the extracted draft image and the word vectors corresponding to the class marks, updating the parameters of the semantic feature extraction network through a counterstudy loss function, and improving the discrimination of the semantic features of the output draft image, wherein the loss function LadvSD) The mathematical expression of (a) is:
wherein,indicating expectations, y indicates class semantic information for the sketch, W (-) indicates word embedding into the network,representing a semantic discriminator, θDPresentation languageThe parameters of the sense discriminator are defined,representing a semantic feature extraction network, θSParameters, x, representing a semantic feature extraction networkskeRepresenting a sketch image;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning.
And 3, extracting the characteristics of the RGB images in the training sample set.
3.1) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
3.2) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096.
And 4, constructing a generating network.
Constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
the coordinate layer is used for extracting a sketch semantic feature vector x with the output dimension of 300 of a network for semantic feature extractionsemAnd a RGB image feature vector x with a dimension of 4096imgPerforming dimension splicing, and outputting a feature vector with a dimension of 4396;
the conditional coder comprises a first full connection layer with an input dimension of 4396 and an output dimension of 4096, a nonlinear active layer ReLU, and a one-dimensional encoder with momentum parameters of 0.99 and eps of 1e-3The data distribution system comprises a batch normalization layer, a Dropout layer with the deactivation rate of 0.3, a second full-connection layer with the output dimension of 2048, a nonlinear activation layer ReLU and a one-dimensional batch normalization layer with the momentum parameter of 0.99 and the eps being 1e-3, and is used for taking the output of the concatenate layer as input to enable the data distribution P (x is the x value of the data distribution layer) to be distributedimg,xsem) Obtaining a mean vector mu and a standard deviation vector sigma through a conditional coder to form prior distribution P (z) of a hidden latent variable z;
the triple loss layer is used for keeping the discriminability of the generated features in each training category, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function LtriThe mathematical expression of (a) is:
wherein d (·,. cndot.) representsA distance function, E (-) represents a potential embedding function to obtain the mean vector μ,which represents a fixed sample of the specimen that is,which is indicative of a positive sample,represents a negative sample, δ represents an edge value;
the KL loss layer is used for enabling the data distribution P (x)img,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation, LKLThe mathematical expression of (a) is:
wherein,θEparameter, theta, representing a conditional encoder networkD'A parameter indicative of a network of decoders,indicating expectation, ximgAnd xsemRespectively representing RGB image characteristics and semantic characteristics, KL (. | ·) represents solving KL divergence, Q (z | x)img,xsem) Representing the output variation distribution of the encoder network,a posteriori distribution, P (x), representing the semantic feature xsemimg|z,xsem) Representing the distribution of output conditions of the decoder network;
the decoder consists of a first full-link layer with an input dimension of 1324 and an output dimension of 4096, a nonlinear active layer ReLU, a second full-link layer with an output dimension of 4096 and the nonlinear active layer ReLU in sequence, and is used for learning a potential vector z with a dimension of 1024 to obtain a semantic feature x with a dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesGeneratingThe mathematical expression of (a) is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
the image reconstruction loss layer is used for ensuring that the generated RGB image features have enough discriminability, and uses a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
the regressor consists of a first full-link layer with an input dimension of 4096 and an output dimension of 2048, a nonlinear active layer ReLU, a second full-link layer with an output dimension of 300 and a nonlinear active layer Tanh in sequence, and is used for outputting the output of the decoderAs input, semantic features are reconstructed by a regressorReconstructionThe mathematical expression of (a) is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
the semantic reconstruction loss layer is used for ensuring that the generated RGB image features can store category-level semantic information, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemRepresenting semantic features of the sketch.
And 5, training the semantic countermeasure network and the generation network.
5.1) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
5.2) setting the loss function of the whole network as: l ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem
5.3) taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L, and adopting an Adam optimizer in a deep learning toolbox PyTorch when training the network, wherein the initial learning rate is 0.0001, and the initial learning rate is β1=0.5,β20.99, and for the stability of training, training the semantic countermeasure network and the generation network alternately in the first 2 times of training, and training the whole network in an end-to-end mode in the next 18 times of training, wherein the training is performed for 20 times in total, so that the trained semantic countermeasure network and the generation network are obtained.
And 6, carrying out zero sample sketch retrieval on the sketch image to be retrieved.
6.1) extracting a sketch image from a test sample set which is not intersected with the training sample set in category, and cutting the sketch image to obtain a sketch image to be retrieved;
6.2) inputting the sketch image to be retrieved into the trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
6.3) splicing the semantic feature vectors and the random Gaussian noise and inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
6.4) taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance, and finally calculating the retrieval precision according to the 200 retrieved images.
The technical effects of the present invention will be further explained below by combining with simulation experiments.
1. Simulation conditions are as follows:
the simulation experiment is carried out by using a GPU with the model number of NVIDIA GTX TITAN V and based on a tool box PyTorch of deep learning.
2. Simulation content:
the invention carries out simulation experiments on two data sets Sketchy and TU-Berlin which are disclosed to be specially used for the performance test of a sketch retrieval method, wherein:
the data set Sketchy contains 75,479 sketch images and 73,002 RGB images from 125 different classes, and 104 training classes in 125 classes are used as known classes and 21 test classes are used as unseen classes according to the experimental setting of standard zero sample learning;
the data set TU-Berlin contains 20,000 sketch images and 204,070 RGB images from 250 different classes, 194 training classes out of the 250 classes as known classes and 56 test classes as unseen classes according to the experimental setup of standard zero sample learning.
The results of simulation comparison experiments on the two public data sets Sketchy and TU-Berlin by using the method and the prior sketch retrieval method and zero sample learning method based on the deep convolutional neural network are shown in the table 1.
TABLE 1
Precision @200 and mAP @200 in Table 1 are the precision and average precision means, respectively, for the top 200 retrieved images.
As can be seen from the simulation results in Table 1, the accuracy and average accuracy mean of the present invention on both data sets is higher than the accuracy and average accuracy mean of the prior art on both data sets.
The retrieval results of the present invention and the best CVAE method in the prior art are visualized on the Sketchy data set, and the results are shown in fig. 2 by comparing the top 10 images out of the top 200 images retrieved.
As can be seen from FIG. 2, when the sketch pictures of 3 different test categories are searched, the top 10 searched pictures and the sketch pictures of the invention belong to the same category, and the searched result of the CVAE method has the picture with the wrong search.

Claims (10)

1. A zero sample sketch retrieval method based on a semantic countermeasure network is characterized by comprising the following steps:
(1) obtaining a training sample set:
(1a) respectively extracting 10,400 RGB images and corresponding 10,400 binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
(1b) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
(1c) 298,478 images after random horizontal turning are resized to 224 multiplied by 224, and 298,478 images are respectively formed into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2
(2) Constructing a semantic countermeasure network:
setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein,
the semantic feature extraction network is used for extracting semantic features of the binary sketch image;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image;
a semantic discriminator for performing countercheck learning on the semantic features of the extracted draft image and the word vectors corresponding to the class marks through a countercheck loss LadvSD) Parameters of the semantic feature extraction network are updated, and the judgment of semantic features of the output sketch image is improved;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning;
(3) performing feature extraction on the RGB images in the training sample set:
(3a) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on an ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
(3b) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096;
(4) constructing a generating network:
constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
a coordinate layer for extracting output sketch semantic feature vector x of the network from the semantic featuressemAnd RGB image feature vector ximgCarrying out dimensional splicing;
a conditional coder for distributing the data P (x) with the output of the concatenate layer as inputimg,xsem) Obtaining a mean vector mu and a standard deviation vector sigma through a conditional coder to form prior distribution P (z) of a hidden latent variable z;
a triple loss layer for keeping the discriminability of the generated features in each training class, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function, wherein the loss function of the loss layer is Ltri
KL loss layer for distributing P (x) dataimg,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation;
a decoder for learning the potential vector z with the dimension of 1024 to obtain the semantic feature x with the dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesThe mathematical expression of the decoding process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
an image reconstruction loss layer for ensuring that the generated RGB image features have enough discriminationSex, using a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
a regressor for converting the output of the decoderAs input, semantic features are reconstructed by a regressorThe mathematical expression of the regression process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
semantic reconstruction loss layer to guarantee generated RGB image featuresCategory level semantic information can be saved, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemSemantic features representing sketches;
(5) training the semantic countermeasure network and the generation network:
(5a) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
(5b) let the loss function of the whole network be L ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem
(5c) Taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, and realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L to obtain the trained semantic countermeasure network and the generation network;
(6) carrying out zero sample sketch retrieval on the sketch image to be retrieved:
(6a) extracting a sketch image from a test sample set which is not intersected with the training sample set, and cutting the sketch image to obtain a sketch image to be retrieved;
(6b) inputting the sketch image to be retrieved into a trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
(6c) splicing the semantic feature vectors and the random Gaussian noise, inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
(6d) and taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, and searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance.
2. According to the rightThe method of claim 1, wherein the training sample set S of the first training sample in (1c)1And a training sample set S of second training samples2Respectively, as follows:
wherein,for the ith RGB image in the Sketchy database,is composed ofA binary sketch image of the corresponding category,for the jth RGB image in the TU-Berlin database,is composed ofAnd (4) corresponding to the binary sketch image of the category.
3. The method of claim 1, wherein the semantic feature extraction network in (2) adopts a VGG16 network pre-trained on ImageNet data set, and selects a fifth convolutional layer of the VGG16 network as convolutional output, and outputs a semantic feature vector with dimension of 300 through a full connection layer.
4. The method of claim 1, wherein the word embedding network in (2) employs a word vector model pre-trained on wikipedia to obtain a class-level word vector representation with dimension 300.
5. The method according to claim 1, wherein the semantic classifier in (2) comprises a first fully-connected layer with an input dimension of 300, a first sigmoid nonlinear activation layer with an output dimension of 200, and a second fully-connected layer with an output dimension of 1, and the output of the semantic classifier updates the parameters of the semantic feature extraction network through a countermeasure loss LadvThe mathematical expression of (a) is:
wherein,indicating expectations, y indicates class semantic information for the sketch, W (-) indicates word embedding into the network,representing a semantic discriminator, θDA parameter representing a semantic discriminator,representing a semantic feature extraction network, θSParameters, x, representing a semantic feature extraction networkskeRepresenting a sketch image.
6. The method according to claim 1, wherein the conditional encoder in (4) is composed of a first fully-connected layer with input dimension of 4396 and output dimension of 4096, a nonlinear active layer ReLU, a one-dimensional batch normalization layer with momentum parameters of 0.99 and eps of 1e-3, a Dropout layer with deactivation rate of 0.3, a second fully-connected layer with output dimension of 2048, a nonlinear active layer ReLU, and a one-dimensional batch normalization layer with momentum parameters of 0.99 and eps of 1e-3, in that order.
7. The method of claim 1, wherein the triple loss layer in (4) is trained on the encoder using a triple loss function L using the mean vector output μ of the conditional encoder as an inputtriThe mathematical expression of (a) is:
wherein d (·,. cndot.) represents l2A distance function, E (-) represents a potential embedding function to obtain the mean vector μ,which represents a fixed sample of the specimen that is,which is indicative of a positive sample,representing negative samples and δ representing an edge value.
8. The method according to claim 1, wherein the KL loss layer in (4) is formed by applying a loss function LKLDetermining a lower bound of variation, LKLThe mathematical expression of (a) is:
wherein, thetaEParameter, theta, representing a conditional encoder networkD'A parameter indicative of a network of decoders,indicating expectation, ximgAnd xsemRespectively representing RGB image characteristics and semantic characteristics, KL (. | ·) represents solving KL divergence, Q (z | x)img,xsem) Representing the output variation distribution of the encoder network,a posteriori distribution, P (x), representing the semantic feature xsemimg|z,xsem) Representing the distribution of output conditions of the decoder network.
9. The method of claim 1, wherein the decoder in (4) consists of a first fully-connected layer with an input dimension of 1324 and an output dimension of 4096, a non-linear active layer ReLU, a second fully-connected layer with an output dimension of 4096, and a non-linear active layer ReLU, in that order.
10. The method of claim 1, wherein the regressor in (4) consists of a first fully-connected layer with an input dimension of 4096 and an output dimension of 2048, a nonlinear active layer ReLU, a second fully-connected layer with an output dimension of 300, and a nonlinear active layer Tanh in that order.
CN201910442481.4A 2019-05-25 2019-05-25 The zero sample Sketch Searching method based on semantic confrontation network Pending CN110175251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910442481.4A CN110175251A (en) 2019-05-25 2019-05-25 The zero sample Sketch Searching method based on semantic confrontation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910442481.4A CN110175251A (en) 2019-05-25 2019-05-25 The zero sample Sketch Searching method based on semantic confrontation network

Publications (1)

Publication Number Publication Date
CN110175251A true CN110175251A (en) 2019-08-27

Family

ID=67695694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910442481.4A Pending CN110175251A (en) 2019-05-25 2019-05-25 The zero sample Sketch Searching method based on semantic confrontation network

Country Status (1)

Country Link
CN (1) CN110175251A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634170A (en) * 2019-08-30 2019-12-31 福建帝视信息科技有限公司 Photo-level image generation method based on semantic content and rapid image retrieval
CN110648294A (en) * 2019-09-19 2020-01-03 北京百度网讯科技有限公司 Image restoration method and device and electronic equipment
CN111274424A (en) * 2020-01-08 2020-06-12 大连理工大学 Semantic enhanced hash method for zero sample image retrieval
CN111274430A (en) * 2020-01-19 2020-06-12 易拍全球(北京)科贸有限公司 Porcelain field image retrieval algorithm based on feature reconstruction supervision
CN111291212A (en) * 2020-01-24 2020-06-16 复旦大学 Zero sample sketch image retrieval method and system based on graph convolution neural network
CN111898645A (en) * 2020-07-03 2020-11-06 贵州大学 Movable sample attack resisting method based on attention mechanism
CN111914929A (en) * 2020-07-30 2020-11-10 南京邮电大学 Zero sample learning method
CN111915693A (en) * 2020-05-22 2020-11-10 中国科学院计算技术研究所 Sketch-based face image generation method and system
CN112101470A (en) * 2020-09-18 2020-12-18 上海电力大学 Guide zero sample identification method based on multi-channel Gauss GAN
CN112364894A (en) * 2020-10-23 2021-02-12 天津大学 Zero sample image classification method of countermeasure network based on meta-learning
CN112686277A (en) * 2019-10-18 2021-04-20 北京大学 Method and device for model training
CN113128530A (en) * 2019-12-30 2021-07-16 上海高德威智能交通系统有限公司 Data classification method and device
CN113361251A (en) * 2021-05-13 2021-09-07 山东师范大学 Text image generation method and system based on multi-stage generation countermeasure network
CN113392906A (en) * 2021-06-16 2021-09-14 西华大学 Confrontation sample recovery method and system based on image high-order guide coding recombination
CN113393546A (en) * 2021-05-17 2021-09-14 杭州电子科技大学 Fashion clothing image generation method based on clothing category and texture pattern control
CN113435396A (en) * 2021-07-13 2021-09-24 大连海洋大学 Underwater fish school detection method based on image self-adaptive noise resistance
CN113628329A (en) * 2021-08-20 2021-11-09 天津大学 Zero-sample sketch three-dimensional point cloud retrieval method
CN113723431A (en) * 2021-09-01 2021-11-30 上海云从汇临人工智能科技有限公司 Image recognition method, image recognition device and computer-readable storage medium
CN113722528A (en) * 2021-08-03 2021-11-30 南京邮电大学 Method and system for rapidly retrieving photos facing sketch
CN113903043A (en) * 2021-12-11 2022-01-07 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN115146711A (en) * 2022-06-15 2022-10-04 北京芯联心科技发展有限公司 Cross-modal data retrieval method and system
CN115496824A (en) * 2022-09-27 2022-12-20 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN115878833A (en) * 2023-02-20 2023-03-31 中山大学 Appearance patent image retrieval method and system based on hand-drawn sketch semantics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976435A (en) * 2010-10-07 2011-02-16 西安电子科技大学 Combination learning super-resolution method based on dual constraint
CN104751182A (en) * 2015-04-02 2015-07-01 中国人民解放军空军工程大学 DDAG-based SVM multi-class classification active learning algorithm
US10248664B1 (en) * 2018-07-02 2019-04-02 Inception Institute Of Artificial Intelligence Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976435A (en) * 2010-10-07 2011-02-16 西安电子科技大学 Combination learning super-resolution method based on dual constraint
CN104751182A (en) * 2015-04-02 2015-07-01 中国人民解放军空军工程大学 DDAG-based SVM multi-class classification active learning algorithm
US10248664B1 (en) * 2018-07-02 2019-04-02 Inception Institute Of Artificial Intelligence Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DIZAJI K G.ET AL.: "Unsupervised deep generative adversarial hashing network", 《IEEE》 *
XINXUN XU ET AL.: "Semantic Adversarial Network for Zero-Shot Sketch-Based Image Retrieval", 《ARXIVPREPRINT ARVIV》 *

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634170A (en) * 2019-08-30 2019-12-31 福建帝视信息科技有限公司 Photo-level image generation method based on semantic content and rapid image retrieval
CN110634170B (en) * 2019-08-30 2022-09-13 福建帝视信息科技有限公司 Photo-level image generation method based on semantic content and rapid image retrieval
CN110648294A (en) * 2019-09-19 2020-01-03 北京百度网讯科技有限公司 Image restoration method and device and electronic equipment
CN110648294B (en) * 2019-09-19 2022-08-30 北京百度网讯科技有限公司 Image restoration method and device and electronic equipment
CN112686277A (en) * 2019-10-18 2021-04-20 北京大学 Method and device for model training
CN113128530B (en) * 2019-12-30 2023-11-03 上海高德威智能交通系统有限公司 Data classification method and device
CN113128530A (en) * 2019-12-30 2021-07-16 上海高德威智能交通系统有限公司 Data classification method and device
CN111274424A (en) * 2020-01-08 2020-06-12 大连理工大学 Semantic enhanced hash method for zero sample image retrieval
CN111274424B (en) * 2020-01-08 2021-01-19 大连理工大学 Semantic enhanced hash method for zero sample image retrieval
CN111274430A (en) * 2020-01-19 2020-06-12 易拍全球(北京)科贸有限公司 Porcelain field image retrieval algorithm based on feature reconstruction supervision
CN111291212A (en) * 2020-01-24 2020-06-16 复旦大学 Zero sample sketch image retrieval method and system based on graph convolution neural network
CN111291212B (en) * 2020-01-24 2022-10-11 复旦大学 Zero sample sketch image retrieval method and system based on graph convolution neural network
CN111915693B (en) * 2020-05-22 2023-10-24 中国科学院计算技术研究所 Sketch-based face image generation method and sketch-based face image generation system
CN111915693A (en) * 2020-05-22 2020-11-10 中国科学院计算技术研究所 Sketch-based face image generation method and system
CN111898645A (en) * 2020-07-03 2020-11-06 贵州大学 Movable sample attack resisting method based on attention mechanism
CN111914929A (en) * 2020-07-30 2020-11-10 南京邮电大学 Zero sample learning method
CN111914929B (en) * 2020-07-30 2022-08-23 南京邮电大学 Zero sample learning method
CN112101470A (en) * 2020-09-18 2020-12-18 上海电力大学 Guide zero sample identification method based on multi-channel Gauss GAN
CN112101470B (en) * 2020-09-18 2023-04-11 上海电力大学 Guide zero sample identification method based on multi-channel Gauss GAN
CN112364894A (en) * 2020-10-23 2021-02-12 天津大学 Zero sample image classification method of countermeasure network based on meta-learning
CN113361251B (en) * 2021-05-13 2023-06-30 山东师范大学 Text generation image method and system based on multi-stage generation countermeasure network
CN113361251A (en) * 2021-05-13 2021-09-07 山东师范大学 Text image generation method and system based on multi-stage generation countermeasure network
CN113393546A (en) * 2021-05-17 2021-09-14 杭州电子科技大学 Fashion clothing image generation method based on clothing category and texture pattern control
CN113393546B (en) * 2021-05-17 2024-02-02 杭州电子科技大学 Fashion clothing image generation method based on clothing type and texture pattern control
CN113392906B (en) * 2021-06-16 2022-04-22 西华大学 Confrontation sample recovery method and system based on image high-order guide coding recombination
CN113392906A (en) * 2021-06-16 2021-09-14 西华大学 Confrontation sample recovery method and system based on image high-order guide coding recombination
CN113435396A (en) * 2021-07-13 2021-09-24 大连海洋大学 Underwater fish school detection method based on image self-adaptive noise resistance
CN113435396B (en) * 2021-07-13 2022-05-20 大连海洋大学 Underwater fish school detection method based on image self-adaptive noise resistance
CN113722528A (en) * 2021-08-03 2021-11-30 南京邮电大学 Method and system for rapidly retrieving photos facing sketch
CN113628329A (en) * 2021-08-20 2021-11-09 天津大学 Zero-sample sketch three-dimensional point cloud retrieval method
CN113628329B (en) * 2021-08-20 2023-06-06 天津大学 Zero-sample sketch three-dimensional point cloud retrieval method
CN113723431A (en) * 2021-09-01 2021-11-30 上海云从汇临人工智能科技有限公司 Image recognition method, image recognition device and computer-readable storage medium
CN113723431B (en) * 2021-09-01 2023-08-18 上海云从汇临人工智能科技有限公司 Image recognition method, apparatus and computer readable storage medium
CN113903043A (en) * 2021-12-11 2022-01-07 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN113903043B (en) * 2021-12-11 2022-05-06 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN115146711A (en) * 2022-06-15 2022-10-04 北京芯联心科技发展有限公司 Cross-modal data retrieval method and system
CN115496824B (en) * 2022-09-27 2023-08-18 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN115496824A (en) * 2022-09-27 2022-12-20 北京航空航天大学 Multi-class object-level natural image generation method based on hand drawing
CN115878833B (en) * 2023-02-20 2023-06-13 中山大学 Appearance patent image retrieval method and system based on hand-drawn sketch semantics
CN115878833A (en) * 2023-02-20 2023-03-31 中山大学 Appearance patent image retrieval method and system based on hand-drawn sketch semantics

Similar Documents

Publication Publication Date Title
CN110175251A (en) The zero sample Sketch Searching method based on semantic confrontation network
Hussain et al. A real time face emotion classification and recognition using deep learning model
CN109948425B (en) Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching
CN112990054B (en) Compact linguistics-free facial expression embedding and novel triple training scheme
Kadam et al. Detection and localization of multiple image splicing using MobileNet V1
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN107944459A (en) A kind of RGB D object identification methods
CN110674685B (en) Human body analysis segmentation model and method based on edge information enhancement
CN112257665A (en) Image content recognition method, image recognition model training method, and medium
CN106408037A (en) Image recognition method and apparatus
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN113269224A (en) Scene image classification method, system and storage medium
Xu et al. A novel image feature extraction algorithm based on the fusion AutoEncoder and CNN
CN114495163B (en) Pedestrian re-identification generation learning method based on category activation mapping
CN113408651B (en) Unsupervised three-dimensional object classification method based on local discriminant enhancement
CN114299590A (en) Training method of face completion model, face completion method and system
Abdelaziz et al. Few-shot learning with saliency maps as additional visual information
Liu et al. A3GAN: An attribute-aware attentive generative adversarial network for face aging
CN113191381B (en) Image zero-order classification model based on cross knowledge and classification method thereof
CN112463936A (en) Visual question answering method and system based on three-dimensional information
Ghorai et al. Bishnupur heritage image dataset (BHID) a resource for various computer vision applications
Kim What makes the difference in visual styles of comics: from classification to style transfer
Guan et al. Synthetic region screening and adaptive feature fusion for constructing a flexible object detection database
CN117541810B (en) Three-dimensional feature extraction method, three-dimensional feature extraction device, electronic equipment and readable storage medium
US20240169701A1 (en) Affordance-based reposing of an object in a scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190827

WD01 Invention patent application deemed withdrawn after publication