CN110175251A - The zero sample Sketch Searching method based on semantic confrontation network - Google Patents
The zero sample Sketch Searching method based on semantic confrontation network Download PDFInfo
- Publication number
- CN110175251A CN110175251A CN201910442481.4A CN201910442481A CN110175251A CN 110175251 A CN110175251 A CN 110175251A CN 201910442481 A CN201910442481 A CN 201910442481A CN 110175251 A CN110175251 A CN 110175251A
- Authority
- CN
- China
- Prior art keywords
- semantic
- network
- sketch
- layer
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 69
- 239000013598 vector Substances 0.000 claims description 46
- 238000000605 extraction Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 2
- 230000009849 deactivation Effects 0.000 claims description 2
- 230000000007 visual effect Effects 0.000 abstract description 6
- 238000003745 diagnosis Methods 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract 1
- 238000004088 simulation Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention proposes a kind of zero sample Sketch Searching method based on semantic confrontation network, mainly solve the problems, such as that prior art sketch variance within clusters are larger and the lower visual knowledge of zero sample setting is difficult to move to from known class and has no class.Its scheme are as follows: obtain training sample set;The semantic confrontation network of building, extracts RGB image feature by VGG16 network;Building generates network to generate the RGB image feature with identification;By the semantic confrontation network generative semantics feature of sketch input to be retrieved, semantic feature and random Gaussian input are generated and generate RGB image feature in network, is found in image retrieval library and obtains search result with most like preceding 200 images of RGB image feature.Present invention reduces the variance within clusters of sketch characteristics of image, it can guarantee the RGB image feature generated in each classification according to sketch image, improve the retrieval performance of zero sample Sketch Searching, can be used for e-commerce, medical diagnosis, remotely sensed image.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a zero-sample sketch retrieval method which can be used for electronic commerce, medical diagnosis and remote sensing imaging.
Background
The sketch retrieval refers to retrieving real natural images according to the hand-drawn sketch. The zero sample sketch retrieval method is a method for retrieving real natural images of hand-drawn sketches of unknown classes. The existing sketch retrieval method mainly comprises two types: features based on artificial design and methods based on deep learning. The method based on artificial design features comprises a gradient field HOG descriptor and a SIFT descriptor, while the method based on deep learning comprises a twin network, a triplet network, a deep sketch hash and the like, and the main ideas of the methods are to extract discriminant features of images or text information and project the discriminant features to a common feature space for similarity measurement. However, the existing sketch retrieval method is premised on that all the categories are required to be known in the training stage, so that the scale of the training data cannot be guaranteed to cover all the categories in a real scene, and the retrieval performance is sharply reduced when the categories are not found in the test. Meanwhile, different people have different understandings on the sketch, so that the intra-class variance of the drawn sketch is large, and the task of sketch retrieval is more challenging.
The zero sample sketch retrieval is to realize the visual knowledge migration from a known category to an unseen category under the setting of a zero sample, thereby solving the problem of the existing sketch retrieval. Currently, researchers have proposed two methods for Zero sample Sketch retrieval, for example, an article entitled "Zero-Shot Sketch-Image Hashing" published by Yuming Shen and Li Liu et al in the Computer Vision and pattern recognition conference of 2018 discloses a Zero sample Sketch hash retrieval method, which constructs an end-to-end three-network framework, wherein the first two networks are binary encoders, the third network utilizes a kronecker fusion layer and a graph convolution, reduces heterogeneity of Sketch images, enhances semantic relation between data, and also proposes a hash generation method for reconstructing semantic knowledge representation of Zero sample retrieval; an article entitled "a Zero-Shot frame for Sketch-Based Image Retrieval" published at the European Conference on Computer Vision Conference of 2018 by saii Kiran yelarthi et al discloses a method of generating a model Based on a depth condition against an automatic encoder and a variation automatic encoder, which takes a Sketch feature vector as an input, randomly fills missing information using the generated model to generate a natural Image feature vector, and then retrieves an Image from a database using these generated natural Image feature vectors. Although the above methods achieve good performance, neither method takes into account the problem of large variance in the sketch class, so that semantic information extracted by a pre-trained convolutional neural network has weak discrimination capability, and it is difficult to accurately migrate the visual knowledge of the sketch from the known class to the unseen class.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a zero sample sketch retrieval method based on a semantic confrontation network, so that better discriminant semantic information is extracted through a pre-trained convolutional neural network, and the visual knowledge of the sketch is accurately transferred from a known class to an unseen class.
The technical idea of the invention is that the semantic features of the sketch are learned by adopting a semantic countermeasure module in an end-to-end semantic countermeasure network, so that the intra-class variance of the sketch features is reduced; by adding the triple loss into the generation module, the identifiability of the RGB image features generated in each category is ensured, so that the problem that visual identification is difficult to migrate from a known category to an unseen category under zero sample setting is solved.
According to the above thought, the implementation steps of the invention include the following:
(1) obtaining a training sample set:
(1a) respectively extracting 10,400 RGB images and corresponding 10,400 binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
(1b) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
(1c) 298,478 images after random horizontal turning are resized to 224 multiplied by 224, and 298,478 images are respectively formed into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2;
(2) Constructing a semantic countermeasure network:
setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein:
the semantic feature extraction network is used for extracting semantic features of the binary sketch image;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image;
a semantic discriminator for performing countercheck learning on the semantic features of the extracted draft image and the word vectors corresponding to the class marks through a countercheck loss Ladv(θS,θD) Parameters of the semantic feature extraction network are updated, and the judgment of semantic features of the output sketch image is improved;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning;
(3) performing feature extraction on the RGB images in the training sample set:
(3a) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on an ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
(3b) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096;
(4) constructing a generating network:
constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
a coordinate layer for extracting output sketch semantic feature vector x of the network from the semantic featuressemAnd RGB image feature vector ximgCarrying out dimensional splicing;
a conditional coder for distributing the data P (x) with the output of the concatenate layer as inputimg,xsem) Obtaining prior distribution P (z) of hidden latent variable z after passing through a conditional coder, and calculating a mean vector mu and a standard deviation vector sigma of the prior distribution P (z);
a triple loss layer for keeping the discriminability of the generated features in each training class, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function, wherein the loss function of the loss layer is Ltri;
KL loss layer for distributing P (x) dataimg,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation;
a decoder for learning the potential vector z with the dimension of 1024 to obtain the semantic feature x with the dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesThe mathematical expression of the decoding process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
an image reconstruction loss layer for ensuring that the generated RGB image features have sufficient discriminability, using a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
a regressor for converting the output of the decoderAs input, semantic features are reconstructed by a regressorThe mathematical expression of the regression process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
semantic reconstruction loss layer to guarantee generated RGB image featuresCategory level semantic information can be saved, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemSemantic features representing sketches;
(5) training the semantic countermeasure network and the generation network:
(5a) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
(5b) let the loss function of the whole network be L ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem;
(5c) Taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, and realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L to obtain the trained semantic countermeasure network and the generation network;
(6) carrying out zero sample sketch retrieval on the sketch image to be retrieved:
(6a) extracting a sketch image from a test sample set which is not intersected with the training sample set, and cutting the sketch image to obtain a sketch image to be retrieved;
(6b) inputting the sketch image to be retrieved into a trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
(6c) splicing the semantic feature vectors and the random Gaussian noise, inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
(6d) and taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, and searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance.
Compared with the prior art, the invention has the following advantages:
in the training stage, by means of the advantages of category-level semantic information, the semantic countermeasure module in the end-to-end semantic countermeasure network is adopted to learn the semantic features of the sketch, so that the intra-category variance of the sketch image features is reduced; and triple loss is added in a generating network, so that the identifiability of the RGB image features generated in each class is ensured, and the problem that visual identification is difficult to migrate from a known class to an unseen class under zero sample setting is solved.
Compared with the prior art, the method simplifies the training process and effectively improves the retrieval performance of zero-sample sketch retrieval.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a graph comparing the search results of the present invention with the conventional method.
Detailed description of the preferred embodiments
The invention is described in further detail below with reference to the following figures and specific implementations:
referring to fig. 1, the zero sample sketch retrieval method based on the semantic countermeasure network of the invention comprises the following implementation steps:
step 1, a training sample set is obtained.
1.1) respectively extracting 10,400 RGB images and 10,400 corresponding binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
1.2) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
1.3) resizing 298,478 images after random horizontal flipping to 224 × 224, and respectively forming 298,478 images into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2:
Wherein,for the ith RGB image in the Sketchy database,is composed ofA binary sketch image of the corresponding category,for the jth RGB image in the TU-Berlin database,is composed ofAnd (4) corresponding to the binary sketch image of the category.
And 2, constructing a semantic countermeasure network.
Setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein:
the semantic feature extraction network is used for extracting semantic features of the binary sketch image, specifically is a VGG16 network pre-trained on ImageNet, selects a fifth convolutional layer of the VGG16 network as convolutional output, and outputs a semantic feature vector with the dimension of 300 through a full connection layer;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image, and acquiring category-level word vector representation with the dimension of 300 by adopting a word vector model pre-trained on Wikipedia;
a semantic discriminator used for carrying out counterstudy on the semantic features of the extracted draft image and the word vectors corresponding to the class marks, updating the parameters of the semantic feature extraction network through a counterstudy loss function, and improving the discrimination of the semantic features of the output draft image, wherein the loss function Ladv(θS,θD) The mathematical expression of (a) is:
wherein,indicating expectations, y indicates class semantic information for the sketch, W (-) indicates word embedding into the network,representing a semantic discriminator, θDPresentation languageThe parameters of the sense discriminator are defined,representing a semantic feature extraction network, θSParameters, x, representing a semantic feature extraction networkskeRepresenting a sketch image;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning.
And 3, extracting the characteristics of the RGB images in the training sample set.
3.1) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
3.2) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096.
And 4, constructing a generating network.
Constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
the coordinate layer is used for extracting a sketch semantic feature vector x with the output dimension of 300 of a network for semantic feature extractionsemAnd a RGB image feature vector x with a dimension of 4096imgPerforming dimension splicing, and outputting a feature vector with a dimension of 4396;
the conditional coder comprises a first full connection layer with an input dimension of 4396 and an output dimension of 4096, a nonlinear active layer ReLU, and a one-dimensional encoder with momentum parameters of 0.99 and eps of 1e-3The data distribution system comprises a batch normalization layer, a Dropout layer with the deactivation rate of 0.3, a second full-connection layer with the output dimension of 2048, a nonlinear activation layer ReLU and a one-dimensional batch normalization layer with the momentum parameter of 0.99 and the eps being 1e-3, and is used for taking the output of the concatenate layer as input to enable the data distribution P (x is the x value of the data distribution layer) to be distributedimg,xsem) Obtaining a mean vector mu and a standard deviation vector sigma through a conditional coder to form prior distribution P (z) of a hidden latent variable z;
the triple loss layer is used for keeping the discriminability of the generated features in each training category, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function LtriThe mathematical expression of (a) is:
wherein d (·,. cndot.) representsA distance function, E (-) represents a potential embedding function to obtain the mean vector μ,which represents a fixed sample of the specimen that is,which is indicative of a positive sample,represents a negative sample, δ represents an edge value;
the KL loss layer is used for enabling the data distribution P (x)img,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation, LKLThe mathematical expression of (a) is:
wherein,θEparameter, theta, representing a conditional encoder networkD'A parameter indicative of a network of decoders,indicating expectation, ximgAnd xsemRespectively representing RGB image characteristics and semantic characteristics, KL (. | ·) represents solving KL divergence, Q (z | x)img,xsem) Representing the output variation distribution of the encoder network,a posteriori distribution, P (x), representing the semantic feature xsemimg|z,xsem) Representing the distribution of output conditions of the decoder network;
the decoder consists of a first full-link layer with an input dimension of 1324 and an output dimension of 4096, a nonlinear active layer ReLU, a second full-link layer with an output dimension of 4096 and the nonlinear active layer ReLU in sequence, and is used for learning a potential vector z with a dimension of 1024 to obtain a semantic feature x with a dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesGeneratingThe mathematical expression of (a) is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
the image reconstruction loss layer is used for ensuring that the generated RGB image features have enough discriminability, and uses a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
the regressor consists of a first full-link layer with an input dimension of 4096 and an output dimension of 2048, a nonlinear active layer ReLU, a second full-link layer with an output dimension of 300 and a nonlinear active layer Tanh in sequence, and is used for outputting the output of the decoderAs input, semantic features are reconstructed by a regressorReconstructionThe mathematical expression of (a) is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
the semantic reconstruction loss layer is used for ensuring that the generated RGB image features can store category-level semantic information, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemRepresenting semantic features of the sketch.
And 5, training the semantic countermeasure network and the generation network.
5.1) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
5.2) setting the loss function of the whole network as: l ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem;
5.3) taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L, and adopting an Adam optimizer in a deep learning toolbox PyTorch when training the network, wherein the initial learning rate is 0.0001, and the initial learning rate is β1=0.5,β20.99, and for the stability of training, training the semantic countermeasure network and the generation network alternately in the first 2 times of training, and training the whole network in an end-to-end mode in the next 18 times of training, wherein the training is performed for 20 times in total, so that the trained semantic countermeasure network and the generation network are obtained.
And 6, carrying out zero sample sketch retrieval on the sketch image to be retrieved.
6.1) extracting a sketch image from a test sample set which is not intersected with the training sample set in category, and cutting the sketch image to obtain a sketch image to be retrieved;
6.2) inputting the sketch image to be retrieved into the trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
6.3) splicing the semantic feature vectors and the random Gaussian noise and inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
6.4) taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance, and finally calculating the retrieval precision according to the 200 retrieved images.
The technical effects of the present invention will be further explained below by combining with simulation experiments.
1. Simulation conditions are as follows:
the simulation experiment is carried out by using a GPU with the model number of NVIDIA GTX TITAN V and based on a tool box PyTorch of deep learning.
2. Simulation content:
the invention carries out simulation experiments on two data sets Sketchy and TU-Berlin which are disclosed to be specially used for the performance test of a sketch retrieval method, wherein:
the data set Sketchy contains 75,479 sketch images and 73,002 RGB images from 125 different classes, and 104 training classes in 125 classes are used as known classes and 21 test classes are used as unseen classes according to the experimental setting of standard zero sample learning;
the data set TU-Berlin contains 20,000 sketch images and 204,070 RGB images from 250 different classes, 194 training classes out of the 250 classes as known classes and 56 test classes as unseen classes according to the experimental setup of standard zero sample learning.
The results of simulation comparison experiments on the two public data sets Sketchy and TU-Berlin by using the method and the prior sketch retrieval method and zero sample learning method based on the deep convolutional neural network are shown in the table 1.
TABLE 1
Precision @200 and mAP @200 in Table 1 are the precision and average precision means, respectively, for the top 200 retrieved images.
As can be seen from the simulation results in Table 1, the accuracy and average accuracy mean of the present invention on both data sets is higher than the accuracy and average accuracy mean of the prior art on both data sets.
The retrieval results of the present invention and the best CVAE method in the prior art are visualized on the Sketchy data set, and the results are shown in fig. 2 by comparing the top 10 images out of the top 200 images retrieved.
As can be seen from FIG. 2, when the sketch pictures of 3 different test categories are searched, the top 10 searched pictures and the sketch pictures of the invention belong to the same category, and the searched result of the CVAE method has the picture with the wrong search.
Claims (10)
1. A zero sample sketch retrieval method based on a semantic countermeasure network is characterized by comprising the following steps:
(1) obtaining a training sample set:
(1a) respectively extracting 10,400 RGB images and corresponding 10,400 binary sketch images from a Sketchy sketch retrieval database to form a pair of first training samples; respectively extracting 138,839 RGB images and 138,839 binary sketch images of corresponding categories from a TU-Berlin sketch retrieval database to form a pair of second training samples;
(1b) randomly and horizontally turning all 298,478 extracted pictures to obtain 298,478 randomly and horizontally turned images;
(1c) 298,478 images after random horizontal turning are resized to 224 multiplied by 224, and 298,478 images are respectively formed into a training sample set S containing a first training sample1And a training sample set S comprising second training samples2:
(2) Constructing a semantic countermeasure network:
setting a semantic countermeasure network consisting of a semantic feature extraction network, a word embedding network and a semantic discriminator, wherein,
the semantic feature extraction network is used for extracting semantic features of the binary sketch image;
the word embedding network is used for extracting word vectors of category information corresponding to the binary sketch image;
a semantic discriminator for performing countercheck learning on the semantic features of the extracted draft image and the word vectors corresponding to the class marks through a countercheck loss Ladv(θS,θD) Parameters of the semantic feature extraction network are updated, and the judgment of semantic features of the output sketch image is improved;
the output of the semantic feature extraction network and the word embedding network in the semantic countermeasure network are input into a semantic discriminator for countermeasure learning;
(3) performing feature extraction on the RGB images in the training sample set:
(3a) performing feature extraction on the RGB images in the first training sample set by using a VGG16 network pre-trained on an ImageNet data set, and selecting the output of a second full-connection layer in the network as the final RGB image feature of the first training sample set, wherein the dimension of the image feature is 4096;
(3b) performing feature extraction on the RGB images in the second training sample set by using a VGG16 network pre-trained on the ImageNet data set, and selecting the output of a second full connection layer in the network as the final RGB image feature of the second training sample set, wherein the dimension of the image feature is 4096;
(4) constructing a generating network:
constructing a generation network sequentially consisting of a concatenate layer, a conditional encoder, a triple loss layer, a KL loss layer, a decoder, an image reconstruction loss layer, a regressor and a semantic reconstruction loss layer, wherein:
a coordinate layer for extracting output sketch semantic feature vector x of the network from the semantic featuressemAnd RGB image feature vector ximgCarrying out dimensional splicing;
a conditional coder for distributing the data P (x) with the output of the concatenate layer as inputimg,xsem) Obtaining a mean vector mu and a standard deviation vector sigma through a conditional coder to form prior distribution P (z) of a hidden latent variable z;
a triple loss layer for keeping the discriminability of the generated features in each training class, taking the mean vector output mu of the conditional encoder as input, and training the encoder by using a triple loss function, wherein the loss function of the loss layer is Ltri;
KL loss layer for distributing P (x) dataimg,xsem) And a variation distribution Q (z | x)img,xsem) Approximation, then by applying a loss function LKLDetermining a lower bound of variation;
a decoder for learning the potential vector z with the dimension of 1024 to obtain the semantic feature x with the dimension of 300semStitching as input to generate RGB image features corresponding to the sketch imagesThe mathematical expression of the decoding process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,represents a decoder;
an image reconstruction loss layer for ensuring that the generated RGB image features have enough discriminationSex, using a reconstruction loss function:the decoder is trained, wherein,representing RGB image features, x, corresponding to the generated sketch imageimgRepresenting the characteristics of the original RGB image,represents a 2 norm;
a regressor for converting the output of the decoderAs input, semantic features are reconstructed by a regressorThe mathematical expression of the regression process is:
wherein noise represents random Gaussian noise Z-N (0,1), the noise dimension is 1024,representing a regressor;
semantic reconstruction loss layer to guarantee generated RGB image featuresCategory level semantic information can be saved, and the loss function of the layer is as follows:wherein,representing reconstructed sketch semantic features, xsemSemantic features representing sketches;
(5) training the semantic countermeasure network and the generation network:
(5a) initializing the semantic countermeasure network and the generation network, wherein network parameters adopted during random initialization obey Gaussian distribution with the mean value of 0 and the standard deviation of 0.1 to obtain the initialized semantic countermeasure network and the generation network;
(5b) let the loss function of the whole network be L ═ Ladv+Ltri+LKL+Lrecon_img+Lrecon_sem;
(5c) Taking the sketch image preprocessed in the step 1 and the corresponding category information thereof as input data of an initialized semantic countermeasure network, outputting semantic features corresponding to the sketch, taking the semantic features corresponding to the sketch and RGB image features extracted by using a pre-trained VGG16 network as input data of a generation network, and realizing training of the semantic countermeasure network and the generation network by minimizing a loss function L to obtain the trained semantic countermeasure network and the generation network;
(6) carrying out zero sample sketch retrieval on the sketch image to be retrieved:
(6a) extracting a sketch image from a test sample set which is not intersected with the training sample set, and cutting the sketch image to obtain a sketch image to be retrieved;
(6b) inputting the sketch image to be retrieved into a trained semantic feature extraction network, and outputting a semantic feature vector corresponding to the sketch image;
(6c) splicing the semantic feature vectors and the random Gaussian noise, inputting the spliced semantic feature vectors and the random Gaussian noise into a trained generation network, and generating RGB image features corresponding to a plurality of sketches through an encoder and a decoder;
(6d) and taking the average value of the multiple generated RGB image characteristics as the final RGB image characteristics, and searching the first 200 images which are most similar to the generated final RGB image characteristics in the image retrieval library according to the cosine distance.
2. According to the rightThe method of claim 1, wherein the training sample set S of the first training sample in (1c)1And a training sample set S of second training samples2Respectively, as follows:
wherein,for the ith RGB image in the Sketchy database,is composed ofA binary sketch image of the corresponding category,for the jth RGB image in the TU-Berlin database,is composed ofAnd (4) corresponding to the binary sketch image of the category.
3. The method of claim 1, wherein the semantic feature extraction network in (2) adopts a VGG16 network pre-trained on ImageNet data set, and selects a fifth convolutional layer of the VGG16 network as convolutional output, and outputs a semantic feature vector with dimension of 300 through a full connection layer.
4. The method of claim 1, wherein the word embedding network in (2) employs a word vector model pre-trained on wikipedia to obtain a class-level word vector representation with dimension 300.
5. The method according to claim 1, wherein the semantic classifier in (2) comprises a first fully-connected layer with an input dimension of 300, a first sigmoid nonlinear activation layer with an output dimension of 200, and a second fully-connected layer with an output dimension of 1, and the output of the semantic classifier updates the parameters of the semantic feature extraction network through a countermeasure loss LadvThe mathematical expression of (a) is:
wherein,indicating expectations, y indicates class semantic information for the sketch, W (-) indicates word embedding into the network,representing a semantic discriminator, θDA parameter representing a semantic discriminator,representing a semantic feature extraction network, θSParameters, x, representing a semantic feature extraction networkskeRepresenting a sketch image.
6. The method according to claim 1, wherein the conditional encoder in (4) is composed of a first fully-connected layer with input dimension of 4396 and output dimension of 4096, a nonlinear active layer ReLU, a one-dimensional batch normalization layer with momentum parameters of 0.99 and eps of 1e-3, a Dropout layer with deactivation rate of 0.3, a second fully-connected layer with output dimension of 2048, a nonlinear active layer ReLU, and a one-dimensional batch normalization layer with momentum parameters of 0.99 and eps of 1e-3, in that order.
7. The method of claim 1, wherein the triple loss layer in (4) is trained on the encoder using a triple loss function L using the mean vector output μ of the conditional encoder as an inputtriThe mathematical expression of (a) is:
wherein d (·,. cndot.) represents l2A distance function, E (-) represents a potential embedding function to obtain the mean vector μ,which represents a fixed sample of the specimen that is,which is indicative of a positive sample,representing negative samples and δ representing an edge value.
8. The method according to claim 1, wherein the KL loss layer in (4) is formed by applying a loss function LKLDetermining a lower bound of variation, LKLThe mathematical expression of (a) is:
wherein, thetaEParameter, theta, representing a conditional encoder networkD'A parameter indicative of a network of decoders,indicating expectation, ximgAnd xsemRespectively representing RGB image characteristics and semantic characteristics, KL (. | ·) represents solving KL divergence, Q (z | x)img,xsem) Representing the output variation distribution of the encoder network,a posteriori distribution, P (x), representing the semantic feature xsemimg|z,xsem) Representing the distribution of output conditions of the decoder network.
9. The method of claim 1, wherein the decoder in (4) consists of a first fully-connected layer with an input dimension of 1324 and an output dimension of 4096, a non-linear active layer ReLU, a second fully-connected layer with an output dimension of 4096, and a non-linear active layer ReLU, in that order.
10. The method of claim 1, wherein the regressor in (4) consists of a first fully-connected layer with an input dimension of 4096 and an output dimension of 2048, a nonlinear active layer ReLU, a second fully-connected layer with an output dimension of 300, and a nonlinear active layer Tanh in that order.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910442481.4A CN110175251A (en) | 2019-05-25 | 2019-05-25 | The zero sample Sketch Searching method based on semantic confrontation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910442481.4A CN110175251A (en) | 2019-05-25 | 2019-05-25 | The zero sample Sketch Searching method based on semantic confrontation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110175251A true CN110175251A (en) | 2019-08-27 |
Family
ID=67695694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910442481.4A Pending CN110175251A (en) | 2019-05-25 | 2019-05-25 | The zero sample Sketch Searching method based on semantic confrontation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110175251A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634170A (en) * | 2019-08-30 | 2019-12-31 | 福建帝视信息科技有限公司 | Photo-level image generation method based on semantic content and rapid image retrieval |
CN110648294A (en) * | 2019-09-19 | 2020-01-03 | 北京百度网讯科技有限公司 | Image restoration method and device and electronic equipment |
CN111274424A (en) * | 2020-01-08 | 2020-06-12 | 大连理工大学 | Semantic enhanced hash method for zero sample image retrieval |
CN111274430A (en) * | 2020-01-19 | 2020-06-12 | 易拍全球(北京)科贸有限公司 | Porcelain field image retrieval algorithm based on feature reconstruction supervision |
CN111291212A (en) * | 2020-01-24 | 2020-06-16 | 复旦大学 | Zero sample sketch image retrieval method and system based on graph convolution neural network |
CN111898645A (en) * | 2020-07-03 | 2020-11-06 | 贵州大学 | Movable sample attack resisting method based on attention mechanism |
CN111914929A (en) * | 2020-07-30 | 2020-11-10 | 南京邮电大学 | Zero sample learning method |
CN111915693A (en) * | 2020-05-22 | 2020-11-10 | 中国科学院计算技术研究所 | Sketch-based face image generation method and system |
CN112101470A (en) * | 2020-09-18 | 2020-12-18 | 上海电力大学 | Guide zero sample identification method based on multi-channel Gauss GAN |
CN112364894A (en) * | 2020-10-23 | 2021-02-12 | 天津大学 | Zero sample image classification method of countermeasure network based on meta-learning |
CN112686277A (en) * | 2019-10-18 | 2021-04-20 | 北京大学 | Method and device for model training |
CN113128530A (en) * | 2019-12-30 | 2021-07-16 | 上海高德威智能交通系统有限公司 | Data classification method and device |
CN113361251A (en) * | 2021-05-13 | 2021-09-07 | 山东师范大学 | Text image generation method and system based on multi-stage generation countermeasure network |
CN113392906A (en) * | 2021-06-16 | 2021-09-14 | 西华大学 | Confrontation sample recovery method and system based on image high-order guide coding recombination |
CN113393546A (en) * | 2021-05-17 | 2021-09-14 | 杭州电子科技大学 | Fashion clothing image generation method based on clothing category and texture pattern control |
CN113435396A (en) * | 2021-07-13 | 2021-09-24 | 大连海洋大学 | Underwater fish school detection method based on image self-adaptive noise resistance |
CN113628329A (en) * | 2021-08-20 | 2021-11-09 | 天津大学 | Zero-sample sketch three-dimensional point cloud retrieval method |
CN113723431A (en) * | 2021-09-01 | 2021-11-30 | 上海云从汇临人工智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN113722528A (en) * | 2021-08-03 | 2021-11-30 | 南京邮电大学 | Method and system for rapidly retrieving photos facing sketch |
CN113903043A (en) * | 2021-12-11 | 2022-01-07 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
CN115146711A (en) * | 2022-06-15 | 2022-10-04 | 北京芯联心科技发展有限公司 | Cross-modal data retrieval method and system |
CN115496824A (en) * | 2022-09-27 | 2022-12-20 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN115878833A (en) * | 2023-02-20 | 2023-03-31 | 中山大学 | Appearance patent image retrieval method and system based on hand-drawn sketch semantics |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976435A (en) * | 2010-10-07 | 2011-02-16 | 西安电子科技大学 | Combination learning super-resolution method based on dual constraint |
CN104751182A (en) * | 2015-04-02 | 2015-07-01 | 中国人民解放军空军工程大学 | DDAG-based SVM multi-class classification active learning algorithm |
US10248664B1 (en) * | 2018-07-02 | 2019-04-02 | Inception Institute Of Artificial Intelligence | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval |
-
2019
- 2019-05-25 CN CN201910442481.4A patent/CN110175251A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101976435A (en) * | 2010-10-07 | 2011-02-16 | 西安电子科技大学 | Combination learning super-resolution method based on dual constraint |
CN104751182A (en) * | 2015-04-02 | 2015-07-01 | 中国人民解放军空军工程大学 | DDAG-based SVM multi-class classification active learning algorithm |
US10248664B1 (en) * | 2018-07-02 | 2019-04-02 | Inception Institute Of Artificial Intelligence | Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval |
Non-Patent Citations (2)
Title |
---|
DIZAJI K G.ET AL.: "Unsupervised deep generative adversarial hashing network", 《IEEE》 * |
XINXUN XU ET AL.: "Semantic Adversarial Network for Zero-Shot Sketch-Based Image Retrieval", 《ARXIVPREPRINT ARVIV》 * |
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110634170A (en) * | 2019-08-30 | 2019-12-31 | 福建帝视信息科技有限公司 | Photo-level image generation method based on semantic content and rapid image retrieval |
CN110634170B (en) * | 2019-08-30 | 2022-09-13 | 福建帝视信息科技有限公司 | Photo-level image generation method based on semantic content and rapid image retrieval |
CN110648294A (en) * | 2019-09-19 | 2020-01-03 | 北京百度网讯科技有限公司 | Image restoration method and device and electronic equipment |
CN110648294B (en) * | 2019-09-19 | 2022-08-30 | 北京百度网讯科技有限公司 | Image restoration method and device and electronic equipment |
CN112686277A (en) * | 2019-10-18 | 2021-04-20 | 北京大学 | Method and device for model training |
CN113128530B (en) * | 2019-12-30 | 2023-11-03 | 上海高德威智能交通系统有限公司 | Data classification method and device |
CN113128530A (en) * | 2019-12-30 | 2021-07-16 | 上海高德威智能交通系统有限公司 | Data classification method and device |
CN111274424A (en) * | 2020-01-08 | 2020-06-12 | 大连理工大学 | Semantic enhanced hash method for zero sample image retrieval |
CN111274424B (en) * | 2020-01-08 | 2021-01-19 | 大连理工大学 | Semantic enhanced hash method for zero sample image retrieval |
CN111274430A (en) * | 2020-01-19 | 2020-06-12 | 易拍全球(北京)科贸有限公司 | Porcelain field image retrieval algorithm based on feature reconstruction supervision |
CN111291212A (en) * | 2020-01-24 | 2020-06-16 | 复旦大学 | Zero sample sketch image retrieval method and system based on graph convolution neural network |
CN111291212B (en) * | 2020-01-24 | 2022-10-11 | 复旦大学 | Zero sample sketch image retrieval method and system based on graph convolution neural network |
CN111915693B (en) * | 2020-05-22 | 2023-10-24 | 中国科学院计算技术研究所 | Sketch-based face image generation method and sketch-based face image generation system |
CN111915693A (en) * | 2020-05-22 | 2020-11-10 | 中国科学院计算技术研究所 | Sketch-based face image generation method and system |
CN111898645A (en) * | 2020-07-03 | 2020-11-06 | 贵州大学 | Movable sample attack resisting method based on attention mechanism |
CN111914929A (en) * | 2020-07-30 | 2020-11-10 | 南京邮电大学 | Zero sample learning method |
CN111914929B (en) * | 2020-07-30 | 2022-08-23 | 南京邮电大学 | Zero sample learning method |
CN112101470A (en) * | 2020-09-18 | 2020-12-18 | 上海电力大学 | Guide zero sample identification method based on multi-channel Gauss GAN |
CN112101470B (en) * | 2020-09-18 | 2023-04-11 | 上海电力大学 | Guide zero sample identification method based on multi-channel Gauss GAN |
CN112364894A (en) * | 2020-10-23 | 2021-02-12 | 天津大学 | Zero sample image classification method of countermeasure network based on meta-learning |
CN113361251B (en) * | 2021-05-13 | 2023-06-30 | 山东师范大学 | Text generation image method and system based on multi-stage generation countermeasure network |
CN113361251A (en) * | 2021-05-13 | 2021-09-07 | 山东师范大学 | Text image generation method and system based on multi-stage generation countermeasure network |
CN113393546A (en) * | 2021-05-17 | 2021-09-14 | 杭州电子科技大学 | Fashion clothing image generation method based on clothing category and texture pattern control |
CN113393546B (en) * | 2021-05-17 | 2024-02-02 | 杭州电子科技大学 | Fashion clothing image generation method based on clothing type and texture pattern control |
CN113392906B (en) * | 2021-06-16 | 2022-04-22 | 西华大学 | Confrontation sample recovery method and system based on image high-order guide coding recombination |
CN113392906A (en) * | 2021-06-16 | 2021-09-14 | 西华大学 | Confrontation sample recovery method and system based on image high-order guide coding recombination |
CN113435396A (en) * | 2021-07-13 | 2021-09-24 | 大连海洋大学 | Underwater fish school detection method based on image self-adaptive noise resistance |
CN113435396B (en) * | 2021-07-13 | 2022-05-20 | 大连海洋大学 | Underwater fish school detection method based on image self-adaptive noise resistance |
CN113722528A (en) * | 2021-08-03 | 2021-11-30 | 南京邮电大学 | Method and system for rapidly retrieving photos facing sketch |
CN113628329A (en) * | 2021-08-20 | 2021-11-09 | 天津大学 | Zero-sample sketch three-dimensional point cloud retrieval method |
CN113628329B (en) * | 2021-08-20 | 2023-06-06 | 天津大学 | Zero-sample sketch three-dimensional point cloud retrieval method |
CN113723431A (en) * | 2021-09-01 | 2021-11-30 | 上海云从汇临人工智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN113723431B (en) * | 2021-09-01 | 2023-08-18 | 上海云从汇临人工智能科技有限公司 | Image recognition method, apparatus and computer readable storage medium |
CN113903043A (en) * | 2021-12-11 | 2022-01-07 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
CN113903043B (en) * | 2021-12-11 | 2022-05-06 | 绵阳职业技术学院 | Method for identifying printed Chinese character font based on twin metric model |
CN115146711A (en) * | 2022-06-15 | 2022-10-04 | 北京芯联心科技发展有限公司 | Cross-modal data retrieval method and system |
CN115496824B (en) * | 2022-09-27 | 2023-08-18 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN115496824A (en) * | 2022-09-27 | 2022-12-20 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN115878833B (en) * | 2023-02-20 | 2023-06-13 | 中山大学 | Appearance patent image retrieval method and system based on hand-drawn sketch semantics |
CN115878833A (en) * | 2023-02-20 | 2023-03-31 | 中山大学 | Appearance patent image retrieval method and system based on hand-drawn sketch semantics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175251A (en) | The zero sample Sketch Searching method based on semantic confrontation network | |
Hussain et al. | A real time face emotion classification and recognition using deep learning model | |
CN109948425B (en) | Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching | |
CN112990054B (en) | Compact linguistics-free facial expression embedding and novel triple training scheme | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
CN108595558B (en) | Image annotation method based on data equalization strategy and multi-feature fusion | |
CN107944459A (en) | A kind of RGB D object identification methods | |
CN110674685B (en) | Human body analysis segmentation model and method based on edge information enhancement | |
CN112257665A (en) | Image content recognition method, image recognition model training method, and medium | |
CN106408037A (en) | Image recognition method and apparatus | |
CN117033609B (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
Xu et al. | A novel image feature extraction algorithm based on the fusion AutoEncoder and CNN | |
CN114495163B (en) | Pedestrian re-identification generation learning method based on category activation mapping | |
CN113408651B (en) | Unsupervised three-dimensional object classification method based on local discriminant enhancement | |
CN114299590A (en) | Training method of face completion model, face completion method and system | |
Abdelaziz et al. | Few-shot learning with saliency maps as additional visual information | |
Liu et al. | A3GAN: An attribute-aware attentive generative adversarial network for face aging | |
CN113191381B (en) | Image zero-order classification model based on cross knowledge and classification method thereof | |
CN112463936A (en) | Visual question answering method and system based on three-dimensional information | |
Ghorai et al. | Bishnupur heritage image dataset (BHID) a resource for various computer vision applications | |
Kim | What makes the difference in visual styles of comics: from classification to style transfer | |
Guan et al. | Synthetic region screening and adaptive feature fusion for constructing a flexible object detection database | |
CN117541810B (en) | Three-dimensional feature extraction method, three-dimensional feature extraction device, electronic equipment and readable storage medium | |
US20240169701A1 (en) | Affordance-based reposing of an object in a scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190827 |
|
WD01 | Invention patent application deemed withdrawn after publication |