CN108446334B

CN108446334B - Image retrieval method based on content for unsupervised countermeasure training

Info

Publication number: CN108446334B
Application number: CN201810154813.4A
Authority: CN
Inventors: 白琮; 黄玲; 郝鹏翼; 陈胜勇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-02-23
Filing date: 2018-02-23
Publication date: 2021-08-03
Anticipated expiration: 2038-02-23
Also published as: CN108446334A

Abstract

A method of unsupervised confrontation training content-based image retrieval, the method comprising the steps of: step one, network construction, wherein the unsupervised countermeasure network framework is composed of a generation model and a discrimination model. The generation model and the discrimination model are both formed by three layers of fully-connected networks; step two, preprocessing a data set; step three, network training, the process is as follows: step 3.1: initializing a generation model and distinguishing model parameters by using random weight; step 3.2: training a generating model; step 3.3: training a discrimination model; and step four, testing the precision. The invention provides the image retrieval method based on the content for the unsupervised countermeasure training, which has better robustness, lower requirement on training data and no need of a large amount of labeled information.

Description

Image retrieval method based on content for unsupervised countermeasure training

Technical Field

The invention relates to multimedia big data processing and analysis in the field of computer vision, in particular to an unsupervised countermeasure content-based picture retrieval method, and belongs to the field of image retrieval.

Background

With the development of network sharing technology, more and more pictures on the network can be shared and received in real time. Content-based image retrieval techniques occupy a significant part of the image processing process. With the rapid development of deep learning methods in recent years, the image retrieval technology performance based on contents is greatly improved thanks to the accurate expression of the depth features to the image contents. But such promotion is based on labeled training. Supervised training methods based on labeled training may not work well in situations where training data labels are not available or training data is low.

Disclosure of Invention

In order to overcome the defects of poor robustness, high requirement on training data and the need of a large amount of labeled information in the conventional picture retrieval technology, the invention provides the content-based image retrieval method for unsupervised countermeasure training, which has better robustness, lower requirement on the training data and no need of a large amount of labeled information.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method of unsupervised confrontation training content-based image retrieval, the method comprising the steps of:

step one, network construction, the process is as follows:

step 1.1: the unsupervised countermeasure network framework is composed of a generative model and a discriminant model. The generation model and the discrimination model are both formed by three layers of fully-connected networks;

step 1.2, a Relu activation function is connected behind a first full connection layer of the generated model;

step 1.3: a tanh activation function is connected behind a second full connection layer of the generated model, and the output is controlled to be {0, 1 };

step 1.4: generating a third full-connection layer of the model and then connecting a distance measurement function;

step 1.5: the first full-connection layer of the discriminant model is followed by Relu activation function

Step 1.6: a tanh activation function is connected behind the second full connection layer of the discrimination model, and the output is controlled to be {0, 1 };

step 1.7: the third full-connection layer of the discrimination model is connected with a similarity score function;

step 1.8: the discrimination model feeds back the calculated similarity score to the generation model;

step two, preprocessing the data set, wherein the process is as follows:

step 2.1: dividing the data into a query data set Q, a test data set Q' and a data set D to be retrieved, wherein a part of pictures are randomly extracted from the data set to be retrieved to serve as a data set F for fine-tuning network parameters when picture features are extracted;

step 2.2: extracting picture characteristics by using a pre-trained VGG model on ImageNet, wherein a small amount of pictures are required to be used for fine-tuning network parameters before the VGG is used for extracting the picture characteristics;

step 2.3: inputting the pictures into an unsupervised countermeasure network in the form of feature vectors;

step three, network training, the process is as follows:

step 3.1: initializing a generation model and distinguishing model parameters by using random weight;

step 3.2: training the generative model, the process is as follows:

step 3.2.1, sending the picture characteristics of the query data set Q and the data set D to be retrieved, which are extracted by the VGG network, into a generation model;

step 3.2.2, generating a model to optimize the weight of the characteristics of the input query data set Q and the data set D to be retrieved;

step 3.2.3: generating a model for each image to be queried, calculating the cosine distance between the model and the image in the data set to be retrieved, converting the similarity into the probability of selecting the image by using a softmax function, and selecting K image features from the data set D to be retrieved as the output of a generator according to the probability;

step 3.2.4: maximizing the difference value between the similarity of the query picture and the selected K pictures and 1 by using a logic loss function;

step 3.3: training a discriminant model by the following process:

step 3.3.1, taking the characteristics of the K pictures returned by the generator as the input of the discriminator, and carrying out weight optimization on the query picture and the characteristics of the K pictures returned by the generator again;

step 3.3.2, recalculating the cosine distance between each inquiry picture and the returned K pictures, and giving a similarity score according to the distance;

step 3.3.3, the judger feeds the calculated similarity score back to the generator, the similarity score is used for selecting the picture to be retrieved next by the generator, and the difference value between the distance between the inquiry picture and the returned K pictures and 0 is reduced by using a logistic regression function;

step 3.4: minimizing loss function using stochastic gradient descent algorithm

Step four, testing the precision, and the process is as follows:

step 4.1: sending the preprocessed test data set Q' into an optimal generator model;

step 4.2: the generator selects the picture with the highest degree of similarity of topK sheets from the data set D to be retrieved according to the given inquiry picture

Step 4.3: comparing whether the tags of the inquired pictures are consistent with the tags of the K pictures returned by the generator or not, and calculating the average accuracy of all the inquired pictures according to the evaluation criteria in the information retrieval;

through the operation of the steps, the retrieval of the test picture can be realized.

The invention has the following beneficial effects: the invention provides an unsupervised confrontation training image retrieval method. Under the condition of inputting label-free data, the generating model and the judging model improve the performance of the generating model and the judging model through maximum minimum confrontation training, wherein the generating model can find out K pictures with the highest similarity with the query picture, and the judging model can judge whether the picture output by the generator is similar to the query picture to the maximum extent. The method solves the problem that a large amount of labeled information is needed in the training process in deep learning, and meanwhile, the generation of the confrontation network is successfully realized in the picture retrieval task based on the content.

Drawings

FIG. 1 is a diagram of an unsupervised countermeasure training picture retrieval network framework used in the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a content-based image retrieval method for unsupervised countermeasure training includes four processes of construction of an unsupervised countermeasure training network, data set preprocessing, network training and picture retrieval testing.

The pictures in this embodiment are divided into 10 types, and 600 pictures are provided for each type. Randomly selecting 20 pictures in each type of pictures, and equally dividing the pictures into two parts: the query picture Q and the test picture Q', the remaining 580 pictures constitute a data set D to be retrieved. The picture retrieval network structure framework is shown in fig. 1, and the operation steps comprise four processes of network construction, data set preprocessing, network training and picture retrieval testing.

The unsupervised confrontation training image retrieval method comprises the following steps:

step one, network construction, the process is as follows:

step 1.2, setting the number of first full-connection layer neurons of a generated model to be 48, setting the weight value to be W _1, defining the number to be a floating point type variable, setting the bias value to be b _1, defining the number to be the floating point type variable, and then connecting a Relu activation function;

step 1.3: setting the number of neurons of a second full connection layer of the generated model as 32, setting the weight as W _2, defining as a floating point type variable with bias as b _2, and then connecting with a tanh activation function to control the output as {0, 1 };

step 1.4: the number of neurons of the third full connection layer of the generated model is 10, the weight is W _3, the model is defined as a floating point type variable, no bias exists, and a distance measurement function is connected after the floating point type variable is defined;

step 1.5: the number of the first full-connection layer neuron of the discrimination model is set to be 48, the weight value is W _4, the first full-connection layer neuron is defined as a floating point type variable, the bias value is b _4, the first full-connection layer neuron is defined as a floating point type variable, and the first full-connection layer neuron is followed by a Relu activating function

Step 1.6: setting the number of neurons of a second full connection layer of the discrimination model as 32, setting a weight value as W _5, defining as a floating point type variable, setting a bias as b _5, defining as a floating point type variable, and then connecting with a tanh activation function to control the output as {0, 1 };

step 1.7: the number of neurons of the third full connection layer of the generated model is 10, the weight is W _6, the model is defined as a floating point type variable without bias, and then a similarity score function is connected;

step 1.8: the discrimination model feeds back the calculated similarity score to the generation model in the form of a generator loss function weight;

step two, preprocessing the data set, wherein the process is as follows:

step 2.1: the data is divided into a query data set Q, a test data set Q' and a data set D to be retrieved. Randomly extracting 5000 pictures from the data set D to be retrieved to serve as a data set F for fine-tuning network parameters when picture features are extracted;

step 2.2: finely adjusting a pre-trained VGG model on ImageNet by using a data set F, and setting the characteristic dimension of an output picture to be 48 dimensions;

step 2.3: extracting feature vectors corresponding to the query data set Q and the data set D to be retrieved by using a fine-tuned VGG network model, normalizing feature values to be between {0, 1} by using a sigmoid function, and storing the feature vectors into a TXT format file;

step three, network training, the process is as follows:

step 3.1: initializing parameters in a generating model and a judging model by using random weight; setting the generated model to iterate 10 times each time, judging that the model iterates 3 times to be complete network training for one time, and totally performing 5 times of complete training;

step 3.2: training a generating model;

step 3.2.1, setting the learning rate to be 0.0001 and K to be 500;

step 3.2.1, sending the query data set Q and the data set D to be retrieved in the TXT format into a network as the input of a generation model;

step 3.2.2, the generated model utilizes a three-layer full-connection network to carry out weight optimization on the characteristics of the input query data set Q and the data set D to be retrieved;

step 3.2.3: generating a model for each image to be queried, calculating the similarity between the model and all images in a data set to be retrieved, converting the similarity into the probability of selecting the images by using a softmax function, and selecting 500 image features with high similarity probability from the data set D to be retrieved according to the probability as the output of a generator;

step 3.2.4: and (5) maximizing the similarity between the query picture and the selected 500 pictures by using a logistic regression function, and iteratively optimizing the network weight of the generated model. Calculating the average accurate precision of all inquiry pictures according to the pictures output by the generator;

step 3.2.5: using a random gradient descent algorithm to minimize a loss function, iterating for 10 times, and storing a generated network model when the average accuracy of all query pictures is highest;

step 3.3: training a discrimination model;

step 3.3.1, setting the learning rate to be 0.0001;

step 3.3.1, taking the 500 picture features returned by the generator as the input of the discriminator, and carrying out weight optimization on the picture features again by using a three-layer full-connection discrimination model;

step 3.3.2, recalculating the distance between each inquiry picture and the returned 500 pictures, and giving a similarity score according to the distance;

step 3.3.3: minimizing the distance between the query picture and the returned K pictures by using a logistic regression function;

3.3.3, using an advanced gradient descent algorithm ADAM algorithm to minimize a loss function, iterating for 3 times, feeding back the similarity score calculated by the last discriminator to a generator, and directly acting on the optimization of the weight of the generator in a loss function weight form;

step 3.4: saving the optimal generator model as the output of the training;

step four, testing the precision, and the process is as follows:

step 4.2: aiming at a given query picture, a generator selects top500 pictures with highest similarity from a data set D to be retrieved;

step 4.3: comparing whether the tags of the inquired pictures are consistent with the tags of the K pictures returned by the generator or not, calculating the average accuracy of all the inquired pictures according to the evaluation criteria in the information retrieval and outputting test results;

through the operation of the steps, the unsupervised confrontation retrieval of the picture can be realized.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention, and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for content-based image retrieval for unsupervised countermeasure training, the method comprising the steps of:

step one, network construction, the process is as follows:

step 1.1: the unsupervised countermeasure network framework consists of a generation model and a discrimination model, wherein the generation model and the discrimination model are both formed by three layers of full-connection networks;

step 1.4: generating a cosine distance measurement function after the third full-connection layer of the model is generated;

step two, preprocessing the data set, wherein the process is as follows:

step 2.2: extracting picture features by using a pre-trained VGG model on ImageNet, wherein a small amount of picture data sets F mentioned in the step 2.1 are required to be used for fine-tuning network parameters before the VGG is used for extracting the picture features;

step three, network training, the process is as follows:

step 3.2: the process of training the generative model is as follows:

step 3.2.3: generating a model for each image to be queried, calculating the cosine distance between the image to be queried and the image in the data set to be retrieved, converting the cosine distance into the probability of selecting the image by using a softmax function, and selecting K image features from the data set D to be retrieved as the output of a generator according to the probability;

step 3.3: the process of training the discriminant model is as follows:

step 3.4: minimizing a loss function by using a random gradient descent algorithm;

step four, testing the precision, and the process is as follows:

step 4.2: the generator selects a picture with highest topK similarity from the data set D to be retrieved aiming at a given query picture;