CN108446334B - Image retrieval method based on content for unsupervised countermeasure training - Google Patents

Image retrieval method based on content for unsupervised countermeasure training Download PDF

Info

Publication number
CN108446334B
CN108446334B CN201810154813.4A CN201810154813A CN108446334B CN 108446334 B CN108446334 B CN 108446334B CN 201810154813 A CN201810154813 A CN 201810154813A CN 108446334 B CN108446334 B CN 108446334B
Authority
CN
China
Prior art keywords
model
data set
picture
pictures
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810154813.4A
Other languages
Chinese (zh)
Other versions
CN108446334A (en
Inventor
白琮
黄玲
郝鹏翼
陈胜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810154813.4A priority Critical patent/CN108446334B/en
Publication of CN108446334A publication Critical patent/CN108446334A/en
Application granted granted Critical
Publication of CN108446334B publication Critical patent/CN108446334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A method of unsupervised confrontation training content-based image retrieval, the method comprising the steps of: step one, network construction, wherein the unsupervised countermeasure network framework is composed of a generation model and a discrimination model. The generation model and the discrimination model are both formed by three layers of fully-connected networks; step two, preprocessing a data set; step three, network training, the process is as follows: step 3.1: initializing a generation model and distinguishing model parameters by using random weight; step 3.2: training a generating model; step 3.3: training a discrimination model; and step four, testing the precision. The invention provides the image retrieval method based on the content for the unsupervised countermeasure training, which has better robustness, lower requirement on training data and no need of a large amount of labeled information.

Description

Image retrieval method based on content for unsupervised countermeasure training
Technical Field
The invention relates to multimedia big data processing and analysis in the field of computer vision, in particular to an unsupervised countermeasure content-based picture retrieval method, and belongs to the field of image retrieval.
Background
With the development of network sharing technology, more and more pictures on the network can be shared and received in real time. Content-based image retrieval techniques occupy a significant part of the image processing process. With the rapid development of deep learning methods in recent years, the image retrieval technology performance based on contents is greatly improved thanks to the accurate expression of the depth features to the image contents. But such promotion is based on labeled training. Supervised training methods based on labeled training may not work well in situations where training data labels are not available or training data is low.
Disclosure of Invention
In order to overcome the defects of poor robustness, high requirement on training data and the need of a large amount of labeled information in the conventional picture retrieval technology, the invention provides the content-based image retrieval method for unsupervised countermeasure training, which has better robustness, lower requirement on the training data and no need of a large amount of labeled information.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method of unsupervised confrontation training content-based image retrieval, the method comprising the steps of:
step one, network construction, the process is as follows:
step 1.1: the unsupervised countermeasure network framework is composed of a generative model and a discriminant model. The generation model and the discrimination model are both formed by three layers of fully-connected networks;
step 1.2, a Relu activation function is connected behind a first full connection layer of the generated model;
step 1.3: a tanh activation function is connected behind a second full connection layer of the generated model, and the output is controlled to be {0, 1 };
step 1.4: generating a third full-connection layer of the model and then connecting a distance measurement function;
step 1.5: the first full-connection layer of the discriminant model is followed by Relu activation function
Step 1.6: a tanh activation function is connected behind the second full connection layer of the discrimination model, and the output is controlled to be {0, 1 };
step 1.7: the third full-connection layer of the discrimination model is connected with a similarity score function;
step 1.8: the discrimination model feeds back the calculated similarity score to the generation model;
step two, preprocessing the data set, wherein the process is as follows:
step 2.1: dividing the data into a query data set Q, a test data set Q' and a data set D to be retrieved, wherein a part of pictures are randomly extracted from the data set to be retrieved to serve as a data set F for fine-tuning network parameters when picture features are extracted;
step 2.2: extracting picture characteristics by using a pre-trained VGG model on ImageNet, wherein a small amount of pictures are required to be used for fine-tuning network parameters before the VGG is used for extracting the picture characteristics;
step 2.3: inputting the pictures into an unsupervised countermeasure network in the form of feature vectors;
step three, network training, the process is as follows:
step 3.1: initializing a generation model and distinguishing model parameters by using random weight;
step 3.2: training the generative model, the process is as follows:
step 3.2.1, sending the picture characteristics of the query data set Q and the data set D to be retrieved, which are extracted by the VGG network, into a generation model;
step 3.2.2, generating a model to optimize the weight of the characteristics of the input query data set Q and the data set D to be retrieved;
step 3.2.3: generating a model for each image to be queried, calculating the cosine distance between the model and the image in the data set to be retrieved, converting the similarity into the probability of selecting the image by using a softmax function, and selecting K image features from the data set D to be retrieved as the output of a generator according to the probability;
step 3.2.4: maximizing the difference value between the similarity of the query picture and the selected K pictures and 1 by using a logic loss function;
step 3.3: training a discriminant model by the following process:
step 3.3.1, taking the characteristics of the K pictures returned by the generator as the input of the discriminator, and carrying out weight optimization on the query picture and the characteristics of the K pictures returned by the generator again;
step 3.3.2, recalculating the cosine distance between each inquiry picture and the returned K pictures, and giving a similarity score according to the distance;
step 3.3.3, the judger feeds the calculated similarity score back to the generator, the similarity score is used for selecting the picture to be retrieved next by the generator, and the difference value between the distance between the inquiry picture and the returned K pictures and 0 is reduced by using a logistic regression function;
step 3.4: minimizing loss function using stochastic gradient descent algorithm
Step four, testing the precision, and the process is as follows:
step 4.1: sending the preprocessed test data set Q' into an optimal generator model;
step 4.2: the generator selects the picture with the highest degree of similarity of topK sheets from the data set D to be retrieved according to the given inquiry picture
Step 4.3: comparing whether the tags of the inquired pictures are consistent with the tags of the K pictures returned by the generator or not, and calculating the average accuracy of all the inquired pictures according to the evaluation criteria in the information retrieval;
through the operation of the steps, the retrieval of the test picture can be realized.
The invention has the following beneficial effects: the invention provides an unsupervised confrontation training image retrieval method. Under the condition of inputting label-free data, the generating model and the judging model improve the performance of the generating model and the judging model through maximum minimum confrontation training, wherein the generating model can find out K pictures with the highest similarity with the query picture, and the judging model can judge whether the picture output by the generator is similar to the query picture to the maximum extent. The method solves the problem that a large amount of labeled information is needed in the training process in deep learning, and meanwhile, the generation of the confrontation network is successfully realized in the picture retrieval task based on the content.
Drawings
FIG. 1 is a diagram of an unsupervised countermeasure training picture retrieval network framework used in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a content-based image retrieval method for unsupervised countermeasure training includes four processes of construction of an unsupervised countermeasure training network, data set preprocessing, network training and picture retrieval testing.
The pictures in this embodiment are divided into 10 types, and 600 pictures are provided for each type. Randomly selecting 20 pictures in each type of pictures, and equally dividing the pictures into two parts: the query picture Q and the test picture Q', the remaining 580 pictures constitute a data set D to be retrieved. The picture retrieval network structure framework is shown in fig. 1, and the operation steps comprise four processes of network construction, data set preprocessing, network training and picture retrieval testing.
The unsupervised confrontation training image retrieval method comprises the following steps:
step one, network construction, the process is as follows:
step 1.1: the unsupervised countermeasure network framework is composed of a generative model and a discriminant model. The generation model and the discrimination model are both formed by three layers of fully-connected networks;
step 1.2, setting the number of first full-connection layer neurons of a generated model to be 48, setting the weight value to be W _1, defining the number to be a floating point type variable, setting the bias value to be b _1, defining the number to be the floating point type variable, and then connecting a Relu activation function;
step 1.3: setting the number of neurons of a second full connection layer of the generated model as 32, setting the weight as W _2, defining as a floating point type variable with bias as b _2, and then connecting with a tanh activation function to control the output as {0, 1 };
step 1.4: the number of neurons of the third full connection layer of the generated model is 10, the weight is W _3, the model is defined as a floating point type variable, no bias exists, and a distance measurement function is connected after the floating point type variable is defined;
step 1.5: the number of the first full-connection layer neuron of the discrimination model is set to be 48, the weight value is W _4, the first full-connection layer neuron is defined as a floating point type variable, the bias value is b _4, the first full-connection layer neuron is defined as a floating point type variable, and the first full-connection layer neuron is followed by a Relu activating function
Step 1.6: setting the number of neurons of a second full connection layer of the discrimination model as 32, setting a weight value as W _5, defining as a floating point type variable, setting a bias as b _5, defining as a floating point type variable, and then connecting with a tanh activation function to control the output as {0, 1 };
step 1.7: the number of neurons of the third full connection layer of the generated model is 10, the weight is W _6, the model is defined as a floating point type variable without bias, and then a similarity score function is connected;
step 1.8: the discrimination model feeds back the calculated similarity score to the generation model in the form of a generator loss function weight;
step two, preprocessing the data set, wherein the process is as follows:
step 2.1: the data is divided into a query data set Q, a test data set Q' and a data set D to be retrieved. Randomly extracting 5000 pictures from the data set D to be retrieved to serve as a data set F for fine-tuning network parameters when picture features are extracted;
step 2.2: finely adjusting a pre-trained VGG model on ImageNet by using a data set F, and setting the characteristic dimension of an output picture to be 48 dimensions;
step 2.3: extracting feature vectors corresponding to the query data set Q and the data set D to be retrieved by using a fine-tuned VGG network model, normalizing feature values to be between {0, 1} by using a sigmoid function, and storing the feature vectors into a TXT format file;
step three, network training, the process is as follows:
step 3.1: initializing parameters in a generating model and a judging model by using random weight; setting the generated model to iterate 10 times each time, judging that the model iterates 3 times to be complete network training for one time, and totally performing 5 times of complete training;
step 3.2: training a generating model;
step 3.2.1, setting the learning rate to be 0.0001 and K to be 500;
step 3.2.1, sending the query data set Q and the data set D to be retrieved in the TXT format into a network as the input of a generation model;
step 3.2.2, the generated model utilizes a three-layer full-connection network to carry out weight optimization on the characteristics of the input query data set Q and the data set D to be retrieved;
step 3.2.3: generating a model for each image to be queried, calculating the similarity between the model and all images in a data set to be retrieved, converting the similarity into the probability of selecting the images by using a softmax function, and selecting 500 image features with high similarity probability from the data set D to be retrieved according to the probability as the output of a generator;
step 3.2.4: and (5) maximizing the similarity between the query picture and the selected 500 pictures by using a logistic regression function, and iteratively optimizing the network weight of the generated model. Calculating the average accurate precision of all inquiry pictures according to the pictures output by the generator;
step 3.2.5: using a random gradient descent algorithm to minimize a loss function, iterating for 10 times, and storing a generated network model when the average accuracy of all query pictures is highest;
step 3.3: training a discrimination model;
step 3.3.1, setting the learning rate to be 0.0001;
step 3.3.1, taking the 500 picture features returned by the generator as the input of the discriminator, and carrying out weight optimization on the picture features again by using a three-layer full-connection discrimination model;
step 3.3.2, recalculating the distance between each inquiry picture and the returned 500 pictures, and giving a similarity score according to the distance;
step 3.3.3: minimizing the distance between the query picture and the returned K pictures by using a logistic regression function;
3.3.3, using an advanced gradient descent algorithm ADAM algorithm to minimize a loss function, iterating for 3 times, feeding back the similarity score calculated by the last discriminator to a generator, and directly acting on the optimization of the weight of the generator in a loss function weight form;
step 3.4: saving the optimal generator model as the output of the training;
step four, testing the precision, and the process is as follows:
step 4.1: sending the preprocessed test data set Q' into an optimal generator model;
step 4.2: aiming at a given query picture, a generator selects top500 pictures with highest similarity from a data set D to be retrieved;
step 4.3: comparing whether the tags of the inquired pictures are consistent with the tags of the K pictures returned by the generator or not, calculating the average accuracy of all the inquired pictures according to the evaluation criteria in the information retrieval and outputting test results;
through the operation of the steps, the unsupervised confrontation retrieval of the picture can be realized.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention, and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (1)

1. A method for content-based image retrieval for unsupervised countermeasure training, the method comprising the steps of:
step one, network construction, the process is as follows:
step 1.1: the unsupervised countermeasure network framework consists of a generation model and a discrimination model, wherein the generation model and the discrimination model are both formed by three layers of full-connection networks;
step 1.2, a Relu activation function is connected behind a first full connection layer of the generated model;
step 1.3: a tanh activation function is connected behind a second full connection layer of the generated model, and the output is controlled to be {0, 1 };
step 1.4: generating a cosine distance measurement function after the third full-connection layer of the model is generated;
step 1.5: the first full-connection layer of the discriminant model is followed by Relu activation function
Step 1.6: a tanh activation function is connected behind the second full connection layer of the discrimination model, and the output is controlled to be {0, 1 };
step 1.7: the third full-connection layer of the discrimination model is connected with a similarity score function;
step 1.8: the discrimination model feeds back the calculated similarity score to the generation model;
step two, preprocessing the data set, wherein the process is as follows:
step 2.1: dividing the data into a query data set Q, a test data set Q' and a data set D to be retrieved, wherein a part of pictures are randomly extracted from the data set to be retrieved to serve as a data set F for fine-tuning network parameters when picture features are extracted;
step 2.2: extracting picture features by using a pre-trained VGG model on ImageNet, wherein a small amount of picture data sets F mentioned in the step 2.1 are required to be used for fine-tuning network parameters before the VGG is used for extracting the picture features;
step 2.3: inputting the pictures into an unsupervised countermeasure network in the form of feature vectors;
step three, network training, the process is as follows:
step 3.1: initializing a generation model and distinguishing model parameters by using random weight;
step 3.2: the process of training the generative model is as follows:
step 3.2.1, sending the picture characteristics of the query data set Q and the data set D to be retrieved, which are extracted by the VGG network, into a generation model;
step 3.2.2, generating a model to optimize the weight of the characteristics of the input query data set Q and the data set D to be retrieved;
step 3.2.3: generating a model for each image to be queried, calculating the cosine distance between the image to be queried and the image in the data set to be retrieved, converting the cosine distance into the probability of selecting the image by using a softmax function, and selecting K image features from the data set D to be retrieved as the output of a generator according to the probability;
step 3.2.4: maximizing the difference value between the similarity of the query picture and the selected K pictures and 1 by using a logic loss function;
step 3.3: the process of training the discriminant model is as follows:
step 3.3.1, taking the characteristics of the K pictures returned by the generator as the input of the discriminator, and carrying out weight optimization on the query picture and the characteristics of the K pictures returned by the generator again;
step 3.3.2, recalculating the cosine distance between each inquiry picture and the returned K pictures, and giving a similarity score according to the distance;
step 3.3.3, the judger feeds the calculated similarity score back to the generator, the similarity score is used for selecting the picture to be retrieved next by the generator, and the difference value between the distance between the inquiry picture and the returned K pictures and 0 is reduced by using a logistic regression function;
step 3.4: minimizing a loss function by using a random gradient descent algorithm;
step four, testing the precision, and the process is as follows:
step 4.1: sending the preprocessed test data set Q' into an optimal generator model;
step 4.2: the generator selects a picture with highest topK similarity from the data set D to be retrieved aiming at a given query picture;
step 4.3: comparing whether the tags of the inquired pictures are consistent with the tags of the K pictures returned by the generator or not, and calculating the average accuracy of all the inquired pictures according to the evaluation criteria in the information retrieval;
through the operation of the steps, the retrieval of the test picture can be realized.
CN201810154813.4A 2018-02-23 2018-02-23 Image retrieval method based on content for unsupervised countermeasure training Active CN108446334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810154813.4A CN108446334B (en) 2018-02-23 2018-02-23 Image retrieval method based on content for unsupervised countermeasure training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810154813.4A CN108446334B (en) 2018-02-23 2018-02-23 Image retrieval method based on content for unsupervised countermeasure training

Publications (2)

Publication Number Publication Date
CN108446334A CN108446334A (en) 2018-08-24
CN108446334B true CN108446334B (en) 2021-08-03

Family

ID=63192735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810154813.4A Active CN108446334B (en) 2018-02-23 2018-02-23 Image retrieval method based on content for unsupervised countermeasure training

Country Status (1)

Country Link
CN (1) CN108446334B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543674B (en) * 2018-10-19 2023-04-07 天津大学 Image copy detection method based on generation countermeasure network
CN109635273B (en) * 2018-10-25 2023-04-25 平安科技(深圳)有限公司 Text keyword extraction method, device, equipment and storage medium
CN109785399B (en) * 2018-11-19 2021-01-19 北京航空航天大学 Synthetic lesion image generation method, device, equipment and readable storage medium
CN110287357B (en) * 2019-05-31 2021-05-18 浙江工业大学 Image description generation method for generating countermeasure network based on condition
CN112712094B (en) * 2019-10-24 2024-08-02 北京四维图新科技股份有限公司 Model training method, device, equipment and storage medium
CN113269256B (en) * 2021-05-26 2024-08-27 广州密码营地信息科技有限公司 Construction method and application of MiSrc-GAN medical image model
CN113887504B (en) * 2021-10-22 2023-03-24 大连理工大学 Strong-generalization remote sensing image target identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951919A (en) * 2017-03-02 2017-07-14 浙江工业大学 A kind of flow monitoring implementation method based on confrontation generation network
CN107563428A (en) * 2017-08-25 2018-01-09 西安电子科技大学 Classification of Polarimetric SAR Image method based on generation confrontation network
US10275473B2 (en) * 2017-04-27 2019-04-30 Sk Telecom Co., Ltd. Method for learning cross-domain relations based on generative adversarial networks
CN107464210B (en) * 2017-07-06 2020-02-21 浙江工业大学 Image style migration method based on generating type countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951919A (en) * 2017-03-02 2017-07-14 浙江工业大学 A kind of flow monitoring implementation method based on confrontation generation network
US10275473B2 (en) * 2017-04-27 2019-04-30 Sk Telecom Co., Ltd. Method for learning cross-domain relations based on generative adversarial networks
CN107464210B (en) * 2017-07-06 2020-02-21 浙江工业大学 Image style migration method based on generating type countermeasure network
CN107563428A (en) * 2017-08-25 2018-01-09 西安电子科技大学 Classification of Polarimetric SAR Image method based on generation confrontation network

Also Published As

Publication number Publication date
CN108446334A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108446334B (en) Image retrieval method based on content for unsupervised countermeasure training
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
US20230108692A1 (en) Semi-Supervised Person Re-Identification Using Multi-View Clustering
CN110162593B (en) Search result processing and similarity model training method and device
US20220147836A1 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN108416384B (en) Image label labeling method, system, equipment and readable storage medium
CN111353076B (en) Method for training cross-modal retrieval model, cross-modal retrieval method and related device
CN107657008B (en) Cross-media training and retrieval method based on deep discrimination ranking learning
Huang et al. Cost-effective vehicle type recognition in surveillance images with deep active learning and web data
CN113806746B (en) Malicious code detection method based on improved CNN (CNN) network
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN108399185B (en) Multi-label image binary vector generation method and image semantic similarity query method
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN109635140B (en) Image retrieval method based on deep learning and density peak clustering
CN106503661B (en) Face gender identification method based on fireworks deepness belief network
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN112434628B (en) Small sample image classification method based on active learning and collaborative representation
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN109800768B (en) Hash feature representation learning method of semi-supervised GAN
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN113806580B (en) Cross-modal hash retrieval method based on hierarchical semantic structure
CN112036511B (en) Image retrieval method based on attention mechanism graph convolution neural network
CN113537304A (en) Cross-modal semantic clustering method based on bidirectional CNN
CN112115806A (en) Remote sensing image scene accurate classification method based on Dual-ResNet small sample learning
CN110704665A (en) Image feature expression method and system based on visual attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared