CN112883216B - Semi-supervised image retrieval method and device based on disturbance consistency self-integration - Google Patents

Semi-supervised image retrieval method and device based on disturbance consistency self-integration Download PDF

Info

Publication number
CN112883216B
CN112883216B CN202110226266.8A CN202110226266A CN112883216B CN 112883216 B CN112883216 B CN 112883216B CN 202110226266 A CN202110226266 A CN 202110226266A CN 112883216 B CN112883216 B CN 112883216B
Authority
CN
China
Prior art keywords
data
image
semi
hash
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110226266.8A
Other languages
Chinese (zh)
Other versions
CN112883216A (en
Inventor
周玉灿
程帅
吴大衍
李波
王伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202110226266.8A priority Critical patent/CN112883216B/en
Publication of CN112883216A publication Critical patent/CN112883216A/en
Application granted granted Critical
Publication of CN112883216B publication Critical patent/CN112883216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a semi-supervised image retrieval method and device based on disturbance consistency self-integration, which comprises the steps of inputting an image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: the system comprises a convolution neural network, a hash layer and a disturbance consistency self-integration module; converting the characteristics of the image into a discrete binary hash code of the image; and retrieving according to the binary hash code to obtain an image retrieval result. The method can discover the distinguishing characteristics of each category by integrating the characteristics of the same sample under different data enhancement conditions; the similarity between the output of the hash layer of the unmarked data and the corresponding integrated features is maximized through a designed disturbance consistency loss function, and the generalization capability of the unmarked data is fully utilized to promote the network; better search effect can be obtained.

Description

Semi-supervised image retrieval method and device based on disturbance consistency self-integration
Technical Field
The invention belongs to the technical field of software, and particularly relates to a semi-supervised image retrieval method and device based on disturbance consistency self-integration.
Background
With the explosive growth of image data on the internet, the huge amount of image data and high-dimensional image features make image retrieval face a huge challenge. The deep hash method is a research hotspot in recent years due to the characteristics of low storage cost and high retrieval speed.
Generally, the deep hash method maps high-dimensional real-value image features into compact binary hash codes to realize quick retrieval, and utilizes semantic similarity of images to constrain the hash codes in the mapping process to ensure retrieval accuracy. In a big data environment, the supervised hash method usually depends on a large amount of labeled image data to obtain higher retrieval accuracy, and the performance of the supervised hash method is greatly reduced when only a small amount of labeled data exists. Chinese patent application CN109800314A discloses a method for generating hash codes for image retrieval using a deep convolutional network, in which a hash layer is added before a classification layer, and the output of the hash layer is binarized to obtain the hash codes of images, but in this application, a large amount of labeled data is used to train a hash model to obtain better retrieval performance, but in an actual scene, a large amount of data is labeled, and huge manpower and material resources are consumed. Therefore, a deep semi-supervised hashing method is proposed, which learns a better hash function with a small amount of labeled data and a large amount of unlabeled data.
The existing semi-supervised hash method mainly utilizes the visual similarity of unmarked data and marked data to guide the learning of the unmarked data hash code, and realizes the hash function learning by keeping the visual neighbor relation between unmarked samples and marked samples in the hash space. Therefore, many researchers are trying to construct reliable sample proximity relations. These research efforts can be broadly divided into graph-based approaches and relationship-based approaches. Graph-based methods construct an approximate graph using visual similarity between samples, where nodes on the graph represent labeled data and unlabeled data, and edges on the graph reflect visual similarity between samples. The method based on relationship consistency adopts a self-integration model to generate the integrated features of each sample, and the visual similarity of the integrated features between paired samples is used for representing the semantic similarity relationship between the samples.
At present, the semi-supervised hashing method uses visual similarity among samples to represent semantic similarity among the samples, but the visual similarity cannot reflect the real semantic similarity among the samples, and two samples with similar visual information may come from two different categories. Therefore, guiding the learning of the hash code by using wrong visual similarity can cause the similarity of the hash code learned by the two samples to be inconsistent with the real semantic similarity relationship.
Disclosure of Invention
Aiming at the problems of the existing method, the invention aims to design a semi-supervised image retrieval method and device based on disturbance consistency self-integration.
The technical content of the invention comprises:
a semi-supervised image retrieval method based on disturbance consistency self-integration comprises the following steps:
1) inputting the image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: a convolutional neural network, a hash layer and a disturbance consistency self-integration module, wherein the semi-supervised image feature extraction model is trained by using a small amount of marked data and a large amount of unmarked data as follows:
1.1) training a pre-training convolutional neural network and a hash layer by using a small amount of marked data to obtain a preliminarily trained convolutional neural network and a hash layer;
1.2) maximizing unmarked data x by perturbing the consistency self-integration module k Hash layer output of h k And integration features
Figure BDA0002956408200000021
Training the preliminarily trained convolutional neural network and Hash layer to obtain the trained convolutional neural network and Hash layer, and generating integrated features
Figure BDA0002956408200000022
Where t is the number of iterations and k is the number of unmarked data, integration features
Figure BDA0002956408200000023
Through h k And
Figure BDA0002956408200000024
obtaining the result by weighted summation;
2) converting the characteristics of the image into a binary hash code with discrete image;
3) and searching according to the binary hash code to obtain an image searching result.
And further, before the marked data and the unmarked data are input into the trained convolutional neural network, respectively acquiring the enhanced data of the marked data and the unmarked data, and training the enhanced data of the marked data and the unmarked data to obtain the semi-supervised image feature extraction model.
Further, the semi-supervised image feature extraction model further comprises a classification layer; before the convolutional neural network and the hash layer which are initially trained through label-free data training, the classification layer is trained by using fc7 features corresponding to the label data to obtain the trained classification layer, wherein the fc7 features are full-connection layer output of the convolutional neural network.
Further, a classification loss function L for classification training is performed c =∑ j∈L -y j logf j Wherein y is j For marked data x j True mark of f j For marked data x j J is the number of the marked data, and L is the marked data set.
Further, the loss function is maintained by pairwise similarity
Figure BDA0002956408200000025
The hash layer is trained by the labeled data, wherein S is a semantic similarity matrix,
Figure BDA0002956408200000026
h i and h j Respectively being marked data x i And x j And (4) outputting the hash layer.
Further, the disturbance consistency self-integration module further comprises a memory bank (memory bank); will integrate features
Figure BDA0002956408200000031
And storing the information in the memory bank.
Further, integrating features
Figure BDA0002956408200000032
Where α is the coefficient of momentum.
Further, by perturbing the consistency loss function
Figure BDA0002956408200000033
Maximizing unmarked data x k Hash layer output h of k And integration features
Figure BDA0002956408200000034
Where U is the label-free data set, μ is the scaling factor,
Figure BDA0002956408200000035
Figure BDA0002956408200000036
alpha is the momentum coefficient.
Further, the method for converting the hash layer output characteristics of the image into the discrete binary hash code of the image comprises the following steps: inputting features of an image into a sign function
Figure BDA0002956408200000037
In (1).
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following positive effects:
1) by integrating the hash layer characteristics of the same sample under different data enhancement conditions, the distinguishing characteristics of each category can be found;
2) the similarity between the hash layer output of the unmarked data and the corresponding integrated features is maximized through the designed disturbance consistency loss function, and the generalization capability of the unmarked data is fully utilized to improve the network;
3) better search effect can be obtained.
Drawings
FIG. 1 is a diagram of a semi-supervised hashing framework in accordance with the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a method for maximizing similarity between hash layer output of unmarked data and corresponding integrated features, which can improve generalization capability of a network, and designs a Self-integrated semi-supervised hash framework (DCSE) based on Disturbance consistency, as shown in FIG. 1. The frame comprises three parts: (1) a backbone network comprising a convolutional neural network, a hash layer, and a classification layer. (2) A pairwise similarity preserving loss function and a classification loss function for learning hash codes and performing image classification on the labeled data sets. (3) The module integrates the network output of the same unmarked sample under different data enhancement conditions to form a global feature, and then maximizes the similarity between the network output of the sample and the corresponding integrated feature by using a designed disturbance consistency loss function.
The specific method is that labeled data and unlabeled data under different data enhancement conditions are input into a neural network to obtain fc7 layer characteristics.
In the marked data stream, the output fc7 feature of the fully-connected layer of the marked data is transferred to a classification layer for classification, and the classification loss function is as follows:
L c =∑ j∈L -y j logf j (1)
wherein y is j And f j Is the mark data x j True labels and classification layer prediction results, L denotes the label dataset. Simultaneously fc7 features of the label data are transferred to the hash layer for hash code learning, and the pairwise similarity preserving loss function is as follows:
Figure BDA0002956408200000041
wherein
Figure BDA0002956408200000042
h i Is the mark data x i S is the semantic similarity matrix if sample x i And x j Have the same class, then S ij 1, otherwise S ij =0。
In the label-free data stream, a memory space (memory bank) is established for storing the global features of each sample integration, and a novel disturbance consistency loss function L is designed u To maximize the current unmarked sample x k Output h of k And corresponding integration features
Figure BDA0002956408200000043
The similarity of (c).
Figure BDA0002956408200000044
Wherein
Figure BDA0002956408200000045
μ is the scaling factor. The memory bank is then updated using Exponential Moving Average (EMA), i.e. by equation (4).
Figure BDA0002956408200000046
Wherein
Figure BDA0002956408200000047
Is x k At the integration feature of t iterative training, α is the momentum coefficient.
When image retrieval is actually carried out, image features output by a semi-supervised Hash framework Hash layer are input into a symbolic function
Figure BDA0002956408200000048
And obtaining the binary hash code with the discrete image, and searching according to the binary hash code with the discrete image to obtain an image searching result.
To validate the present invention, we performed a number of experiments to evaluate the search effect of DCSE. Our model was trained and tested on the image dataset CIFAR-10 and NUS-WIDE. Wherein, the CIFAR-10 has 60000 images, and we randomly select 100 images of each class as a query set, and the rest of the pictures as a search set, wherein 500 images of each class are selected as a labeled data set in the search set, and the rest of the pictures are taken as unlabeled data sets. The NUS-WIDE data set contains about 270000 pictures and we select the 21 categories that appear the most, with at least 5000 pictures per category. And then randomly selecting 100 pictures of each type as a query set, and taking the rest pictures as a retrieval set. In the training phase, 500 pieces of labeled data sets are randomly selected from each type in the search set, and the rest labeled data sets are selected. Our base network uses pre-trained VGG 16.
Table 1 shows mAP results on CIFAR-10 and NUS-WIDE for DCSE and other image retrieval methods, including: locality Sensitive Hashing (LSH), iterative quantization (ITQ), Supervised Discrete Hashing (SDH), Convolutional Neural Network Hashing (CNNH), Network In Network Hashing (NINH), semi-supervised deep hashing (SSDH), Bipartite Graph Deep Hashing (BGDH), semi-supervised generative countermeasures hashing (SSGAH), semi-supervised deep pairwise hashing (SSDPH), Generalized Product Quantization (GPQ). The experimental results show that the invention is superior to other comparison methods.
Table 2 shows the results of ablation experiments for DCSE, DCSE-1 being a variation of DCSE removal perturbation consistency self-integrating module. Experimental results show that the disturbance consistency self-integration module provided by the invention obviously improves the semi-supervised retrieval performance.
Table 3 shows the results of an unseen class of experiment in which we used 75% of the classes in the dataset for training and the remaining 25% for testing. Specifically, we divide the data set into 4 parts: train75, test75, train25 and test25, wherein train75 and test75 belong to the 75% of the category in the data set and train25 and test25 belong to the 25% of the category in the data set. We make train75 as the labeled training set, train25 and test75 as the search set, and test25 as the query set. Experimental results show that the method is superior to other comparison methods.
Figure BDA0002956408200000051
Figure BDA0002956408200000061
Table 1 mAP results for different bit lengths on two data sets for different methods
Figure BDA0002956408200000062
TABLE 2 ablation test results
Figure BDA0002956408200000063
Table 3 results of unsettled type experiments
The above embodiments only express the embodiments of the present invention, and the description is specific, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (5)

1. A semi-supervised image retrieval method based on disturbance consistency self-integration comprises the following steps:
1) inputting the image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: the semi-supervised image feature extraction model is trained by using a small amount of marked data and a large amount of unmarked data as follows:
1.1) inputting marked data and unmarked data under different data enhancement conditions into the convolutional neural network to obtain fc7 layer characteristics, wherein the fc7 layer characteristics are full-connection layer output of the convolutional neural network;
1.2) respectively transferring fc7 layer characteristics of the marked data to the classification layer for classification learning and the hash layer for hash code learning, wherein the loss function of the classification learning is L c =∑ j∈L -y j logf j ,y j And f j Is the mark data x j The true label and classification layer prediction results, L represents the labeled data set, and the loss function of hash code learning is
Figure FDA0003688276480000011
Figure FDA0003688276480000012
Parameter(s)
Figure FDA0003688276480000013
h i Is the mark data x i Hash layer output of S ij Is an element in the semantic similarity matrix S and when marking data x i And mark data x j When they are of the same class, S ij 1, otherwise S ij =0;
1.3) the disturbance consistencySelf-integration module for same unmarked sample x under different data enhancement conditions k Hash layer output h of k Performing integration to form a global feature, and maximizing unmarked sample x by using perturbation consistency loss function k Hash layer output h of k Similarity to corresponding integrated features, said perturbation consistency loss function
Figure FDA0003688276480000014
Figure FDA0003688276480000015
Mu is a scaling factor, U is a label-free data set, and the integration features are updated using exponential moving averages
Figure FDA0003688276480000016
Figure FDA0003688276480000017
Alpha is a momentum coefficient, and t represents the iteration number in training;
2) converting the characteristics of the image into a binary hash code with discrete image;
3) and retrieving according to the binary hash code to obtain an image retrieval result.
2. The method of claim 1, wherein before the labeled data and the unlabeled data are input into the trained convolutional neural network, enhanced data of the labeled data and the unlabeled data are respectively obtained, and the semi-supervised image feature extraction model is obtained through training of the enhanced data of the labeled data and the unlabeled data.
3. The method of claim 1, wherein the perturb-consistent self-integrating module further comprises a memory space; will integrate features
Figure FDA0003688276480000018
And storing in the storage space.
4. The method of claim 1, wherein converting hash-layer output features of an image into a discrete binary hash code of the image comprises: inputting features of an image into a symbolic function
Figure FDA0003688276480000019
In (1).
5. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-4.
CN202110226266.8A 2021-03-01 2021-03-01 Semi-supervised image retrieval method and device based on disturbance consistency self-integration Active CN112883216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110226266.8A CN112883216B (en) 2021-03-01 2021-03-01 Semi-supervised image retrieval method and device based on disturbance consistency self-integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110226266.8A CN112883216B (en) 2021-03-01 2021-03-01 Semi-supervised image retrieval method and device based on disturbance consistency self-integration

Publications (2)

Publication Number Publication Date
CN112883216A CN112883216A (en) 2021-06-01
CN112883216B true CN112883216B (en) 2022-09-16

Family

ID=76055106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110226266.8A Active CN112883216B (en) 2021-03-01 2021-03-01 Semi-supervised image retrieval method and device based on disturbance consistency self-integration

Country Status (1)

Country Link
CN (1) CN112883216B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762393B (en) * 2021-09-08 2024-04-30 杭州网易智企科技有限公司 Model training method, gaze point detection method, medium, device and computing equipment
CN114972118B (en) * 2022-06-30 2023-04-28 抖音视界有限公司 Noise reduction method and device for inspection image, readable medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109241313A (en) * 2018-08-14 2019-01-18 大连大学 A kind of image search method based on the study of high-order depth Hash
CN110309331A (en) * 2019-07-04 2019-10-08 哈尔滨工业大学(深圳) A kind of cross-module state depth Hash search method based on self-supervisory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512273A (en) * 2015-12-03 2016-04-20 中山大学 Image retrieval method based on variable-length depth hash learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109241313A (en) * 2018-08-14 2019-01-18 大连大学 A kind of image search method based on the study of high-order depth Hash
CN110309331A (en) * 2019-07-04 2019-10-08 哈尔滨工业大学(深圳) A kind of cross-module state depth Hash search method based on self-supervisory

Also Published As

Publication number Publication date
CN112883216A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109165306B (en) Image retrieval method based on multitask Hash learning
Cao et al. Deep visual-semantic quantization for efficient image retrieval
Wang et al. Semi-supervised hashing for scalable image retrieval
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN111046179B (en) Text classification method for open network question in specific field
CN111914156A (en) Cross-modal retrieval method and system for self-adaptive label perception graph convolution network
EP3166020A1 (en) Method and apparatus for image classification based on dictionary learning
Wu et al. Distance metric learning from uncertain side information with application to automated photo tagging
CN112883216B (en) Semi-supervised image retrieval method and device based on disturbance consistency self-integration
US11803971B2 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
Ma et al. A weighted KNN-based automatic image annotation method
Sumbul et al. Deep learning for image search and retrieval in large remote sensing archives
Niu et al. Knowledge-based topic model for unsupervised object discovery and localization
CN112507912B (en) Method and device for identifying illegal pictures
CN114461804B (en) Text classification method, classifier and system based on key information and dynamic routing
Zhang et al. ObjectPatchNet: Towards scalable and semantic image annotation and retrieval
CN112163114B (en) Image retrieval method based on feature fusion
Shen et al. DSRPH: deep semantic-aware ranking preserving hashing for efficient multi-label image retrieval
Yu et al. Text-image matching for cross-modal remote sensing image retrieval via graph neural network
Zhang et al. Image region annotation based on segmentation and semantic correlation analysis
Dong et al. Training inter-related classifiers for automatic image classification and annotation
Tian et al. Automatic image annotation with real-world community contributed data set
CN115994239A (en) Prototype comparison learning-based semi-supervised remote sensing image retrieval method and system
CN116363460A (en) High-resolution remote sensing sample labeling method based on topic model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant