CN112883216B

CN112883216B - Semi-supervised image retrieval method and device based on disturbance consistency self-integration

Info

Publication number: CN112883216B
Application number: CN202110226266.8A
Authority: CN
Inventors: 周玉灿; 程帅; 吴大衍; 李波; 王伟平
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2022-09-16
Anticipated expiration: 2041-03-01
Also published as: CN112883216A

Abstract

The invention discloses a semi-supervised image retrieval method and device based on disturbance consistency self-integration, which comprises the steps of inputting an image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: the system comprises a convolution neural network, a hash layer and a disturbance consistency self-integration module; converting the characteristics of the image into a discrete binary hash code of the image; and retrieving according to the binary hash code to obtain an image retrieval result. The method can discover the distinguishing characteristics of each category by integrating the characteristics of the same sample under different data enhancement conditions; the similarity between the output of the hash layer of the unmarked data and the corresponding integrated features is maximized through a designed disturbance consistency loss function, and the generalization capability of the unmarked data is fully utilized to promote the network; better search effect can be obtained.

Description

Semi-supervised image retrieval method and device based on disturbance consistency self-integration

Technical Field

The invention belongs to the technical field of software, and particularly relates to a semi-supervised image retrieval method and device based on disturbance consistency self-integration.

Background

With the explosive growth of image data on the internet, the huge amount of image data and high-dimensional image features make image retrieval face a huge challenge. The deep hash method is a research hotspot in recent years due to the characteristics of low storage cost and high retrieval speed.

Generally, the deep hash method maps high-dimensional real-value image features into compact binary hash codes to realize quick retrieval, and utilizes semantic similarity of images to constrain the hash codes in the mapping process to ensure retrieval accuracy. In a big data environment, the supervised hash method usually depends on a large amount of labeled image data to obtain higher retrieval accuracy, and the performance of the supervised hash method is greatly reduced when only a small amount of labeled data exists. Chinese patent application CN109800314A discloses a method for generating hash codes for image retrieval using a deep convolutional network, in which a hash layer is added before a classification layer, and the output of the hash layer is binarized to obtain the hash codes of images, but in this application, a large amount of labeled data is used to train a hash model to obtain better retrieval performance, but in an actual scene, a large amount of data is labeled, and huge manpower and material resources are consumed. Therefore, a deep semi-supervised hashing method is proposed, which learns a better hash function with a small amount of labeled data and a large amount of unlabeled data.

The existing semi-supervised hash method mainly utilizes the visual similarity of unmarked data and marked data to guide the learning of the unmarked data hash code, and realizes the hash function learning by keeping the visual neighbor relation between unmarked samples and marked samples in the hash space. Therefore, many researchers are trying to construct reliable sample proximity relations. These research efforts can be broadly divided into graph-based approaches and relationship-based approaches. Graph-based methods construct an approximate graph using visual similarity between samples, where nodes on the graph represent labeled data and unlabeled data, and edges on the graph reflect visual similarity between samples. The method based on relationship consistency adopts a self-integration model to generate the integrated features of each sample, and the visual similarity of the integrated features between paired samples is used for representing the semantic similarity relationship between the samples.

At present, the semi-supervised hashing method uses visual similarity among samples to represent semantic similarity among the samples, but the visual similarity cannot reflect the real semantic similarity among the samples, and two samples with similar visual information may come from two different categories. Therefore, guiding the learning of the hash code by using wrong visual similarity can cause the similarity of the hash code learned by the two samples to be inconsistent with the real semantic similarity relationship.

Disclosure of Invention

Aiming at the problems of the existing method, the invention aims to design a semi-supervised image retrieval method and device based on disturbance consistency self-integration.

The technical content of the invention comprises:

a semi-supervised image retrieval method based on disturbance consistency self-integration comprises the following steps:

1) inputting the image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: a convolutional neural network, a hash layer and a disturbance consistency self-integration module, wherein the semi-supervised image feature extraction model is trained by using a small amount of marked data and a large amount of unmarked data as follows:

1.1) training a pre-training convolutional neural network and a hash layer by using a small amount of marked data to obtain a preliminarily trained convolutional neural network and a hash layer;

1.2) maximizing unmarked data x by perturbing the consistency self-integration module _k Hash layer output of h _k And integration features

Training the preliminarily trained convolutional neural network and Hash layer to obtain the trained convolutional neural network and Hash layer, and generating integrated features

Where t is the number of iterations and k is the number of unmarked data, integration features

Through h _k And

obtaining the result by weighted summation;

2) converting the characteristics of the image into a binary hash code with discrete image;

3) and searching according to the binary hash code to obtain an image searching result.

And further, before the marked data and the unmarked data are input into the trained convolutional neural network, respectively acquiring the enhanced data of the marked data and the unmarked data, and training the enhanced data of the marked data and the unmarked data to obtain the semi-supervised image feature extraction model.

Further, the semi-supervised image feature extraction model further comprises a classification layer; before the convolutional neural network and the hash layer which are initially trained through label-free data training, the classification layer is trained by using fc7 features corresponding to the label data to obtain the trained classification layer, wherein the fc7 features are full-connection layer output of the convolutional neural network.

Further, a classification loss function L for classification training is performed _c ＝∑ _j∈L -y _j logf _j Wherein y is _j For marked data x _j True mark of f _j For marked data x _j J is the number of the marked data, and L is the marked data set.

Further, the loss function is maintained by pairwise similarity

The hash layer is trained by the labeled data, wherein S is a semantic similarity matrix,

h _i and h _j Respectively being marked data x _i And x _j And (4) outputting the hash layer.

Further, the disturbance consistency self-integration module further comprises a memory bank (memory bank); will integrate features

And storing the information in the memory bank.

Further, integrating features

Where α is the coefficient of momentum.

Further, by perturbing the consistency loss function

Maximizing unmarked data x _k Hash layer output h of _k And integration features

Where U is the label-free data set, μ is the scaling factor,

alpha is the momentum coefficient.

Further, the method for converting the hash layer output characteristics of the image into the discrete binary hash code of the image comprises the following steps: inputting features of an image into a sign function

In (1).

A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.

An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.

Compared with the prior art, the invention has the following positive effects:

1) by integrating the hash layer characteristics of the same sample under different data enhancement conditions, the distinguishing characteristics of each category can be found;

2) the similarity between the hash layer output of the unmarked data and the corresponding integrated features is maximized through the designed disturbance consistency loss function, and the generalization capability of the unmarked data is fully utilized to improve the network;

3) better search effect can be obtained.

Drawings

FIG. 1 is a diagram of a semi-supervised hashing framework in accordance with the present invention.

Detailed Description

In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for maximizing similarity between hash layer output of unmarked data and corresponding integrated features, which can improve generalization capability of a network, and designs a Self-integrated semi-supervised hash framework (DCSE) based on Disturbance consistency, as shown in FIG. 1. The frame comprises three parts: (1) a backbone network comprising a convolutional neural network, a hash layer, and a classification layer. (2) A pairwise similarity preserving loss function and a classification loss function for learning hash codes and performing image classification on the labeled data sets. (3) The module integrates the network output of the same unmarked sample under different data enhancement conditions to form a global feature, and then maximizes the similarity between the network output of the sample and the corresponding integrated feature by using a designed disturbance consistency loss function.

The specific method is that labeled data and unlabeled data under different data enhancement conditions are input into a neural network to obtain fc7 layer characteristics.

In the marked data stream, the output fc7 feature of the fully-connected layer of the marked data is transferred to a classification layer for classification, and the classification loss function is as follows:

L _c ＝∑ _j∈L -y _j logf _j (1)

wherein y is _j And f _j Is the mark data x _j True labels and classification layer prediction results, L denotes the label dataset. Simultaneously fc7 features of the label data are transferred to the hash layer for hash code learning, and the pairwise similarity preserving loss function is as follows:

wherein

h _i Is the mark data x _i S is the semantic similarity matrix if sample x _i And x _j Have the same class, then S _ij 1, otherwise S _ij ＝0。

In the label-free data stream, a memory space (memory bank) is established for storing the global features of each sample integration, and a novel disturbance consistency loss function L is designed _u To maximize the current unmarked sample x _k Output h of _k And corresponding integration features

The similarity of (c).

Wherein

μ is the scaling factor. The memory bank is then updated using Exponential Moving Average (EMA), i.e. by equation (4).

Wherein

Is x _k At the integration feature of t iterative training, α is the momentum coefficient.

When image retrieval is actually carried out, image features output by a semi-supervised Hash framework Hash layer are input into a symbolic function

And obtaining the binary hash code with the discrete image, and searching according to the binary hash code with the discrete image to obtain an image searching result.

To validate the present invention, we performed a number of experiments to evaluate the search effect of DCSE. Our model was trained and tested on the image dataset CIFAR-10 and NUS-WIDE. Wherein, the CIFAR-10 has 60000 images, and we randomly select 100 images of each class as a query set, and the rest of the pictures as a search set, wherein 500 images of each class are selected as a labeled data set in the search set, and the rest of the pictures are taken as unlabeled data sets. The NUS-WIDE data set contains about 270000 pictures and we select the 21 categories that appear the most, with at least 5000 pictures per category. And then randomly selecting 100 pictures of each type as a query set, and taking the rest pictures as a retrieval set. In the training phase, 500 pieces of labeled data sets are randomly selected from each type in the search set, and the rest labeled data sets are selected. Our base network uses pre-trained VGG 16.

Table 1 shows mAP results on CIFAR-10 and NUS-WIDE for DCSE and other image retrieval methods, including: locality Sensitive Hashing (LSH), iterative quantization (ITQ), Supervised Discrete Hashing (SDH), Convolutional Neural Network Hashing (CNNH), Network In Network Hashing (NINH), semi-supervised deep hashing (SSDH), Bipartite Graph Deep Hashing (BGDH), semi-supervised generative countermeasures hashing (SSGAH), semi-supervised deep pairwise hashing (SSDPH), Generalized Product Quantization (GPQ). The experimental results show that the invention is superior to other comparison methods.

Table 2 shows the results of ablation experiments for DCSE, DCSE-1 being a variation of DCSE removal perturbation consistency self-integrating module. Experimental results show that the disturbance consistency self-integration module provided by the invention obviously improves the semi-supervised retrieval performance.

Table 3 shows the results of an unseen class of experiment in which we used 75% of the classes in the dataset for training and the remaining 25% for testing. Specifically, we divide the data set into 4 parts: train75, test75, train25 and test25, wherein train75 and test75 belong to the 75% of the category in the data set and train25 and test25 belong to the 25% of the category in the data set. We make train75 as the labeled training set, train25 and test75 as the search set, and test25 as the query set. Experimental results show that the method is superior to other comparison methods.

Table 1 mAP results for different bit lengths on two data sets for different methods

TABLE 2 ablation test results

Table 3 results of unsettled type experiments

The above embodiments only express the embodiments of the present invention, and the description is specific, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims

1. A semi-supervised image retrieval method based on disturbance consistency self-integration comprises the following steps:

1) inputting the image into a trained semi-supervised image feature extraction model to obtain the features of the image, wherein the semi-supervised image feature extraction model comprises the following steps: the semi-supervised image feature extraction model is trained by using a small amount of marked data and a large amount of unmarked data as follows:

1.1) inputting marked data and unmarked data under different data enhancement conditions into the convolutional neural network to obtain fc7 layer characteristics, wherein the fc7 layer characteristics are full-connection layer output of the convolutional neural network;

1.2) respectively transferring fc7 layer characteristics of the marked data to the classification layer for classification learning and the hash layer for hash code learning, wherein the loss function of the classification learning is L _c ＝∑ _j∈L -y _j logf _j ，y _j And f _j Is the mark data x _j The true label and classification layer prediction results, L represents the labeled data set, and the loss function of hash code learning is

Parameter(s)

h _i Is the mark data x _i Hash layer output of S _ij Is an element in the semantic similarity matrix S and when marking data x _i And mark data x _j When they are of the same class, S _ij 1, otherwise S _ij ＝0；

1.3) the disturbance consistencySelf-integration module for same unmarked sample x under different data enhancement conditions _k Hash layer output h of _k Performing integration to form a global feature, and maximizing unmarked sample x by using perturbation consistency loss function _k Hash layer output h of _k Similarity to corresponding integrated features, said perturbation consistency loss function

Mu is a scaling factor, U is a label-free data set, and the integration features are updated using exponential moving averages

Alpha is a momentum coefficient, and t represents the iteration number in training;

3) and retrieving according to the binary hash code to obtain an image retrieval result.

2. The method of claim 1, wherein before the labeled data and the unlabeled data are input into the trained convolutional neural network, enhanced data of the labeled data and the unlabeled data are respectively obtained, and the semi-supervised image feature extraction model is obtained through training of the enhanced data of the labeled data and the unlabeled data.

3. The method of claim 1, wherein the perturb-consistent self-integrating module further comprises a memory space; will integrate features

And storing in the storage space.

4. The method of claim 1, wherein converting hash-layer output features of an image into a discrete binary hash code of the image comprises: inputting features of an image into a symbolic function

In (1).

5. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-4.