CN112364192A

CN112364192A - Zero sample Hash retrieval method based on ensemble learning

Info

Publication number: CN112364192A
Application number: CN202011092264.6A
Authority: CN
Inventors: 赵钰莹; 赖韩江; 印鉴
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-02-12

Abstract

The invention provides a zero sample Hash retrieval method based on ensemble learning, which is used for applying the ensemble learning method to the zero sample picture retrieval problem, extracts high-dimensional real number features of a picture by using VGG-16, and converts the high-dimensional real number features into a low-dimensional binary Hash code by using a full connection layer and an activation function, so that the storage space is reduced on the premise of ensuring the retrieval effect. And then, updating the Hash model by utilizing a training method of ensemble learning, so that the model has stronger generalization capability, and the retrieval effect of the model on the new type of pictures is greatly improved.

Description

Zero sample Hash retrieval method based on ensemble learning

Technical Field

The invention relates to the field of computer vision, in particular to a zero sample hash retrieval method based on ensemble learning.

Background

With the rapid development of the internet, various data are increasing explosively, including various information such as pictures, characters, videos and the like. Therefore, people will spend a lot of time searching for interesting content, when people browse web pages with targets or use mobile phone software, in the face of such huge databases, thousands of pieces of information are displayed on the interface, and it is often difficult to find all target information quickly by naked eyes, so that the retrieval system is produced.

Picture retrieval is an important component of retrieval systems. The hash technique has been used in the field of fast picture retrieval. The picture label is used as supervision information to train the deep neural network, so that the effect of the Hash technology is better. However, new concepts, new pictures, are being generated every day on the network, which brings new challenges for the retrieval system: the zero sample problem, i.e. when the trained model encounters new class pictures that have never been seen, the search effect becomes very poor. In order to solve the problem, the invention provides a zero-sample Hash retrieval method, and the generalization capability of the model is improved by utilizing the idea of ensemble learning, so that the model can have a good retrieval effect when encountering a new type of pictures.

The ensemble learning mainly combines a plurality of weak supervision models to obtain a better and more comprehensive strong supervision model, and combines several machine learning techniques into a meta-algorithm of a prediction model to achieve the effect of reducing variance and deviation or improving prediction. Ensemble learning methods are mainly divided into two main categories: integration with respect to data set, integration with respect to model fusion. The integration of the data sets means that a plurality of data sets are obtained by sampling by using a bootstrapping method, or data distribution is changed by updating the weight of each sample, and a plurality of models are respectively trained and combined, such as Bagging and Boosting methods; integration of model fusion means that a plurality of learners are combined in different ways to obtain a better effect, and for regression problems, an averaging method can be used; for classification problems, a majority voting method may be used.

The patent specification with the application number of 201510200864.2 discloses a quick image retrieval method based on integrated hash coding, which comprises the steps of firstly extracting SIFT characteristics of a training image and a query image, and carrying out initial hash coding on the training image by utilizing M hash algorithms; secondly, relearning the initial Hash coding result by using a consistency constraint rule in the ensemble learning to obtain an ensemble Hash mapping matrix; and finally, carrying out integrated hash coding on the training image and the query image again, and carrying out quick retrieval by calculating the Hamming distance between the query image and the training image on the basis of the integrated hash coding. The integrated hash code in the invention can simultaneously integrate the characteristics and advantages of different hash algorithms, and solves the problems of insufficient discrimination of a single hash algorithm and limitation of application range, thereby enabling the quick retrieval of images to be more accurate and efficient. However, the patent cannot realize that two strategies of dividing the data set and combining the models are combined to improve the generalization capability of the models, and the retrieval effect is greatly improved.

Disclosure of Invention

The invention provides a zero sample hash retrieval method based on ensemble learning, which combines two strategies of dividing a data set and combining a model to improve the generalization capability of the model, and greatly improves the retrieval effect.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a zero sample hash retrieval method based on ensemble learning comprises the following steps:

s1: dividing a training set into two parts A and B with non-overlapping categories according to category labels;

s2: respectively using A, B and A + B as training data, and obtaining a hash code of a training sample through a VGG-16 model and a full connection layer;

s3: obtaining the loss in the training process by utilizing the triple loss;

s4: training an updating network by using an SWA method to obtain a converged model;

s5: 3 different models can be obtained by training the 3 data sets in the step S2, and the models are averaged to obtain a final integrated model;

s6: and calculating the retrieval result of the integration model on the test set.

Further, the specific process of step S1 is:

and dividing the training set into two parts A and B with non-overlapping categories according to the category labels. For the data set cifar10, the 1-9 categories are divided into training sets, and the 10 th category is divided into testing sets. In the training process, the training set needs to be divided into two parts, namely A (types 1-5) and B (types 6-9), according to the categories.

Further, the hash model of step S2 is designed as follows:

s21: first, the data sets A, B and a + B were used as training sets, respectively, to train 3 different models. The specific training steps are as follows;

s22: extracting high-dimensional real number features (4096 dimensions) of the image samples in the training set using the VGG-16 model;

s23: and inputting the high-dimensional real number features obtained in the step S22 into the full-link layer and the tanh activation function to obtain a real number vector v, and binarizing v (elements larger than 0 are set to be 1, and elements smaller than 0 are set to be 0) to obtain a binary code b, namely a hash code. The quantization formula is as follows:

further, the specific steps of the triplet loss in step S3 are as follows:

s31: in each batch's training sample, construct the triple < I, I_pos，I_neg> (where the origin I is a randomly selected sample in the training samples, homogeneous sample point I_posIs a sample of the same class as I, and a heterogeneous sample point I_negIs a different class of sample than I);

s32: the triplet loss calculation formula is as follows:

wherein the hyper-parameter margin represents I and the negative sample I_negDistance of (d) and a positive sample I_posA difference minimum of the distances of (a); the distance between two sample real number features is expressed in terms of euclidean distance.

Further, the training process of SWA in step S4 is:

s41: initializing a feature extraction model by using pre-training model parameters of VGG-16, then randomly initializing the last full-link layer (the full-link layer for obtaining the Hash code) to obtain the initialized weight

And w_swa；

S42: iterating n rounds and training the model;

s43: for the ith iteration, the learning rate and the model weight are updated in sequence, and the updating formula is as follows:

cyclic learning rate:

updating the network weight:

s44: for the ith iteration, if mod (i, c) is 0, where c is a preset hyperparameter and represents the cycle length, the final network weight w is updated by means of moving average_swaThe formula is as follows:

n_models＝i/c

further, the process of obtaining the integration model in step S5 is:

s51: respectively training the model by using the training sets A, B and A + B to obtain three different model weights w₁、w₂And w₃And obtaining the final integrated model weight by using an averaging method:

further, in step S6, the process of calculating the retrieval accuracy (mapp) of the integration model on the test set is as follows:

s61: calculating Hamming distances between the hash codes of the query image and all the image hash codes in the database (the Hamming distance is equal to the Hamming distance obtained by carrying out XOR operation on the hash code words on corresponding bits and summing the Hash code words, namely the number of code elements with different values);

s62: and sorting the hash codes in the database from small to large according to the Hamming distance between the hash codes and the query image, and sequentially judging whether the image and the text belong to the same type or not according to the label, wherein the retrieval is correct when the image and the text belong to the same type, so that the AP value is calculated.

Overall search index:

average accuracy:

wherein i represents the ith test set picture; i is the number of test set pictures. k represents the ranking position in the retrieval list obtained when the ith picture is used as the query picture; p_kFor the accuracy of the first k results, i.e.

(note: if picture a and picture b have the same label, picture a is considered to be related to picture b); rel_kThe picture at the position k is represented as 1 if the picture is related to the retrieval picture; otherwise, it is 0; m represents the number of pictures having the same label as the retrieved picture in the entire sorted list.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention applies the ensemble learning method to the zero sample picture retrieval problem. The VGG-16 is used for extracting the high-dimensional real number features of the picture, and then the full connection layer and the activation function are used for converting the high-dimensional real number features into the low-dimensional binary hash code, so that the storage space is reduced on the premise of ensuring the retrieval effect. And then, updating the Hash model by utilizing a training method of ensemble learning, so that the model has stronger generalization capability, and the retrieval effect of the model on the new type of pictures is greatly improved.

Drawings

FIG. 1 is a flow chart of the algorithm of the present invention;

FIG. 2 is a schematic of the SWA process of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, a zero sample hash retrieval method based on ensemble learning includes the following steps:

s3: obtaining the loss in the training process by utilizing the triple loss;

The specific process of step S1 is:

The hash model of step S2 is designed as follows:

the specific steps of the triplet loss of step S3 are:

s 31: in each batch's training sample, construct the triple < I, I_pos，I_neg> (where the origin I is a randomly selected sample in the training samples, homogeneous sample point I_posIs a sample of the same class as I, and a heterogeneous sample point I_negIs a different class of sample than I);

s32: the triplet loss calculation formula is as follows:

As shown in fig. 2, the training process of SWA in step S4 is:

s41: initializing a feature extraction model by using pre-training model parameters of VGG-16, and then randomly initializing the last full-link layer (the full-link layer for obtaining the Hash code) to obtain an initialNormalized weight

And w_swa；

S42: iterating n rounds and training the model;

cyclic learning rate:

updating the network weight:

n_models＝i/c

the process of obtaining the integration model in step S5 is:

Overall search index:

average accuracy:

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A zero sample hash retrieval method based on ensemble learning is characterized by comprising the following steps:

s3: obtaining the loss in the training process by utilizing the triple loss;

2. The ensemble learning-based zero-sample hash retrieval method according to claim 1, wherein the specific process of step S1 is:

dividing a training set into two non-overlapping parts A and B according to category labels, for a data set cifar10, dividing 1-9 categories into the training set, and dividing the 10 th category into a test set; in the training process, a training set is divided into two parts, namely A (types 1-5) and B (types 6-9), according to categories.

3. The ensemble learning-based zero-sample hash retrieval method according to claim 2, wherein the hash model of the step S2 is designed by:

s21: firstly, respectively using data sets A, B and A + B as training sets to train 3 different models;

s22: extracting high-dimensional real number features of the image samples in the training set by using a VGG-16 model;

s23: inputting the high-dimensional real number features obtained in the step S22 into the full-link layer and the tanh activation function to obtain a real number vector v, binarizing v, and if the elements greater than 0 are set to 1 and the elements less than 0 are set to 0, obtaining a binary code b, namely a hash code, wherein the quantization formula is as follows:

4. the ensemble learning-based zero-sample hash retrieval method according to claim 3, wherein the triplet loss step of step S3 is:

s31: in each batch's training sample, a triplet is constructed<I,I_pos,I_neg>Wherein, the origin I is a randomly selected sample in the training sample, and the homogeneous sample point I_posIs a sample of the same class as I, and a heterogeneous sample point I_negIs a different class of sample than I;

s32: the triplet loss calculation formula is as follows:

5. The ensemble learning-based zero-sample hash retrieval method according to claim 4, wherein the training process of SWA in step S4 is:

And w_swa；

S42: iterating n rounds and training the model;

cyclic learning rate:

updating the network weight:

n_models＝i/c

6. the ensemble learning-based zero-sample hash retrieval method according to claim 5, wherein in step S5, the process of obtaining the ensemble model is:

respectively training the model by using the training sets A, B and A + B to obtain three different model weights w₁、w₂And w₃And obtaining the final integrated model weight by using an averaging method:

7. the ensemble learning-based zero-sample hash retrieval method according to claim 6, wherein in step S6, the process of calculating the retrieval accuracy of the ensemble model on the test set is as follows:

s61: calculating Hamming distances between the query image hash code and all image hash codes in the database;

s62: sorting the hash codes in the database from small to large according to the Hamming distance from the query image, sequentially judging whether the image and the text belong to the same type according to the label, wherein the retrieval is correct when the image and the text belong to the same type, and calculating the AP value according to the result:

overall search index:

average accuracy:

wherein i represents the ith test set picture; i is the number of the test set pictures, and k represents the sequencing position in a retrieval list obtained when the ith picture is used as a query picture; p_kFor the accuracy of the first k results, i.e.

rel_kThe picture at the position k is represented as 1 if the picture is related to the retrieval picture; otherwise, it is 0; m represents the number of pictures having the same label as the retrieved picture in the entire sorted list.

8. The ensemble learning-based zero-sample hash retrieval method according to claim 7, wherein in step S61, the hamming distance is obtained by xoring and summing the hash code words on the corresponding bits, i.e. the number of symbols with different values.

9. The ensemble learning-based zero-sample hash retrieval method according to claim 8, wherein in step S62, if picture a and picture b have the same label, picture a and picture b are related.

10. The ensemble learning-based zero sample hash retrieval method of any one of claims 1 to 9, wherein the dimension of the high-dimensional real numbers for extracting the image samples in the training set using the VGG-16 model is 4096-dimensional.