CN117370594A - Distributed difference self-adaptive image retrieval method based on space-frequency interaction - Google Patents
Distributed difference self-adaptive image retrieval method based on space-frequency interaction Download PDFInfo
- Publication number
- CN117370594A CN117370594A CN202311424869.4A CN202311424869A CN117370594A CN 117370594 A CN117370594 A CN 117370594A CN 202311424869 A CN202311424869 A CN 202311424869A CN 117370594 A CN117370594 A CN 117370594A
- Authority
- CN
- China
- Prior art keywords
- hash
- quantization
- image
- code
- hash quantization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000003993 interaction Effects 0.000 title claims abstract description 11
- 238000013139 quantization Methods 0.000 claims abstract description 141
- 238000009826 distribution Methods 0.000 claims abstract description 46
- 230000005012 migration Effects 0.000 claims abstract description 37
- 238000013508 migration Methods 0.000 claims abstract description 37
- 230000009466 transformation Effects 0.000 claims abstract description 37
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000005457 optimization Methods 0.000 claims abstract description 10
- 238000004821 distillation Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 17
- 229910052739 hydrogen Inorganic materials 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 239000006185 dispersion Substances 0.000 claims description 2
- 230000003044 adaptive effect Effects 0.000 claims 3
- 230000008859 change Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a distributed difference self-adaptive image retrieval method based on space-frequency interaction, which comprises the steps of firstly obtaining an original image for training, and obtaining strong and weak transformation images through data enhancement; then, constructing a deep hash network; the strong transformation image and the weak transformation image are respectively input into a student model and a teacher model to obtain hash quantization codes, and then self-distillation difference quantization loss, hash agent loss and binary cross entropy loss are obtained; then, a distribution migration module is constructed, and the distribution center and the discrete degree of the hash quantization codes extracted by the student model are utilized to migrate the hash quantization codes extracted by the teacher model, so that distribution migration loss is obtained; constructing a frequency component extraction module, extracting frequency domain information of the hash quantization codes through fast Fourier transformation, and extracting frequency components of the hash quantization codes through arc tangent transformation, so as to obtain frequency component loss; and finally, constructing a target optimization function based on all losses, training a student model and a teacher model, and using the trained student model or teacher model for image retrieval. By fully quantizing the difference information between hash codes, the retrieval performance is improved.
Description
Technical Field
The invention belongs to the field of image retrieval in information retrieval, and particularly relates to a distributed difference self-adaptive image retrieval method based on space-frequency interaction.
Background
The image retrieval is a technology for searching and retrieving images matched with the images to be queried through the images to be queried, and mainly aims to obtain the images which are most related to the images to be queried semantically from a large-scale image database in a matching way. The image retrieval has wide application value, plays a key role in a plurality of fields, including image search engines, medical image analysis, safety monitoring and the like, and can help people to access and manage image data more easily, so that the working efficiency and the user experience are improved, and the decision process is shortened. As the size of the database increases, more manpower and time resources are consumed to traverse the images required for searching in the database. The image in the database and the image to be queried are represented through the semantic features, so that the searching problem of the image to be queried in the database is converted into the similarity judging problem among the semantic features, and the searching efficiency can be greatly improved.
The hash algorithm has significant advantages in terms of speed and storage, and is widely used in large-scale image retrieval. The hash algorithm is divided into a traditional hash algorithm and a deep hash algorithm, the early hash algorithm is mostly the traditional hash algorithm, the image characteristic is extracted through a manually designed convolution kernel based on the image characteristic implementation, the matched image in the database is determined according to the similarity between the characteristics, and the image is used as a retrieval result. Compared with the early retrieval mode which depends on manually input metadata and labels, the traditional hash algorithm is easier to realize, but is limited by the problems of manually designed convolution kernels and model depth, and the generated hash code only contains a small amount of semantic information. With the development of deep learning technology, the field of image retrieval has made tremendous progress. By means of the stronger characteristic learning capability of the deep neural network, the image retrieval algorithm based on the deep learning can acquire feature codes containing more high-level semantic information, then in order to realize faster retrieval, the features extracted by the deep neural network are compressed into a Hamming space, and the calculation of the similarity between discrete feature quantization codes is converted into the Hamming distance calculation of binary hash codes.
The current distance quantization mode between hash codes comprises two types of tuple loss and central coding loss, wherein the tuple loss comprises a log loss and a triplet loss. The method comprises the steps of taking a pair of images as a group, converting the extracted image features into codes, taking the distance between the codes as loss, and calculating the coding distance between any group of images has huge time cost because the relation between two samples is only similar and dissimilar, although the similar images are close due to the value loss, the dissimilar images are far away, the problem of unbalanced positive and negative samples caused by the difference of magnitude between the two images is solved, and meanwhile, the relation between the intra-class and the inter-class cannot be obtained. Although the triple loss can alleviate the problem of imbalance of positive and negative samples to a certain extent, the model obtained by training has certain bias and can not acquire the relationship between classes due to the problem of the number of samples in the classes and between the classes. Class center loss is pre-defined or class centers are built through clustering, and coding loss between image pairs is converted into the distance between image coding and class center coding. Compared with tuple loss, class center loss does not need to calculate the distances between all samples in pairs, so that training time is greatly reduced, and meanwhile, the learned codes also have a certain class relation due to the relation between class centers.
The existing image retrieval method based on deep learning is more concerned about how to better quantify the difference between image codes, but how to more fully utilize the intra-class and inter-class relations between images for coding, so that the coding can more fully decouple the class information, and the utilization of the distribution difference between the classes is also important for improving the image retrieval performance.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a distributed difference self-adaptive image retrieval method based on space-frequency interaction.
The invention solves the technical problems by adopting the following technical scheme:
a distributed difference self-adaptive image retrieval method based on space-frequency interaction is characterized by comprising the following steps:
the first step: acquiring an original image for training, and carrying out data enhancement on the original image to obtain a strong transformation image and a weak transformation image;
and a second step of: constructing a deep hash network; the strong transformation image and the weak transformation image are respectively input into a student model and a teacher model of the deep hash network, so that a hash quantization code extracted by the student model and a hash quantization code extracted by the teacher model are obtained; hash quantization coding extracted based on student model and teacher to obtain self-distillation difference quantization loss L Sdh Hash proxy loss L HP And binary cross entropy loss L bce-Q ;
L Sdh =1-cos(H T ,H S ) (1)
L HP =H(y,Softmax(P T /T)) (4)
Wherein H is T 、H S Hash quantization coding representing teacher model and student model extraction, P T For proxy samples, T represents the temperature scale super-parameter, H (-) represents the quantization error between the real class label and the predicted class label of the image, y represents the real class label sequence of the image,a value representing hash coding of 1, < >>Representing the maximum likelihood estimate of the kth hash code, H k Representing hash quantization codes H T K represents the coding length;
thirdly, constructing a distribution migration module, wherein the distribution migration module migrates the hash quantization codes extracted by the teacher model by utilizing the distribution center and the discrete degree of the hash quantization codes extracted by the student model to obtain hash quantization codes after distribution migration, and obtains distribution migration loss by quantifying the difference between the hash quantization codes after distribution migration and the hash quantization codes extracted by the teacher model; distributed migration loss L DIT Expressed as:
L DIT =1-cos(H T ,H T_S ) (12)
wherein H is T_S Hash quantization coding after distribution migration subjected to range constraint;
fourth step: constructing a frequency component extraction module, wherein the frequency component extraction module extracts frequency domain information of the hash quantization code through fast Fourier transform, and extracts frequency components of the hash quantization code through inverse tangent transform;
wherein x represents hash quantization code input by fast fourier transform, F (x) (u, v) is information of the hash quantization code at a frequency domain coordinate (u, v), (H, W) represents a spatial coordinate of the hash quantization code, x (H, W) represents a value of the hash quantization code at the spatial coordinate (H, W), and H and W represent lengths and widths of the hash quantization code;
wherein PH represents the frequency component of the hash quantization code, R (x ') (u, v) is the real part of the hash quantization code at the frequency domain coordinates (u, v), and I (x') (u, v) is the imaginary part of the hash quantization code at the frequency domain coordinates (u, v);
loss of frequency component L ph Expressed as:
L ph =1-cos(PH T ,PH S ) (15)
wherein the pH is T Representing hash quantization codes H T Frequency component, PH S Encoding H for hash quantization S Frequency components of (2);
fifth step: constructing a target optimization function, training a student model and a teacher model, and measuring training loss through the target optimization function; the objective optimization function is:
wherein N is B Lambda is the total number of samples 1 、λ 2 、λ 3 、λ 4 Are all weights;
and inputting the image to be queried into the trained student model or teacher model, and outputting the retrieval image.
Compared with the prior art, the invention has the advantages that:
1. the current image retrieval method based on the self-distillation model transforms the image in a data enhancement mode to change the distribution of the image, and the hash code difference between the transformed image pairs is used for guiding the generation of the hash code, but when the distribution difference between two images is overlarge, the problem that the hash code between the two images cannot be fully utilized due to the fact that the difference information generated by data enhancement cannot be fully quantized is solved, the retrieval performance is affected, and the retrieval accuracy is reduced, so that a distribution migration module is designed, and migrates the hash quantization code extracted by a teacher model by using the distribution center and the discrete degree of the hash quantization code extracted by a student model to obtain the hash quantization code after distribution migration; the similarity between the hash quantization codes after distributed migration and the hash quantization codes extracted by the student model is calculated to assist the difference between the hash quantization codes extracted by the quantization teacher model and the student model, so that distributed difference information generated by data enhancement is more fully utilized, and the retrieval performance is improved.
2. The information transformation of the image is often more reflected in the frequency domain space, and when the current depth hash network carries out quantization of the image coding, only coding quantization in the space domain is considered, but relative transformation of the image is not considered, so the invention designs a frequency component extraction module for analyzing the frequency component of the hash quantization coding, and the generated image coding can contain higher-layer semantic information with more obvious difference by capturing the relative change generated by the image transformation. The frequency component extraction module firstly converts the spatial domain image code into a frequency domain form through fast Fourier transform, then carries out phase analysis through inverse tangent transform, and captures relative change on the frequency domain through quantizing phase difference between codes, thereby more fully quantizing difference information generated by data enhancement.
3. The present invention has been tested on a single tag dataset ImageNet, a multi-tag dataset MS COCO, NUS-WIDE, NUS-wide_m datasets. Experimental results show that compared with the currently popular image retrieval model DHD, the performance of the method is improved under the condition that the encoding lengths 32, 48 and 64 are distributed on a simpler single-label data set, and equivalent results are obtained under the condition that the encoding length is 16. For the multi-label data set, different degrees of promotion are achieved under the condition of different coding lengths, and the result shows that the method has better adaptability and performance in solving the relative change generated by the enhancement form of the self-distilled data, and better retrieval effect is achieved.
Drawings
FIG. 1 is a schematic diagram of the deep hash network of the present invention in a training phase;
FIG. 2 is a schematic diagram of a distributed migration module according to the present invention;
fig. 3 is a schematic diagram of a frequency component extraction module according to the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention refers to the accompanying drawings and detailed description, which are not intended to limit the scope of the invention.
The invention relates to a distributed difference self-adaptive image retrieval method (a method for short, see fig. 1-3) based on space-frequency interaction, which comprises the following steps:
the first step: acquiring original images, wherein a plurality of original images form a data set;
in the embodiment, four data sets of MS COCO, imageNet, NUS-WIDE and NUS-WIDE_M are selected, wherein the ImageNet data set is a single-label data set, the other three data sets are multi-label data sets, and each image is processed to 256 pixels by 256 pixels; each data set is divided into a database, a training set and a testing set, wherein the ImageNet data set comprises 100 categories, and the database, the training set and the testing set respectively comprise 128503, 13000 and 5000 images; the NUS-WIDE and NUS-WIDE-M data sets comprise 21 categories, and the database, the training set and the test set respectively comprise 149736, 10500 and 2100 images; the MS COCO data set comprises 80 categories, and the data base, the training set and the test set respectively comprise 117218, 10000 and 5000 images.
Carrying out data enhancement on an original image through operations such as random cutting, horizontal overturning, gaussian blur, brightness, contrast, saturation transformation and the like, and representing the intensity of overall data enhancement of the image in a probability form to obtain a strong transformation image and a weak transformation image; in practical application, the image to be queried can be transformed, and the image transformation in the practical application process can be simulated through data enhancement.
And a second step of: constructing a deep hash network;
the Deep-Hash-distribution (DHD) network takes a twin network as a basic framework, the Deep-Hash-distribution (DHD) network comprises a student model and a teacher model, parameters between the student model and the teacher model are shared, the student model and the teacher model comprise a feature extraction network and a code generation network, the feature extraction network generally adopts a ResNet50 or AlexNet network, features extracted by the feature extraction network are input into the code generation network, and the code generation network generates Hash quantization codes; the code generation network comprises a full connection layer, a layer normalization layer and a tanh activation function, and the value of the hash quantization code obtained by the layer normalization layer is constrained to be within the range of [ -1,1] through the tanh activation function; the strong and weak transformation images are respectively input into a student model and a teacher model to extract respective hash quantization codes, for a distillation model, the performance of the model can be effectively improved by fixing a single branch, the generation of codes is assisted by quantizing distribution difference information between the two, the DHD network realizes the self-distillation difference quantization loss between the student model and the teacher model through the distribution difference between the strong and weak transformation images, and the calculation is performed through cosine similarity, wherein the formula is as follows:
L Sdh =1-cos(H T ,H S ) (1)
H T =tanh(h T ) (2)
H S =tanh(h S ) (3)
wherein L is Sdh Quantifying loss for self-distillation difference between student model and teacher model, h T 、h S Hash quantization coding representing layer normalization layer output of teacher model and student model, H T 、H S The hash quantization codes extracted by the teacher model and the student model are represented, and the tanh (·) represents a tanh activation function;
in the encoding difference quantization stage, the Hash is quantized and encoded H T And proxy sample P T Performing similarity judgment and proxy sample P T Representing the center of the image sample, the hash proxy loss is calculated as follows:
L HP =H(y,Softmax(P T /T)) (4)
wherein T is a temperature scale super parameter, H (-) represents a quantization error between a real class label and a predicted class label of an image, the predicted class label is obtained through a Softmax function, and y represents a real class label sequence of the image;
the depth hash algorithm reduces quantization errors by minimizing the distance between the hash code and the binary object by means of regression. Since the hash quantization coding quantizes the coding of each bit, the coding quantization is regarded as binary classification, and the coding result of each bit is predicted by the gaussian distribution estimator g (h), and the formula is as follows:
wherein m and sigma are the mean value and standard deviation of Gaussian distribution estimation quantity 9 (h), and the value of m is +1 or-1 whenIn the case where m is +1, when +.>When m is-1;
in summary, the calculation formula of the Binary Cross Entropy (BCE) loss is as follows:
wherein,a value representing hash coding of 1, < >>Representing the maximum likelihood estimate of the kth hash code, H k Representing hash quantization codes H T K represents the coding length;
thirdly, constructing a distribution migration module for quantifying distribution migration loss; the distribution migration module migrates the hash quantization codes extracted by the teacher model by utilizing the distribution center and the discrete degree of the hash quantization codes extracted by the student model to obtain the hash quantization codes after distribution migration, and the distribution migration loss is obtained by quantifying the difference between the hash quantization codes after distribution migration and the hash quantization codes extracted by the teacher model;
the existing depth hash network only considers the transformation difference generated by data enhancement of the image, and the difference between the transformed image is quantized by the teacher model and the student model, and the hash quantization coding of the image is generated by the depth hash network in an auxiliary mode only by directly quantizing the difference information between the hash quantization coding of the teacher model and the hash quantization coding of the student model; however, when there is a large difference in distribution between the strong and weak transformed images, the hash quantization code obtained by directly quantizing the strong and weak transformed images cannot fully utilize the distribution difference information between the strong and weak transformed images, so how to fully utilize the distribution difference between the strong and weak transformed images caused by data enhancement and fully utilize the intra-class and inter-class relationships between the images, so that the hash code with more image difference information constructed by the network is critical for improving the image retrieval performance. Therefore, in the quantization code generation stage, the distribution migration module (Distribution Information Transformation Block) is utilized to guide the hash quantization codes generated by the teacher model through the distribution information of the hash quantization codes extracted by the student model, the hash quantization codes extracted by the teacher model are migrated to obtain the hash quantization codes after distribution migration, and the difference between the hash quantization codes of the student model and the hash quantization codes of the teacher model is extracted in an auxiliary mode through the distribution difference between the hash quantization codes after quantization distribution migration and the hash quantization codes extracted by the teacher model, so that the distribution difference information caused by data enhancement transformation is utilized more fully, and the DIT-Block module enables the distribution difference generated by data enhancement to be focused more.
The input of the DIT-Block module comprises hash quantization codes extracted by a teacher model and a student model, and the range of the hash quantization codes is constrained by a tanh activation function, so that the hash quantization codes subjected to range constraint generate certain information loss, and the hash quantization codes h output by a layer normalization layer are obtained T And h S As input to the DIT-Block module; assuming that the mean and variance of the hash quantization code represent the distribution center and the degree of dispersion of the code, respectively, first, the hash quantization code h is calculated T And h S Is encoded by hash quantization h S Is used for guiding Hash quantization coding h by distribution center and discrete degree T Obtaining the hash quantization codes after distributed migration, wherein the hash quantization codes after distributed migration are used as constraint guide items of the distribution difference between the hash quantization codes extracted by the quantization teacher model and the hash quantization codes extracted by the student model;
in the formula, h T_S Representing the hash quantization code after distribution migration, μ (°) representing the mean value of the hash quantization code, i.e., the distribution center; sigma () represents the variance, i.e., degree of discretization, of the hash quantization code; x is x hw Representing the value of the hash quantization code at coordinates (H, W), H and W representing the length, width of the hash quantization code, e being the offset value;
implementation of the hash quantization coding h by means of a DIT-block module T To obtain hash quantization code h after distribution migration T_S The method comprises the steps of carrying out a first treatment on the surface of the Hash quantization coding h after distributed migration through tanh activation function T_S Performing range constraint to obtain hash quantization code H T_S The method comprises the steps of carrying out a first treatment on the surface of the Encoding H for hash quantization T_S And H is T And the difference quantization, namely the distribution migration loss, is judged through cosine similarity, and the formula is as follows:
L DIT =1-cos(H T ,H T_S ) (12)
fourth step: constructing a frequency component extraction module for quantifying the loss of the frequency component;
the data enhancement causes the image to generate a certain transformation, the current image retrieval network usually only considers the spatial information difference between codes when carrying out coding quantization, but does not consider the frequency domain information of the codes, and the relative difference change between image pairs cannot be obtained by fully utilizing the relative change information caused by the image transformation, so the invention extracts the frequency domain information of the codes through a frequency component extraction module (Frequency Component Extraction Block), and the code H is quantized through the Hash extracted by focusing on a teacher model T Hash quantization code H extracted from student model S The frequency domain information difference between the two images is used for acquiring the influence of data enhancement on the relative transformation relation generated by the images in the hash coding process, so that the retrieval accuracy is improved;
the FCE-Block module extracts frequency domain information of the hash quantization code through fast Fourier transform, and converts the representation of the spatial domain image code into the representation of the frequency domain space through fast Fourier transform;
where x represents the hash quantized code of the fast fourier transform input, F (x) (u, v) is information of the hash quantized code at the frequency domain coordinates (u, v), (h, w) represents the spatial coordinates of the hash quantized code, x (h, w) represents the value of the hash quantized code at the spatial coordinates (h, w),representing the frequency domain coordinates of the hash quantization code;
converting the original representation from a space domain to a frequency domain through fast Fourier transformation, and then carrying out phase analysis through arc tangent transformation to extract frequency components;
wherein PH represents the frequency component of the hash quantization code, R (x ') (u, v) is the real part of the hash quantization code x at the frequency domain coordinates (u, v), and I (x') (u, v) is the imaginary part of the hash quantization code x at the frequency domain coordinates (u, v);
hash quantization coding H T And H S Obtaining a frequency component PH through a frequency component extraction module T And pH (potential of Hydrogen) S The similarity between frequency components is quantized through cosine similarity, so that the frequency domain information difference between the Hash quantized codes extracted by the teacher model and the Hash quantized codes extracted by the student model is focused, and the relative transformation relation of data enhancement transformation on the image is utilized more fully; the frequency component loss calculation formula is:
L ph =1-cos(PH T ,PH S ) (15)
wherein L is ph For a loss of frequency components, the closer the loss value is to 1, the more similar or related the two frequency components;
fifth step: constructing a target optimization function;
joint hash agent penalty L HP Loss of self-distillation difference quantization L Sdh Binary cross entropy loss L bce-Q Distributed migration loss L DIT Loss of frequency component L ph Obtaining a target optimization function:
wherein N is B Lambda is the total number of samples 1 、λ 2 、λ 3 、λ 4 Are weighted, lambda in this embodiment 1 、λ 2 Taking 0.1, lambda when the feature extraction network employs ResNet50 3 Taking 1, when the characteristic extraction network adopts an AlexNet network, lambda 3 Taking 0.7; when the frequency domain difference of the image in the data set is small, for different coding lengths, lambda is adjusted 4 The balance between the frequency domain and the space domain quantization is realized, so that the retrieval effect is optimal;
training a deep hash network by using a data set, and training by adopting small batches of mini-batch samples x i Representing an input image +.>A label corresponding to the input image; the original image of the data set is subjected to data enhancement to obtain a strong transformation image and a weak transformation image, the strong transformation image and the weak transformation image are respectively input into a student model and a teacher model, training loss is measured through a target optimization function until the loss converges, and a trained student model and a trained teacher model are obtained;
and inputting the image to be queried into the trained student model or teacher model, outputting the retrieved image, and completing the image retrieval.
The invention is applicable to the prior art where it is not described.
Claims (4)
1. A distributed difference self-adaptive image retrieval method based on space-frequency interaction is characterized by comprising the following steps:
the first step: acquiring an original image for training, and carrying out data enhancement on the original image to obtain a strong transformation image and a weak transformation image;
and a second step of: constructing a deep hash network; the strong transformation image and the weak transformation image are respectively input into a student model and a teacher model of the deep hash network, so that a hash quantization code extracted by the student model and a hash quantization code extracted by the teacher model are obtained; hash quantization coding extracted based on student model and teacher to obtain self-distillation difference quantization loss L Sdh Hash proxy loss L HP And binary cross entropy loss L bce-Q ;
L sdh =1-cos(H T ,H S ) (1)
L HP =H(y,Softmax(P T /T)) (4)
Wherein H is T 、H S Hash quantization coding representing teacher model and student model extraction, P T For proxy samples, T represents the temperature scale super-parameter, H (-) represents the quantization error between the real class label and the predicted class label of the image, y represents the real class label sequence of the image,a value representing hash coding of 1, < >>Representing the maximum likelihood estimate of the kth hash code, H k Representing hash quantization codes H T K represents the coding length;
thirdly, constructing a distribution migration module, wherein the distribution migration module migrates the hash quantization codes extracted by the teacher model by utilizing the distribution center and the discrete degree of the hash quantization codes extracted by the student model to obtain hash quantization codes after distribution migration, and obtains distribution migration loss by quantifying the difference between the hash quantization codes after distribution migration and the hash quantization codes extracted by the teacher model; distributed migration loss L DIT Expressed as:
L DIT =1-cos(H T ,H T_S ) (12)
wherein H is T_S Hash quantization coding after distribution migration subjected to range constraint;
fourth step: constructing a frequency component extraction module, wherein the frequency component extraction module extracts frequency domain information of the hash quantization code through fast Fourier transform, and extracts frequency components of the hash quantization code through inverse tangent transform;
wherein x represents hash quantization code input by fast fourier transform, F (x) (u, v) is information of the hash quantization code at a frequency domain coordinate (u, v), (H, W) represents a spatial coordinate of the hash quantization code, x (H, W) represents a value of the hash quantization code at the spatial coordinate (H, W), and H and W represent lengths and widths of the hash quantization code;
wherein PH represents the frequency component of the hash quantization code, R (x ') (u, v) is the real part of the hash quantization code at the frequency domain coordinates (u, v), and I (x') (u, v) is the imaginary part of the hash quantization code at the frequency domain coordinates (u, v);
loss of frequency component L ph Expressed as:
l ph =1-cos(PH T ,PH S ) (15)
wherein the pH is T Representing hash quantization codes H T Frequency component, PH S Encoding H for hash quantization S Frequency components of (2);
fifth step: constructing a target optimization function, training a student model and a teacher model, and measuring training loss through the target optimization function; the objective optimization function is:
wherein N is B Lambda is the total number of samples 1 、λ 2 、λ 3 、λ 4 Are all weights;
and inputting the image to be queried into the trained student model or teacher model, and outputting the retrieval image.
2. The distributed difference adaptive image retrieval method based on space-frequency interaction according to claim 1, wherein in the third step, the hash quantization code H T_S Encoded h by distributed migrated hash quantization T_S Obtained by a tanh activation function, and the hash quantization coding h after distributed migration T_S Expressed as:
wherein μ (-) represents the mean value of the hash quantization code, characterizing the distribution center of the hash quantization code; sigma () represents the variance of the hash quantization code, characterizing the degree of dispersion of the hash quantization code; h is a T Is a teacherThe layer of the model normalizes the hash quantization code output by the layer.
3. The distributed difference adaptive image retrieval method based on space-frequency interaction according to claim 1 or 2, wherein the student model and the teacher model each comprise a feature extraction network and a code generation network, the feature extraction network adopts a res net50 or an AlexNet network, and the code generation network comprises a full connection layer, a layer normalization layer and a tanh activation function.
4. A distributed difference adaptive image retrieval method based on space-frequency interaction according to claim 3, wherein in the first step, data enhancement includes random clipping, horizontal flipping, gaussian blurring, and brightness, contrast, saturation transformation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311424869.4A CN117370594A (en) | 2023-10-31 | 2023-10-31 | Distributed difference self-adaptive image retrieval method based on space-frequency interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311424869.4A CN117370594A (en) | 2023-10-31 | 2023-10-31 | Distributed difference self-adaptive image retrieval method based on space-frequency interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117370594A true CN117370594A (en) | 2024-01-09 |
Family
ID=89400187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311424869.4A Pending CN117370594A (en) | 2023-10-31 | 2023-10-31 | Distributed difference self-adaptive image retrieval method based on space-frequency interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117370594A (en) |
-
2023
- 2023-10-31 CN CN202311424869.4A patent/CN117370594A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113743119B (en) | Chinese named entity recognition module, method and device and electronic equipment | |
CN108806718B (en) | Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum | |
CN113392191B (en) | Text matching method and device based on multi-dimensional semantic joint learning | |
CN111524593A (en) | Medical question-answering method and system based on context language model and knowledge embedding | |
CN117408311B (en) | CNN, transformer and transfer learning-based small sample malicious website detection method | |
Peng et al. | Detection of double JPEG compression with the same quantization matrix based on convolutional neural networks | |
CN112270358A (en) | Code annotation generation model robustness improving method based on deep learning | |
López-Cifuentes et al. | Attention-based knowledge distillation in scene recognition: the impact of a dct-driven loss | |
CN114118058A (en) | Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism | |
CN109934248B (en) | Multi-model random generation and dynamic self-adaptive combination method for transfer learning | |
CN117370594A (en) | Distributed difference self-adaptive image retrieval method based on space-frequency interaction | |
Iskra et al. | Temporal convolutional and recurrent networks for image captioning | |
Ma et al. | Perceptual image hashing with bidirectional generative adversarial networks for copy detection | |
Huang et al. | A method for extracting fingerprint feature of communication satellite signal | |
CN114842246B (en) | Social media pressure type detection method and device | |
CN115640577B (en) | Vulnerability detection method and system for binary Internet of things firmware program | |
Onder | Frame similarity detection and frame clustering using variational autoencoders and k-means on news videos from different affinity groups | |
CN118193774A (en) | Image retrieval method based on frequency spectrum information guidance | |
CN117669593B (en) | Zero sample relation extraction method, system, equipment and medium based on equivalent semantics | |
Pei et al. | Drp: Discrete rank pruning for neural network | |
CN117521658B (en) | RPA process mining method and system based on chapter-level event extraction | |
Zhang et al. | Class-based Core Feature Extraction Network for Few-shot Classification | |
CN118820882A (en) | Digital twinning-based fault diagnosis method, system, electronic equipment and medium | |
Yang et al. | Employee Fingerprint Identification Management System Based on Bidirectional ResNext Network with Triplet Loss | |
CN116958585A (en) | Image processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |