CN117370594A

CN117370594A - Distributed difference self-adaptive image retrieval method based on space-frequency interaction

Info

Publication number: CN117370594A
Application number: CN202311424869.4A
Authority: CN
Inventors: 张军; 张智铭; 杨召云; 张旭鹏; 高英杰; 王仪诺; 王朝权; 张程
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-09

Abstract

The invention relates to a distributed difference self-adaptive image retrieval method based on space-frequency interaction, which comprises the steps of firstly obtaining an original image for training, and obtaining strong and weak transformation images through data enhancement; then, constructing a deep hash network; the strong transformation image and the weak transformation image are respectively input into a student model and a teacher model to obtain hash quantization codes, and then self-distillation difference quantization loss, hash agent loss and binary cross entropy loss are obtained; then, a distribution migration module is constructed, and the distribution center and the discrete degree of the hash quantization codes extracted by the student model are utilized to migrate the hash quantization codes extracted by the teacher model, so that distribution migration loss is obtained; constructing a frequency component extraction module, extracting frequency domain information of the hash quantization codes through fast Fourier transformation, and extracting frequency components of the hash quantization codes through arc tangent transformation, so as to obtain frequency component loss; and finally, constructing a target optimization function based on all losses, training a student model and a teacher model, and using the trained student model or teacher model for image retrieval. By fully quantizing the difference information between hash codes, the retrieval performance is improved.

Description

Distribution difference adaptive image retrieval method based on space-frequency interaction

技术领域Technical field

本发明属于信息检索中的图像检索领域，具体是一种基于空频交互的分布差异自适应图像检索方法。The invention belongs to the field of image retrieval in information retrieval, and is specifically a distribution difference adaptive image retrieval method based on space-frequency interaction.

背景技术Background technique

图像检索是一种通过待查询图像来查找、检索与之匹配图像的技术，主要目的是从大型图像数据库中匹配得到与待查询图像在语义上最相关的图像。图像检索具有广泛的应用价值，在许多领域都发挥着关键作用，包括图像搜索引擎、医学图像分析、安全监控等，能够帮助人们更轻松地访问和管理图像数据，从而提高工作效率和用户体验，缩短决策进程。随着数据库规模的逐渐增大，在数据库之中遍历搜索所需要的图像，需要更多的人力、时间资源的消耗。而通过语义特征对数据库中的图像和待查询图像进行表示，从而将待查询图像在数据库中的查找问题转化为语义特征之间的相似性判断问题，能够极大程度地提高检索效率。Image retrieval is a technology that searches for and retrieves matching images through the image to be queried. The main purpose is to match the image that is most semantically related to the image to be queried from a large image database. Image retrieval has wide application value and plays a key role in many fields, including image search engines, medical image analysis, security monitoring, etc. It can help people access and manage image data more easily, thereby improving work efficiency and user experience. Shorten the decision-making process. As the size of the database gradually increases, traversing and searching for the images required in the database requires more manpower and time resources. The images in the database and the image to be queried are represented by semantic features, thereby converting the search problem of the image to be queried in the database into a similarity judgment problem between semantic features, which can greatly improve the retrieval efficiency.

哈希算法在速度以及存储方面具有显著优势，故被广泛用于大规模的图像检索中。哈希算法分为传统哈希和深度哈希算法两类，早期的哈希算法多为传统哈希算法，基于图像特征实现，通过手工设计的卷积核提取图像的特征，通过特征之间相似性的大小来确定数据库中与之匹配的图像，将其作为检索结果。相较于早期依赖于人工输入的元数据和标签的检索方式，传统哈希算法更加易于实现，但是受限于手工设计的卷积核以及模型深度的问题，生成的哈希编码只含有少量的语义信息。随着深度学习技术的发展，图像检索领域取得了巨大的进展。借助于深度神经网络更为强大的表征学习能力，基于深度学习的图像检索算法能够获取到包含更多高层语义信息的特征编码，而后为了实现更快速的检索，将深度神经网络提取的特征压缩到海明空间中，将离散的特征量化编码之间的相似性的计算转换为二进制哈希编码的汉明距离计算。The hash algorithm has significant advantages in speed and storage, so it is widely used in large-scale image retrieval. Hash algorithms are divided into two categories: traditional hashing and deep hashing algorithms. Early hashing algorithms are mostly traditional hashing algorithms, implemented based on image features. The features of the image are extracted through manually designed convolution kernels, and the similarity between features is used. The size of the image is used to determine the matching image in the database and use it as the retrieval result. Compared with early retrieval methods that relied on manually input metadata and tags, traditional hashing algorithms are easier to implement. However, they are limited by manually designed convolution kernels and model depth, and the generated hash codes only contain a small amount of semantic information. With the development of deep learning technology, tremendous progress has been made in the field of image retrieval. With the help of the more powerful representation learning ability of deep neural networks, image retrieval algorithms based on deep learning can obtain feature codes containing more high-level semantic information. Then, in order to achieve faster retrieval, the features extracted by deep neural networks are compressed into In Hamming space, the calculation of similarity between discrete feature quantization codes is converted into the calculation of Hamming distance of binary hash codes.

目前哈希编码之间距离量化方式包括元组损失和中心编码损失两类，元组损失包括对值损失和三元组损失。将一对图像作为一组，将提取到的图像特征转化为编码，将编码之间的距离作为损失，由于两个样本之间的关系只有相似和不相似两类，因此虽然对值损失会使得相似图像接近，不相似图像远离，而两者之间量级存在的差异也会导致正负样本不均衡的问题，同时无法获得类内、类间关系，此外计算任意一组图像之间的编码距离也有着巨大的时间开销。虽然三元组损失在一定程度上能够缓解正负样本不均衡的问题，但由于类内、类间样本数量的问题，导致训练得到的模型具有一定的偏向性，同时无法获取类间关系。类中心损失预先按照定义或通过聚类构建类中心，将图像对之间的编码损失转化为图像编码与类中心编码的距离。与元组损失相比，类中心损失不需要对所有样本之间的距离进行两两计算，大大减少了训练时间，同时由于类中心之间的关系，使得学习得到的编码也具有一定的类别关系。Currently, distance quantification methods between hash codes include tuple loss and center coding loss. Tuple loss includes pairwise loss and triplet loss. Treat a pair of images as a group, convert the extracted image features into codes, and use the distance between codes as a loss. Since the relationship between the two samples is only similar and dissimilar, although the pair-value loss will make Similar images are close, and dissimilar images are far away. The difference in magnitude between the two will also lead to the problem of imbalance of positive and negative samples. At the same time, it is impossible to obtain intra-class and inter-class relationships. In addition, calculate the difference between any set of images. Encoding distance also has a huge time overhead. Although triplet loss can alleviate the problem of imbalance between positive and negative samples to a certain extent, due to the problem of the number of samples within and between classes, the trained model has a certain bias and cannot obtain the relationship between classes. The class center loss constructs the class center by definition or through clustering in advance, and converts the encoding loss between image pairs into the distance between the image encoding and the class center encoding. Compared with tuple loss, class center loss does not require pairwise calculation of the distance between all samples, which greatly reduces training time. At the same time, due to the relationship between class centers, the learned encoding also has a certain class relationship. .

现有的基于深度学习的图像检索方法更加关注如何更好地量化图像编码之间的差异，但是如何更充分地利用图像之间的类内和类间关系进行编码，使得编码能够更加充分的解耦类别信息，以及利用类别之间的分布差异对于提升图像检索的性能也很重要。Existing deep learning-based image retrieval methods focus more on how to better quantify the differences between image encodings, but how to more fully utilize the intra-class and inter-class relationships between images for encoding so that the encoding can more fully solve the problem. Coupling category information and exploiting distribution differences between categories are also important to improve the performance of image retrieval.

发明内容Contents of the invention

针对现有技术的不足，本发明拟解决的技术问题是，提供一种基于空频交互的分布差异自适应图像检索方法。In view of the shortcomings of the existing technology, the technical problem to be solved by the present invention is to provide a distribution difference adaptive image retrieval method based on space-frequency interaction.

本发明解决所述技术问题采用如下的技术方案：The present invention solves the technical problems and adopts the following technical solutions:

一种基于空频交互的分布差异自适应图像检索方法，其特征在于，该方法包括如下步骤：A distribution difference adaptive image retrieval method based on space-frequency interaction, characterized in that the method includes the following steps:

第一步：获取训练用的原始图像，对原始图像进行数据增强，得到强变换图像和弱变换图像；The first step: obtain the original image for training, perform data enhancement on the original image, and obtain a strong transformation image and a weak transformation image;

第二步：构建深度哈希网络；将强、弱变换图像分别输入到深度哈希网络的学生模型和教师模型中，得到学生模型提取的哈希量化编码和教师模型提取的哈希量化编码；基于学生模型和教师提取的哈希量化编码得到自蒸馏差异量化损失L_Sdh、哈希代理损失L_HP和二进制交叉熵损失L_bce-Q；Step 2: Construct a deep hash network; input the strong and weak transformed images into the student model and teacher model of the deep hash network respectively to obtain the hash quantization code extracted by the student model and the hash quantization code extracted by the teacher model; Based on the hash quantization encoding extracted by the student model and the teacher, the self-distillation difference quantization loss L _Sdh , the hash proxy loss L _HP and the binary cross-entropy loss L _bce-Q are obtained;

L_Sdh＝1-cos(H_T，H_S) (1)L _Sdh = 1-cos (H _T , H _S ) (1)

L_HP＝H(y，Softmax(P_T/T)) (4)L _HP =H(y,Softmax(P _T /T)) (4)

式中，H_T、H_S表示教师模型和学生模型提取的哈希量化编码，P_T为代理样本，T表示温度标度超参数，H(·)表示图像的真实类别标签与预测类别标签之间的量化误差，y表示图像的真实类别标签序列，表示哈希编码的值为1，/>表示第k个哈希编码的极大似然估计值，H_k表示哈希量化编码H_T的第k位，K表示编码长度；In the formula, H _T and H _S represent the hash quantization codes extracted by the teacher model and the student model, P _T is the proxy sample, T represents the temperature scale hyperparameter, and H(·) represents the difference between the real category label and the predicted category label of the image. The quantified error between , y represents the real category label sequence of the image, Indicates that the value of hash encoding is 1, /> Represents the maximum likelihood estimate of the k-th hash code, H _k represents the k-th bit of the hash quantization code H _T , and K represents the coding length;

第三步、构建分布迁移模块，分布迁移模块利用学生模型提取的哈希量化编码的分布中心和离散程度对教师模型提取的哈希量化编码进行迁移，得到分布迁移后的哈希量化编码，通过量化分布迁移后的哈希量化编码和教师模型提取的哈希量化编码之间的差异，得到分布迁移损失；分布迁移损失L_DIT表示为：The third step is to build a distribution migration module. The distribution migration module uses the distribution center and discrete degree of the hash quantization code extracted by the student model to migrate the hash quantization code extracted by the teacher model, and obtains the hash quantization code after distribution migration. Quantify the difference between the hash quantization coding after distribution migration and the hash quantization coding extracted by the teacher model to obtain the distribution migration loss; the distribution migration loss L _DIT is expressed as:

L_DIT＝1-cos(H_T，H_{T_S}) (12)L _DIT =1-cos( _HT , _{HT_S} ) (12)

式中，H_{T_S}为经过范围约束的分布迁移后的哈希量化编码；In the formula, _{HT_S} is the hash quantization code after range-constrained distribution migration;

第四步：构建频率成分提取模块，频率成分提取模块通过快速傅里叶变换提取哈希量化编码的频域信息，再通过反正切变换提取哈希量化编码的频率成分；Step 4: Construct a frequency component extraction module. The frequency component extraction module extracts the frequency domain information of hash quantization coding through fast Fourier transform, and then extracts the frequency component of hash quantization coding through arctangent transformation;

其中，x表示快速傅里叶变换输入的哈希量化编码，F(x)(u，v)为哈希量化编码在频域坐标(u，v)处的信息，(h，w)表示哈希量化编码的空域坐标，x(h，w)表示哈希量化编码在空域坐标(h，w)处的值，H和W表示哈希量化编码的长度、宽度；Where, The spatial coordinates of the hash quantization code, x(h, w) represents the value of the hash quantization code at the spatial domain coordinates (h, w), H and W represent the length and width of the hash quantization code;

其中，PH表示哈希量化编码的频率成分，R(x′)(u，v)为哈希量化编码在频域坐标(u，v)处的实部，I(x′)(u，v)为哈希量化编码在频域坐标(u，v)处的虚部；Among them, PH represents the frequency component of hash quantization coding, R(x′)(u, v) is the real part of hash quantization coding at frequency domain coordinates (u, v), I(x′)(u, v) ) is the imaginary part of the hash quantization code at the frequency domain coordinates (u, v);

频率成分损失L_ph表示为：The frequency component loss L _ph is expressed as:

L_ph＝1-cos(PH_T，PH_S) (15)L _ph =1-cos (PH _T , PH _S ) (15)

其中，PH_T表示哈希量化编码H_T的频率成分，PH_S为哈希量化编码H_S的频率成分；Among them, PH _T represents the frequency component of the hash quantization code _HT , and PH _S is the frequency component of the hash quantization code _HS ;

第五步：构建目标优化函数，对学生模型和教师模型进行训练，通过目标优化函数衡量训练损失；目标优化函数为：Step 5: Construct the objective optimization function, train the student model and teacher model, and measure the training loss through the objective optimization function; the objective optimization function is:

其中，N_B为样本总数，λ₁、λ₂、λ₃、λ₄均为权重；Among them, N _B is the total number of samples, λ ₁ , λ ₂ , λ ₃ , and λ ₄ are all weights;

将待查询图像输入到训练后的学生模型或教师模型中，输出检索图像。Input the image to be queried into the trained student model or teacher model, and output the retrieved image.

与现有技术相比，本发明的优点和有益效果是：Compared with the prior art, the advantages and beneficial effects of the present invention are:

1、目前的基于自蒸馏模型的图像检索方法通过数据增强的方式对图像进行变换，使图像的分布产生变化，通过量化图像对之间的哈希编码差异引导哈希编码的生成，但当两张图像之间的分布差异过大时，直接量化两者之间的哈希编码将导致数据增强产生的差异信息无法被充分利用的问题，差异信息无法被充分量化会影响检索性能，降低检索的准确度，因此设计了分布迁移模块，分布迁移模块利用学生模型提取的哈希量化编码的分布中心和离散程度对教师模型提取的哈希量化编码进行迁移，得到分布迁移后的哈希量化编码；通过计算分布迁移后的哈希量化编码与学生模型提取的哈希量化编码之间的相似性，来辅助量化教师模型和学生模型提取到的哈希量化编码之间的差异，从而更充分地利用数据增强产生的分布差异信息，提升检索性能。1. The current image retrieval method based on the self-distillation model transforms the image through data enhancement to change the distribution of the image, and guides the generation of the hash code by quantifying the hash code difference between the image pairs. However, when the two When the distribution difference between images is too large, directly quantifying the hash coding between the two will lead to the problem that the difference information generated by data enhancement cannot be fully utilized. The difference information cannot be fully quantified, which will affect the retrieval performance and reduce the retrieval efficiency. Accuracy, so a distribution migration module is designed. The distribution migration module uses the distribution center and discrete degree of the hash quantization code extracted by the student model to migrate the hash quantization code extracted by the teacher model, and obtains the hash quantization code after distribution migration; By calculating the similarity between the hash quantization code after distribution migration and the hash quantization code extracted by the student model, we can assist in quantifying the difference between the hash quantization code extracted by the teacher model and the student model, so as to make full use of The distribution difference information generated by data enhancement improves retrieval performance.

2、图像的信息变换往往在频域空间中有更多的体现，目前的深度哈希网络在进行图像编码的量化时，仅仅考虑空域中的编码量化，而不考虑图像本身的相对变换，故本发明设计了频率成分提取模块，用于对哈希量化编码的频率成分进行分析，通过捕获图像变换产生的相对变化，使得生成的图像编码能够包含差异更明显的高层语义信息。频率成分提取模块首先通过快速傅里叶变换将空域的图像编码转换为频域形式，再通过反正切变换进行相位分析，通过量化编码之间的相位差异来捕获频域上的相对变化，从而更加充分地量化数据增强产生的差异信息。2. The information transformation of images is often more reflected in the frequency domain space. When quantizing image coding, the current deep hash network only considers the coding quantization in the spatial domain, without considering the relative transformation of the image itself. Therefore, The present invention designs a frequency component extraction module to analyze the frequency components of hash quantization coding. By capturing the relative changes caused by image transformation, the generated image coding can contain higher-level semantic information with more obvious differences. The frequency component extraction module first converts the image coding in the spatial domain into the frequency domain form through fast Fourier transform, and then performs phase analysis through arctangent transformation, and captures the relative changes in the frequency domain by quantizing the phase difference between codes, thereby making it more precise. Fully quantify the differential information produced by data augmentation.

3.本发明在单标签数据集ImageNet以及多标签数据集MS COCO、NUS-WIDE、NUS-WIDE_M数据集上进行了实验。实验结果显示，相较于当前流行的图像检索模型DHD，本发明在分布更为简单的单标签数据集上编码长度32、48、64的情况下性能取得了提升，编码长度为16的情况下获得了相当的结果。对于多标签数据集，在不同编码长度的情况下均取得了不同程度的提升，结果表明本发明方法在解决自蒸馏数据增强形式产生的相对变化方面具有更好的适应性和性能，取得了更好的检索效果。3. The present invention conducted experiments on the single-label data set ImageNet and the multi-label data set MS COCO, NUS-WIDE, and NUS-WIDE_M data sets. Experimental results show that compared with the currently popular image retrieval model DHD, the present invention has improved performance when the coding length is 32, 48, and 64 on a single-label data set with a simpler distribution. When the coding length is 16 Comparable results were obtained. For multi-label data sets, various improvements have been achieved under different encoding lengths. The results show that the method of the present invention has better adaptability and performance in solving the relative changes caused by the self-distillation data enhancement form, and has achieved better results. Good search results.

附图说明Description of the drawings

图1为本发明的深度哈希网络在训练阶段的结构示意图；Figure 1 is a schematic structural diagram of the deep hash network of the present invention in the training stage;

图2为本发明的分布迁移模块的原理图；Figure 2 is a schematic diagram of the distributed migration module of the present invention;

图3为本发明的频率成分提取模块的原理图。Figure 3 is a schematic diagram of the frequency component extraction module of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实现方式对本发明的技术方案进行详细说明，并不以此限定本申请的保护范围。The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific implementation modes, but this does not limit the scope of protection of the present application.

本发明为一种基于空频交互的分布差异自适应图像检索方法(简称方法，参见图1～3)，包括如下步骤：The present invention is a distribution difference adaptive image retrieval method based on space-frequency interaction (referred to as method, see Figures 1 to 3), which includes the following steps:

第一步：获取原始图像，若干张原始图像组成数据集；The first step: obtain the original image, and several original images form a data set;

本实施例选取MS COCO、ImageNet、NUS-WIDE、NUS-WIDE_M四种数据集，ImageNet数据集为单标签数据集，其余三个都是多标签数据集，每张图像均被处理为256*256像素；每个数据集分为数据库、训练集和测试集，ImageNet数据集包含100个类别，其中数据库、训练集、测试集分别包含了128503、13000、5000张图像；NUS-WIDE与NUS-WIDE-M数据集均包含21个类别，数据库、训练集、测试集分别包含了149736、10500、2100张图像；MS COCO数据集包含80个类别，数据库、训练集、测试集分别包含了117218、10000、5000张图像。This embodiment selects four data sets: MS COCO, ImageNet, NUS-WIDE, and NUS-WIDE_M. The ImageNet data set is a single-label data set, and the other three are multi-label data sets. Each image is processed into 256*256 Pixels; each data set is divided into database, training set and test set. The ImageNet data set contains 100 categories, of which the database, training set and test set contain 128503, 13000 and 5000 images respectively; NUS-WIDE and NUS-WIDE The -M data set contains 21 categories, and the database, training set, and test set contain 149,736, 10,500, and 2,100 images respectively; the MS COCO data set contains 80 categories, and the database, training set, and test set contain 117,218, and 10,000 images respectively. , 5000 images.

通过随机裁剪、水平翻转、高斯模糊以及亮度、对比度、饱和度变换等操作对原始图像进行数据增强，以概率的形式表征图像总体数据增强的强弱，得到强变换图像和弱变换图像；在实际应用中，待查询图像可能是变换后的，通过数据增强也可以模实际应用过程中的图像变换。The original image is data enhanced through random cropping, horizontal flipping, Gaussian blur, brightness, contrast, saturation transformation and other operations, and the strength of the overall image data enhancement is characterized in the form of probability, and strong transformation images and weak transformation images are obtained; in practice In the application, the image to be queried may be transformed, and the image transformation in the actual application process can also be simulated through data enhancement.

第二步：构建深度哈希网络；Step 2: Build a deep hash network;

深度哈希网络(Deep-Hash-Distillation，DHD)以孪生网络作为基础框架，包括学生模型和教师模型，学生模型和教师模型之间的参数共享，学生模型和教师模型均包括特征提取网络和编码生成网络两部分，特征提取网络通常采用ResNet50或AlexNet网络，特征提取网络提取的特征输入到编码生成网络中，编码生成网络生成哈希量化编码；编码生成网络包括全连接层、层归一化层和tanh激活函数，通过tanh激活函数将层归一化层得到的哈希量化编码的值约束到[-1，1]范围内；将强、弱变换图像分别输入学生模型和教师模型中提取各自的哈希量化编码，对于蒸馏模型而言，固定单分支能够有效提升模型的性能，通过量化两者之间的分布差异信息来辅助编码的生成，DHD网络通过强、弱变换图像之间的分布差异实现学生模型与教师模型之间的自蒸馏差异量化损失，通过余弦相似性进行计算，公式如下：Deep-Hash-Distillation (DHD) uses twin network as the basic framework, including student model and teacher model, parameter sharing between student model and teacher model, both student model and teacher model include feature extraction network and coding There are two parts of the generation network. The feature extraction network usually uses ResNet50 or AlexNet network. The features extracted by the feature extraction network are input into the encoding generation network, and the encoding generation network generates hash quantization encoding; the encoding generation network includes a fully connected layer and a layer normalization layer. and tanh activation function. The tanh activation function constrains the value of the hash quantization code obtained by the layer normalization layer to the range of [-1, 1]; input the strong and weak transformed images into the student model and the teacher model respectively to extract their respective Hash quantization coding. For distillation models, fixed single branches can effectively improve the performance of the model and assist the generation of codes by quantifying the distribution difference information between the two. The DHD network transforms the distribution between images by strong and weak Difference realizes the self-distillation difference quantification loss between the student model and the teacher model, calculated through cosine similarity, the formula is as follows:

L_Sdh＝1-cos(H_T，H_S) (1)L _Sdh = 1-cos (H _T , H _S ) (1)

H_T＝tanh(h_T) (2)H _T =tanh(h _T ) (2)

H_S＝tanh(h_S) (3)H _S =tanh(h _S ) (3)

式中，L_Sdh为学生模型与教师模型之间的自蒸馏差异量化损失，h_T、h_S表示教师模型和学生模型的层归一化层输出的哈希量化编码，H_T、H_S表示教师模型和学生模型提取的哈希量化编码，tanh(·)表示tanh激活函数；In the formula, L _Sdh is the self-distillation difference quantization loss between the student model and the teacher model, h _T and h _S represent the hash quantization codes output by the layer normalization layer of the teacher model and the student model, and H _T and H _S represent The hash quantization encoding extracted by the teacher model and the student model, tanh(·) represents the tanh activation function;

在编码差异量化阶段，将哈希量化编码H_T与代理样本P_T进行相似度判断，代理样本P_T表示图像样本中心，则哈希代理损失的计算公式如下：In the coding difference quantification stage, the hash quantized code H _T and the proxy sample P _T are judged for similarity. The proxy sample P _T represents the center of the image sample. The calculation formula of the hash proxy loss is as follows:

L_HP＝H(y，Softmax(P_T/T)) (4)L _HP =H(y,Softmax(P _T /T)) (4)

其中，T是温度标度超参数，H(·)表示图像的真实类别标签与预测类别标签之间的量化误差，通过Softmax函数得到预测类别标签，y表示图像的真实类别标签序列；Among them, T is the temperature scale hyperparameter, H(·) represents the quantified error between the real category label of the image and the predicted category label. The predicted category label is obtained through the Softmax function, and y represents the real category label sequence of the image;

深度哈希算法通过回归的方式最小化哈希编码与二进制目标之间的距离来减小量化误差。由于哈希量化编码会对每个位的编码分别进行量化，因此将编码量化视为二进制分类，通过高斯分布估计量g(h)对各个位的编码结果进行预测，公式如下所示：The deep hashing algorithm reduces the quantization error by minimizing the distance between the hash code and the binary target through regression. Since hash quantization coding quantizes the coding of each bit separately, coding quantization is regarded as binary classification, and the coding result of each bit is predicted by the Gaussian distribution estimator g(h). The formula is as follows:

其中，m、σ为高斯分布估计量9(h)的均值和标准差，m的取值为+1或-1，当时，m取+1，当/>时，m取-1；Among them, m and σ are the mean and standard deviation of the Gaussian distribution estimator 9(h), and the value of m is +1 or -1. When When , m takes +1, when/> When , m takes -1;

综上，二进制交叉熵(BCE)损失的计算公式如下所示：To sum up, the calculation formula of binary cross entropy (BCE) loss is as follows:

其中，表示哈希编码的值为1，/>表示第k个哈希编码的极大似然估计值，H_k表示哈希量化编码H_T的第k位，K表示编码长度；in, Indicates that the value of hash encoding is 1, /> Represents the maximum likelihood estimate of the k-th hash code, H _k represents the k-th bit of the hash quantization code H _T , and K represents the coding length;

第三步、构建分布迁移模块，用于量化分布迁移损失；分布迁移模块利用学生模型提取的哈希量化编码的分布中心和离散程度对教师模型提取的哈希量化编码进行迁移，得到分布迁移后的哈希量化编码，通过量化分布迁移后的哈希量化编码和教师模型提取的哈希量化编码之间的差异，得到分布迁移损失；The third step is to construct a distribution migration module to quantify the distribution migration loss; the distribution migration module uses the distribution center and discrete degree of the hash quantization code extracted by the student model to migrate the hash quantization code extracted by the teacher model, and obtains the distribution migration Hash quantization coding, by quantifying the difference between the hash quantization coding after distribution migration and the hash quantization coding extracted by the teacher model, the distribution migration loss is obtained;

现有的深度哈希网络只考虑了图像进行数据增强产生的变换差异，通过教师模型和学生模型量化变换图像之间的差异，仅仅通过直接量化教师模型和学生模型的哈希量化编码之间的差异信息，来辅助深度哈希网络生成图像的哈希量化编码；但是，当强、弱变换图像之间分布存在较大差异时，直接量化强、弱变换图像得到的哈希量化编码无法充分利用强、弱变换图像之间的分布差异信息，因此如何充分利用由于数据增强导致的强、弱变换图像之间的分布差异，充分利用图像之间的类内和类间关系，使网络构建出具有更多图像差异信息的哈希编码对于提升图像检索性能十分关键。因此，本发明在量化编码生成阶段，利用分布迁移模块(Distribution Information Transformation Block)通过学生模型提取的哈希量化编码的分布信息引导教师模型生成的哈希量化编码，对教师模型提取的哈希量化编码进行迁移，得到分布迁移后的哈希量化编码，通过量化分布迁移后的哈希量化编码与教师模型提取的哈希量化编码之间的分布差异，来辅助提取学生模型的哈希量化编码与教师模型的哈希量化编码之间的差异，从而更充分地利用由于数据增强变换带来的分布差异信息，DIT-Block模块使数据增强产生的分布差异得到更多的关注。The existing deep hash network only considers the transformation difference caused by data enhancement of the image, quantifies the difference between the transformed images through the teacher model and the student model, and only directly quantifies the difference between the hash quantization codes of the teacher model and the student model. Difference information is used to assist the deep hash network to generate hash quantization coding of images; however, when there is a large difference in distribution between strong and weak transformation images, the hash quantization coding obtained by directly quantizing strong and weak transformation images cannot be fully utilized. The distribution difference information between strong and weakly transformed images, so how to make full use of the distribution difference between strong and weakly transformed images due to data enhancement, make full use of the intra-class and inter-class relationships between images, so that the network can construct a network with Hash encoding of more image difference information is critical to improving image retrieval performance. Therefore, in the quantized code generation stage, the present invention uses the distribution information transformation module (Distribution Information Transformation Block) to guide the hash quantized code generated by the teacher model through the distribution information of the hash quantized code extracted by the student model, and quantizes the hash quantized code extracted by the teacher model. The code is migrated to obtain the hash quantized code after distribution migration. By quantifying the distribution difference between the hash quantization code after distribution migration and the hash quantization code extracted by the teacher model, it is used to assist in extracting the hash quantization code and the hash quantization code of the student model. The hash of the teacher model quantifies the difference between encodings, thereby more fully utilizing the distribution difference information caused by data augmentation transformation, and the DIT-Block module allows the distribution difference caused by data augmentation to receive more attention.

DIT-Block模块的输入包括教师模型和学生模型提取的哈希量化编码，由于tanh激活函数对哈希量化编码的范围进行了约束，使得范围约束后的哈希量化编码产生了一定的信息丢失，因此将层归一化层输出的哈希量化编码h_T和h_S作为DIT-Block模块的输入；假设哈希量化编码的均值和方差分别表示编码的分布中心以及离散程度，首先，计算哈希量化编码h_T和h_S的均值和方差，通过哈希量化编码h_S的分布中心以及离散程度来引导哈希量化编码h_T的迁移，得到分布迁移后的哈希量化编码，分布迁移后的哈希量化编码作为量化教师模型提取的哈希量化编码与学生模型提取的哈希量化编码之间分布差异的约束引导项；The input of the DIT-Block module includes the hash quantization codes extracted by the teacher model and the student model. Since the tanh activation function constrains the range of the hash quantization codes, the hash quantization codes after the range constraints produce a certain amount of information loss. Therefore, the hash quantization codes h _T and h _S output by the layer normalization layer are used as the input of the DIT-Block module; assuming that the mean and variance of the hash quantization code represent the distribution center and discreteness of the code respectively, first, calculate the hash The mean and variance of the quantized codes h _T and h _S are used to guide the migration of the hash quantized codes h _T through the distribution center and discrete degree of the hash quantized codes h _S to obtain the hash quantized codes after distribution migration. Hash quantization coding serves as a constraint guidance term that quantifies the distribution difference between the hash quantization coding extracted by the teacher model and the hash quantization coding extracted by the student model;

式中，h_{T_S}表示分布迁移后的哈希量化编码，μ(.)表示哈希量化编码的均值，即分布中心；σ(.)表示哈希量化编码的方差，即离散程度；x_hw表示哈希量化编码在坐标(h，w)处的值，H和W表示哈希量化编码的长度、宽度，∈为偏置值；In the formula, h _{T_S} represents the hash quantization code after distribution migration, μ(.) represents the mean value of the hash quantization code, that is, the distribution center; σ(.) represents the variance of the hash quantization code, that is, the degree of discreteness; x _hw represents The value of the hash quantization code at coordinates (h, w), H and W represent the length and width of the hash quantization code, ∈ is the offset value;

通过DIT-block模块实现了关于哈希量化编码h_T的分布迁移，得到分布迁移后的哈希量化编码h_{T_S}；通过tanh激活函数对分布迁移后的哈希量化编码h_{T_S}进行范围约束，得到哈希量化编码H_{T_S}；对于哈希量化编码H_{T_S}与H_T之间的差异量化，即分布迁移损失，通过余弦相似性进行判断，公式如下：The distribution migration of the hash quantization code h _T is implemented through the DIT-block module, and the hash quantization code h _{T_S} after the distribution migration is obtained; the range constraint of the hash quantization code h _{T_S} after the distribution migration is obtained by using the tanh activation function. Hash quantization code _{HT_S} ; for the difference quantification between hash quantization code _{HT_S} and _HT , that is, distribution migration loss, it is judged by cosine similarity, the formula is as follows:

L_DIT＝1-cos(H_T，H_{T_S}) (12)L _DIT =1-cos( _HT , _{HT_S} ) (12)

第四步：构建频率成分提取模块，用于量化频率成分损失；Step 4: Construct a frequency component extraction module to quantify frequency component loss;

数据增强使得图像产生了一定的变换，目前的图像检索网络通常在进行编码量化时仅仅考虑编码之间的空域信息差异，而不考虑编码的频域信息，导致无法充分利用图像变换导致的相对变化信息获取图像对之间的相对差异变化，因此本发明通过频率成分提取模块(Frequency Component Extraction Block)提取编码的频域信息，通过关注教师模型提取的哈希量化编码H_T与学生模型提取的哈希量化编码H_S之间的频域信息差异，来获取数据增强对图像产生的相对变换关系在哈希编码过程中造成的影响，进而提高检索的精确度；Data enhancement causes a certain transformation in the image. Current image retrieval networks usually only consider the difference in spatial domain information between codes when performing coding quantization, without considering the frequency domain information of coding, resulting in the inability to fully utilize the relative changes caused by image transformation. The information obtains the relative difference change between the image pairs. Therefore, the present invention extracts the frequency domain information of the encoding through the frequency component extraction module (Frequency Component Extraction Block), and pays attention to the hash quantization encoding H _T extracted by the teacher model and the hash extracted by the student model. The difference in frequency domain information between _HS quantization codes is used to obtain the impact of the relative transformation relationship caused by data enhancement on the image in the hash coding process, thereby improving the accuracy of retrieval;

FCE-Block模块通过快速傅里叶变换提取哈希量化编码的频域信息，通过快速傅里叶变换将空域的图像编码表征转换为频域空间的表征；The FCE-Block module extracts the frequency domain information of hash quantization coding through fast Fourier transform, and converts the image coding representation in the spatial domain into the representation of frequency domain space through fast Fourier transform;

其中，x表示快速傅里叶变换输入的哈希量化编码，F(x)(u，v)为哈希量化编码在频域坐标(u，v)处的信息，(h，w)表示哈希量化编码的空域坐标，x(h，w)表示哈希量化编码在空域坐标(h，w)处的值，表示哈希量化编码的频域坐标；Where, The spatial coordinate of the hash quantization code, x(h, w) represents the value of the hash quantization code at the spatial coordinate (h, w), Represents the frequency domain coordinates of hash quantization coding;

通过快速傅里叶变换将原始的表征中从空域上转换到频域上，再通过反正切变换进行相位分析，提取频率成分；The original representation is converted from the spatial domain to the frequency domain through fast Fourier transform, and then phase analysis is performed through arctangent transform to extract the frequency component;

其中，PH表示哈希量化编码的频率成分，R(x′)(u，v)为哈希量化编码x在频域坐标(u，v)处的实部，I(x′)(u，v)为哈希量化编码x在频域坐标(u，v)处的虚部；Among them, PH represents the frequency component of the hash quantization code, R(x′)(u, v) is the real part of the hash quantization code x at the frequency domain coordinate (u, v), I(x′)(u, v) is the imaginary part of hash quantization encoding x at frequency domain coordinates (u, v);

哈希量化编码H_T和H_S经过频率成分提取模块，得到频率成分PH_T和PH_S，通过余弦相似性来量化频率成分之间的相似性，从而关注教师模型提取的哈希量化编码与学生模型提取的哈希量化编码之间的频域信息差异，更充分的利用数据增强变换对图像产生的相对变换关系；频率成分损失计算公式为：The hash quantization codes _HT and _HS pass through the frequency component extraction module to obtain the frequency components PH _T and PH _S. The similarity between the frequency components is quantified through cosine similarity, thereby focusing on the hash quantization code extracted by the teacher model and the student The frequency domain information difference between the hash quantization codes extracted by the model can more fully utilize the relative transformation relationship produced by the data enhancement transformation on the image; the frequency component loss calculation formula is:

L_ph＝1-cos(PH_T，PH_S) (15)L _ph =1-cos (PH _T , PH _S ) (15)

其中，L_ph为频率成分损失，损失值越接近1，表示两个频率成分越相似或相关；Among them, L _ph is the frequency component loss. The closer the loss value is to 1, the more similar or relevant the two frequency components are;

第五步：构建目标优化函数；Step 5: Construct the objective optimization function;

联合哈希代理损失L_HP、自蒸馏差异量化损失L_Sdh、二进制交叉熵损失L_bce-Q、分布迁移损失L_DIT以及频率成分损失L_ph，得到目标优化函数：Combined hash proxy loss L _HP , self-distillation difference quantization loss L _Sdh , binary cross-entropy loss L _bce-Q , distribution transfer loss L _DIT and frequency component loss L _ph , the objective optimization function is obtained:

其中，N_B为样本总数，λ₁、λ₂、λ₃、λ₄均为权重，本实施例中λ₁、λ₂取0.1，当特征提取网络采用ResNet50时，λ₃取1，当特征提取网络采用AlexNet网络时，λ₃取0.7；当数据集中图像的频域差异较小时，对于不同的编码长度，通过调节λ₄实现频域和空域量化之间的平衡，使得检索效果达到最优；Among them, N _B is the total number of samples, λ ₁ , λ ₂ , λ ₃ , and λ ₄ are all weights. In this embodiment, λ ₁ and λ ₂ take 0.1. When the feature extraction network uses ResNet50, λ ₃ takes 1. When the feature When the extraction network uses the AlexNet network, λ ₃ is taken to be 0.7; when the frequency domain difference of the images in the data set is small, for different encoding lengths, the balance between frequency domain and spatial domain quantization is achieved by adjusting λ ₄ , so that the retrieval effect is optimized. ;

使用数据集训练深度哈希网络，采用小批次的mini-batch样本进行训练，样本 x_i表示输入图像，/>表示输入图像对应的标签；数据集的原始图像经过数据增强得到强变换图像和弱变换图像，强、弱变换图像分别输入到学生模型和教师模型中，通过目标优化函数衡量训练损失，直至损失收敛，得到训练后的学生模型和教师模型；Use the data set to train the deep hash network, and use small batches of mini-batch samples for training. Samples x _i represents the input image, /> Indicates the label corresponding to the input image; the original image of the data set is enhanced through data to obtain a strong transformation image and a weak transformation image. The strong and weak transformation images are input into the student model and the teacher model respectively, and the training loss is measured through the objective optimization function until the loss converges. , obtain the trained student model and teacher model;

将待查询图像输入到训练后的学生模型或教师模型中，输出检索得到的图像，完成图像检索。Input the image to be queried into the trained student model or teacher model, output the retrieved image, and complete the image retrieval.

本发明未述及之处适用于现有技术。The parts not described in the present invention are applicable to the existing technology.

Claims

1. A distribution difference adaptive image retrieval method based on space-frequency interaction, characterized in that the method includes the following steps:

The first step: obtain the original image for training, perform data enhancement on the original image, and obtain a strong transformation image and a weak transformation image;

Step 2: Construct a deep hash network; input the strong and weak transformed images into the student model and teacher model of the deep hash network respectively to obtain the hash quantization code extracted by the student model and the hash quantization code extracted by the teacher model; Based on the hash quantization encoding extracted by the student model and the teacher, the self-distillation difference quantization loss L _Sdh , the hash proxy loss L _HP and the binary cross-entropy loss L _bce-Q are obtained;

L _sdh ＝1-cos(H _T ,H _S ) (1)

L _HP =H(y,Softmax(P _T /T)) (4)

In the formula, H _T and H _S represent the hash quantization codes extracted by the teacher model and the student model, P _T is the proxy sample, T represents the temperature scale hyperparameter, and H(·) represents the difference between the real category label and the predicted category label of the image. The quantified error between , y represents the real category label sequence of the image, Indicates that the value of hash encoding is 1, /> Represents the maximum likelihood estimate of the k-th hash code, H _k represents the k-th bit of the hash quantization code H _T , and K represents the coding length;

The third step is to build a distribution migration module. The distribution migration module uses the distribution center and discrete degree of the hash quantization code extracted by the student model to migrate the hash quantization code extracted by the teacher model, and obtains the hash quantization code after distribution migration. Quantify the difference between the hash quantization coding after distribution migration and the hash quantization coding extracted by the teacher model to obtain the distribution migration loss; the distribution migration loss L _DIT is expressed as:

L _DIT =1-cos(H _T ,H _{T_S} ) (12)

In the formula, _{HT_S} is the hash quantization code after range-constrained distribution migration;

Step 4: Construct a frequency component extraction module. The frequency component extraction module extracts the frequency domain information of hash quantization coding through fast Fourier transform, and then extracts the frequency component of hash quantization coding through arctangent transformation;

Where, The spatial coordinates of the hash quantization code, x(h,w) represents the value of the hash quantization code at the spatial domain coordinates (h, w), H and W represent the length and width of the hash quantization code;

Among them, PH represents the frequency component of hash quantization coding, R(x′)(u,v) is the real part of hash quantization coding at frequency domain coordinates (u,v), I(x′)(u,v) ) is the imaginary part of the hash quantization code at the frequency domain coordinates (u, v);

The frequency component loss L _ph is expressed as:

l _ph ＝1-cos(PH _T ,PH _S ) (15)

Among them, PH _T represents the frequency component of the hash quantization code _HT , and PH _S is the frequency component of the hash quantization code _HS ;

Step 5: Construct the objective optimization function, train the student model and teacher model, and measure the training loss through the objective optimization function; the objective optimization function is:

Among them, N _B is the total number of samples, λ ₁ , λ ₂ , λ ₃ , and λ ₄ are all weights;

Input the image to be queried into the trained student model or teacher model, and output the retrieved image.

2. The distribution difference adaptive image retrieval method based on space-frequency interaction according to claim 1, characterized in that, in the third step, the hash quantization code H _{T_S} is the hash quantization code h _{T_S} after distribution migration through tanh The activation function is obtained, and the hash quantization code h _{T_S} after distribution migration is expressed as:

Among them, μ(.) represents the mean value of hash quantization coding, which represents the distribution center of hash quantization coding; σ(.) represents the variance of hash quantization coding, which represents the degree of discreteness of hash quantization coding; h _T is the teacher model's Hash quantization encoding of layer normalization layer output.

3. The distributed difference adaptive image retrieval method based on space-frequency interaction according to claim 1 or 2, characterized in that both the student model and the teacher model include a feature extraction network and a code generation network, and the feature extraction network adopts ResNet50. Or AlexNet network, the encoding generation network includes a fully connected layer, a layer normalization layer and a tanh activation function.

4. The distribution difference adaptive image retrieval method based on space-frequency interaction according to claim 3, characterized in that in the first step, data enhancement includes random cropping, horizontal flipping, Gaussian blur and brightness, contrast, and saturation transformation. .