CN114610941A

CN114610941A - Cultural relic image retrieval system based on comparison learning

Info

Publication number: CN114610941A
Application number: CN202210253589.0A
Authority: CN
Inventors: 周圆; 郭阿欣; 霍树伟; 陈克然; 李硕士
Original assignee: Tianjin University
Current assignee: Yiyuan Digital Beijing Technology Group Co ltd; Tianjin University
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-10
Anticipated expiration: 2042-03-15
Also published as: CN114610941B

Abstract

The invention discloses a cultural relic image retrieval system based on contrast learning, which is characterized in that firstly, a supervised contrast learning algorithm is used for training a model to obtain a good feature extractor, so that extracted features can accurately represent semantic information contained in an image, then, the similarity between feature representations is calculated for retrieval, and the accuracy of a retrieval result is further improved through average query expansion and database end feature enhancement. The retrieval system of the invention trains the network through the supervised contrast learning algorithm to obtain the feature extractor, can extract the image to obtain effective and discriminant feature representation, and further improves the retrieval accuracy through average query expansion and data characteristic enhancement of the data end.

Description

Image retrieval system of cultural relics based on contrastive learning

技术领域technical field

本发明涉及文物图像资料的特征提取与对比匹配技术，更具体地，涉及到一种基于对比学习算法的针对文物数据的图像检索系统。The invention relates to the feature extraction and contrast matching technology of cultural relic image data, and more particularly, to an image retrieval system for cultural relic data based on a contrast learning algorithm.

背景技术Background technique

民间文物交易流通各环节中的审核工作过于依赖经验分析与肉眼判断，存在过程繁杂、效率低下等问题，这也催生了计算机自动检索文物图像的需求。图像检索旨在建立查询图像与图像数据库之间的索引，根据某种度量方式，输出数据库中与查询图像匹配或相似的图像。基于目前图像数据量大且检索需求高的现状，急需提出适应民间文物多样性与场景复杂性的高保真数字信息采集技术，设计文物数据关键特征信息提取方式与比对匹配方法。The review work in each link of the transaction and circulation of folk cultural relics relies too much on empirical analysis and judgment with the naked eye, and there are problems such as complicated processes and low efficiency, which has also spawned the need for computers to automatically retrieve images of cultural relics. Image retrieval aims to build an index between a query image and an image database, and output images in the database that match or are similar to the query image according to some measure. Based on the current situation of large amount of image data and high retrieval demand, it is urgent to propose a high-fidelity digital information acquisition technology that adapts to the diversity of folk cultural relics and the complexity of the scene, and to design the extraction method and comparison and matching method for key feature information of cultural relics data.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的问题，本发明提供一种基于对比学习的文物图像检索系统，解决现有技术中检索准确率与效率低、对计算力要求高等问题。In order to solve the problems in the prior art, the present invention provides a cultural relic image retrieval system based on contrastive learning, which solves the problems of low retrieval accuracy and efficiency and high requirements for computing power in the prior art.

本发明的技术方案是：The technical scheme of the present invention is:

一种基于对比学习的文物图像检索系统，包括特征提取器和检索模块，所述特征提取器包括预处理和特征提取，所述检索模块包括排序和索引、相似性计算；将待检索的图像输入，对其进行预处理和特征提取得到相应的特征向量，同时对图像数据库中的所有图像进行预处理与特征提取，得到对应的图像特征库，之后通过检索模块，计算查询图像的特征向量与图像特征库中特征向量之间的相似度，并利用相似度对图像数据库中的图像进行排序与索引，得到与查询图像匹配的图像作为最终的检索结果。A cultural relic image retrieval system based on comparative learning, including a feature extractor and a retrieval module, the feature extractor includes preprocessing and feature extraction, the retrieval module includes sorting and indexing, similarity calculation; , perform preprocessing and feature extraction on it to obtain the corresponding feature vector, and simultaneously perform preprocessing and feature extraction on all images in the image database to obtain the corresponding image feature library, and then use the retrieval module to calculate the feature vector of the query image and the image The similarity between the feature vectors in the feature library is used to sort and index the images in the image database, and the image matching the query image is obtained as the final retrieval result.

使用有监督的对比学习算法对特征提取器进行网络训练；对比学习模型采用完全对称且参数共享的两个分支，每个分支均包括数据增强、编码器网络与投影网络，其中编码器网络与投影网络组成特征提取器；对于任意一张图像x，它通过两种不同的数据增强方式形成两个增强视图x_i与x_j；由于上下分支是完全对称的，上分支中x_i首先经过编码器网络转换为对应的特征表示h_i＝f_θ(x_i)；之后非线性变换结构--投影网络将特征表示映射为最终的特征表示z_i＝g_θ(h_i)；类似地，下分支的增强视图经过两次非线性变换得到最终的特征表示z_j＝g_θ(f_θ(x_j))。The feature extractor network is trained using a supervised contrastive learning algorithm; the contrastive learning model employs two branches that are fully symmetrical and parameter-sharing, each branch includes data augmentation, an encoder network, and a projection network, where the encoder network and the projection network The network constitutes a feature extractor; for any image x, it forms two enhanced views x _i and x _j through two different data enhancement methods; since the upper and lower branches are completely symmetrical, x _i in the upper branch first passes through the encoder The network is converted to the corresponding feature representation _hi = f _θ (x _i ); then the nonlinear transformation structure-projection network maps the feature representation to the final feature representation _zi = g _θ ( _hi ); similarly, the lower branch The enhanced view of is subjected to two nonlinear transformations to obtain the final feature representation z _j =g _θ (f _θ (x _j )).

所述网络训练为：随机采样N个样本构成一个Batch，记为{x_k，y_k}_{k＝1，2，...，N}，y_k是x_k的标签，通过数据增强可以得到2N个样本

其中，

和

是同一个样本经两种随机的数据增强方式得到的数据对，数据增强过程中的标签信息始终不会改变；对于有监督对比学习，一个样本对应着多个正样本，即Batch内与其标签信息相同的样本作为正样本，而与其标签信息不同的样本作为负样本，这样可以有效利用已知的标签信息进行监督学习，从而实现同类别的样本在表示空间中更加接近，而不同类别的样本在表示空间中相互远离，提高特征表示的判别能力；因此，有监督对比学习的损失函数定义为：The network training is: randomly sample N samples to form a Batch, denoted as {x _k , y _k } _{k = 1, 2, ..., N} , y _k is the label of x _k , and 2N can be obtained through data enhancement samples

in,

and

It is a data pair obtained by the same sample through two random data enhancement methods, and the label information will never change during the data enhancement process; for supervised comparative learning, one sample corresponds to multiple positive samples, that is, the batch and its label information The same sample is used as a positive sample, and a sample with different label information is used as a negative sample, which can effectively use the known label information for supervised learning, so that the samples of the same category are closer in the representation space, while the samples of different categories are The representation space is far away from each other to improve the discriminative ability of feature representation; therefore, the loss function of supervised contrastive learning is defined as:

其中，1_i≠j∈{0，1}为指示函数，当且仅当i≠j时取1，否则取0；τ＞0为温度参数；z_j(i)表示z_i的正样本，z_i·z_j(i)表示向量之间的内积运算；

表示Batch中与样本z_i具有相同标签信息的样本总数；通过优化式(4)中的损失函数对网络进行训练，将训练好的编码器网络与投影网络作为特征提取器对查询图像和图像数据库中的图像进行特征提取。Among them, 1 _i≠j ∈{0, 1} is the indicator function, if and only if i≠j, take 1, otherwise take 0; τ>0 is the temperature parameter; z _j(i) _represents the positive sample of zi, z _i ·z _j(i) represents the inner product operation between vectors;

Represents the total number of samples in the Batch that have the same label information as the sample _zi ; the network is trained by optimizing the loss function in equation (4), and the trained encoder network and projection network are used as feature extractors to query images and image databases. feature extraction from the images.

所述特征向量之间的相似度计算函数采用对特征向量L2正则化后的点积或者特征向量间的余弦相似度：The similarity calculation function between the eigenvectors adopts the dot product after regularization of the eigenvectors L2 or the cosine similarity between the eigenvectors:

其中，z_i与z_i表示一维向量，||·||₂表示向量的L2范数。Among them, _zi and _zi represent a one-dimensional vector, and ||·|| ₂ represents the L2 norm of the vector.

在索引与排序过程中，使用平均查询扩展及数据库端特征增强以进一步提高检索结果的准确性。During the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results.

所述平均查询扩展即首先根据原始查询Q₀的特征向量与特征库中特征向量之间的相似度对数据库中的图像进行排序，返回前m(m＜50)个结果，之后对原始查询Q₀与m个结果进行平均，形成一个新的查询Q_avg，并利用新的查询生成最终的检索结果；The average query expansion is to first sort the images in the database according to the similarity between the feature vector of the original query Q ₀ and the feature vector in the feature database, return the first m (m<50) results, and then analyze the original query Q ₀ and m results are averaged to form a new query Q _avg , and the new query is used to generate the final retrieval result;

其中，z₀为原始查询的特征向量，z_i为第i个结果的特征向量。Among them, z ₀ is the feature vector of the original query, and _zi is the feature vector of the ith result.

所述数据库端特征增强通过对数据库中图像及与其相近图像的组合对原始图像进行替换，旨在利用图像邻域的特征来提高图像表示的质量；首先对图像特征库中的特征向量两两计算相似度，对于任一图像而言，将与其最近的K个图像特征进行相加，或者根据特征的排名对求和进行加权：The database-side feature enhancement replaces the original image by the combination of the image in the database and its similar images, aiming to improve the quality of the image representation by using the features of the image neighborhood; first, the feature vectors in the image feature library are calculated in pairs. Similarity, for any image, the K nearest image features are added, or the sum is weighted according to the ranking of the features:

其中，r是图像特征的排名，k是考虑的相近图像总数。where r is the rank of image features and k is the total number of close images considered.

有益效果：Beneficial effects:

本发明提出的基于对比学习的文物图像检索系统，通过有监督对比学习算法对网络进行训练得到特征提取器，可以对图像提取得到有效的、具有判别性的特征表示，并通过平均查询扩展与数据端数据特征增强来进一步提高检索的准确性。用户输入一张文物图像作为查询，该检索系统可以准确在图像数据库中检索并返回与查询图像匹配的结果(一个或排序后的多个)。在常见的图像数据集cifar10上得到的定量与定性结果均表明检索系统的有效性。The cultural relic image retrieval system based on contrastive learning proposed by the present invention trains the network through a supervised contrastive learning algorithm to obtain a feature extractor, which can extract an effective and discriminative feature representation from the image, and expands the data with the average query. The end data feature enhancement is used to further improve the retrieval accuracy. The user inputs an image of a cultural relic as a query, and the retrieval system can accurately retrieve and return results (one or multiple sorted) matching the query image in the image database. Both quantitative and qualitative results obtained on the common image dataset cifar10 demonstrate the effectiveness of the retrieval system.

附图说明Description of drawings

图1基于对比学习的文物图像检索系统；Fig. 1 Cultural relic image retrieval system based on contrastive learning;

图2对比学习模型；Figure 2 compares the learning model;

图3系统的定量与定性结果。Figure 3 Quantitative and qualitative results of the system.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明基于对比学习的文物图像检索系统如图1所示，输入一张待检索的图像，对其进行预处理如尺度变换、随机翻转等，提取得到对应的查询图像特征，同时对图像数据库中的所有图像进行预处理与特征提取，得到对应的图像特征库，之后计算图像特征和图像特征库中所有特征之间的相似性，并利用相似性对图像数据库中的图像进行排序与索引，得到与查询图像匹配的图像(一张或多张排序后的图像)作为最终的检索结果。下面对特征提取器的训练以及检索模块的实现进行详细介绍：The cultural relic image retrieval system based on comparative learning of the present invention is shown in Figure 1. An image to be retrieved is input, and preprocessed such as scale transformation, random flip, etc., to extract the corresponding query image features. All images are preprocessed and feature extracted to obtain the corresponding image feature library, then calculate the similarity between the image features and all the features in the image feature library, and use the similarity to sort and index the images in the image database to get The images (one or more sorted images) matching the query image are used as the final retrieval result. The following is a detailed introduction to the training of the feature extractor and the implementation of the retrieval module:

1.特征提取器1. Feature Extractor

特征提取器需要对查询图像以及图像数据库中的图像进行特征提取以对图像信息进行有效的表示，其是后续检索模块中根据特征表示之间相似度来衡量图像之间相似度的基础，也是整个检索系统检索准确性的关键所在。这里使用有监督的对比学习算法对特征提取器进行训练。The feature extractor needs to perform feature extraction on the query image and the images in the image database to effectively represent the image information. The key to retrieval accuracy is the retrieval system. Here the feature extractor is trained using a supervised contrastive learning algorithm.

对比学习的核心思想是拉进样本与正样本之间的距离，同时拉远样本与其负样本之间的距离。有监督的对比学习算法中，利用数据集中的标签信息作为监督，每个样本都对应着多个正样本与负样本，训练特征提取器可以使其生成有判别性的特征表示，有利于图像检索任务的实现。对比学习模型如图2所示，采用完全对称且参数共享的两个分支，每个分支均包括数据增强、编码器网络与投影网络，其中的编码器网络与投影网络组成特征提取器。The core idea of contrastive learning is to pull in the distance between samples and positive samples, and at the same time pull the distance between samples and their negative samples. In the supervised contrastive learning algorithm, the label information in the data set is used as supervision, and each sample corresponds to multiple positive samples and negative samples. Training the feature extractor can make it generate a discriminative feature representation, which is beneficial to image retrieval. realization of the task. The contrastive learning model is shown in Figure 2. It adopts two branches that are completely symmetrical and share parameters. Each branch includes data enhancement, encoder network and projection network. The encoder network and projection network form a feature extractor.

对于任意一张图像x，它通过两种不同的数据增强方式形成两个增强视图x_i与x_j。由于上下分支是完全对称的，以上分支为例，x_i首先经过编码器网络(一般采用ResNet作为模型结构)转换为对应的特征表示h_i＝f_θ(x_i)。之后非线性变换结构--投影网络(由[FC-＞BN-＞ReLU-＞FC]两层MLP构成)将特征表示映射为最终的特征表示z_i＝g_θ(h_i)。类似地，下分支的增强视图经过两次非线性变换得到最终的特征表示z_j＝g_θ(f_θ(x_j))。对比学习的目的则是使得表示空间中正样本之间的距离较近，而负样本之间的距离较远。For any image x, it forms two augmented views x _i and x _j through two different data augmentation methods. Since the upper and lower branches are completely symmetrical, taking the above branch as an example, x _i is first converted into a corresponding feature representation hi =f _θ ( _xi ₎ through an encoder network (usually using ResNet as the model structure). Then the nonlinear transformation structure-projection network (consisting of [FC->BN->ReLU->FC] two-layer MLP) maps the feature representation to the final feature representation _zi =g _θ (h _i ). Similarly, the enhanced view of the lower branch undergoes two nonlinear transformations to obtain the final feature representation z _j =g _θ (f _θ (x _j )). The purpose of contrastive learning is to make the distance between positive samples in the representation space closer, while the distance between negative samples is farther.

网络训练时，随机采样N个样本构成一个Batch，记为{x_k，y_k}_{k＝1，2，...，N}，y_k是x_k的标签，通过数据增强可以得到2N个样本

其中，

和

是同一个样本经两种随机的数据增强方式得到的数据对，数据增强过程中的标签信息始终不会改变。若不考虑类别的监督信息，数据对

互为正样本，而

与Batch中除

外的其他任意2N-2个样本都互为负样本。此时为自监督的对比学习算法，其损失函数定义为：During network training, randomly sample N samples to form a Batch, denoted as {x _k , y _k } _{k = 1, 2, ..., N} , y _k is the label of x _k , and 2N samples can be obtained through data enhancement

in,

and

It is a data pair obtained by the same sample through two random data enhancement methods, and the label information in the data enhancement process will never change. If the supervisory information of the category is not considered, the data

are positive samples of each other, and

Except in Batch

Any other 2N-2 samples are negative samples of each other. At this time, it is a self-supervised contrastive learning algorithm, and its loss function is defined as:

其中，1_i≠k∈{0，1}为指示函数，当且仅当i≠k时取1，否则取0；τ＞0为温度参数；z_j(i)表示z_i的正样本，z_i·z_j(i)表示向量之间的内积运算。可知，损失函数的分子部分鼓励样本与正样本之间的相似度越高越好，即在表示空间中距离越近越好；分母部分则鼓励样本与负样本之间的相似度越低越好，即在表示空间中距离越远越好。Among them, 1 _i≠k ∈{0,1} is the indicator function, if and only if i≠k, take 1, otherwise take 0; τ>0 is the temperature parameter; z _j(i) _represents the positive sample of zi, z _i ·z _j(i) represents an inner product operation between vectors. It can be seen that the numerator part of the loss function encourages the higher the similarity between the sample and the positive sample, the better, that is, the closer the distance in the representation space, the better; the denominator part encourages the lower the similarity between the sample and the negative sample, the better , that is, the farther the distance in the representation space, the better.

可知，自监督对比学习的损失函数将每个样本作为一个单独的类别进行处理，无法处理数据集中存在标签即已知多个样本属于同一类别的情况。而对于有监督对比学习，一个样本对应着多个正样本，即Batch内与其标签信息相同的样本作为正样本，而与其标签信息不同的样本作为负样本，这样可以有效利用已知的标签信息进行监督学习，从而实现同类别的样本在表示空间中更加接近，而不同类别的样本在表示空间中相互远离，提高特征表示的判别能力。因此，有监督对比学习的损失函数定义为：It can be seen that the loss function of self-supervised contrastive learning treats each sample as a separate category, and cannot handle the situation where there are labels in the dataset, that is, it is known that multiple samples belong to the same category. For supervised contrastive learning, one sample corresponds to multiple positive samples, that is, the samples with the same label information in the batch are regarded as positive samples, and the samples with different label information are regarded as negative samples, which can effectively use the known label information to carry out Supervised learning, so that samples of the same category are closer in the representation space, while samples of different categories are far away from each other in the representation space, improving the discriminative ability of feature representation. Therefore, the loss function for supervised contrastive learning is defined as:

其中，

表示Batch中与样本z_i具有相同标签信息的样本总数。通过优化式(4)中的损失函数对网络进行训练，将训练好的编码器网络与投影网络作为特征提取器对查询图像和图像数据库中的图像进行特征提取。in,

Indicates the total number of samples in the Batch that have the same label information as the sample _zi . The network is trained by optimizing the loss function in equation (4), and the trained encoder network and projection network are used as feature extractors to extract features from the query image and the images in the image database.

2.检索模块2. Retrieval module

利用特征提取器对数据库中的所有图像进行特征提取得到相应的图像特征库。进行检索时，输入一张查询图像，对其进行特征提取得到相应的特征向量。之后通过检索模块，计算查询图像的特征向量与图形特征库中特征向量之间的相似度，并根据相似度进行索引与排序，输出排序后的图像作为最终结果(个数由人为设定)。The feature extractor is used to extract the features of all the images in the database to obtain the corresponding image feature library. When retrieving, input a query image, and perform feature extraction on it to obtain the corresponding feature vector. Afterwards, through the retrieval module, the similarity between the feature vector of the query image and the feature vector in the graphic feature library is calculated, and the indexing and sorting are performed according to the similarity, and the sorted images are output as the final result (the number is set manually).

特征向量之间的相似度计算函数一般采用对特征向量L2正则化后的点积或者特征向量间的余弦相似度：The similarity calculation function between eigenvectors generally adopts the dot product after regularization of eigenvectors L2 or the cosine similarity between eigenvectors:

在索引与排序过程中，使用平均查询扩展及数据库端特征增强以进一步提高检索结果的准确性。平均查询扩展即首先根据原始查询Q₀的特征向量与特征库中特征向量之间的相似度对数据库中的图像进行排序，返回前m(m＜50)个结果，之后对原始查询Q0与m个结果进行平均，形成一个新的查询Q_avg，并利用新的查询生成最终的检索结果。During the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results. The average query expansion is to first sort the images in the database according to the similarity between the feature vector of the original query Q ₀ and the feature vector in the feature database, and return the first m (m < 50) results, and then compare the original query Q0 and m The results are averaged to form a new query Q _avg , and the new query is used to generate the final retrieval result.

其中，z₀为原始查询的特征向量，z_i为第i个结果的特征向量。数据库端特征增强通过对数据库中图像及与其相近图像的组合对原始图像进行替换，旨在利用图像邻域的特征来提高图像表示的质量。首先对图像特征库中的特征向量两两计算相似度，对于任一图像而言，将与其最近的K个图像特征进行相加，或者根据特征的排名对求和进行加权：Among them, z ₀ is the feature vector of the original query, and _zi is the feature vector of the ith result. Database-side feature enhancement replaces the original image with a combination of images in the database and its adjacent images, aiming to improve the quality of image representation by utilizing the features of image neighborhoods. First, the similarity is calculated for the feature vectors in the image feature library pairwise. For any image, the K nearest image features are added, or the sum is weighted according to the ranking of the features:

所提的基于对比学习的文物图像检索系统，特征提取器中的编码器网络采用ResNet50网络架构，图像特征向量之间的相似度计算使用余弦相似度，检索模块进行平均查询扩展与数据库端特征增强时，选取特征相似度排名前五的结果进行计算。首先基于有监督对比学习算法对特征提取器进行训练，利用训练好的特征提取器对图像数据库中的图像进行特征提取得到特征数据库，用户进行查询时，输入一张查询图像到系统中，系统将通过特征提取器对其进行预处理与特征提取得到查询特征，之后对查询特征与特征数据库中的特征进行相似性度量，利用特征相似度实现排序与索引，最终输出与查询图像匹配的图像(一个或排序后的多个)给用户。该检索系统可以实现快速且有效的检索，表1展示了其在常见的图像数据集cifar10上的检索准确率，图3展示了其输出为10个的检索结果。In the proposed cultural relic image retrieval system based on contrastive learning, the encoder network in the feature extractor adopts the ResNet50 network architecture, the similarity between image feature vectors is calculated using cosine similarity, and the retrieval module performs average query expansion and database-side feature enhancement. , select the top five results of feature similarity for calculation. First, the feature extractor is trained based on the supervised contrastive learning algorithm, and the trained feature extractor is used to extract the features of the images in the image database to obtain the feature database. When the user queries, input a query image into the system, the system will The query feature is obtained by preprocessing and feature extraction by the feature extractor, and then the similarity between the query feature and the feature in the feature database is measured, and the feature similarity is used to achieve sorting and indexing, and finally an image matching the query image is output (a or sorted multiple) to the user. The retrieval system can achieve fast and effective retrieval. Table 1 shows its retrieval accuracy on the common image dataset cifar10, and Figure 3 shows the retrieval results with 10 outputs.

表1Table 1

Precision@1(％)Precision@1(%) Precision@10(％)Precision@10(%) Map@all(％)Map@all(%) Cifar10Cifar10 98.098.0 100100 98.198.1

本发明公开和提出的技术方案，本领域技术人员可通过借鉴本文内容，适当改变条件路线等环节实现，尽管本发明的方法和制备技术已通过较佳实施例子进行了描述，相关技术人员明显能在不脱离本发明内容、精神和范围内对本文所述的方法和技术路线进行改动或重新组合，来实现最终的制备技术。特别需要指出的是，所有相类似的替换和改动对本领域技术人员来说是显而易见的，他们都被视为包括在本发明精神、范围和内容中。本发明未尽事宜属于公知技术。The technical solutions disclosed and proposed in the present invention can be realized by those skilled in the art by referring to the content of this article and appropriately changing the conditions, routes and other links. The methods and technical routes described herein can be modified or recombined without departing from the content, spirit and scope of the present invention to achieve the final preparation technology. It should be particularly pointed out that all similar substitutions and modifications apparent to those skilled in the art are deemed to be included in the spirit, scope and content of the present invention. Matters not covered by the present invention belong to the known technology.

Claims

1. a cultural relic image retrieval system based on contrast learning, is characterized in that, comprises feature extractor and retrieval module, described feature extractor comprises preprocessing and feature extraction, and described retrieval module comprises sorting and index, similarity calculation; Input the image to be retrieved, perform preprocessing and feature extraction on it to obtain the corresponding feature vector, and simultaneously perform preprocessing and feature extraction on all images in the image database to obtain the corresponding image feature library, and then calculate the query through the retrieval module. The similarity between the feature vector of the image and the feature vector in the image feature library is used to sort and index the images in the image database, and the image matching the query image is obtained as the final retrieval result.

2. the cultural relic image retrieval system based on contrastive learning according to claim 1, is characterized in that, using supervised contrastive learning algorithm to carry out network training to feature extractor; Contrastive learning model adopts two branches of complete symmetry and parameter sharing, Each branch includes data enhancement, encoder network and projection network, wherein the encoder network and projection network form a feature extractor; for any image x, it forms two enhanced views x _i through two different data enhancement methods With x _j ; since the upper and lower branches are completely symmetrical, x _i in the upper branch is first converted into the corresponding feature representation h _i =f _θ (x _i ) through the encoder network; then the nonlinear transformation structure-projection network will feature representation The mapping is the final feature representation z _i =g _θ ( _hi ); similarly, the enhanced view of the lower branch undergoes two nonlinear transformations to obtain the final feature representation z _j =g _θ (f _θ (x _j )).

3. The cultural relic image retrieval system based on contrast learning according to claim 2, wherein the network training is: randomly sampling N samples to form a Batch, denoted as {x _k , y _k } _{k=1,2 , ..., N} , y _k are the labels of x _k , 2N samples can be obtained by data augmentation

in,

and

It is the data pair obtained by the same sample through two random data enhancement methods, and the label information in the data enhancement process will never change;

For supervised contrastive learning, one sample corresponds to multiple positive samples, that is, the samples with the same label information in the batch are regarded as positive samples, and the samples with different label information are regarded as negative samples, which can effectively use the known label information for supervision. Learning, so that the samples of the same category are closer in the representation space, while the samples of different categories are far away from each other in the representation space, which improves the discriminative ability of feature representation; therefore, the loss function of supervised contrastive learning is defined as:

Among them, 1 _i≠j ∈{0, 1} is the indicator function, if and only if i≠j, take 1, otherwise take 0; τ>0 is the temperature parameter; z _j(i) _represents the positive sample of zi, z _i ·z _j(i) represents the inner product operation between vectors;

4. the cultural relic image retrieval system based on contrast learning according to claim 1, is characterized in that, the similarity calculation function between described feature vectors adopts the dot product after feature vector L2 regularization or the cosine similarity between feature vectors Spend:

Among them, z _i and z _j represent a one-dimensional vector, and ||·|| ₂ represents the L2 norm of the vector.

5 . The cultural relic image retrieval system based on contrastive learning according to claim 1 , wherein in the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results. 6 .

6. the cultural relic image retrieval system based on contrast learning according to claim 5, is characterized in that, described average query expansion namely first according to the similarity between the feature vector in the feature vector of original query Q ₀ and the feature vector in the feature library to the database. The images are sorted, the first m (m<50) results are returned, and then the original query Q ₀ and the m results are averaged to form a new query Q _avg , and the new query is used to generate the final retrieval result;

Among them, z ₀ is the feature vector of the original query, and _zi is the feature vector of the ith result.

7. The cultural relic image retrieval system based on contrastive learning according to claim 5, is characterized in that, described database end feature enhancement replaces original image by the combination of image in database and its adjacent image, is intended to utilize image neighborhood feature to improve the quality of image representation; first, calculate the similarity of the feature vectors in the image feature library pairwise, for any image, add its nearest K image features, or find the feature based on the ranking of the features. and weighted:

where r is the rank of image features and k is the total number of close images considered.