CN114610941A - Cultural relic image retrieval system based on comparison learning - Google Patents

Cultural relic image retrieval system based on comparison learning Download PDF

Info

Publication number
CN114610941A
CN114610941A CN202210253589.0A CN202210253589A CN114610941A CN 114610941 A CN114610941 A CN 114610941A CN 202210253589 A CN202210253589 A CN 202210253589A CN 114610941 A CN114610941 A CN 114610941A
Authority
CN
China
Prior art keywords
feature
image
samples
query
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210253589.0A
Other languages
Chinese (zh)
Other versions
CN114610941B (en
Inventor
周圆
郭阿欣
霍树伟
陈克然
李硕士
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yiyuan Digital Beijing Technology Group Co ltd
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210253589.0A priority Critical patent/CN114610941B/en
Publication of CN114610941A publication Critical patent/CN114610941A/en
Application granted granted Critical
Publication of CN114610941B publication Critical patent/CN114610941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cultural relic image retrieval system based on contrast learning, which is characterized in that firstly, a supervised contrast learning algorithm is used for training a model to obtain a good feature extractor, so that extracted features can accurately represent semantic information contained in an image, then, the similarity between feature representations is calculated for retrieval, and the accuracy of a retrieval result is further improved through average query expansion and database end feature enhancement. The retrieval system of the invention trains the network through the supervised contrast learning algorithm to obtain the feature extractor, can extract the image to obtain effective and discriminant feature representation, and further improves the retrieval accuracy through average query expansion and data characteristic enhancement of the data end.

Description

基于对比学习的文物图像检索系统Image retrieval system of cultural relics based on contrastive learning

技术领域technical field

本发明涉及文物图像资料的特征提取与对比匹配技术,更具体地,涉及到一种基于对比学习算法的针对文物数据的图像检索系统。The invention relates to the feature extraction and contrast matching technology of cultural relic image data, and more particularly, to an image retrieval system for cultural relic data based on a contrast learning algorithm.

背景技术Background technique

民间文物交易流通各环节中的审核工作过于依赖经验分析与肉眼判断,存在过程繁杂、效率低下等问题,这也催生了计算机自动检索文物图像的需求。图像检索旨在建立查询图像与图像数据库之间的索引,根据某种度量方式,输出数据库中与查询图像匹配或相似的图像。基于目前图像数据量大且检索需求高的现状,急需提出适应民间文物多样性与场景复杂性的高保真数字信息采集技术,设计文物数据关键特征信息提取方式与比对匹配方法。The review work in each link of the transaction and circulation of folk cultural relics relies too much on empirical analysis and judgment with the naked eye, and there are problems such as complicated processes and low efficiency, which has also spawned the need for computers to automatically retrieve images of cultural relics. Image retrieval aims to build an index between a query image and an image database, and output images in the database that match or are similar to the query image according to some measure. Based on the current situation of large amount of image data and high retrieval demand, it is urgent to propose a high-fidelity digital information acquisition technology that adapts to the diversity of folk cultural relics and the complexity of the scene, and to design the extraction method and comparison and matching method for key feature information of cultural relics data.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术中的问题,本发明提供一种基于对比学习的文物图像检索系统,解决现有技术中检索准确率与效率低、对计算力要求高等问题。In order to solve the problems in the prior art, the present invention provides a cultural relic image retrieval system based on contrastive learning, which solves the problems of low retrieval accuracy and efficiency and high requirements for computing power in the prior art.

本发明的技术方案是:The technical scheme of the present invention is:

一种基于对比学习的文物图像检索系统,包括特征提取器和检索模块,所述特征提取器包括预处理和特征提取,所述检索模块包括排序和索引、相似性计算;将待检索的图像输入,对其进行预处理和特征提取得到相应的特征向量,同时对图像数据库中的所有图像进行预处理与特征提取,得到对应的图像特征库,之后通过检索模块,计算查询图像的特征向量与图像特征库中特征向量之间的相似度,并利用相似度对图像数据库中的图像进行排序与索引,得到与查询图像匹配的图像作为最终的检索结果。A cultural relic image retrieval system based on comparative learning, including a feature extractor and a retrieval module, the feature extractor includes preprocessing and feature extraction, the retrieval module includes sorting and indexing, similarity calculation; , perform preprocessing and feature extraction on it to obtain the corresponding feature vector, and simultaneously perform preprocessing and feature extraction on all images in the image database to obtain the corresponding image feature library, and then use the retrieval module to calculate the feature vector of the query image and the image The similarity between the feature vectors in the feature library is used to sort and index the images in the image database, and the image matching the query image is obtained as the final retrieval result.

使用有监督的对比学习算法对特征提取器进行网络训练;对比学习模型采用完全对称且参数共享的两个分支,每个分支均包括数据增强、编码器网络与投影网络,其中编码器网络与投影网络组成特征提取器;对于任意一张图像x,它通过两种不同的数据增强方式形成两个增强视图xi与xj;由于上下分支是完全对称的,上分支中xi首先经过编码器网络转换为对应的特征表示hi=fθ(xi);之后非线性变换结构--投影网络将特征表示映射为最终的特征表示zi=gθ(hi);类似地,下分支的增强视图经过两次非线性变换得到最终的特征表示zj=gθ(fθ(xj))。The feature extractor network is trained using a supervised contrastive learning algorithm; the contrastive learning model employs two branches that are fully symmetrical and parameter-sharing, each branch includes data augmentation, an encoder network, and a projection network, where the encoder network and the projection network The network constitutes a feature extractor; for any image x, it forms two enhanced views x i and x j through two different data enhancement methods; since the upper and lower branches are completely symmetrical, x i in the upper branch first passes through the encoder The network is converted to the corresponding feature representation hi = f θ (x i ); then the nonlinear transformation structure-projection network maps the feature representation to the final feature representation zi = g θ ( hi ); similarly, the lower branch The enhanced view of is subjected to two nonlinear transformations to obtain the final feature representation z j =g θ (f θ (x j )).

所述网络训练为:随机采样N个样本构成一个Batch,记为{xk,yk}k=1,2,...,N,yk是xk的标签,通过数据增强可以得到2N个样本

Figure BDA0003547982000000021
其中,
Figure BDA0003547982000000022
Figure BDA0003547982000000023
是同一个样本经两种随机的数据增强方式得到的数据对,数据增强过程中的标签信息始终不会改变;对于有监督对比学习,一个样本对应着多个正样本,即Batch内与其标签信息相同的样本作为正样本,而与其标签信息不同的样本作为负样本,这样可以有效利用已知的标签信息进行监督学习,从而实现同类别的样本在表示空间中更加接近,而不同类别的样本在表示空间中相互远离,提高特征表示的判别能力;因此,有监督对比学习的损失函数定义为:The network training is: randomly sample N samples to form a Batch, denoted as {x k , y k } k = 1, 2, ..., N , y k is the label of x k , and 2N can be obtained through data enhancement samples
Figure BDA0003547982000000021
in,
Figure BDA0003547982000000022
and
Figure BDA0003547982000000023
It is a data pair obtained by the same sample through two random data enhancement methods, and the label information will never change during the data enhancement process; for supervised comparative learning, one sample corresponds to multiple positive samples, that is, the batch and its label information The same sample is used as a positive sample, and a sample with different label information is used as a negative sample, which can effectively use the known label information for supervised learning, so that the samples of the same category are closer in the representation space, while the samples of different categories are The representation space is far away from each other to improve the discriminative ability of feature representation; therefore, the loss function of supervised contrastive learning is defined as:

Figure BDA0003547982000000024
Figure BDA0003547982000000024

Figure BDA0003547982000000025
Figure BDA0003547982000000025

其中,1i≠j∈{0,1}为指示函数,当且仅当i≠j时取1,否则取0;τ>0为温度参数;zj(i)表示zi的正样本,zi·zj(i)表示向量之间的内积运算;

Figure BDA0003547982000000028
表示Batch中与样本zi具有相同标签信息的样本总数;通过优化式(4)中的损失函数对网络进行训练,将训练好的编码器网络与投影网络作为特征提取器对查询图像和图像数据库中的图像进行特征提取。Among them, 1 i≠j ∈{0, 1} is the indicator function, if and only if i≠j, take 1, otherwise take 0; τ>0 is the temperature parameter; z j(i) represents the positive sample of zi, z i ·z j(i) represents the inner product operation between vectors;
Figure BDA0003547982000000028
Represents the total number of samples in the Batch that have the same label information as the sample zi ; the network is trained by optimizing the loss function in equation (4), and the trained encoder network and projection network are used as feature extractors to query images and image databases. feature extraction from the images.

所述特征向量之间的相似度计算函数采用对特征向量L2正则化后的点积或者特征向量间的余弦相似度:The similarity calculation function between the eigenvectors adopts the dot product after regularization of the eigenvectors L2 or the cosine similarity between the eigenvectors:

Figure BDA0003547982000000026
Figure BDA0003547982000000026

其中,zi与zi表示一维向量,||·||2表示向量的L2范数。Among them, zi and zi represent a one-dimensional vector, and ||·|| 2 represents the L2 norm of the vector.

在索引与排序过程中,使用平均查询扩展及数据库端特征增强以进一步提高检索结果的准确性。During the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results.

所述平均查询扩展即首先根据原始查询Q0的特征向量与特征库中特征向量之间的相似度对数据库中的图像进行排序,返回前m(m<50)个结果,之后对原始查询Q0与m个结果进行平均,形成一个新的查询Qavg,并利用新的查询生成最终的检索结果;The average query expansion is to first sort the images in the database according to the similarity between the feature vector of the original query Q 0 and the feature vector in the feature database, return the first m (m<50) results, and then analyze the original query Q 0 and m results are averaged to form a new query Q avg , and the new query is used to generate the final retrieval result;

Figure BDA0003547982000000027
Figure BDA0003547982000000027

其中,z0为原始查询的特征向量,zi为第i个结果的特征向量。Among them, z 0 is the feature vector of the original query, and zi is the feature vector of the ith result.

所述数据库端特征增强通过对数据库中图像及与其相近图像的组合对原始图像进行替换,旨在利用图像邻域的特征来提高图像表示的质量;首先对图像特征库中的特征向量两两计算相似度,对于任一图像而言,将与其最近的K个图像特征进行相加,或者根据特征的排名对求和进行加权:The database-side feature enhancement replaces the original image by the combination of the image in the database and its similar images, aiming to improve the quality of the image representation by using the features of the image neighborhood; first, the feature vectors in the image feature library are calculated in pairs. Similarity, for any image, the K nearest image features are added, or the sum is weighted according to the ranking of the features:

Figure BDA0003547982000000031
Figure BDA0003547982000000031

其中,r是图像特征的排名,k是考虑的相近图像总数。where r is the rank of image features and k is the total number of close images considered.

有益效果:Beneficial effects:

本发明提出的基于对比学习的文物图像检索系统,通过有监督对比学习算法对网络进行训练得到特征提取器,可以对图像提取得到有效的、具有判别性的特征表示,并通过平均查询扩展与数据端数据特征增强来进一步提高检索的准确性。用户输入一张文物图像作为查询,该检索系统可以准确在图像数据库中检索并返回与查询图像匹配的结果(一个或排序后的多个)。在常见的图像数据集cifar10上得到的定量与定性结果均表明检索系统的有效性。The cultural relic image retrieval system based on contrastive learning proposed by the present invention trains the network through a supervised contrastive learning algorithm to obtain a feature extractor, which can extract an effective and discriminative feature representation from the image, and expands the data with the average query. The end data feature enhancement is used to further improve the retrieval accuracy. The user inputs an image of a cultural relic as a query, and the retrieval system can accurately retrieve and return results (one or multiple sorted) matching the query image in the image database. Both quantitative and qualitative results obtained on the common image dataset cifar10 demonstrate the effectiveness of the retrieval system.

附图说明Description of drawings

图1基于对比学习的文物图像检索系统;Fig. 1 Cultural relic image retrieval system based on contrastive learning;

图2对比学习模型;Figure 2 compares the learning model;

图3系统的定量与定性结果。Figure 3 Quantitative and qualitative results of the system.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明基于对比学习的文物图像检索系统如图1所示,输入一张待检索的图像,对其进行预处理如尺度变换、随机翻转等,提取得到对应的查询图像特征,同时对图像数据库中的所有图像进行预处理与特征提取,得到对应的图像特征库,之后计算图像特征和图像特征库中所有特征之间的相似性,并利用相似性对图像数据库中的图像进行排序与索引,得到与查询图像匹配的图像(一张或多张排序后的图像)作为最终的检索结果。下面对特征提取器的训练以及检索模块的实现进行详细介绍:The cultural relic image retrieval system based on comparative learning of the present invention is shown in Figure 1. An image to be retrieved is input, and preprocessed such as scale transformation, random flip, etc., to extract the corresponding query image features. All images are preprocessed and feature extracted to obtain the corresponding image feature library, then calculate the similarity between the image features and all the features in the image feature library, and use the similarity to sort and index the images in the image database to get The images (one or more sorted images) matching the query image are used as the final retrieval result. The following is a detailed introduction to the training of the feature extractor and the implementation of the retrieval module:

1.特征提取器1. Feature Extractor

特征提取器需要对查询图像以及图像数据库中的图像进行特征提取以对图像信息进行有效的表示,其是后续检索模块中根据特征表示之间相似度来衡量图像之间相似度的基础,也是整个检索系统检索准确性的关键所在。这里使用有监督的对比学习算法对特征提取器进行训练。The feature extractor needs to perform feature extraction on the query image and the images in the image database to effectively represent the image information. The key to retrieval accuracy is the retrieval system. Here the feature extractor is trained using a supervised contrastive learning algorithm.

对比学习的核心思想是拉进样本与正样本之间的距离,同时拉远样本与其负样本之间的距离。有监督的对比学习算法中,利用数据集中的标签信息作为监督,每个样本都对应着多个正样本与负样本,训练特征提取器可以使其生成有判别性的特征表示,有利于图像检索任务的实现。对比学习模型如图2所示,采用完全对称且参数共享的两个分支,每个分支均包括数据增强、编码器网络与投影网络,其中的编码器网络与投影网络组成特征提取器。The core idea of contrastive learning is to pull in the distance between samples and positive samples, and at the same time pull the distance between samples and their negative samples. In the supervised contrastive learning algorithm, the label information in the data set is used as supervision, and each sample corresponds to multiple positive samples and negative samples. Training the feature extractor can make it generate a discriminative feature representation, which is beneficial to image retrieval. realization of the task. The contrastive learning model is shown in Figure 2. It adopts two branches that are completely symmetrical and share parameters. Each branch includes data enhancement, encoder network and projection network. The encoder network and projection network form a feature extractor.

对于任意一张图像x,它通过两种不同的数据增强方式形成两个增强视图xi与xj。由于上下分支是完全对称的,以上分支为例,xi首先经过编码器网络(一般采用ResNet作为模型结构)转换为对应的特征表示hi=fθ(xi)。之后非线性变换结构--投影网络(由[FC->BN->ReLU->FC]两层MLP构成)将特征表示映射为最终的特征表示zi=gθ(hi)。类似地,下分支的增强视图经过两次非线性变换得到最终的特征表示zj=gθ(fθ(xj))。对比学习的目的则是使得表示空间中正样本之间的距离较近,而负样本之间的距离较远。For any image x, it forms two augmented views x i and x j through two different data augmentation methods. Since the upper and lower branches are completely symmetrical, taking the above branch as an example, x i is first converted into a corresponding feature representation hi =f θ ( xi ) through an encoder network (usually using ResNet as the model structure). Then the nonlinear transformation structure-projection network (consisting of [FC->BN->ReLU->FC] two-layer MLP) maps the feature representation to the final feature representation zi =g θ (h i ). Similarly, the enhanced view of the lower branch undergoes two nonlinear transformations to obtain the final feature representation z j =g θ (f θ (x j )). The purpose of contrastive learning is to make the distance between positive samples in the representation space closer, while the distance between negative samples is farther.

网络训练时,随机采样N个样本构成一个Batch,记为{xk,yk}k=1,2,...,N,yk是xk的标签,通过数据增强可以得到2N个样本

Figure BDA0003547982000000041
其中,
Figure BDA0003547982000000042
Figure BDA0003547982000000043
是同一个样本经两种随机的数据增强方式得到的数据对,数据增强过程中的标签信息始终不会改变。若不考虑类别的监督信息,数据对
Figure BDA0003547982000000044
互为正样本,而
Figure BDA0003547982000000045
与Batch中除
Figure BDA0003547982000000046
外的其他任意2N-2个样本都互为负样本。此时为自监督的对比学习算法,其损失函数定义为:During network training, randomly sample N samples to form a Batch, denoted as {x k , y k } k = 1, 2, ..., N , y k is the label of x k , and 2N samples can be obtained through data enhancement
Figure BDA0003547982000000041
in,
Figure BDA0003547982000000042
and
Figure BDA0003547982000000043
It is a data pair obtained by the same sample through two random data enhancement methods, and the label information in the data enhancement process will never change. If the supervisory information of the category is not considered, the data
Figure BDA0003547982000000044
are positive samples of each other, and
Figure BDA0003547982000000045
Except in Batch
Figure BDA0003547982000000046
Any other 2N-2 samples are negative samples of each other. At this time, it is a self-supervised contrastive learning algorithm, and its loss function is defined as:

Figure BDA0003547982000000047
Figure BDA0003547982000000047

Figure BDA0003547982000000048
Figure BDA0003547982000000048

其中,1i≠k∈{0,1}为指示函数,当且仅当i≠k时取1,否则取0;τ>0为温度参数;zj(i)表示zi的正样本,zi·zj(i)表示向量之间的内积运算。可知,损失函数的分子部分鼓励样本与正样本之间的相似度越高越好,即在表示空间中距离越近越好;分母部分则鼓励样本与负样本之间的相似度越低越好,即在表示空间中距离越远越好。Among them, 1 i≠k ∈{0,1} is the indicator function, if and only if i≠k, take 1, otherwise take 0; τ>0 is the temperature parameter; z j(i) represents the positive sample of zi, z i ·z j(i) represents an inner product operation between vectors. It can be seen that the numerator part of the loss function encourages the higher the similarity between the sample and the positive sample, the better, that is, the closer the distance in the representation space, the better; the denominator part encourages the lower the similarity between the sample and the negative sample, the better , that is, the farther the distance in the representation space, the better.

可知,自监督对比学习的损失函数将每个样本作为一个单独的类别进行处理,无法处理数据集中存在标签即已知多个样本属于同一类别的情况。而对于有监督对比学习,一个样本对应着多个正样本,即Batch内与其标签信息相同的样本作为正样本,而与其标签信息不同的样本作为负样本,这样可以有效利用已知的标签信息进行监督学习,从而实现同类别的样本在表示空间中更加接近,而不同类别的样本在表示空间中相互远离,提高特征表示的判别能力。因此,有监督对比学习的损失函数定义为:It can be seen that the loss function of self-supervised contrastive learning treats each sample as a separate category, and cannot handle the situation where there are labels in the dataset, that is, it is known that multiple samples belong to the same category. For supervised contrastive learning, one sample corresponds to multiple positive samples, that is, the samples with the same label information in the batch are regarded as positive samples, and the samples with different label information are regarded as negative samples, which can effectively use the known label information to carry out Supervised learning, so that samples of the same category are closer in the representation space, while samples of different categories are far away from each other in the representation space, improving the discriminative ability of feature representation. Therefore, the loss function for supervised contrastive learning is defined as:

Figure BDA0003547982000000051
Figure BDA0003547982000000051

Figure BDA0003547982000000052
Figure BDA0003547982000000052

其中,

Figure BDA0003547982000000053
表示Batch中与样本zi具有相同标签信息的样本总数。通过优化式(4)中的损失函数对网络进行训练,将训练好的编码器网络与投影网络作为特征提取器对查询图像和图像数据库中的图像进行特征提取。in,
Figure BDA0003547982000000053
Indicates the total number of samples in the Batch that have the same label information as the sample zi . The network is trained by optimizing the loss function in equation (4), and the trained encoder network and projection network are used as feature extractors to extract features from the query image and the images in the image database.

2.检索模块2. Retrieval module

利用特征提取器对数据库中的所有图像进行特征提取得到相应的图像特征库。进行检索时,输入一张查询图像,对其进行特征提取得到相应的特征向量。之后通过检索模块,计算查询图像的特征向量与图形特征库中特征向量之间的相似度,并根据相似度进行索引与排序,输出排序后的图像作为最终结果(个数由人为设定)。The feature extractor is used to extract the features of all the images in the database to obtain the corresponding image feature library. When retrieving, input a query image, and perform feature extraction on it to obtain the corresponding feature vector. Afterwards, through the retrieval module, the similarity between the feature vector of the query image and the feature vector in the graphic feature library is calculated, and the indexing and sorting are performed according to the similarity, and the sorted images are output as the final result (the number is set manually).

特征向量之间的相似度计算函数一般采用对特征向量L2正则化后的点积或者特征向量间的余弦相似度:The similarity calculation function between eigenvectors generally adopts the dot product after regularization of eigenvectors L2 or the cosine similarity between eigenvectors:

Figure BDA0003547982000000054
Figure BDA0003547982000000054

在索引与排序过程中,使用平均查询扩展及数据库端特征增强以进一步提高检索结果的准确性。平均查询扩展即首先根据原始查询Q0的特征向量与特征库中特征向量之间的相似度对数据库中的图像进行排序,返回前m(m<50)个结果,之后对原始查询Q0与m个结果进行平均,形成一个新的查询Qavg,并利用新的查询生成最终的检索结果。During the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results. The average query expansion is to first sort the images in the database according to the similarity between the feature vector of the original query Q 0 and the feature vector in the feature database, and return the first m (m < 50) results, and then compare the original query Q0 and m The results are averaged to form a new query Q avg , and the new query is used to generate the final retrieval result.

Figure BDA0003547982000000055
Figure BDA0003547982000000055

其中,z0为原始查询的特征向量,zi为第i个结果的特征向量。数据库端特征增强通过对数据库中图像及与其相近图像的组合对原始图像进行替换,旨在利用图像邻域的特征来提高图像表示的质量。首先对图像特征库中的特征向量两两计算相似度,对于任一图像而言,将与其最近的K个图像特征进行相加,或者根据特征的排名对求和进行加权:Among them, z 0 is the feature vector of the original query, and zi is the feature vector of the ith result. Database-side feature enhancement replaces the original image with a combination of images in the database and its adjacent images, aiming to improve the quality of image representation by utilizing the features of image neighborhoods. First, the similarity is calculated for the feature vectors in the image feature library pairwise. For any image, the K nearest image features are added, or the sum is weighted according to the ranking of the features:

Figure BDA0003547982000000056
Figure BDA0003547982000000056

其中,r是图像特征的排名,k是考虑的相近图像总数。where r is the rank of image features and k is the total number of close images considered.

所提的基于对比学习的文物图像检索系统,特征提取器中的编码器网络采用ResNet50网络架构,图像特征向量之间的相似度计算使用余弦相似度,检索模块进行平均查询扩展与数据库端特征增强时,选取特征相似度排名前五的结果进行计算。首先基于有监督对比学习算法对特征提取器进行训练,利用训练好的特征提取器对图像数据库中的图像进行特征提取得到特征数据库,用户进行查询时,输入一张查询图像到系统中,系统将通过特征提取器对其进行预处理与特征提取得到查询特征,之后对查询特征与特征数据库中的特征进行相似性度量,利用特征相似度实现排序与索引,最终输出与查询图像匹配的图像(一个或排序后的多个)给用户。该检索系统可以实现快速且有效的检索,表1展示了其在常见的图像数据集cifar10上的检索准确率,图3展示了其输出为10个的检索结果。In the proposed cultural relic image retrieval system based on contrastive learning, the encoder network in the feature extractor adopts the ResNet50 network architecture, the similarity between image feature vectors is calculated using cosine similarity, and the retrieval module performs average query expansion and database-side feature enhancement. , select the top five results of feature similarity for calculation. First, the feature extractor is trained based on the supervised contrastive learning algorithm, and the trained feature extractor is used to extract the features of the images in the image database to obtain the feature database. When the user queries, input a query image into the system, the system will The query feature is obtained by preprocessing and feature extraction by the feature extractor, and then the similarity between the query feature and the feature in the feature database is measured, and the feature similarity is used to achieve sorting and indexing, and finally an image matching the query image is output (a or sorted multiple) to the user. The retrieval system can achieve fast and effective retrieval. Table 1 shows its retrieval accuracy on the common image dataset cifar10, and Figure 3 shows the retrieval results with 10 outputs.

表1Table 1

Precision@1(%)Precision@1(%) Precision@10(%)Precision@10(%) Map@all(%)Map@all(%) Cifar10Cifar10 98.098.0 100100 98.198.1

本发明公开和提出的技术方案,本领域技术人员可通过借鉴本文内容,适当改变条件路线等环节实现,尽管本发明的方法和制备技术已通过较佳实施例子进行了描述,相关技术人员明显能在不脱离本发明内容、精神和范围内对本文所述的方法和技术路线进行改动或重新组合,来实现最终的制备技术。特别需要指出的是,所有相类似的替换和改动对本领域技术人员来说是显而易见的,他们都被视为包括在本发明精神、范围和内容中。本发明未尽事宜属于公知技术。The technical solutions disclosed and proposed in the present invention can be realized by those skilled in the art by referring to the content of this article and appropriately changing the conditions, routes and other links. The methods and technical routes described herein can be modified or recombined without departing from the content, spirit and scope of the present invention to achieve the final preparation technology. It should be particularly pointed out that all similar substitutions and modifications apparent to those skilled in the art are deemed to be included in the spirit, scope and content of the present invention. Matters not covered by the present invention belong to the known technology.

Claims (7)

1.一种基于对比学习的文物图像检索系统,其特征在于,包括特征提取器和检索模块,所述特征提取器包括预处理和特征提取,所述检索模块包括排序和索引、相似性计算;将待检索的图像输入,对其进行预处理和特征提取得到相应的特征向量,同时对图像数据库中的所有图像进行预处理与特征提取,得到对应的图像特征库,之后通过检索模块,计算查询图像的特征向量与图像特征库中特征向量之间的相似度,并利用相似度对图像数据库中的图像进行排序与索引,得到与查询图像匹配的图像作为最终的检索结果。1. a cultural relic image retrieval system based on contrast learning, is characterized in that, comprises feature extractor and retrieval module, described feature extractor comprises preprocessing and feature extraction, and described retrieval module comprises sorting and index, similarity calculation; Input the image to be retrieved, perform preprocessing and feature extraction on it to obtain the corresponding feature vector, and simultaneously perform preprocessing and feature extraction on all images in the image database to obtain the corresponding image feature library, and then calculate the query through the retrieval module. The similarity between the feature vector of the image and the feature vector in the image feature library is used to sort and index the images in the image database, and the image matching the query image is obtained as the final retrieval result. 2.根据权利要求1所述基于对比学习的文物图像检索系统,其特征在于,使用有监督的对比学习算法对特征提取器进行网络训练;对比学习模型采用完全对称且参数共享的两个分支,每个分支均包括数据增强、编码器网络与投影网络,其中编码器网络与投影网络组成特征提取器;对于任意一张图像x,它通过两种不同的数据增强方式形成两个增强视图xi与xj;由于上下分支是完全对称的,上分支中xi首先经过编码器网络转换为对应的特征表示hi=fθ(xi);之后非线性变换结构--投影网络将特征表示映射为最终的特征表示zi=gθ(hi);类似地,下分支的增强视图经过两次非线性变换得到最终的特征表示zj=gθ(fθ(xj))。2. the cultural relic image retrieval system based on contrastive learning according to claim 1, is characterized in that, using supervised contrastive learning algorithm to carry out network training to feature extractor; Contrastive learning model adopts two branches of complete symmetry and parameter sharing, Each branch includes data enhancement, encoder network and projection network, wherein the encoder network and projection network form a feature extractor; for any image x, it forms two enhanced views x i through two different data enhancement methods With x j ; since the upper and lower branches are completely symmetrical, x i in the upper branch is first converted into the corresponding feature representation h i =f θ (x i ) through the encoder network; then the nonlinear transformation structure-projection network will feature representation The mapping is the final feature representation z i =g θ ( hi ); similarly, the enhanced view of the lower branch undergoes two nonlinear transformations to obtain the final feature representation z j =g θ (f θ (x j )). 3.根据权利要求2所述基于对比学习的文物图像检索系统,其特征在于,所述网络训练为:随机采样N个样本构成一个Batch,记为{xk,yk}k=1,2,...,N,yk是xk的标签,通过数据增强可以得到2N个样本
Figure FDA0003547981990000011
其中,
Figure FDA0003547981990000012
Figure FDA0003547981990000013
是同一个样本经两种随机的数据增强方式得到的数据对,数据增强过程中的标签信息始终不会改变;
3. The cultural relic image retrieval system based on contrast learning according to claim 2, wherein the network training is: randomly sampling N samples to form a Batch, denoted as {x k , y k } k=1,2 , ..., N , y k are the labels of x k , 2N samples can be obtained by data augmentation
Figure FDA0003547981990000011
in,
Figure FDA0003547981990000012
and
Figure FDA0003547981990000013
It is the data pair obtained by the same sample through two random data enhancement methods, and the label information in the data enhancement process will never change;
对于有监督对比学习,一个样本对应着多个正样本,即Batch内与其标签信息相同的样本作为正样本,而与其标签信息不同的样本作为负样本,这样可以有效利用已知的标签信息进行监督学习,从而实现同类别的样本在表示空间中更加接近,而不同类别的样本在表示空间中相互远离,提高特征表示的判别能力;因此,有监督对比学习的损失函数定义为:For supervised contrastive learning, one sample corresponds to multiple positive samples, that is, the samples with the same label information in the batch are regarded as positive samples, and the samples with different label information are regarded as negative samples, which can effectively use the known label information for supervision. Learning, so that the samples of the same category are closer in the representation space, while the samples of different categories are far away from each other in the representation space, which improves the discriminative ability of feature representation; therefore, the loss function of supervised contrastive learning is defined as:
Figure FDA0003547981990000014
Figure FDA0003547981990000014
Figure FDA0003547981990000015
Figure FDA0003547981990000015
其中,1i≠j∈{0,1}为指示函数,当且仅当i≠j时取1,否则取0;τ>0为温度参数;zj(i)表示zi的正样本,zi·zj(i)表示向量之间的内积运算;
Figure FDA0003547981990000016
表示Batch中与样本zi具有相同标签信息的样本总数;通过优化式(4)中的损失函数对网络进行训练,将训练好的编码器网络与投影网络作为特征提取器对查询图像和图像数据库中的图像进行特征提取。
Among them, 1 i≠j ∈{0, 1} is the indicator function, if and only if i≠j, take 1, otherwise take 0; τ>0 is the temperature parameter; z j(i) represents the positive sample of zi, z i ·z j(i) represents the inner product operation between vectors;
Figure FDA0003547981990000016
Represents the total number of samples in the Batch that have the same label information as the sample zi ; the network is trained by optimizing the loss function in equation (4), and the trained encoder network and projection network are used as feature extractors to query images and image databases. feature extraction from the images.
4.根据权利要求1所述基于对比学习的文物图像检索系统,其特征在于,所述特征向量之间的相似度计算函数采用对特征向量L2正则化后的点积或者特征向量间的余弦相似度:4. the cultural relic image retrieval system based on contrast learning according to claim 1, is characterized in that, the similarity calculation function between described feature vectors adopts the dot product after feature vector L2 regularization or the cosine similarity between feature vectors Spend:
Figure FDA0003547981990000021
Figure FDA0003547981990000021
其中,zi与zj表示一维向量,||·||2表示向量的L2范数。Among them, z i and z j represent a one-dimensional vector, and ||·|| 2 represents the L2 norm of the vector.
5.根据权利要求1所述基于对比学习的文物图像检索系统,其特征在于,在索引与排序过程中,使用平均查询扩展及数据库端特征增强以进一步提高检索结果的准确性。5 . The cultural relic image retrieval system based on contrastive learning according to claim 1 , wherein in the indexing and sorting process, average query expansion and database-side feature enhancement are used to further improve the accuracy of retrieval results. 6 . 6.根据权利要求5所述基于对比学习的文物图像检索系统,其特征在于,所述平均查询扩展即首先根据原始查询Q0的特征向量与特征库中特征向量之间的相似度对数据库中的图像进行排序,返回前m(m<50)个结果,之后对原始查询Q0与m个结果进行平均,形成一个新的查询Qavg,并利用新的查询生成最终的检索结果;6. the cultural relic image retrieval system based on contrast learning according to claim 5, is characterized in that, described average query expansion namely first according to the similarity between the feature vector in the feature vector of original query Q 0 and the feature vector in the feature library to the database. The images are sorted, the first m (m<50) results are returned, and then the original query Q 0 and the m results are averaged to form a new query Q avg , and the new query is used to generate the final retrieval result;
Figure FDA0003547981990000022
Figure FDA0003547981990000022
其中,z0为原始查询的特征向量,zi为第i个结果的特征向量。Among them, z 0 is the feature vector of the original query, and zi is the feature vector of the ith result.
7.根据权利要求5所述基于对比学习的文物图像检索系统,其特征在于,所述数据库端特征增强通过对数据库中图像及与其相近图像的组合对原始图像进行替换,旨在利用图像邻域的特征来提高图像表示的质量;首先对图像特征库中的特征向量两两计算相似度,对于任一图像而言,将与其最近的K个图像特征进行相加,或者根据特征的排名对求和进行加权:7. The cultural relic image retrieval system based on contrastive learning according to claim 5, is characterized in that, described database end feature enhancement replaces original image by the combination of image in database and its adjacent image, is intended to utilize image neighborhood feature to improve the quality of image representation; first, calculate the similarity of the feature vectors in the image feature library pairwise, for any image, add its nearest K image features, or find the feature based on the ranking of the features. and weighted:
Figure FDA0003547981990000023
Figure FDA0003547981990000023
其中,r是图像特征的排名,k是考虑的相近图像总数。where r is the rank of image features and k is the total number of close images considered.
CN202210253589.0A 2022-03-15 2022-03-15 Cultural Relics Image Retrieval System Based on Contrastive Learning Active CN114610941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210253589.0A CN114610941B (en) 2022-03-15 2022-03-15 Cultural Relics Image Retrieval System Based on Contrastive Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210253589.0A CN114610941B (en) 2022-03-15 2022-03-15 Cultural Relics Image Retrieval System Based on Contrastive Learning

Publications (2)

Publication Number Publication Date
CN114610941A true CN114610941A (en) 2022-06-10
CN114610941B CN114610941B (en) 2025-01-14

Family

ID=81862722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210253589.0A Active CN114610941B (en) 2022-03-15 2022-03-15 Cultural Relics Image Retrieval System Based on Contrastive Learning

Country Status (1)

Country Link
CN (1) CN114610941B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580268A (en) * 2023-07-11 2023-08-11 腾讯科技(深圳)有限公司 Training method of image target positioning model, image processing method and related products

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
CN110851645A (en) * 2019-11-08 2020-02-28 吉林大学 A Similarity Preserving Image Retrieval Method Based on Deep Metric Learning
CN113127661A (en) * 2021-04-06 2021-07-16 中国科学院计算技术研究所 Multi-supervision medical image retrieval method and system based on cyclic query expansion
CN113743251A (en) * 2021-08-17 2021-12-03 华中科技大学 Target searching method and device based on weak supervision scene
CN113822368A (en) * 2021-09-29 2021-12-21 成都信息工程大学 Anchor-free incremental target detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
CN110851645A (en) * 2019-11-08 2020-02-28 吉林大学 A Similarity Preserving Image Retrieval Method Based on Deep Metric Learning
CN113127661A (en) * 2021-04-06 2021-07-16 中国科学院计算技术研究所 Multi-supervision medical image retrieval method and system based on cyclic query expansion
CN113743251A (en) * 2021-08-17 2021-12-03 华中科技大学 Target searching method and device based on weak supervision scene
CN113822368A (en) * 2021-09-29 2021-12-21 成都信息工程大学 Anchor-free incremental target detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈阳;周圆;: "一种基于深度学习模型的图像模糊自动分析处理算法", 小型微型计算机系统, no. 03, 15 March 2018 (2018-03-15) *
项圣凯;曹铁勇;方正;洪施展;: "使用密集弱注意力机制的图像显著性检测", 中国图象图形学报, no. 01, 16 January 2020 (2020-01-16) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580268A (en) * 2023-07-11 2023-08-11 腾讯科技(深圳)有限公司 Training method of image target positioning model, image processing method and related products
CN116580268B (en) * 2023-07-11 2023-10-03 腾讯科技(深圳)有限公司 Training method of image target positioning model, image processing method and related products

Also Published As

Publication number Publication date
CN114610941B (en) 2025-01-14

Similar Documents

Publication Publication Date Title
CN114926746B (en) SAR image change detection method based on multiscale differential feature attention mechanism
Beikmohammadi et al. SWP-LeafNET: A novel multistage approach for plant leaf identification based on deep CNN
CN107066599A (en) A kind of similar enterprise of the listed company searching classification method and system of knowledge based storehouse reasoning
CN110188225B (en) Image retrieval method based on sequencing learning and multivariate loss
CN102004786B (en) Acceleration method in image retrieval system
CN105718960A (en) Image ordering model based on convolutional neural network and spatial pyramid matching
Champ et al. A comparative study of fine-grained classification methods in the context of the LifeCLEF plant identification challenge 2015
CN113392191B (en) Text matching method and device based on multi-dimensional semantic joint learning
CN110866134B (en) A Distribution Consistency Preserving Metric Learning Method for Image Retrieval
CN109902714A (en) A Multimodal Medical Image Retrieval Method Based on Multi-Graph Regularized Deep Hashing
CN111338950A (en) Software defect feature selection method based on spectral clustering
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN107291895A (en) A kind of quick stratification document searching method
CN114676769A (en) A small-sample insect image recognition method based on visual Transformer
Ahmed et al. Prediction of COVID-19 disease severity using machine learning techniques
CN106250925A (en) A kind of zero Sample video sorting technique based on the canonical correlation analysis improved
WO2021128704A1 (en) Open set classification method based on classification utility
CN114610941A (en) Cultural relic image retrieval system based on comparison learning
CN116258938A (en) Image Retrieval and Recognition Method Based on Autonomous Evolutionary Loss
CN114090813B (en) Variable self-encoder balanced hash remote sensing image retrieval method based on multichannel feature fusion
CN103514276A (en) Graphic target retrieval positioning method based on center estimation
Ni et al. The analysis and research of clustering algorithm based on PCA
Xiang et al. Wool fabric image retrieval based on soft similarity and listwise learning
CN117576471A (en) Method and device for classifying few-sample images by introducing local feature alignment and prototype correction mechanisms
Lin et al. Multi-stage network with geometric semantic attention for two-view correspondence learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221101

Address after: 300072 Tianjin City, Nankai District Wei Jin Road No. 92

Applicant after: Tianjin University

Applicant after: Yiyuan digital (Beijing) Technology Group Co.,Ltd.

Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92

Applicant before: Tianjin University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant