CN115205554A - Retrieval method based on semantic concept extraction - Google Patents

Retrieval method based on semantic concept extraction Download PDF

Info

Publication number
CN115205554A
CN115205554A CN202210725320.8A CN202210725320A CN115205554A CN 115205554 A CN115205554 A CN 115205554A CN 202210725320 A CN202210725320 A CN 202210725320A CN 115205554 A CN115205554 A CN 115205554A
Authority
CN
China
Prior art keywords
image
semantic
features
feature
semantic concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210725320.8A
Other languages
Chinese (zh)
Inventor
赵万磊
洪义耕
雷蕴奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202210725320.8A priority Critical patent/CN115205554A/en
Publication of CN115205554A publication Critical patent/CN115205554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a retrieval method based on semantic concept extraction, which comprises the following steps: acquiring query characteristics; similarity calculation is carried out on the query features and the candidate features in the candidate feature database, and a similarity sorting result is obtained; and returning the similarity ranking result as a retrieval result. The candidate features are obtained by extracting the features of the images in the image database: extracting basic semantic elements from the image; performing semantic concept segmentation to obtain semantic concept features; and obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing. The semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and image retrieval and instance retrieval can be unified in a set of frames, and therefore, the method can be used for an instance retrieval task and an image retrieval task.

Description

Retrieval method based on semantic concept extraction
Technical Field
The invention relates to a computer vision technology and an information retrieval technology, in particular to a retrieval method based on semantic concept extraction, which can be applied to scenes such as search engines in internet companies, e-commerce, security monitoring and the like.
Background
Image retrieval and instance retrieval have been considered by many researchers as two distinct problems to solve. In overview of the main ideas of many current methods, image retrieval extracts image-level features for each image, and instance retrieval extracts instance-level features for instances in each image. Among the image-level features, the features of the local instances may be covered by the background or subject instances; in instance-level features, the centers of gravity of the feature representations are the individual visual instances in the image. Images usually contain a plurality of visual instances, and similar images mean that some or several visual instances have similarity between the images. It can be seen that the two problems have relevance. However, due to the disparity in image retrieval and instance retrieval target feature granularity, current solutions still resolve the two problems separately.
In the feature map generated by the deep convolution neural network on an image, each pixel on the feature map corresponds to a spatial region in a certain range in the original image due to the degeneration of convolution, pooling operation, and the like. Therefore, the feature vector of a certain pixel position on the feature map expresses the semantic information of the corresponding spatial range in the original image. The image retrieval method generally uses a global pooling or feature encoding method to aggregate all feature vectors in a feature map to obtain image-level features of the image; and the example retrieval method respectively aggregates the feature vectors in the local range in a feature map according to the example positioning frame and generates features for the examples appearing at the local positions. However, the high-level semantic concepts of an image may not be spatially continuous, which may be co-expressed by the semantics of several scattered parts of the image. Therefore, neither current image-level features nor current instance-level features can express high-level semantic concepts in an image, and conversely, these semantic concept features can express features at the image level and the instance level.
In the feature map of the convolutional neural network, features at different spatial positions contain specific semantic information, spatial features which jointly express a certain semantic concept are related to each other, and if the related features can be aggregated, the semantic concept features in the image can be extracted. However, when the semantic concept features in the image are aggregated, not all the regions in the image contribute, and interference of non-main part information needs to be eliminated in the process of aggregating the semantic features. Furthermore, of all semantic concepts extracted in an image, there may be some non-primary semantic concepts that are not important in the dataset, and these non-primary semantic concept features stored in the database may take up a lot of space and consume additional comparisons thereby reducing retrieval speed.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a retrieval method based on semantic concept extraction, which can realize image retrieval and instance retrieval in a unified framework.
In order to achieve the purpose, the invention adopts the technical scheme that:
a search method based on semantic concept extraction comprises
Acquiring query characteristics;
similarity calculation is carried out on the query features and the candidate features in the candidate feature database to obtain similarity ranking results;
returning the similarity sorting result as a retrieval result;
the candidate features in the candidate feature library are obtained by extracting the features of the images in the image database; the feature extraction method comprises the following steps:
extracting basic semantic elements from the image;
performing semantic concept cutting:
constructing an undirected graph G for basic semantic elements image Undirected graph G image The weight of the edge between two nodes is defined as:
Figure BDA0003710809790000031
wherein cos (v) i ,v j ) Is v i And v j Cosine similarity therebetween;
cutting connected components in an undirected graph, wherein each connected component comprises similar basic semantic elements; calculating the average characteristic of the nodes in each connected component to aggregate the characteristics of the basic semantic elements to obtain semantic concept characteristics;
and obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing.
The basic semantic elements extracted from the image are specifically as follows:
inputting the image into a convolution neural network to obtain an output H multiplied by W multiplied by C dimensional feature map X;
averaging the characteristic diagram X on the channel dimension C to obtain an H multiplied by W dimension average activation diagram
Figure BDA0003710809790000032
Activation map on average using NxN windows
Figure BDA0003710809790000041
Searching to obtain a peak point set;
for average activation map
Figure BDA0003710809790000042
Carrying out backward propagation calculation layer by layer from back to front through a contribution probability formula until an input image is obtained, and obtaining a contribution probability graph M with the same scale of each peak point and an original image;
the spatial position information of the basic semantic elements is estimated on the contribution probability map M in the form of a rectangular box.
The contribution probability formula is as follows:
Figure BDA0003710809790000043
wherein, P (I) i,j ) For a certain pixel position I in the input characteristic diagram i,j For a certain pixel position O in the output characteristic diagram p,q The probability of contribution of (c); conditional probability P (I) i,j |O p,q ) Is defined as:
Figure BDA0003710809790000044
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003710809790000045
a bottom-up activation value, Z, computed for I by forward propagation over spatial location (I, j) p,q For regularization term, to ensure ∑ p,q P(I i,j |O p,q )=1。
Before estimating the spatial position of the basic semantic element, the contribution probability map M is processed as follows:
normalizing the values on the contribution probability map M to a range of [0,1];
setting the threshold τ a Filtering pixel points which do not contribute to the peak point;
the activation region on the contribution probability map M is estimated as an ellipse whose parameters are defined by a value greater than τ in the contribution probability map M a Modeling the image second moment by pixel location of (a):
Figure BDA0003710809790000051
the rectangular frames corresponding to the basic semantic elements are obtained by the external rectangles of the ellipses, and the features of the basic semantic elements are obtained by performing average pooling on the feature map through the rectangular frames corresponding to each basic semantic element.
The undirected graph G image Expressed as:
G image =<V image ,E image >,
wherein, V image The nodes are undirected graphs and are formed by basic semantic element characteristics
Figure BDA0003710809790000052
Figure BDA0003710809790000053
E image Is a collection of edges, denoted as
Figure BDA0003710809790000054
After semantic concept cutting is carried out to obtain semantic concept features, non-semantic concept removing is carried out; the method comprises the following specific steps:
first, a dataset-level undirected graph G is built dataset =<V dataset ,E dataset >. V dataset For nodes in a dataset-level undirected graph, for semantic concept features
Figure BDA0003710809790000055
E dataset Is a collection of edges, represented as
Figure BDA0003710809790000056
G dataset The edge weight in (1) is defined as: if the similarity between two nodes is larger than the threshold value tau c If the weight between the two nodes is 1, otherwise, the weight is 0;
then, using the dataset-level undirected graph G dataset The degree centrality of the middle node is used for measuring the importance degree of the node; the degree of centrality is defined as the degree of a node in an undirected graph G dataset The number of edges in each node; the more central a node is, the more nodes it is connected to, so in the undirected graph G dataset The more important the intermediate is; a node v i Degree D of i Calculated by the following formula:
Figure BDA0003710809790000057
Figure BDA0003710809790000061
finally, the center of the degree of falling is eliminatedSexual score below threshold τ d The non-main semantic concept of (2) and retains meaningful semantic concept characteristics.
The query feature is an image query feature or an instance query feature.
When the query feature is an image query feature, the query feature is obtained as follows:
inputting a query image, extracting a feature map through a convolutional neural network, and performing global pooling on the feature map to obtain query image features; or performing semantic concept segmentation on the feature graph to obtain a plurality of semantic concept features, and selecting one of the semantic concept features as an image query feature.
When the query feature is an example query feature, the query feature is obtained as follows:
inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image; extracting a characteristic graph from the example image through a convolutional neural network; and performing global pooling on the feature map to obtain example query features.
After the scheme is adopted, the semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and the image retrieval and the instance retrieval can be unified in a set of frames, so that the method can be used for an instance retrieval task and an image retrieval task.
In particular, in the semantic concept segmentation of the proposed method, the construction of the undirected graph is based on elementary semantic elements and depends on whether the similarity between the elementary semantic elements is higher than a threshold τ b To determine whether there is an edge connection between them, so that the threshold τ b The level of (2) affects the node connectivity degree in the undirected graph. Subsequently, the semantic conceptual features are obtained by fusing basic semantic element features in the connected components in the undirected graph. Therefore, as the degree of connectivity of the nodes in the undirected graph increases, the more nodes are included in the connected components in the graph, and the semantic concept features generated by the connected components approach the image-level features. Conversely, if the node connectivity degree in the undirected graph is lower, the connectivity is carried outThe closer the semantic conceptual features generated by the components are to the instance-level features. In conclusion, the semantic concept features can give consideration to both the task of instance retrieval and the task of image retrieval.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a flow chart of image query feature acquisition;
FIG. 3 is a flow diagram of an example query feature acquisition.
Detailed Description
As shown in fig. 1, the present invention discloses a search method based on semantic concept extraction, which comprises the following steps:
step 1, acquiring query characteristics;
the query feature is an image query feature or an instance query feature.
As shown in fig. 2, when the query feature is an image query feature, the query feature is obtained as follows:
inputting a query image, extracting a feature map through a convolutional neural network, and performing global pooling on the feature map to obtain query image features.
Or performing semantic concept segmentation on the feature graph to obtain a plurality of semantic concept features, and selecting one of the semantic concept features as an image query feature. The semantic concept cut here is the same as that set forth below, see in particular below.
As shown in fig. 3, when the query feature is an example query feature, the query feature is obtained as follows:
inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image;
extracting a characteristic graph from the example image through a convolutional neural network;
and performing global pooling on the feature map to obtain example query features.
And 2, carrying out similarity calculation on the query features and the candidate features in the candidate feature database to obtain a similarity ranking result.
And 3, returning the similarity sorting result as a retrieval result.
In step 2, the candidate features in the candidate feature library are obtained by extracting features from the images in the image database. The feature extraction method specifically comprises the following steps:
s1, extracting basic semantic elements
Specifically, for a feature map X with dimensions of H × W × C output by the convolutional neural network, an average activation map with dimensions of H × W can be obtained after averaging X on the dimension C of a channel
Figure BDA0003710809790000081
Since the high-level convolution kernels of different channels in the convolutional neural network usually represent different semantics, the value at each pixel position on the average activation map represents the high-level semantic response sum in the original image space range. Since the main part of the image generally has richer semantic information than the non-main part, a higher response value is presented in the average activation map. The set of peak points found on the average activation map using a 3 x 3 window then summarizes the position of the main part of the image, i.e. the position of the elementary semantic elements.
Since the feature map is obtained by down-sampling the original image through a plurality of pooling layers, the peak point on the feature map actually corresponds to a local area in the original image. Therefore, the features of each basic semantic element can be obtained more accurately by finding the local area corresponding to the peak point in the original image and then performing feature pooling.
For a convolutional layer in a convolutional neural network, for the sake of brevity, note H therein f ×W f The convolution kernel of dimension is F, where H f And W f Is the size of the convolution kernel. The input and output profiles of the convolutional layer are denoted as I and O, respectively. In the forward propagation process, the value I of the (I, j) position on the feature map is input i,j And the value O of the (p, q) position on the output characteristic diagram p,q By weight F of (i-p, j-q) on the convolution kernel i-p,j-q Are linked together. Then, the output characteristic map O is calculated by the following equation:
Figure BDA0003710809790000091
where b represents an offset value in the convolutional layer, and σ represents a nonlinear activation function after convolutional layer.
Then, a certain pixel position I in the characteristic diagram is input i,j For a certain pixel position O in the output characteristic diagram p,q The contribution probability of (c) can be expressed by:
Figure BDA0003710809790000092
wherein, the conditional probability P (I) ij |O pq ) Is defined as:
Figure BDA0003710809790000093
wherein the content of the first and second substances,
Figure BDA0003710809790000094
for a bottom-up activation value, Z, computed by forward propagation for I at spatial location (I, j) p,q For regularization term, to ensure ∑ p,q P(I i,j |O p,q )=1。
Equation (2) gives a way to calculate the probability that each position in the input feature map contributes to a position on the output feature map. Thus, for the average activation map
Figure BDA0003710809790000095
Each peak point can be calculated by back propagation layer by layer from back to front through the formula until the input image, and finally a contribution probability map M with the same scale of each peak point and the original image can be obtained.
Next, the spatial position of the basic semantic element needs to be estimated in the form of a rectangular box on the contribution probability map M. First, the values on M are normalized to a range of [0,1]]. Since the starting point of the back propagation is output by the last convolutional layerThe feature map has the largest receptive field, so when back propagation is performed to the input image, a large range of pixel positions are activated, and even pixels that contribute little to the peak point have extremely low activation values. Thus, by setting a threshold τ a To filter out pixel points that contribute little to the peak point, in this method τ a Set to 0.1. Subsequently, the activation region at M is estimated as an elliptical shape with the parameters greater than τ of all M a Modeling the image second moment by pixel location of (a):
Figure BDA0003710809790000101
the rectangular frame of each basic semantic element activation region is obtained by a circumscribed rectangle of an ellipse. And finally, performing average pooling on the feature map through a rectangular box corresponding to each basic semantic element to obtain the features of the basic semantic elements.
S2, semantic concept cutting
After extracting basic semantic elements on an image, constructing an undirected graph G for the basic semantic elements image =<V image ,E image >,
Wherein, V image Is a node in the graph and is characterized by basic semantic elements
Figure BDA0003710809790000102
Figure BDA0003710809790000103
Forming; e image Is a collection of edges, represented as
Figure BDA0003710809790000104
G image The weight of the edge between two nodes is defined as:
Figure BDA0003710809790000105
in formula (5), cos (v) i ,v j ) Is v is i And v j Cosine similarity between them. Thus, similar basic semantic elements in the graph are all connected by edges. Then, connected components are cut out of the graph by a breadth first search algorithm. Specifically, each node is initially marked as not accessed and then is selected from a certain node v that is not accessed i Initially, a new connected component is created, then all of its neighboring nodes that have not been accessed are accessed, and these neighboring nodes are added to the connected component and marked as accessed. And after all the nodes are visited, adding the generated connected components into the connected component set S, and repeating the process from the next node which is not visited until all the nodes are visited. Therefore, each connected component finally contains similar basic semantic elements, and the basic semantic elements together describe a certain semantic concept in the image. Then, aggregating the features of the basic semantic elements by calculating the average features of the nodes in each connected component to obtain the features of the semantic concepts:
Figure BDA0003710809790000111
wherein S is i,j Is a connected component S i N is a connected component S i Number of basic semantic elements in the table.
S3, removing non-main semantic concepts
The semantic concept extraction process is performed on an image, and besides the main semantics (such as a main body instance) in the image, the extracted semantic concepts also have non-main semantics (such as a disordered background and an irrelevant instance). The existence of these non-main semantic conceptual features not only increases the storage burden and slows down the retrieval speed, but also affects the significance of the main semantic features in the data set, thereby affecting the retrieval performance. However, it is difficult to distinguish non-main semantics only on one image, because in the extraction process of the semantic concepts which are not related to the above categories, any semantic concept extracted on one image is not classified, and thus it cannot be determined which semantic concepts should be retained or removed. However, when the hierarchy of the image data set is increased, if a semantic concept is dominant in the data set, the semantic concept inevitably occurs frequently in the data set; conversely, a semantic concept may be a non-primary semantic if it occurs a very small number of times in the dataset. The non-main semantic removing steps are as follows:
first, a dataset-level undirected graph G is built dataset =<V dataset ,E dataset >In which V is dataset Is a node in the graph, is a node in the graph for the semantic concept characteristic, and is a semantic concept characteristic
Figure BDA0003710809790000121
Figure BDA0003710809790000122
E dataset Is a collection of edges, represented as
Figure BDA0003710809790000123
G dataset The edge weight and G in image Is similar to the definition in (1), if the similarity between two nodes is greater than the threshold τ c The weight between two nodes is 1, otherwise it is 0.
Then, the degree of importance of the nodes in the graph is measured by the degree centrality of the nodes in the graph. The centrality is defined as the degree of a node, and is the number of edges per node in the undirected graph. The more central a node is, the more nodes it is connected to, and therefore the more important it is in the graph. Thus, a node v i Degree D of i Calculated by the following formula:
Figure BDA0003710809790000124
finally, the elimination of the centrality score of the degree of obliteration being below the threshold τ d Non-primary semantic concepts of (1), retentionMeaningful semantic concept features complete the removal of non-primary semantic concepts.
And S4, obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing.
The semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and image retrieval and instance retrieval can be unified in a set of frames, and therefore, the method can be used for an instance retrieval task and an image retrieval task. In particular, in the semantic concept segmentation of the proposed method, the construction of the undirected graph is based on elementary semantic elements and depends on whether the similarity between the elementary semantic elements is higher than a threshold τ b To determine whether there is an edge connection between them, so that the threshold τ b The level of (2) affects the node connectivity degree in the undirected graph. Then, the semantic conceptual features are obtained by fusing basic semantic element features in the connected components in the undirected graph. Therefore, as the degree of connectivity of the nodes in the undirected graph increases, the more nodes are included in the connected components in the graph, and the semantic concept features generated by the connected components approach the image-level features. On the contrary, if the node connectivity degree in the undirected graph is lower, the semantic concept features generated by the connected components are closer to the instance level features. In conclusion, the semantic concept features can give consideration to both the task of instance retrieval and the task of image retrieval.
On the task of instance retrieval, since the retrieval target is an instance in an image, the target feature tends to be at a local spatial level. Thus, the threshold τ is used to extract semantic conceptual features on the instance search dataset b Will be set at a higher value so that the generated semantic conceptual features are more inclined to the instance level. When the query requirement of instance retrieval comes, the method uses the same feature extraction network to extract the feature graph of the query image on the same output layer, and then uses the query instance frame to perform average region pooling on the feature graph of the query image to obtain the query feature. Then, the query features are subjected to a similarity ranking result of all semantic concept features in the candidate database through a nearest neighbor search algorithm, andthe ranking of each image is the highest ranking of the semantic conceptual features in the image. And finally, returning the sequencing result of the images as a retrieval result.
On the image retrieval task, since the retrieval target is the whole image, the target feature tends to be at the global image level. Thus, unlike on the instance retrieval task, the threshold τ is used when extracting semantic conceptual features on the image retrieval dataset b Will be set at a lower value so that the generated semantic conceptual features are more inclined at the image level. When the query requirement of image retrieval comes, extracting the feature map of the query image on the same output layer of the same feature extraction network, and then performing average pooling on the feature map on the spatial dimension to obtain the query feature. The subsequent characteristic retrieval step is the same as that on the example retrieval task, and the retrieval sequencing result of the images is obtained according to the images to which each semantic concept characteristic belongs.
In conclusion, the invention can lead the extracted features to describe global and local image semantic information by mining semantic concepts on the feature map, thereby being capable of unifying image retrieval and example retrieval in a set of frames. That is, when searching based on the method of the present invention, developers only need to develop a set of system to apply it to image searching and instance searching, specifically, only need to adjust the threshold τ b Can be used, thereby greatly reducing the development cost. Furthermore, the present invention can achieve excellent search performance in both of a plurality of case searches and image search data sets.
In order to explain the effects of the present invention, the search method of the present invention is compared with the conventional search method. The method comprises the following specific steps:
table 1 shows the retrieval accuracy of the method of the present invention on the Instance-335 dataset as compared to R-MAC, croW, CAM, BLCF, BLCF-SalGAN, regional Activity, deepVision, FCIS + XD, PCL + SPN, DASR.
Method Front 50 Front 100 All are provided with
R-MAC 23.4 31.5 37.5
CroW 15.9 22.5 32.1
CAM 19.4 26.3 34.7
BLCF 24.6 35.8 48.3
BLCF-SalGAN 24.5 35.0 46.9
Regional Attention 24.2 35.1 48.8
DeepVision 40.2 52.1 62.0
FCIS+XD 40.3 50.0 59.3
PCL*+SPN 38.5 47.9 57.9
DASR 41.9 55.8 69.9
Method of the invention 43.6 57.4 72.1
TABLE 1
Table 2 shows the comparison of the search accuracy of the method of the present invention on INSTRE dataset with R-MAC, croW, CAM, BLCF, BLCF-SalGAN, regional Attention, deepVision, FCIS + XD, PCL + SPN, DASR.
Method Accuracy of search
R-MAC 52.3
CroW 41.6
CAM 32.0
BLCF 63.6
BLCF-SalGAN 69.8
Regional Attention 54.2
DeepVision 19.7
FCIS+XD 6.7
PCL*+SPN 56.9
DASR 62.9
Method of the invention 69.7
TABLE 2
Table 3 shows the comparison of the search accuracy of the method of the present invention on the data sets of Holidays, oxford5k and Paris6k with SIFT + VLAD, boVW + HE, neurocodes, R-MAC, croW, BLCF, BLCF-SalGAN, CAM, regional Attention, deepVision and DASR + VLAD.
Method Holidays Oxford5k Paris6k
SIFT+VLAD 66.4 35.9 39.1
BoVW+HE 74.2 50.3 50.1
NeuralCodes 74.9 43.5 -
R-MAC - 66.9 83.0
CroW 85.1 70.8 79.7
BLCF 85.4 72.2 79.8
BLCF-SalGAN 83.5 74.6 81.2
CAM 78.5 71.2 80.5
Regional Attention - 76.8 87.5
DeepVision - 71.0 79.8
DASR+VLAD 83.4 59.4 69.0
Algorithm of the invention 91.9 73.5 84.2
TABLE 3
In tables 1-3 above, the corresponding method for R-MAC is the method proposed by Tolias G et al (Tolias G, sicre R, J tegou H. Particulate object retrieval with integral max-firing of CNN activities [ J ]. ArXiv preprinting arXiv:1511.05879, 2015.);
a method corresponding to CroW is a method proposed by Kalantididis Y et al (Kalantididis Y, mellina C, osindeno S.Cross-dimensional weighting for aligned controlled volumetric delivery of reactive feedstocks [ C ]// European conference on computer vision. Springer, 2016;
the CAM corresponds to the method proposed by Jimenez A et al (Jimenez A, alvarez J M, gir Lo-i-Nieto X. Class weighted volumetric effects for visual effect search [ C ]// 2017);
the corresponding method between BLCF and BLCF-SalGAN is a method proposed by Mohedano E et al (Mohedano E, mcGuinness K, gir Lou-i Nieto X, et al. Saliency weighted contextual basic concepts for instance search [ C ]//2018international conference on content-based multimedia indexing (CBMI). IEEE, 2018;
the corresponding method of Regional authorization is the method proposed by Kim J et al (Kim J, yoon S E.Regial authentication based deep feature for image retrieval. [ C ]// BMVC.2018: 209);
the method corresponding to DeepVision is the method proposed by Salvador A et al (Salvador A, gir Lou-i Nieto X, marqu es F, et al. Fast r-cnn features for instance search [ C ]// Proceedings of the IEEE conference on computer vision and conference research works 2016: 9-16);
the method corresponding to FCIS + XD is a method proposed by Zhan Y et al (Zhan Y, zhao W L. Instant search vision instance level determination and feature presentation [ J ]. Journal of Visual Communication and Image reproduction, 2021, 79);
PCL + SPN is a method proposed by Lin J et al (Lin J, zhan Y, zhao W l. Instant search based on weather super featured learning [ J ] neuro-rendering, 2021, 424;
the method corresponding to DASR and DASR + VLAD is the method proposed by Xiao H C et al (Xiao H C, ZHao W L, lin J, et al. Deep activated localized region for instance search [ J ]. ACM Trans. Multimedia Comp. Commun. Appl. 2021);
the method corresponding to SIFT + VLAD and BoVW + HE is a method proposed by Zhao W L et al (Zhao W L, ngo C W, wang H. Fast covariant VLAD for image search [ J ]. IEEE Transactions on Multimedia,2016,18 (9): 1843-1854);
the corresponding method of neuronal codes is the method proposed by Babenko A et al (Babenko A, slesarev A, chigorin A, et al. Neuronal codes for image retrieval [ C ]// European conference on computer vision. Springer,2014 584-599.).
As can be seen from tables 1,2 and 3, the method of the present invention achieves the best results on two example search datasets and three image search datasets. In addition, the algorithm only depends on a pre-trained convolutional neural network, fine tuning training is carried out without additional manual labeling data, and semantic concept features of any category can be extracted.
The above description is only an example of the present invention, and does not limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above embodiment according to the technical essence of the present invention are within the technical scope of the present invention.

Claims (10)

1. A search method based on semantic concept extraction is characterized in that: the method comprises the following steps:
acquiring query characteristics;
similarity calculation is carried out on the query features and the candidate features in the candidate feature database to obtain similarity ranking results;
returning the similarity sorting result as a retrieval result;
the candidate features in the candidate feature library are obtained by extracting the features of the images in the image database; the feature extraction method comprises the following steps:
extracting basic semantic elements from the image;
performing semantic concept segmentation:
constructing an undirected graph G for basic semantic elements image Undirected graph G image The weight of the edge between two nodes is defined as:
Figure FDA0003710809780000011
wherein cos (v) i ,v j ) Is v is i And v j Cosine similarity therebetween;
cutting connected components in the undirected graph, wherein each connected component comprises similar basic semantic elements; calculating the average characteristic of the nodes in each connected component to aggregate the characteristics of the basic semantic elements to obtain semantic concept characteristics;
and obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing.
2. The search method based on semantic concept extraction as claimed in claim 1, wherein: the basic semantic elements extracted from the image are specifically as follows:
inputting the image into a convolutional neural network to obtain an output H multiplied by W multiplied by C dimensional feature map X;
averaging the feature map X on the channel dimension C to obtain an H multiplied by W dimension average activation map
Figure FDA0003710809780000024
Activation map on average using NxN windows
Figure FDA0003710809780000025
Searching to obtain a peak point set;
for average activation map
Figure FDA0003710809780000026
Each peak point in the above is processed from back to front by a contribution probability formulaPerforming backward propagation calculation layer by layer until an input image is obtained, and obtaining a contribution probability graph M with each peak point and the original image in the same scale;
the spatial position information of the basic semantic elements is estimated on the contribution probability map M in the form of a rectangular box.
3. The search method based on semantic concept extraction as claimed in claim 2, wherein: the contribution probability formula is as follows:
Figure FDA0003710809780000021
wherein, P (I) i,j ) For a certain pixel position I in the input characteristic diagram i,j For a certain pixel position O in the output characteristic diagram p,q The probability of contribution of (c); conditional probability P (I) i,j |O p,q ) Is defined as follows:
Figure FDA0003710809780000022
wherein the content of the first and second substances,
Figure FDA0003710809780000023
for a bottom-up activation value, Z, computed by forward propagation for I at spatial location (I, j) p,q For regularization term, to ensure ∑ p,q P(I i,j |O p,q )=1。
4. The search method based on semantic concept extraction as claimed in claim 2, wherein: before estimating the spatial position of the basic semantic element, the contribution probability map M is processed as follows:
normalizing the values on the contribution probability map M to a range of [0,1];
setting the threshold τ a Filtering pixel points which do not contribute to the peak point;
will contribute to the activation region on the probability map MThe field is estimated as an elliptical shape, the parameters of which are represented by the contribution probability map M larger than tau a The pixel location modeling image second moment of (a) yields:
Figure FDA0003710809780000031
5. the search method based on semantic concept extraction as claimed in claim 4, wherein: the rectangular frames corresponding to the basic semantic elements are obtained by the external rectangles of the ellipses, and the features of the basic semantic elements are obtained by performing average pooling on the feature map through the rectangular frames corresponding to each basic semantic element.
6. The search method based on semantic concept extraction as claimed in claim 1, wherein: the undirected graph G image Expressed as:
G image =<V image ,E image >,
wherein, V image The nodes are undirected graphs and are formed by basic semantic element characteristics
Figure FDA0003710809780000032
Figure FDA0003710809780000033
E image Is a collection of edges, represented as
Figure FDA0003710809780000034
7. The search method based on semantic concept extraction as claimed in claim 1, wherein: after semantic concept cutting is carried out to obtain semantic concept features, non-semantic concept removing is carried out; the method comprises the following specific steps:
first, a dataset-level undirected graph G is built dataset =<V dataset ,E dataset >. V dataset For nodes in a dataset-level undirected graph, for semantic concept features
Figure FDA0003710809780000041
E dataset Is a collection of edges, denoted as
Figure FDA0003710809780000042
G dataset The edge weight in (1) is defined as: if the similarity between two nodes is larger than the threshold value tau c If the weight between the two nodes is 1, otherwise, the weight is 0;
then, using the dataset-level undirected graph G dataset The degree centrality of the middle node is used for measuring the importance degree of the node; the degree of centrality is defined as the degree of a node in an undirected graph G dataset The number of edges in each node; the more central a node is, the more nodes it is connected to, so in the undirected graph G dataset The more important the intermediate is; a node v i Degree D of i Calculated by the following formula:
Figure FDA0003710809780000043
Figure FDA0003710809780000044
finally, the elimination of the centrality score of the degree of obliteration being below the threshold τ d The non-main semantic concepts of (1) and the meaningful semantic concept characteristics are reserved.
8. The search method based on semantic concept extraction as claimed in claim 1, wherein: the query feature is an image query feature or an instance query feature.
9. The search method based on semantic concept extraction as claimed in claim 8, wherein: when the query feature is an image query feature, the query feature is obtained as follows:
inputting a query image, extracting a feature map through a convolutional neural network, and performing global pooling on the feature map to obtain query image features; or performing semantic concept segmentation on the feature graph to obtain a plurality of semantic concept features, and selecting one of the semantic concept features as an image query feature.
10. The search method based on semantic concept extraction as claimed in claim 8, wherein: when the query feature is an example query feature, the query feature is obtained as follows:
inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image; extracting a feature map from the example image through a convolutional neural network; and performing global pooling on the feature map to obtain example query features.
CN202210725320.8A 2022-06-23 2022-06-23 Retrieval method based on semantic concept extraction Pending CN115205554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210725320.8A CN115205554A (en) 2022-06-23 2022-06-23 Retrieval method based on semantic concept extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210725320.8A CN115205554A (en) 2022-06-23 2022-06-23 Retrieval method based on semantic concept extraction

Publications (1)

Publication Number Publication Date
CN115205554A true CN115205554A (en) 2022-10-18

Family

ID=83578013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210725320.8A Pending CN115205554A (en) 2022-06-23 2022-06-23 Retrieval method based on semantic concept extraction

Country Status (1)

Country Link
CN (1) CN115205554A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453944A (en) * 2023-12-25 2024-01-26 厦门大学 Multi-level significant region decomposition unsupervised instance retrieval method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453944A (en) * 2023-12-25 2024-01-26 厦门大学 Multi-level significant region decomposition unsupervised instance retrieval method and system
CN117453944B (en) * 2023-12-25 2024-04-09 厦门大学 Multi-level significant region decomposition unsupervised instance retrieval method and system

Similar Documents

Publication Publication Date Title
CN107679250B (en) Multi-task layered image retrieval method based on deep self-coding convolutional neural network
Xiao et al. A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective
dos Santos et al. A relevance feedback method based on genetic programming for classification of remote sensing images
CN113378632A (en) Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN107633226B (en) Human body motion tracking feature processing method
CN107239565B (en) Image retrieval method based on saliency region
Zhu et al. A multisize superpixel approach for salient object detection based on multivariate normal distribution estimation
US11816149B2 (en) Electronic device and control method thereof
WO2021088365A1 (en) Method and apparatus for determining neural network
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
Wang et al. Aspect-ratio-preserving multi-patch image aesthetics score prediction
CN107563406B (en) Image fine classification method for autonomous learning
CN111539444A (en) Gaussian mixture model method for modified mode recognition and statistical modeling
Chen et al. A saliency map fusion method based on weighted DS evidence theory
Ma et al. The BYY annealing learning algorithm for Gaussian mixture with automated model selection
CN110347853B (en) Image hash code generation method based on recurrent neural network
CN115205554A (en) Retrieval method based on semantic concept extraction
Sreeja et al. A unified model for egocentric video summarization: an instance-based approach
Meng et al. Concept-concept association information integration and multi-model collaboration for multimedia semantic concept detection
CN108717436B (en) Commodity target rapid retrieval method based on significance detection
Moradi et al. A salient object segmentation framework using diffusion-based affinity learning
Turtinen et al. Contextual analysis of textured scene images.
CN115019342A (en) Endangered animal target detection method based on class relation reasoning
Li et al. Feature proposal model on multidimensional data clustering and its application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination