CN115205554A

CN115205554A - Retrieval method based on semantic concept extraction

Info

Publication number: CN115205554A
Application number: CN202210725320.8A
Authority: CN
Inventors: 赵万磊; 洪义耕; 雷蕴奇
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-10-18

Abstract

The invention discloses a retrieval method based on semantic concept extraction, which comprises the following steps: acquiring query characteristics; similarity calculation is carried out on the query features and the candidate features in the candidate feature database, and a similarity sorting result is obtained; and returning the similarity ranking result as a retrieval result. The candidate features are obtained by extracting the features of the images in the image database: extracting basic semantic elements from the image; performing semantic concept segmentation to obtain semantic concept features; and obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing. The semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and image retrieval and instance retrieval can be unified in a set of frames, and therefore, the method can be used for an instance retrieval task and an image retrieval task.

Description

Retrieval method based on semantic concept extraction

Technical Field

The invention relates to a computer vision technology and an information retrieval technology, in particular to a retrieval method based on semantic concept extraction, which can be applied to scenes such as search engines in internet companies, e-commerce, security monitoring and the like.

Background

Image retrieval and instance retrieval have been considered by many researchers as two distinct problems to solve. In overview of the main ideas of many current methods, image retrieval extracts image-level features for each image, and instance retrieval extracts instance-level features for instances in each image. Among the image-level features, the features of the local instances may be covered by the background or subject instances; in instance-level features, the centers of gravity of the feature representations are the individual visual instances in the image. Images usually contain a plurality of visual instances, and similar images mean that some or several visual instances have similarity between the images. It can be seen that the two problems have relevance. However, due to the disparity in image retrieval and instance retrieval target feature granularity, current solutions still resolve the two problems separately.

In the feature map generated by the deep convolution neural network on an image, each pixel on the feature map corresponds to a spatial region in a certain range in the original image due to the degeneration of convolution, pooling operation, and the like. Therefore, the feature vector of a certain pixel position on the feature map expresses the semantic information of the corresponding spatial range in the original image. The image retrieval method generally uses a global pooling or feature encoding method to aggregate all feature vectors in a feature map to obtain image-level features of the image; and the example retrieval method respectively aggregates the feature vectors in the local range in a feature map according to the example positioning frame and generates features for the examples appearing at the local positions. However, the high-level semantic concepts of an image may not be spatially continuous, which may be co-expressed by the semantics of several scattered parts of the image. Therefore, neither current image-level features nor current instance-level features can express high-level semantic concepts in an image, and conversely, these semantic concept features can express features at the image level and the instance level.

In the feature map of the convolutional neural network, features at different spatial positions contain specific semantic information, spatial features which jointly express a certain semantic concept are related to each other, and if the related features can be aggregated, the semantic concept features in the image can be extracted. However, when the semantic concept features in the image are aggregated, not all the regions in the image contribute, and interference of non-main part information needs to be eliminated in the process of aggregating the semantic features. Furthermore, of all semantic concepts extracted in an image, there may be some non-primary semantic concepts that are not important in the dataset, and these non-primary semantic concept features stored in the database may take up a lot of space and consume additional comparisons thereby reducing retrieval speed.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a retrieval method based on semantic concept extraction, which can realize image retrieval and instance retrieval in a unified framework.

In order to achieve the purpose, the invention adopts the technical scheme that:

a search method based on semantic concept extraction comprises

Acquiring query characteristics;

similarity calculation is carried out on the query features and the candidate features in the candidate feature database to obtain similarity ranking results;

returning the similarity sorting result as a retrieval result;

the candidate features in the candidate feature library are obtained by extracting the features of the images in the image database; the feature extraction method comprises the following steps:

extracting basic semantic elements from the image;

performing semantic concept cutting:

constructing an undirected graph G for basic semantic elements ^image Undirected graph G ^image The weight of the edge between two nodes is defined as:

wherein cos (v) _i ,v _j ) Is v _i And v _j Cosine similarity therebetween;

cutting connected components in an undirected graph, wherein each connected component comprises similar basic semantic elements; calculating the average characteristic of the nodes in each connected component to aggregate the characteristics of the basic semantic elements to obtain semantic concept characteristics;

and obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing.

The basic semantic elements extracted from the image are specifically as follows:

inputting the image into a convolution neural network to obtain an output H multiplied by W multiplied by C dimensional feature map X;

averaging the characteristic diagram X on the channel dimension C to obtain an H multiplied by W dimension average activation diagram

Activation map on average using NxN windows

Searching to obtain a peak point set;

for average activation map

Carrying out backward propagation calculation layer by layer from back to front through a contribution probability formula until an input image is obtained, and obtaining a contribution probability graph M with the same scale of each peak point and an original image;

the spatial position information of the basic semantic elements is estimated on the contribution probability map M in the form of a rectangular box.

The contribution probability formula is as follows:

wherein, P (I) _i,j ) For a certain pixel position I in the input characteristic diagram _i,j For a certain pixel position O in the output characteristic diagram _p,q The probability of contribution of (c); conditional probability P (I) _i,j |O _p,q ) Is defined as:

wherein, the first and the second end of the pipe are connected with each other,

a bottom-up activation value, Z, computed for I by forward propagation over spatial location (I, j) _p,q For regularization term, to ensure ∑ _p,q P(I _i,j |O _p,q )＝1。

Before estimating the spatial position of the basic semantic element, the contribution probability map M is processed as follows:

normalizing the values on the contribution probability map M to a range of [0,1];

setting the threshold τ _a Filtering pixel points which do not contribute to the peak point;

the activation region on the contribution probability map M is estimated as an ellipse whose parameters are defined by a value greater than τ in the contribution probability map M _a Modeling the image second moment by pixel location of (a):

the rectangular frames corresponding to the basic semantic elements are obtained by the external rectangles of the ellipses, and the features of the basic semantic elements are obtained by performing average pooling on the feature map through the rectangular frames corresponding to each basic semantic element.

The undirected graph G ^image Expressed as:

G ^image ＝＜V ^image ,E ^image ＞，

wherein, V ^image The nodes are undirected graphs and are formed by basic semantic element characteristics

E ^image Is a collection of edges, denoted as

After semantic concept cutting is carried out to obtain semantic concept features, non-semantic concept removing is carried out; the method comprises the following specific steps:

first, a dataset-level undirected graph G is built ^dataset ＝＜V ^dataset ,E ^dataset >. V ^dataset For nodes in a dataset-level undirected graph, for semantic concept features

E ^dataset Is a collection of edges, represented as

G ^dataset The edge weight in (1) is defined as: if the similarity between two nodes is larger than the threshold value tau _c If the weight between the two nodes is 1, otherwise, the weight is 0;

then, using the dataset-level undirected graph G ^dataset The degree centrality of the middle node is used for measuring the importance degree of the node; the degree of centrality is defined as the degree of a node in an undirected graph G ^dataset The number of edges in each node; the more central a node is, the more nodes it is connected to, so in the undirected graph G ^dataset The more important the intermediate is; a node v _i Degree D of _i Calculated by the following formula:

finally, the center of the degree of falling is eliminatedSexual score below threshold τ _d The non-main semantic concept of (2) and retains meaningful semantic concept characteristics.

The query feature is an image query feature or an instance query feature.

When the query feature is an image query feature, the query feature is obtained as follows:

inputting a query image, extracting a feature map through a convolutional neural network, and performing global pooling on the feature map to obtain query image features; or performing semantic concept segmentation on the feature graph to obtain a plurality of semantic concept features, and selecting one of the semantic concept features as an image query feature.

When the query feature is an example query feature, the query feature is obtained as follows:

inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image; extracting a characteristic graph from the example image through a convolutional neural network; and performing global pooling on the feature map to obtain example query features.

After the scheme is adopted, the semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and the image retrieval and the instance retrieval can be unified in a set of frames, so that the method can be used for an instance retrieval task and an image retrieval task.

In particular, in the semantic concept segmentation of the proposed method, the construction of the undirected graph is based on elementary semantic elements and depends on whether the similarity between the elementary semantic elements is higher than a threshold τ _b To determine whether there is an edge connection between them, so that the threshold τ _b The level of (2) affects the node connectivity degree in the undirected graph. Subsequently, the semantic conceptual features are obtained by fusing basic semantic element features in the connected components in the undirected graph. Therefore, as the degree of connectivity of the nodes in the undirected graph increases, the more nodes are included in the connected components in the graph, and the semantic concept features generated by the connected components approach the image-level features. Conversely, if the node connectivity degree in the undirected graph is lower, the connectivity is carried outThe closer the semantic conceptual features generated by the components are to the instance-level features. In conclusion, the semantic concept features can give consideration to both the task of instance retrieval and the task of image retrieval.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of image query feature acquisition;

FIG. 3 is a flow diagram of an example query feature acquisition.

Detailed Description

As shown in fig. 1, the present invention discloses a search method based on semantic concept extraction, which comprises the following steps:

step 1, acquiring query characteristics;

the query feature is an image query feature or an instance query feature.

As shown in fig. 2, when the query feature is an image query feature, the query feature is obtained as follows:

inputting a query image, extracting a feature map through a convolutional neural network, and performing global pooling on the feature map to obtain query image features.

Or performing semantic concept segmentation on the feature graph to obtain a plurality of semantic concept features, and selecting one of the semantic concept features as an image query feature. The semantic concept cut here is the same as that set forth below, see in particular below.

As shown in fig. 3, when the query feature is an example query feature, the query feature is obtained as follows:

inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image;

extracting a characteristic graph from the example image through a convolutional neural network;

and performing global pooling on the feature map to obtain example query features.

And 2, carrying out similarity calculation on the query features and the candidate features in the candidate feature database to obtain a similarity ranking result.

And 3, returning the similarity sorting result as a retrieval result.

In step 2, the candidate features in the candidate feature library are obtained by extracting features from the images in the image database. The feature extraction method specifically comprises the following steps:

s1, extracting basic semantic elements

Specifically, for a feature map X with dimensions of H × W × C output by the convolutional neural network, an average activation map with dimensions of H × W can be obtained after averaging X on the dimension C of a channel

Since the high-level convolution kernels of different channels in the convolutional neural network usually represent different semantics, the value at each pixel position on the average activation map represents the high-level semantic response sum in the original image space range. Since the main part of the image generally has richer semantic information than the non-main part, a higher response value is presented in the average activation map. The set of peak points found on the average activation map using a 3 x 3 window then summarizes the position of the main part of the image, i.e. the position of the elementary semantic elements.

Since the feature map is obtained by down-sampling the original image through a plurality of pooling layers, the peak point on the feature map actually corresponds to a local area in the original image. Therefore, the features of each basic semantic element can be obtained more accurately by finding the local area corresponding to the peak point in the original image and then performing feature pooling.

For a convolutional layer in a convolutional neural network, for the sake of brevity, note H therein _f ×W _f The convolution kernel of dimension is F, where H _f And W _f Is the size of the convolution kernel. The input and output profiles of the convolutional layer are denoted as I and O, respectively. In the forward propagation process, the value I of the (I, j) position on the feature map is input _i,j And the value O of the (p, q) position on the output characteristic diagram _p,q By weight F of (i-p, j-q) on the convolution kernel _i-p,j-q Are linked together. Then, the output characteristic map O is calculated by the following equation:

where b represents an offset value in the convolutional layer, and σ represents a nonlinear activation function after convolutional layer.

Then, a certain pixel position I in the characteristic diagram is input _i,j For a certain pixel position O in the output characteristic diagram _p,q The contribution probability of (c) can be expressed by:

wherein, the conditional probability P (I) _ij |O _pq ) Is defined as:

wherein the content of the first and second substances,

for a bottom-up activation value, Z, computed by forward propagation for I at spatial location (I, j) _p,q For regularization term, to ensure ∑ _p,q P(I _i,j |O _p,q )＝1。

Equation (2) gives a way to calculate the probability that each position in the input feature map contributes to a position on the output feature map. Thus, for the average activation map

Each peak point can be calculated by back propagation layer by layer from back to front through the formula until the input image, and finally a contribution probability map M with the same scale of each peak point and the original image can be obtained.

Next, the spatial position of the basic semantic element needs to be estimated in the form of a rectangular box on the contribution probability map M. First, the values on M are normalized to a range of [0,1]]. Since the starting point of the back propagation is output by the last convolutional layerThe feature map has the largest receptive field, so when back propagation is performed to the input image, a large range of pixel positions are activated, and even pixels that contribute little to the peak point have extremely low activation values. Thus, by setting a threshold τ _a To filter out pixel points that contribute little to the peak point, in this method τ _a Set to 0.1. Subsequently, the activation region at M is estimated as an elliptical shape with the parameters greater than τ of all M _a Modeling the image second moment by pixel location of (a):

the rectangular frame of each basic semantic element activation region is obtained by a circumscribed rectangle of an ellipse. And finally, performing average pooling on the feature map through a rectangular box corresponding to each basic semantic element to obtain the features of the basic semantic elements.

S2, semantic concept cutting

After extracting basic semantic elements on an image, constructing an undirected graph G for the basic semantic elements ^image ＝<V ^image ,E ^image >，

Wherein, V ^image Is a node in the graph and is characterized by basic semantic elements

Forming; e ^image Is a collection of edges, represented as

G ^image The weight of the edge between two nodes is defined as:

in formula (5), cos (v) _i ,v _j ) Is v is _i And v _j Cosine similarity between them. Thus, similar basic semantic elements in the graph are all connected by edges. Then, connected components are cut out of the graph by a breadth first search algorithm. Specifically, each node is initially marked as not accessed and then is selected from a certain node v that is not accessed _i Initially, a new connected component is created, then all of its neighboring nodes that have not been accessed are accessed, and these neighboring nodes are added to the connected component and marked as accessed. And after all the nodes are visited, adding the generated connected components into the connected component set S, and repeating the process from the next node which is not visited until all the nodes are visited. Therefore, each connected component finally contains similar basic semantic elements, and the basic semantic elements together describe a certain semantic concept in the image. Then, aggregating the features of the basic semantic elements by calculating the average features of the nodes in each connected component to obtain the features of the semantic concepts:

wherein S is _i,j Is a connected component S _i N is a connected component S _i Number of basic semantic elements in the table.

S3, removing non-main semantic concepts

The semantic concept extraction process is performed on an image, and besides the main semantics (such as a main body instance) in the image, the extracted semantic concepts also have non-main semantics (such as a disordered background and an irrelevant instance). The existence of these non-main semantic conceptual features not only increases the storage burden and slows down the retrieval speed, but also affects the significance of the main semantic features in the data set, thereby affecting the retrieval performance. However, it is difficult to distinguish non-main semantics only on one image, because in the extraction process of the semantic concepts which are not related to the above categories, any semantic concept extracted on one image is not classified, and thus it cannot be determined which semantic concepts should be retained or removed. However, when the hierarchy of the image data set is increased, if a semantic concept is dominant in the data set, the semantic concept inevitably occurs frequently in the data set; conversely, a semantic concept may be a non-primary semantic if it occurs a very small number of times in the dataset. The non-main semantic removing steps are as follows:

first, a dataset-level undirected graph G is built ^dataset ＝<V ^dataset ,E ^dataset >In which V is ^dataset Is a node in the graph, is a node in the graph for the semantic concept characteristic, and is a semantic concept characteristic

E ^dataset Is a collection of edges, represented as

G ^dataset The edge weight and G in ^image Is similar to the definition in (1), if the similarity between two nodes is greater than the threshold τ _c The weight between two nodes is 1, otherwise it is 0.

Then, the degree of importance of the nodes in the graph is measured by the degree centrality of the nodes in the graph. The centrality is defined as the degree of a node, and is the number of edges per node in the undirected graph. The more central a node is, the more nodes it is connected to, and therefore the more important it is in the graph. Thus, a node v _i Degree D of _i Calculated by the following formula:

finally, the elimination of the centrality score of the degree of obliteration being below the threshold τ _d Non-primary semantic concepts of (1), retentionMeaningful semantic concept features complete the removal of non-primary semantic concepts.

And S4, obtaining candidate characteristics by the semantic concept characteristics through L2 regularization, PCA whitening and another round of L2 regularization processing.

The semantic concept feature granularity extracted by the method can cover the instance level and the image level, so that the extracted features can describe global and local image semantic information, and image retrieval and instance retrieval can be unified in a set of frames, and therefore, the method can be used for an instance retrieval task and an image retrieval task. In particular, in the semantic concept segmentation of the proposed method, the construction of the undirected graph is based on elementary semantic elements and depends on whether the similarity between the elementary semantic elements is higher than a threshold τ _b To determine whether there is an edge connection between them, so that the threshold τ _b The level of (2) affects the node connectivity degree in the undirected graph. Then, the semantic conceptual features are obtained by fusing basic semantic element features in the connected components in the undirected graph. Therefore, as the degree of connectivity of the nodes in the undirected graph increases, the more nodes are included in the connected components in the graph, and the semantic concept features generated by the connected components approach the image-level features. On the contrary, if the node connectivity degree in the undirected graph is lower, the semantic concept features generated by the connected components are closer to the instance level features. In conclusion, the semantic concept features can give consideration to both the task of instance retrieval and the task of image retrieval.

On the task of instance retrieval, since the retrieval target is an instance in an image, the target feature tends to be at a local spatial level. Thus, the threshold τ is used to extract semantic conceptual features on the instance search dataset _b Will be set at a higher value so that the generated semantic conceptual features are more inclined to the instance level. When the query requirement of instance retrieval comes, the method uses the same feature extraction network to extract the feature graph of the query image on the same output layer, and then uses the query instance frame to perform average region pooling on the feature graph of the query image to obtain the query feature. Then, the query features are subjected to a similarity ranking result of all semantic concept features in the candidate database through a nearest neighbor search algorithm, andthe ranking of each image is the highest ranking of the semantic conceptual features in the image. And finally, returning the sequencing result of the images as a retrieval result.

On the image retrieval task, since the retrieval target is the whole image, the target feature tends to be at the global image level. Thus, unlike on the instance retrieval task, the threshold τ is used when extracting semantic conceptual features on the image retrieval dataset _b Will be set at a lower value so that the generated semantic conceptual features are more inclined at the image level. When the query requirement of image retrieval comes, extracting the feature map of the query image on the same output layer of the same feature extraction network, and then performing average pooling on the feature map on the spatial dimension to obtain the query feature. The subsequent characteristic retrieval step is the same as that on the example retrieval task, and the retrieval sequencing result of the images is obtained according to the images to which each semantic concept characteristic belongs.

In conclusion, the invention can lead the extracted features to describe global and local image semantic information by mining semantic concepts on the feature map, thereby being capable of unifying image retrieval and example retrieval in a set of frames. That is, when searching based on the method of the present invention, developers only need to develop a set of system to apply it to image searching and instance searching, specifically, only need to adjust the threshold τ _b Can be used, thereby greatly reducing the development cost. Furthermore, the present invention can achieve excellent search performance in both of a plurality of case searches and image search data sets.

In order to explain the effects of the present invention, the search method of the present invention is compared with the conventional search method. The method comprises the following specific steps:

table 1 shows the retrieval accuracy of the method of the present invention on the Instance-335 dataset as compared to R-MAC, croW, CAM, BLCF, BLCF-SalGAN, regional Activity, deepVision, FCIS + XD, PCL + SPN, DASR.

Method	Front 50	Front 100	All are provided with
				R-MAC	23.4	31.5	37.5
CroW	15.9	22.5	32.1
				CAM	19.4	26.3	34.7
BLCF	24.6	35.8	48.3
				BLCF-SalGAN	24.5	35.0	46.9
Regional Attention	24.2	35.1	48.8
				DeepVision	40.2	52.1	62.0
FCIS+XD	40.3	50.0	59.3
				PCL*+SPN	38.5	47.9	57.9
DASR	41.9	55.8	69.9
				Method of the invention	43.6	57.4	72.1

TABLE 1

Table 2 shows the comparison of the search accuracy of the method of the present invention on INSTRE dataset with R-MAC, croW, CAM, BLCF, BLCF-SalGAN, regional Attention, deepVision, FCIS + XD, PCL + SPN, DASR.

Method	Accuracy of search
		R-MAC	52.3
CroW	41.6
		CAM	32.0
BLCF	63.6
		BLCF-SalGAN	69.8
Regional Attention	54.2
		DeepVision	19.7
FCIS+XD	6.7
		PCL*+SPN	56.9
DASR	62.9
		Method of the invention	69.7

TABLE 2

Table 3 shows the comparison of the search accuracy of the method of the present invention on the data sets of Holidays, oxford5k and Paris6k with SIFT + VLAD, boVW + HE, neurocodes, R-MAC, croW, BLCF, BLCF-SalGAN, CAM, regional Attention, deepVision and DASR + VLAD.

Method	Holidays	Oxford5k	Paris6k
				SIFT+VLAD	66.4	35.9	39.1
BoVW+HE	74.2	50.3	50.1
				NeuralCodes	74.9	43.5	-
R-MAC	-	66.9	83.0
				CroW	85.1	70.8	79.7
BLCF	85.4	72.2	79.8
				BLCF-SalGAN	83.5	74.6	81.2
CAM	78.5	71.2	80.5
				Regional Attention	-	76.8	87.5
DeepVision	-	71.0	79.8
				DASR+VLAD	83.4	59.4	69.0
Algorithm of the invention	91.9	73.5	84.2

TABLE 3

In tables 1-3 above, the corresponding method for R-MAC is the method proposed by Tolias G et al (Tolias G, sicre R, J tegou H. Particulate object retrieval with integral max-firing of CNN activities [ J ]. ArXiv preprinting arXiv:1511.05879, 2015.);

a method corresponding to CroW is a method proposed by Kalantididis Y et al (Kalantididis Y, mellina C, osindeno S.Cross-dimensional weighting for aligned controlled volumetric delivery of reactive feedstocks [ C ]// European conference on computer vision. Springer, 2016;

the CAM corresponds to the method proposed by Jimenez A et al (Jimenez A, alvarez J M, gir Lo-i-Nieto X. Class weighted volumetric effects for visual effect search [ C ]// 2017);

the corresponding method between BLCF and BLCF-SalGAN is a method proposed by Mohedano E et al (Mohedano E, mcGuinness K, gir Lou-i Nieto X, et al. Saliency weighted contextual basic concepts for instance search [ C ]//2018international conference on content-based multimedia indexing (CBMI). IEEE, 2018;

the corresponding method of Regional authorization is the method proposed by Kim J et al (Kim J, yoon S E.Regial authentication based deep feature for image retrieval. [ C ]// BMVC.2018: 209);

the method corresponding to DeepVision is the method proposed by Salvador A et al (Salvador A, gir Lou-i Nieto X, marqu es F, et al. Fast r-cnn features for instance search [ C ]// Proceedings of the IEEE conference on computer vision and conference research works 2016: 9-16);

the method corresponding to FCIS + XD is a method proposed by Zhan Y et al (Zhan Y, zhao W L. Instant search vision instance level determination and feature presentation [ J ]. Journal of Visual Communication and Image reproduction, 2021, 79);

PCL + SPN is a method proposed by Lin J et al (Lin J, zhan Y, zhao W l. Instant search based on weather super featured learning [ J ] neuro-rendering, 2021, 424;

the method corresponding to DASR and DASR + VLAD is the method proposed by Xiao H C et al (Xiao H C, ZHao W L, lin J, et al. Deep activated localized region for instance search [ J ]. ACM Trans. Multimedia Comp. Commun. Appl. 2021);

the method corresponding to SIFT + VLAD and BoVW + HE is a method proposed by Zhao W L et al (Zhao W L, ngo C W, wang H. Fast covariant VLAD for image search [ J ]. IEEE Transactions on Multimedia,2016,18 (9): 1843-1854);

the corresponding method of neuronal codes is the method proposed by Babenko A et al (Babenko A, slesarev A, chigorin A, et al. Neuronal codes for image retrieval [ C ]// European conference on computer vision. Springer,2014 584-599.).

As can be seen from tables 1,2 and 3, the method of the present invention achieves the best results on two example search datasets and three image search datasets. In addition, the algorithm only depends on a pre-trained convolutional neural network, fine tuning training is carried out without additional manual labeling data, and semantic concept features of any category can be extracted.

The above description is only an example of the present invention, and does not limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above embodiment according to the technical essence of the present invention are within the technical scope of the present invention.

Claims

1. A search method based on semantic concept extraction is characterized in that: the method comprises the following steps:

acquiring query characteristics;

returning the similarity sorting result as a retrieval result;

extracting basic semantic elements from the image;

performing semantic concept segmentation:

wherein cos (v) _i ,v _j ) Is v is _i And v _j Cosine similarity therebetween;

cutting connected components in the undirected graph, wherein each connected component comprises similar basic semantic elements; calculating the average characteristic of the nodes in each connected component to aggregate the characteristics of the basic semantic elements to obtain semantic concept characteristics;

2. The search method based on semantic concept extraction as claimed in claim 1, wherein: the basic semantic elements extracted from the image are specifically as follows:

inputting the image into a convolutional neural network to obtain an output H multiplied by W multiplied by C dimensional feature map X;

averaging the feature map X on the channel dimension C to obtain an H multiplied by W dimension average activation map

Activation map on average using NxN windows

Searching to obtain a peak point set;

for average activation map

Each peak point in the above is processed from back to front by a contribution probability formulaPerforming backward propagation calculation layer by layer until an input image is obtained, and obtaining a contribution probability graph M with each peak point and the original image in the same scale;

3. The search method based on semantic concept extraction as claimed in claim 2, wherein: the contribution probability formula is as follows:

wherein, P (I) _i,j ) For a certain pixel position I in the input characteristic diagram _i,j For a certain pixel position O in the output characteristic diagram _p,q The probability of contribution of (c); conditional probability P (I) _i,j |O _p,q ) Is defined as follows:

wherein the content of the first and second substances,

4. The search method based on semantic concept extraction as claimed in claim 2, wherein: before estimating the spatial position of the basic semantic element, the contribution probability map M is processed as follows:

will contribute to the activation region on the probability map MThe field is estimated as an elliptical shape, the parameters of which are represented by the contribution probability map M larger than tau _a The pixel location modeling image second moment of (a) yields:

5. the search method based on semantic concept extraction as claimed in claim 4, wherein: the rectangular frames corresponding to the basic semantic elements are obtained by the external rectangles of the ellipses, and the features of the basic semantic elements are obtained by performing average pooling on the feature map through the rectangular frames corresponding to each basic semantic element.

6. The search method based on semantic concept extraction as claimed in claim 1, wherein: the undirected graph G ^image Expressed as:

G ^image ＝<V ^image ,E ^image >，

E ^image Is a collection of edges, represented as

7. The search method based on semantic concept extraction as claimed in claim 1, wherein: after semantic concept cutting is carried out to obtain semantic concept features, non-semantic concept removing is carried out; the method comprises the following specific steps:

E ^dataset Is a collection of edges, denoted as

finally, the elimination of the centrality score of the degree of obliteration being below the threshold τ _d The non-main semantic concepts of (1) and the meaningful semantic concept characteristics are reserved.

8. The search method based on semantic concept extraction as claimed in claim 1, wherein: the query feature is an image query feature or an instance query feature.

9. The search method based on semantic concept extraction as claimed in claim 8, wherein: when the query feature is an image query feature, the query feature is obtained as follows:

10. The search method based on semantic concept extraction as claimed in claim 8, wherein: when the query feature is an example query feature, the query feature is obtained as follows:

inputting an example image or inputting a query image, and cutting the query image by using a query example rectangular frame to obtain an example image; extracting a feature map from the example image through a convolutional neural network; and performing global pooling on the feature map to obtain example query features.