CN116701695B - Image retrieval method and system for cascading corner features and twin network - Google Patents

Image retrieval method and system for cascading corner features and twin network Download PDF

Info

Publication number
CN116701695B
CN116701695B CN202310640768.4A CN202310640768A CN116701695B CN 116701695 B CN116701695 B CN 116701695B CN 202310640768 A CN202310640768 A CN 202310640768A CN 116701695 B CN116701695 B CN 116701695B
Authority
CN
China
Prior art keywords
image
images
searched
similar
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310640768.4A
Other languages
Chinese (zh)
Other versions
CN116701695A (en
Inventor
陈程立诏
李潞铭
卢博
宋佳
宋梦柯
胡诗语
赵一汎
王子铭
张明月
杨龙燕
崔爽锌
薛子玥
刘新宇
梁少峰
朱晓东
尹涵冰
张钰
袁千禧
刘伊凡
崔奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202310640768.4A priority Critical patent/CN116701695B/en
Publication of CN116701695A publication Critical patent/CN116701695A/en
Application granted granted Critical
Publication of CN116701695B publication Critical patent/CN116701695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an image retrieval method and system of cascade corner features and a twin network, wherein the method comprises the following steps: extracting key point characteristics of the image to be searched after noise reduction and each image in the search data set through angular point detection, and screening out similar images according to the key point characteristics; and (3) cutting out the region of interest of the image to be retrieved and the similar images thereof, inputting the obtained local images into a trained twin network model which is constructed based on a depth residual error network and has a deformable attention mechanism, obtaining two groups of depth features, scoring the similarity of the depth features, and taking the similar images with the scores exceeding a threshold value as retrieval results. The image retrieval method is particularly suitable for LOGO image retrieval, and can accurately and robustly retrieve the target image containing the corresponding LOGO from the data set under the condition that the number of the image types is uncertain.

Description

Image retrieval method and system for cascading corner features and twin network
Technical Field
The invention belongs to the technical field of digital image processing methods, and particularly relates to the technical field of content-based image retrieval methods.
Background
How to conveniently, rapidly and accurately search images required or interested by a user in an image library is a hot spot for research in the field of current multimedia information retrieval. More studied image retrieval methods include two categories, text-based image retrieval (TBIR, text Based Image Retrieval) and content-based image retrieval (CBIR, content BasedImage Retrieval).
The text-based image retrieval method is characterized in that text labeling is needed for content in an image, so that after a user provides retrieval keywords, the image of interest of the user is retrieved according to the correspondence between the labeled text and the keywords; the image retrieval method based on the content needs to analyze the image by using a computer, and establishes an image feature library according to the image vector features, when a user inputs an inquiry image, the features of the inquiry image are extracted as well, the similarity comparison is carried out between the features in the feature library, and the images are output according to the sequence of the similarity.
Content-based image retrieval has the following drawbacks, including: the retrieval is easy to be interfered by different image scale changes and complex backgrounds, and similar images with certain image changes under the complex backgrounds, such as LOGO images with rotation changes, scale scaling and image quality differences, are difficult to accurately retrieve. However, accurate search of the LOGO image is a technical problem to be solved in the prior art, and thus, the above drawbacks are overcome by a new technology.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to break through the limitations of image scale difference change and image similarity complex background area interference in the existing content-based image retrieval method, and provides a novel image retrieval method and system which can be used for accurately retrieving LOGO images.
The technical scheme of the invention is as follows:
an image retrieval method of cascading corner features and a twin network comprises the following steps:
s1, carrying out noise reduction treatment on each image in an image to be searched and a search data set to obtain a noise-reduced image to be searched and a noise-reduced search data set image, wherein the search data set is an image data set for carrying out image search according to the image to be searched;
s2, respectively carrying out global feature extraction on the image to be searched after noise reduction and the image of the search dataset after noise reduction through a SIFT angular point detection algorithm, carrying out similarity comparison on the extracted key point vectors, screening out similar images which are similar to the image to be searched in the search dataset, and forming a matching pair image by the image to be searched and the similar images;
s3, in an image coordinate system, based on the key points with similarity in the matching pair images, namely the matching points, respectively carrying out region-of-interest clipping on the obtained matching pair images to obtain local images of the images to be retrieved and local images of the similar images;
s4, inputting the local images of the images to be searched and the local images of the similar images into a trained twin network model, obtaining the depth characteristics of the two local images and the similarity scores between the two obtained sets of depth characteristics, screening out the local images with scores exceeding a similarity threshold, and taking the corresponding similar images as search results;
the twin network model is constructed based on a depth residual network and provided with a deformable attention mechanism.
According to some preferred embodiments of the invention, the noise reduction is achieved by gaussian filtering.
According to some preferred embodiments of the present invention, the similarity comparison is evaluated by euclidean distance between the key point vectors, that is, when the euclidean distance between the key point vectors extracted from the image to be retrieved after noise reduction and the key point vectors extracted from the image of the retrieved dataset after noise reduction is less than or equal to a priori threshold, the two are considered to be similar.
According to some preferred embodiments of the invention, the similarity score is obtained by linear cross-correlation.
According to some preferred embodiments of the invention, the a priori threshold is set to 0.6.
According to some preferred embodiments of the invention, the similarity threshold is set to 0.9.
According to some preferred embodiments of the invention, step S2 further comprises:
s21, extracting key point vectors of the noise-reduced image to be retrieved and the noise-reduced retrieval data set image through SIFT angular point detection;
s22, calculating Euclidean distance between key point vectors of the image to be retrieved after noise reduction and the image of the retrieval data set after noise reduction, and screening out the similar images;
s23, screening out images with the width and the height respectively larger than 256 pixels in the similar images to obtain secondary screening images;
s24, carrying out two-dimensional gridding treatment on the noise-reduced image to be searched and the secondary screening image to obtain the image to be searched and the secondary screening image with a two-dimensional coordinate system;
and S25, matching the key points with the highest similarity in the image to be searched and the secondary screening image with the two-dimensional coordinate system to serve as the most similar key points of the image to be searched and the secondary screening image thereof.
According to some preferred embodiments of the invention, step S3 further comprises:
cutting the image to be searched and the secondary screening image with the two-dimensional coordinate system according to the most similar key points to obtain the local image; wherein the cropping comprises:
when the most similar key point is positioned in the central area of the image to be searched or the secondary screening image with the two-dimensional coordinate system, namely, in the two-dimensional coordinate system, the distances between the most similar key point and the four sides of the image are more than or equal to 256, the most similar key point is taken as the central point, a rectangular image with the size of 128 multiplied by 128 is cut out from the image to be searched with the two-dimensional coordinate system and is taken as a local image, and a rectangular image with the size of 256 multiplied by 256 is cut out from the secondary screening image with the two-dimensional coordinate system and is taken as the local image;
when the most similar key point is located in the edge area of the image to be retrieved or the secondary screening image with the two-dimensional coordinate system, that is, in the two-dimensional coordinate system, the distance between the most similar key point and the four sides of the image is smaller than 256, the most similar key point is taken as a corner point in a cutting area of a rectangle, the most similar key point extends from the corner point to the central area of the secondary screening image, and cutting is performed after a cutting area of a target size is obtained, so that a local image is obtained, wherein the target size comprises: and cutting out a rectangular image with the size of 128 multiplied by 128 from the image to be retrieved with the two-dimensional coordinate system, and cutting out a rectangular image with the size of 256 multiplied by 256 from the secondary screening image with the two-dimensional coordinate system.
According to some preferred embodiments of the invention, the twin network model is constructed by a modified Resnet50 network, the modified Resnet50 network having the following structure:
the third convolution layer is changed into a deformable convolution network layer on the basis of the Resnet50 network, and a channel attention module is added after the fourth convolution layer and the five convolution layers respectively.
According to some preferred embodiments of the invention, training the twin network model comprises:
forming a training set by the template images of the same type as the images to be searched and the cut search data set images;
five anchor blocks are arranged pixel by pixel on an input training set image in a coordinate mapping mode, and the arrangement mode is as follows: the area of each anchor point frame is 1/64 of the original image, and the length-width ratio is 0.33, 0.5, 1, 2 and 3 respectively;
training a twin network model based on a triplet loss function by using the trained improved ResNet50 network as an initialization weight, wherein the template image is set to be a positive label in training, and a search dataset image with the Euclidean distance between a key point and the key point of the template image being more than 0.6 is set to be a negative label;
and obtaining the trained twin network model until the triplet loss is stable, namely the network converges.
According to some preferred embodiments of the invention, the obtaining of the search result includes:
s51, extracting feature vectors of local images of the images to be retrieved and local images of similar images of the images to be retrieved through the trained improved ResNet50 network;
s52, taking the extracted depth features of the images to be retrieved as convolution kernels, and carrying out convolution processing on the depth features of the images to be retrieved and the depth features of the similar images in a linear cross-correlation mode to obtain a similarity score graph;
s53, classifying the similarity score graph through a softmax function, wherein samples with the maximum value and the minimum value of the similarity score graph being larger than 0.5 are positive samples, and samples with the similarity score being larger than a preset similarity threshold value of 0.9 are retrieval results in the positive samples.
According to some preferred embodiments of the invention, the image to be retrieved is a LOGO-like image.
The invention further provides a retrieval system for realizing the image retrieval method, which comprises the following steps: the database module stores the images to be searched and the search data set, the noise reduction processing module carries out noise reduction processing on the images in the images to be searched and the search data set, the SIFT angular point detection and similarity matching module carries out all feature extraction and similar image screening, the image cutting module carries out the region of interest cutting, the model processing module carries out the twin network model construction and training, and the secondary search module can carry out secondary search on search results.
The image retrieval method is particularly suitable for LOGO image retrieval, and can be used for firstly carrying out noise reduction and sampling pretreatment operation on the LOGO image by adopting Gaussian filtering, then extracting SIFT corner features of the LOGO image and a retrieval dataset image, carrying out cutting of an interested search area on the image in the dataset to be retrieved meeting a matching condition by key point matching, and generating a corresponding candidate search area image.
The invention can solve the problem of poor practicability of LOGO image retrieval under the complex scene in the existing retrieval method, and realize LOGO image retrieval with high accuracy and good robustness in the complex scene.
The invention can effectively improve the accuracy and the speed of LOGO image retrieval under the complex application condition, and particularly, under the condition that the number of LOGO image types is uncertain, the image expression containing the corresponding LOGO is most obvious when the image data set is retrieved.
Drawings
Fig. 1 is a flow chart of the search method of the present invention.
Fig. 2 is a schematic structural diagram of a specific deep learning network according to the present invention.
FIG. 3 is a schematic diagram of a particular attention module employed in the present invention.
Fig. 4 is a schematic diagram of the components of the retrieval system in the embodiment.
Detailed Description
The present invention will be described in detail with reference to the following examples and drawings, but it should be understood that the examples and drawings are only for illustrative purposes and are not intended to limit the scope of the present invention in any way. All reasonable variations and combinations that are included within the scope of the inventive concept fall within the scope of the present invention.
Referring to fig. 1, a specific embodiment of the image retrieval method of cascading corner features and a twin network model provided by the invention comprises the following steps:
s1, carrying out noise reduction processing on the LOGO image to be searched and each image in the search data set through Gaussian filtering, and obtaining the LOGO image to be searched after noise reduction and each image in the search data set after noise reduction.
In more specific embodiments, the gaussian filter is a two-dimensional gaussian filter.
S2, global feature extraction is respectively carried out on the LOGO image to be searched after noise reduction and each image in the search data set, and images which are similar to the LOGO image to be searched in the search data set, namely similar images, are screened out through similarity comparison of global features obtained through extraction, and then the LOGO image to be searched and the similar images in the search data set are matched images; and the global features use SIFT key point features, and SIFT key points with similarity in the matching pair images are the matching points.
More specific embodiments thereof are as follows: and extracting SIFT key point feature vectors of the LOGO image to be searched and each image in the search data set through a SIFT key point algorithm, calculating the similarity of the SIFT key point feature vectors by using Euclidean distance, and primarily screening out images with similarity Euclidean distance less than or equal to a priori threshold value, wherein the images are used as matching pair images with similarity.
Preferably, the a priori threshold is set to 0.6.
More specific embodiments thereof include:
s21, extracting key point vectors of LOGO images to be searched and each image in the search data set through a SIFT angular point detection algorithm;
s22, calculating Euclidean distance between the LOGO image to be searched and the key point vectors of all the images in the search data set, and screening out images with Euclidean distance smaller than 0.6 between the key point vectors of the LOGO image to be searched in the search data set, namely similar images of the LOGO image to be searched in the search data set;
s23, carrying out secondary screening on similar images in the search data set according to the pixel size of the images, and screening out images with the width and the height respectively larger than 256 pixels, namely secondary screening images;
s24, taking a point at the upper left corner of the secondary screening image as an origin, taking left and right as an X-axis extending direction and Y-axis extending directions respectively, and generating grid point two-dimensional coordinates of the searching and the image to be searched by using a merhgrid function in a python numpy library;
s25, matching the key point pair with the highest similarity between the LOGO image to be searched and the secondary screening image in the searching data set, and taking the key point pair as the most similar key point between the LOGO image to be searched and the secondary screening image.
S3, based on the obtained matching points in an image coordinate system, respectively cutting the interested areas of the images by the obtained matching points, and cutting to obtain local images of the LOGO images to be searched and local images of similar images in the search data set.
Specific embodiments thereof are as follows: and according to the obtained matching points, mapping the matching pair images into an image coordinate system respectively, further, taking any key point coordinate in the matching pair images as the central coordinate of the region of interest, and respectively cutting the region of interest of the matching pair images to generate a local image of the LOGO image to be searched and a local image of a similar image in the search data set.
In more specific embodiments, the partial image of the LOGO image to be searched is set to be an image of a rectangular area with a size of 128×128 in the LOGO image to be searched, and the partial image of the similar image is an image of a rectangular area with a size of 256×256 in the similar image.
Other more specific embodiments thereof include:
s31, obtaining a LOGO image to be searched and a secondary screening image of the LOGO image in a search data set and the most similar key points through the steps S21-S25;
s32, in the LOGO image to be searched and the secondary screening image, local image cutting is carried out by taking the most similar key points as cutting basis, and the local image is obtained.
Preferably, the local image cropping includes:
when the most similar key points are positioned in the central area of the LOGO image to be searched or the secondary screening image, namely in a two-dimensional coordinate system, the distances between the most similar key points and the four sides of the image are more than or equal to 256, the most similar key points are taken as the central points, rectangular images with the size of 128 multiplied by 128 are cut out from the LOGO image to be searched and taken as local images of the LOGO image to be searched, and rectangular images with the size of 256 multiplied by 256 are cut out from the secondary screening image and taken as local images of the similar images;
when the most similar key points are positioned in the edge area of the LOGO image to be searched or the secondary screening image, namely in a two-dimensional coordinate system, the distances between the most similar key points and the four sides of the image are smaller than 256, the most similar key points are used as corner points in a cutting area of a rectangle, the most similar key points extend from the corner points to the central area of the secondary screening image, cutting is carried out after the cutting area with the target size is obtained, and the local images of the LOGO image to be searched and the secondary screening image are obtained; the target size includes: rectangular images with the size of 128 multiplied by 128 are cut out of LOGO images to be searched, and rectangular images with the size of 256 multiplied by 256 are cut out of secondary screening images.
S4, inputting the local images of the LOGO images to be searched and the local images of the similar images in the search data set into a trained twin network model, obtaining the depth characteristics of the two local images and the similarity scores of the two obtained sets of depth characteristics, screening out the local images with scores exceeding a similarity threshold, and taking the corresponding original similar images as search results.
In more specific embodiments, the similarity score is obtained by linear cross-correlation analysis.
In more specific embodiments, the similarity threshold is set to 0.9.
Considering the twin network shared network weight in the twin network model, the similarity of two inputs can be accurately measured, but the common full convolution twin network is difficult to accurately and robustly extract the image characteristics with unknown deformation and unknown sources of complex background characteristics, so in the step S4, the invention further builds the twin network model with a deformable attention mechanism, and the retrieval is more accurate and robustly.
Further, wherein the twin network model is a twin network model with deformable attention (Deformable Attention) mechanism constructed based on a depth residual network (Deep Residual Network, res net).
Further, referring to fig. 2 and 3, a more specific implementation manner of the twin network model is as follows:
with the Resnet50 depth full convolutional network as the basic network structure, referring to FIG. 2, the third convolutional layer of Resnet50 is changed into a deformable convolutional network layer, and a channel attention module is added after the fourth and fifth convolutional layers.
Further, referring to fig. 3, a more specific embodiment of the channel attention module is as follows:
the channel attention modules take the outputs of the fourth and fifth convolution layers as inputs, assuming the inputs of the respective channel attention modulesi=3, 4,5. The input image is first downsampled using a convolution layer with a convolution kernel size of 1 x 1, reducing the number of channels input to one quarter, i.e., C' =1/4C. Based on the resulting downsampled features, the reshape function in the torch library of python is used to deform it into a two-dimensional vector +.>N=h×w, and V is transposed to its transposed matrix V T Matrix multiplication is performed, and after the matrix multiplication, a channel attention map is obtained by using a softmax function in a python numpy library>That is, a=softmax (VV T )。
Then, weighted addition is performed on the channel attention map a and the two-bit vector V, that is, V' =γva+v, where γ is a weight preset to 0 trained with the network. Finally, V' is deformed again to a vector of resolution equal to input X using the reshape function in the torch library of python.
In the above embodiment, the modified res net50 network is used as a backbone network of a twin network model, in which the modification replaces the convolution in the third convolution layer of the res net50 network with a deformable convolution, and an offset variable is added to the position of each sampling point in the convolution kernel through the deformable convolution, and by using the variables, the convolution kernel can randomly sample near the current position, and is not limited to the previous regular lattice points. After the offset is learned, the size and the position of the deformable convolution kernel can be dynamically adjusted according to the image content which is required to be identified at present, and the visual effect is that the convolution kernel sampling point positions at different positions can be adaptively changed according to the image content, so that the geometric deformation such as the shape, the size and the like of different objects can be adapted to the complex and changeable data set to be searched. After the fourth and fifth layers, a channel attention mechanism is used, i.e. a bypass branch is split after the normal convolution operation, a reshape operation is performed first, and the space dimension is compressed, i.e. each two-dimensional feature map becomes a real number, which is equivalent to a pooling operation with a global receptive field, and the number of feature channels is unchanged. The softmax operation then generates weights for each feature channel by parameters that are learned to explicitly model the correlation between feature channels. After the weight of each characteristic channel is obtained, the weight is applied to each original characteristic channel, and the importance of different channels can be learned. After that, the final LOGO characteristic tensor is used as a convolution kernel, convolution operation is carried out on the image to be retrieved, and the similarity is calculated through channel-by-channel multiplication.
Further, training of the twin network model may include:
and taking the LOGO template image and the image area cut in the retrieval data set as the input of the twin network model, namely a training set, and carrying out network parameter training on the twin network model by adopting a triplet loss function.
More specific embodiments thereof are as follows:
five anchor point frames are arranged pixel by pixel on an input training set image in a coordinate mapping mode, wherein the arrangement mode is that; the area of each anchor point frame is 1/64 of the original image, and the length-width ratio is 0.33, 0.5, 1, 2 and 3 respectively;
training a twin network model by using a triple loss function by using a Resnet50 network weight trained on 1400 ten thousand images of an ImageNet as an initialization weight of a backbone network in the twin network model, using a LOGO template image as a positive label, using an image with a Euclidean distance between a key point of the LOGO template image and a key point of the LOGO template image being larger than 0.6 as a negative label;
and obtaining the trained twin network model until the triplet loss is stable, namely the network converges.
Further, in a specific embodiment of the present invention, the process of obtaining the search result through the trained twin network model includes:
s51, extracting feature vectors of local images of LOGO images to be searched and local images of similar images of the LOGO images to be searched through an improved ResNet50 model which is completed through training;
s52, taking the extracted depth image features of the LOGO image to be retrieved as convolution kernels, and carrying out convolution processing on the extracted depth image features of the LOGO image and the depth image features of the similar image in a linear cross-correlation mode to obtain a similarity score graph with the channel number of 2 and the pixel size of 17 multiplied by 17;
s53, classifying the similarity score graph through a softmax function, calculating the maximum value and the minimum value of the similarity score graph, if the maximum value and the minimum value of the similarity score graph are both larger than 0.5, considering the classification sample as positive samples, and if the maximum value and the minimum value of the classification score are both larger than 0.5, the classification sample is an image required by retrieval if the maximum score in the similarity score graph is larger than a preset similarity threshold value of 0.9.
Example 1
According to the above embodiments, the present invention further provides the following examples:
the method was implemented in Python using pythorch and trained on 2 RTX 2080Ti cards. During training, the batch size was set to 16 and 20 epochs were performed using random gradient descent (SGD). We set the initial learning rate to 0.001, using a warm-up learning rate of 0.001 to 0.005 in the first 5 cycles, and decaying exponentially from 0.005 to 0.00005 in the last 15 cycles. The weight decay and momentum were set to 0.0001 and 0.9, respectively. The retrieval image and the image to be retrieved are screen capture images of manual random screen capture in the national 242 information security plan small emergency special project, and have the characteristics of large image scale change, complex background interference, rotation change, poor image quality and the like.
The search system is shown in fig. 4, and the search result statistics are as follows:
TABLE 1LOGO image search results summary
It can be seen that the retrieval method of the invention has excellent accuracy and robustness.
The above examples are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples. All technical schemes belonging to the concept of the invention belong to the protection scope of the invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (9)

1. The image retrieval method of the cascade corner features and the twin network is characterized by comprising the following steps of:
s1, carrying out noise reduction treatment on each image in an image to be searched and a search data set to obtain a noise-reduced image to be searched and a noise-reduced search data set image, wherein the search data set is an image data set for carrying out image search according to the image to be searched;
s2, carrying out global feature extraction on the image to be searched after noise reduction and the image of the search dataset after noise reduction through SIFT angular point detection, carrying out similarity comparison on the extracted key point vectors, screening out similar images with similarity to the image to be searched in the search dataset, and forming a matching pair image by the image to be searched and the similar images;
s3, in an image coordinate system, based on the key points with similarity in the matching pair images, namely the matching points, respectively carrying out region-of-interest clipping on the obtained matching pair images to obtain local images of the images to be retrieved and local images of the similar images;
s4, inputting the local images of the images to be searched and the local images of the similar images into a trained twin network model, obtaining the depth characteristics of the two local images and the similarity scores between the two obtained sets of depth characteristics, screening out the local images with scores exceeding a similarity threshold, and taking the corresponding similar images as search results;
wherein the twin network model is a twin network model with deformable attention mechanism constructed based on depth residual network, which is composed by a modified Resnet50 network, the modified Resnet50 network has the following structure:
on the basis of a Resnet50 network, a third convolution layer is changed into a deformable convolution network layer, and a channel attention module is added after a fourth convolution layer and five convolution layers respectively;
step S2 further comprises:
s21, extracting key point vectors of the noise-reduced image to be retrieved and the noise-reduced retrieval data set image through SIFT angular point detection;
s22, calculating Euclidean distance between key point vectors of the image to be retrieved after noise reduction and the image of the retrieval data set after noise reduction, and screening out the similar images;
s23, screening out images with the width and the height respectively larger than 256 pixels in the similar images to obtain secondary screening images;
s24, carrying out two-dimensional gridding treatment on the noise-reduced image to be searched and the secondary screening image to obtain the image to be searched and the secondary screening image with a two-dimensional coordinate system;
s25, matching the key points with the highest similarity in the images to be searched and the secondary screening images with the two-dimensional coordinate system to serve as the most similar key points of the images to be searched and the secondary screening images;
step S3 further comprises:
cutting the image to be searched and the secondary screening image with the two-dimensional coordinate system according to the most similar key points to obtain the local image; wherein the cropping comprises:
when the most similar key point is positioned in the central area of the image to be searched or the secondary screening image with the two-dimensional coordinate system, namely, in the two-dimensional coordinate system, the distances between the most similar key point and the four sides of the image are more than or equal to 256, the most similar key point is taken as the central point, a rectangular image with the size of 128 multiplied by 128 is cut out from the image to be searched with the two-dimensional coordinate system and is taken as a local image, and a rectangular image with the size of 256 multiplied by 256 is cut out from the secondary screening image with the two-dimensional coordinate system and is taken as the local image;
when the most similar key point is located in the edge area of the image to be retrieved or the secondary screening image with the two-dimensional coordinate system, that is, in the two-dimensional coordinate system, the distance between the most similar key point and the four sides of the image is smaller than 256, the most similar key point is taken as a corner point in a cutting area of a rectangle, the most similar key point extends from the corner point to the central area of the secondary screening image, and cutting is performed after a cutting area of a target size is obtained, so that a local image is obtained, wherein the target size comprises: and cutting out a rectangular image with the size of 128 multiplied by 128 from the image to be retrieved with the two-dimensional coordinate system, and cutting out a rectangular image with the size of 256 multiplied by 256 from the secondary screening image with the two-dimensional coordinate system.
2. The image retrieval method according to claim 1, wherein the noise reduction processing is realized by gaussian filtering.
3. The image retrieval method of claim 1, wherein the similarity score is obtained by linear cross-correlation.
4. The image retrieval method according to claim 1, wherein the similarity comparison is evaluated by euclidean distance between key point vectors, namely, when the euclidean distance between the key point vectors extracted from the image to be retrieved after noise reduction and the key point vectors extracted from the image of the retrieval dataset after noise reduction is less than or equal to a priori threshold, the two are considered to be similar.
5. The image retrieval method as recited in claim 4, wherein the a priori threshold is set to 0.6; and/or, the similarity threshold is set to 0.9.
6. The image retrieval method of claim 1, wherein training the twin network model comprises:
forming a training set by the template images of the same type as the images to be searched and the cut search data set images;
five anchor blocks are arranged pixel by pixel on an input training set image in a coordinate mapping mode, and the arrangement mode is as follows: the area of each anchor point frame is 1/64 of the original image, and the length-width ratio is 0.33, 0.5, 1, 2 and 3 respectively;
training a twin network model based on a triplet loss function by using the trained weight of the improved Resnet50 network as an initialization weight, wherein the template image is set to be a positive label in training, and the cut search dataset image with the Euclidean distance between a key point and the key point of the template image being more than 0.6 is set to be a negative label;
and obtaining the trained twin network model after the triplet loss function is stable, namely the network is converged.
7. The image retrieval method according to claim 1, wherein the obtaining of the retrieval result includes:
s51, extracting feature vectors of local images of the images to be retrieved and local images of similar images of the images to be retrieved through the trained improved ResNet50 network;
s52, taking the extracted depth features of the images to be retrieved as convolution kernels, and carrying out convolution processing on the depth features of the images to be retrieved and the depth features of similar images in a linear cross-correlation mode to obtain a similarity score graph with the channel number of 2 and the pixel size of 17 multiplied by 17;
s53 classifies the similarity score map by a softmax function, including: and calculating the maximum value and the minimum value of the similarity score map, and if the maximum value and the minimum value are both larger than 0.5, considering the classification sample as a positive sample, wherein in the positive sample, a sample with the similarity score larger than a preset similarity threshold value of 0.9 is taken as an image required for retrieval.
8. The image retrieval method according to claim 1, wherein the image to be retrieved is a LOGO-like image.
9. A retrieval system implementing the image retrieval method of any one of claims 1 to 8, comprising: the database module stores the images to be searched and the search data set, the noise reduction processing module carries out noise reduction processing on the images in the images to be searched and the search data set, the SIFT angular point detection and similarity matching module carries out global feature extraction and similar image screening, the image cutting module carries out region-of-interest cutting, the model processing module carries out twin network model construction and training, and the secondary search module can carry out secondary search on search results.
CN202310640768.4A 2023-06-01 2023-06-01 Image retrieval method and system for cascading corner features and twin network Active CN116701695B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310640768.4A CN116701695B (en) 2023-06-01 2023-06-01 Image retrieval method and system for cascading corner features and twin network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310640768.4A CN116701695B (en) 2023-06-01 2023-06-01 Image retrieval method and system for cascading corner features and twin network

Publications (2)

Publication Number Publication Date
CN116701695A CN116701695A (en) 2023-09-05
CN116701695B true CN116701695B (en) 2024-01-30

Family

ID=87828619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310640768.4A Active CN116701695B (en) 2023-06-01 2023-06-01 Image retrieval method and system for cascading corner features and twin network

Country Status (1)

Country Link
CN (1) CN116701695B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013196458A (en) * 2012-03-21 2013-09-30 Casio Comput Co Ltd Image search system, image search device, image search method and program
CN105550381A (en) * 2016-03-17 2016-05-04 北京工业大学 Efficient image retrieval method based on improved SIFT (scale invariant feature transform) feature
CN107403407A (en) * 2017-08-04 2017-11-28 深圳市唯特视科技有限公司 A kind of breathing tracking based on thermal imaging
CN110825899A (en) * 2019-09-18 2020-02-21 武汉纺织大学 Clothing image retrieval method integrating color features and residual network depth features
CN111028277A (en) * 2019-12-10 2020-04-17 中国电子科技集团公司第五十四研究所 SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network
CN111881906A (en) * 2020-06-18 2020-11-03 广州万维创新科技有限公司 LOGO identification method based on attention mechanism image retrieval
CN112966137A (en) * 2021-01-27 2021-06-15 中国电子进出口有限公司 Image retrieval method and system based on global and local feature rearrangement
CN113223068A (en) * 2021-05-31 2021-08-06 西安电子科技大学 Multi-modal image registration method and system based on depth global features
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN113742504A (en) * 2021-09-13 2021-12-03 城云科技(中国)有限公司 Method, device, computer program product and computer program for searching images by images
CN114168768A (en) * 2021-12-07 2022-03-11 深圳市华尊科技股份有限公司 Image retrieval method and related equipment
CN114299559A (en) * 2021-12-27 2022-04-08 杭州电子科技大学 Finger vein identification method based on lightweight fusion global and local feature network
CN115129920A (en) * 2022-06-16 2022-09-30 武汉大学 Cross-modal retrieval method and device for local feature enhanced optical SAR remote sensing image
CN115937552A (en) * 2022-10-21 2023-04-07 华南理工大学 Image matching method based on fusion of manual features and depth features

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7450740B2 (en) * 2005-09-28 2008-11-11 Facedouble, Inc. Image classification and information retrieval over wireless digital networks and the internet
US20210118136A1 (en) * 2019-10-22 2021-04-22 Novateur Research Solutions LLC Artificial intelligence for personalized oncology

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013196458A (en) * 2012-03-21 2013-09-30 Casio Comput Co Ltd Image search system, image search device, image search method and program
CN105550381A (en) * 2016-03-17 2016-05-04 北京工业大学 Efficient image retrieval method based on improved SIFT (scale invariant feature transform) feature
CN107403407A (en) * 2017-08-04 2017-11-28 深圳市唯特视科技有限公司 A kind of breathing tracking based on thermal imaging
CN110825899A (en) * 2019-09-18 2020-02-21 武汉纺织大学 Clothing image retrieval method integrating color features and residual network depth features
CN111028277A (en) * 2019-12-10 2020-04-17 中国电子科技集团公司第五十四研究所 SAR and optical remote sensing image registration method based on pseudo-twin convolutional neural network
CN111881906A (en) * 2020-06-18 2020-11-03 广州万维创新科技有限公司 LOGO identification method based on attention mechanism image retrieval
CN112966137A (en) * 2021-01-27 2021-06-15 中国电子进出口有限公司 Image retrieval method and system based on global and local feature rearrangement
CN113223068A (en) * 2021-05-31 2021-08-06 西安电子科技大学 Multi-modal image registration method and system based on depth global features
CN113742504A (en) * 2021-09-13 2021-12-03 城云科技(中国)有限公司 Method, device, computer program product and computer program for searching images by images
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN114168768A (en) * 2021-12-07 2022-03-11 深圳市华尊科技股份有限公司 Image retrieval method and related equipment
CN114299559A (en) * 2021-12-27 2022-04-08 杭州电子科技大学 Finger vein identification method based on lightweight fusion global and local feature network
CN115129920A (en) * 2022-06-16 2022-09-30 武汉大学 Cross-modal retrieval method and device for local feature enhanced optical SAR remote sensing image
CN115937552A (en) * 2022-10-21 2023-04-07 华南理工大学 Image matching method based on fusion of manual features and depth features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"图像特征检测与匹配方法研究综述";唐灿等;《南京信息工程大学学报(自然科学版)》;261-273 *

Also Published As

Publication number Publication date
CN116701695A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
Su et al. A fast forgery detection algorithm based on exponential-Fourier moments for video region duplication
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
Lomio et al. Classification of building information model (BIM) structures with deep learning
Wang et al. Multiscale deep alternative neural network for large-scale video classification
Hamida et al. Handwritten arabic words recognition system based on hog and gabor filter descriptors
CN105654122B (en) Based on the matched spatial pyramid object identification method of kernel function
Zhang et al. Automatic discrimination of text and non-text natural images
CN112580480A (en) Hyperspectral remote sensing image classification method and device
CN110659374A (en) Method for searching images by images based on neural network extraction of vehicle characteristic values and attributes
Wu et al. Deep texture exemplar extraction based on trimmed T-CNN
CN109902690A (en) Image recognition technology
CN116701695B (en) Image retrieval method and system for cascading corner features and twin network
CN112446372B (en) Text detection method based on channel grouping attention mechanism
Kota et al. Summarizing lecture videos by key handwritten content regions
CN114359786A (en) Lip language identification method based on improved space-time convolutional network
Essa et al. High order volumetric directional pattern for video-based face recognition
Saudagar et al. Efficient Arabic text extraction and recognition using thinning and dataset comparison technique
Wang Extraction algorithm of English text information from color images based on radial wavelet transform
Wadhwa et al. Dissected Urdu Dots Recognition Using Image Compression and KNN Classifier
Shri et al. Video Analysis for Crowd and Traffic Management
Guyomard et al. Contextual detection of drawn symbols in old maps
Liu et al. TFPGAN: Tiny Face Detection with Prior Information and GAN
Raveendra et al. A novel automatic system for logo-based document image retrieval using hybrid SVDM-DLNN
Isnanto et al. Determination of the optimal threshold value and number of keypoints in scale invariant feature transform-based copy-move forgery detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant