CN114168768A - Image retrieval method and related equipment - Google Patents

Image retrieval method and related equipment Download PDF

Info

Publication number
CN114168768A
CN114168768A CN202111486875.3A CN202111486875A CN114168768A CN 114168768 A CN114168768 A CN 114168768A CN 202111486875 A CN202111486875 A CN 202111486875A CN 114168768 A CN114168768 A CN 114168768A
Authority
CN
China
Prior art keywords
key point
image
loss function
feature
descriptor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111486875.3A
Other languages
Chinese (zh)
Inventor
施宏恩
禹世杰
梅术正
吴伟华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN HARZONE TECHNOLOGY CO LTD
Original Assignee
SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN HARZONE TECHNOLOGY CO LTD filed Critical SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority to CN202111486875.3A priority Critical patent/CN114168768A/en
Publication of CN114168768A publication Critical patent/CN114168768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Abstract

The embodiment of the application discloses an image retrieval method and related equipment, wherein the method comprises the following steps: acquiring an image to be queried, determining a target type of the image to be queried, and selecting an interested area of the image to be queried; inputting the region of interest into a feature extraction network to obtain a first key point set; screening the targets in a preset base library according to the target types to obtain P second key point sets; the preset base library comprises Q second key point sets, each key point set corresponds to a target, Q is a positive integer larger than or equal to P, and P is a positive integer; inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship; and obtaining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relation, and displaying the images corresponding to the P second key point sets according to the similarity. By adopting the embodiment of the application, the efficient image retrieval can be realized.

Description

Image retrieval method and related equipment
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image retrieval method and a related device.
Background
In the prior art, image retrieval is also called graph search, which is a technology of converting a query image into vector features and returning an image with the highest similarity to the query image through a similarity search engine. With the rapid development of multimedia technology, the image data volume increases in a geometric progression, and the problem of how to quickly and effectively retrieve images needs to be solved urgently.
Disclosure of Invention
The embodiment of the application provides an image retrieval method and related equipment, which can realize high-efficiency image retrieval.
In a first aspect, an embodiment of the present application provides an image retrieval method, where the method includes:
acquiring an image to be inquired;
extracting an interested area of the image to be inquired;
inputting the region of interest into a feature extraction network to obtain a first key point set;
determining a target category type of the image to be queried, and screening key points in a preset base according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship;
and determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
In a second aspect, an embodiment of the present application provides an image retrieval apparatus, including: an acquisition unit, an extraction unit, an input unit, a determination unit and a presentation unit, wherein,
the acquisition unit is used for acquiring an image to be inquired;
the extraction unit is used for extracting an interested region of the image to be inquired;
the input unit is used for inputting the region of interest into a feature extraction network to obtain a first key point set;
the determining unit is used for determining a target category type of the image to be queried and screening the key points in a preset base library according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
the input unit is further configured to input the first key point set and the P second key point sets to a feature matching network to obtain a P group matching relationship;
and the display unit is used for determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that the image retrieval method and the related device described in the embodiments of the present application acquire an image to be queried, extract an interested region of the image to be queried, input the interested region to a feature extraction network, obtain a first key point set, determine a target category type of the image to be queried, and screen key points in a preset base according to the target category type, to obtain P second key point sets, where the category type of each key point set is the same as the target category type; the preset base comprises Q second key point sets, each key point set corresponds to an image, Q is a positive integer larger than or equal to P, P is a positive integer, the first key point set and the P second key point sets are input into a feature matching network to obtain P group matching relations, the similarity between the image to be inquired and the image corresponding to the P second key point sets is determined according to the P group matching relations, the image corresponding to the P second key point sets is displayed according to the similarity, on one hand, the key points of corresponding categories in the base can be screened out through the interested area of the image to be inquired, on the other hand, only the corresponding key points are matched, and the image retrieval efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application;
FIG. 2 is a schematic flowchart of another image retrieval method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 4 is a block diagram of functional units of an image retrieval apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The electronic Devices described in the embodiments of the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a vehicle data recorder, a traffic guidance platform, a server, a notebook computer, a Mobile Internet device (MID, Mobile Internet Devices), or a wearable device (e.g., a smart watch, a bluetooth headset), which are merely examples, but are not exhaustive and include, but are not limited to, the above electronic Devices.
The following describes embodiments of the present application in detail.
Referring to fig. 1, fig. 1 is a schematic flow chart of an image retrieval method according to an embodiment of the present application, where the image retrieval method includes:
101. and acquiring an image to be inquired.
In this embodiment of the present application, the image to be queried may be an image including a target object, and the target object may include at least one of the following: face, fingerprint, iris, animal, motion, etc., without limitation.
In specific implementation, the image to be queried can be loaded from a memory or a cloud of the electronic device.
102. And extracting the interested region of the image to be inquired.
In a specific implementation, the region of interest of the image to be queried may be extracted to obtain a region of interest, for example, the region of interest may be a target region where a target is located. The number of the interested areas can be 1 or more, and the image to be inquired is selected and the interested area is selected.
103. And inputting the region of interest into a feature extraction network to obtain a first key point set.
Wherein the feature extraction network may include a neural network model, which may include at least one of: convolutional neural network models, fully-connected neural network models, recurrent neural network models, and the like, without limitation. Namely, the interested area of the query image is input into the feature extraction network, and the key point of the interested area of the target object is extracted.
In the embodiment of the application, the feature extraction network can use a residual convolution network as a backbone network, and simultaneously extract coordinates and descriptors of key points on an image, wherein the network finally outputs a dense descriptor feature map (each pixel point corresponds to one descriptor) and two confidence levels Maps. The two confidence Maps estimate the confidence of the keypoint location (reliability of the keypoint location), and the confidence of its descriptor (reliability of the keypoint descriptor), respectively. Finally, the keypoints are taken from the positions where these two Maps responses are maximized to select local keypoints with better expressive power and robustness. After the key points are selected, the coordinates and the descriptors of the key points are input into a preset graph convolution neural network to obtain the matching relation between the image key points, the designed graph convolution neural network is GNN based on an attention mechanism, feature matching is carried out through simulating human beings, for example, matching key points can be tentatively screened by browsing two images back and forth, and back and forth inspection is carried out, so that the accuracy of key point matching can be improved. After matching the key points on the two images, the final matched key point logarithm is used as the matching similarity of the two images, and the more matched point pairs, the more similar the two images.
104. Determining a target category type of the image to be queried, and screening key points in a preset base according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer.
The target category type may be a category type of the target object, for example, the category type may be a pedestrian, a tree, a puppy, a car, or the like. The preset base library can comprise Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer. In specific implementation, preliminary filtering can be performed according to the target category of the query image, and key point features with the same target category in the base are obtained.
105. And inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship.
The feature matching network is used for realizing a feature matching function. The key point characteristics of the interested region of the query image and the key point characteristics of the targets with the same category in the bottom library can be respectively input into the characteristic matching network to obtain the matching relationship between the key points of the two images.
106. And determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
In the specific implementation, the base library images are sorted by taking the matched key point logarithm as the matching similarity of the two images, then the base library images with the highest topK similarity are returned, and K is a positive integer. P sets of matching relationships each set of matching relationships may include a logarithm of keypoints for successful matching, and the K images are presented in order from high to low according to the logarithm.
According to the image retrieval method based on the interesting region, the key points of the interesting region of the query image and the key points of the bottom library image are respectively extracted by adopting a deep learning algorithm, the key points comprise fine-grained features of the image, and the key points are matched to obtain the similarity between the images, so that the accuracy of image retrieval is improved. In addition, focusing on the interested characteristic region on the image, and quickly finding the target with the same interested region in the bottom library. The technology can also be applied to image retrieval tasks based on the interest areas of pedestrians and commodities.
Optionally, the method may further include the following steps:
s1, acquiring a first sample image, wherein the first sample image corresponds to the initial label;
s2, carrying out affine transformation on the first sample image through an affine transformation matrix, and carrying out enhancement processing on the image after the affine transformation to obtain a second sample image;
s3, inputting the first sample image and the second sample image into a feature extraction network respectively to obtain two groups of outputs, wherein the two groups of outputs are a first confidence coefficient and a first descriptor feature map of the first sample image respectively, and a second confidence coefficient and a second descriptor feature map of the second sample image respectively;
s4, determining coordinates and descriptors of corresponding key points in the two groups of groups according to the initial labels and the radiation change matrix, calculating a total loss function according to the coordinates and descriptors of the corresponding key points, and updating network parameters of the feature extraction network by using the loss function;
s5, executing steps S2-S4 by the feature extraction network after network parameters are updated until the feature extraction network reaches a stable condition, and removing initial key point labels according to the initial labels and the confidence coefficient of the output of the feature extraction network to obtain updated initial labels;
s6, executing steps S2-S4 by the feature extraction network after reaching stable conditions, and realizing that the updated key points are used for replacing the initial labels to optimize the network until the training is finished.
In the embodiment of the application, the feature extraction network may use a residual convolutional network as a backbone network, and the network finally outputs a dense descriptor feature map (each pixel point corresponds to one descriptor) and two confidence Maps. The two confidence Maps estimate the confidence of the keypoint location (reliability of the keypoint location), and the confidence of its descriptor (reliability of the keypoint descriptor), respectively. Finally, the keypoints are taken from the locations where the two Maps responses are maximized. The method simultaneously extracts the coordinates and descriptors of the key points on the image, and can select the local key points with better expression capability and robustness.
In specific implementation, in order to make the robustness of the extracted feature points better and avoid the problem of difficult training caused by random initialization, a Scale Invariant Feature Transform (SIFT) algorithm is used to extract key points of an image as an initial label of the image. Then, a training image is randomly selected and recorded as P1, affine transformation is carried out on the image P1, an affine transformation matrix is recorded as M, and data enhancement (including local or global enhancement such as brightness and saturation) is carried out on the image after affine transformation and recorded as P2.
Then, the images P1 and P2 may be respectively input to the designed feature extraction network to obtain two sets of outputs O1 and O2 (including two confidence maps and descriptor feature maps), and then the coordinates and descriptors of corresponding keypoints in the two sets of outputs O1 and O2 are found according to the initial label and the two image transformation relationship M, a total loss function is calculated, and network parameters are updated. Wherein the network parameters may
Further, steps S2, S3, and S4 may be repeated until the network tends to a stable condition, and then the initial label of the image is updated by removing the initial key point labels on the two Maps that respond less than the threshold according to the initial label of the image and the two Maps output by the network. The stable condition may be preset or default to the system, for example, the stable condition may be that the recognition accuracy of the feature extraction network reaches a certain threshold, or the stable condition may be that the number of training times reaches a specified number.
Further, steps S2, S3, and S4 may be repeated, and the updated keypoint labels are used to replace the initial labels, so as to perform optimization training on the network until the training is completed, for example, when the feature extraction network meets the convergence condition, the training is considered to be completed, or when the training frequency reaches a certain number, the training is considered to be completed.
Optionally, in the step S4, calculating the total loss function according to the coordinates and the descriptors of the corresponding key points, the method may include the following steps:
s41, respectively determining a key point position loss function and a key point descriptor loss function according to the corresponding coordinates and descriptors of the key points;
s42, determining the total loss function according to the key point position loss function and the key point descriptor loss function;
wherein the keypoint location loss function calculates an error of the coordinate location of the corresponding keypoint by an L2 loss function; the keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
In the embodiment of the present application, the loss function of the feature extraction network may be composed of two parts, which include a keypoint location loss and a descriptor loss:
L=Lloc+Ldes
wherein L islocReflecting the similarity of local areas of corresponding key points on the two images for the loss of the key point positions; l isdesThe similarity of the corresponding keypoint descriptors on the two images is reflected for keypoint descriptor loss.
Wherein the keypoint location loss function calculates the error of the coordinate location of the corresponding keypoint from the L2 loss function. The keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
In the embodiment of the application, the characteristic points on the ROI of the query image and the characteristic points on the images of the bottom library are respectively extracted, the characteristic points are matched, the images of the bottom library are sequenced according to the matching result of the characteristic points, and the task of image retrieval is completed.
Optionally, the initial tag is obtained by an artificial marking or deep learning method;
in the step S5, the removing of the initial key point label according to the initial label and the confidence level of the output of the feature extraction network to obtain an updated initial label may include the following steps:
s51, determining a threshold corresponding to the confidence of the output of the feature extraction network;
and S52, eliminating the initial key points with the response less than the threshold value on the confidence coefficient, and updating the initial labels of the corresponding images.
Wherein, different confidences may correspond to different thresholds, or 2 confidences may correspond to the same threshold.
In specific implementation, a threshold corresponding to the confidence of the output of the feature extraction network may be determined, the initial key points with the response smaller than the threshold on the confidence are removed, and the initial labels of the corresponding images are updated.
Optionally, the feature matching network includes a feature point encoder, and the feature point encoder is configured to integrate the positions and descriptions of the previously extracted key points together by using a multi-layer perceptron; the method can also comprise the following steps:
a1, connecting the features output by the feature point encoder by using feature points in and among the formal graphs of the graphs, adding a learnable weight on the edge of the graph as an attention mechanism, and finally obtaining a matching descriptor of the feature points through iteration of the graph network;
a2, scoring the matching of each point according to the feature descriptors to obtain a matching score matrix;
and A3, determining an allocation matrix according to the score matrix, wherein the allocation matrix is used for generating a matching relation.
In the embodiment of the present application, a key point is also called a feature point, and is generally represented by coordinates (x, y) and a descriptor (a string of numbers), where the coordinates represent the position of the key point, and the descriptor may be understood as a string of appearances representing the key point, and integrating them together is equivalent to obtaining a new key point representation, i.e. coupling the visual appearance and the feature point position.
In the embodiment of the present application, the feature matching network may include a feature point encoder, which integrates the positions and descriptions of the previously extracted key points using a multi-layer perceptron, and then connects the feature points in and among the graphs in a graph form (equivalent to a priori knowledge of information flow direction), where a graph (graph) is a single complete graph whose nodes are each feature point in the image, and the graph includes two different undirected edges: one is "Intra-image edges" (self edge), which connects feature points from within the image; the other is "Inter-image edges", which connects the feature points of the present drawing with all the feature points of the other drawing (constituting the edge). Moreover, learnable weight can be added on the edge of the graph to serve as an attention mechanism, and through iteration of a graph network, a matching descriptor f of the feature point is finally obtained, wherein the matching descriptor f can be understood as a string of numbers, is similar to the feature descriptor, and is specially used for feature matching.
Furthermore, matching of each point is scored according to the descriptor (similarity is measured by using the inner product of the descriptor), a matching score matrix is obtained, and due to factors such as shielding and the like, a situation of no matching may occur, so that a row and a column are added at the end of the score matrix for placing characteristic points without matching, and then the optimal transportation problem can be considered from the score matrix to the final distribution matrix, and the optimal transportation problem can be solved by using a traditional differentiable Sinkhorn algorithm. And finally, discarding the points without matching to obtain a distribution matrix P. For example, if there are M feature points and N feature points in the graph a and the graph b, respectively, the matching score matrix is an M × N matrix, which is calculated by the above matching descriptor, and the matching score matrix is the matching score between the feature points on the two graphs, which is expressed by the matrix.
It can be seen that, in the image retrieval method described in the embodiment of the present application, an image to be queried is obtained, an interested region of the image to be queried is extracted, the interested region is input to a feature extraction network, a first key point set is obtained, a target category type of the image to be queried is determined, and key points in a preset base are screened according to the target category type, so as to obtain P second key point sets, where the category type of each key point set is the same as the target category type; the preset base comprises Q second key point sets, each key point set corresponds to an image, Q is a positive integer larger than or equal to P, P is a positive integer, the first key point set and the P second key point sets are input into a feature matching network to obtain P group matching relations, the similarity between the image to be inquired and the image corresponding to the P second key point sets is determined according to the P group matching relations, the image corresponding to the P second key point sets is displayed according to the similarity, on one hand, the key points of corresponding categories in the base can be screened out through the interested area of the image to be inquired, on the other hand, only the corresponding key points are matched, and the image retrieval efficiency is improved.
For example, in the embodiment of the present application, vehicles, non-target vehicles, and pedestrians are targets, and each target can extract many key points, which are recorded as a set (key point set). The key point sets extracted from a plurality of various targets are stored in the bottom library. When the image to be inquired is a vehicle, a key point set (non-vehicle targets are filtered) corresponding to the vehicle is found in the bottom library and is matched with the key point set extracted from the interested region of the image to be inquired respectively, wherein the expression of 'respectively' means that P second key point sets are used for matching with the key point set extracted from the interested region of the image for P times at a time.
Referring to fig. 2, fig. 2 is a schematic flow chart of an image retrieval method provided in an embodiment of the present application, and the image retrieval method is applied to an electronic device, as shown in the figure, the image retrieval method includes:
201. and acquiring an image to be inquired.
202. And extracting the interested region of the image to be inquired.
203. And inputting the region of interest into a feature extraction network to obtain a first key point set.
204. Determining a target category type of the image to be queried, and screening key points in a preset base according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer.
205. And inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship.
206. And sorting the images corresponding to the Q second key point sets according to the P groups of matching relations.
207. And displaying the K images ranked at the top after sorting.
The detailed description of the steps 201 to 207 may refer to the corresponding steps of the image retrieval method described in the above fig. 1, and will not be repeated herein.
It can be seen that, in the image retrieval method described in the embodiment of the present application, an image to be queried is obtained, an interested region of the image to be queried is extracted, the interested region is input to a feature extraction network, a first key point set is obtained, a target category type of the image to be queried is determined, and key points in a preset base are screened according to the target category type, so as to obtain P second key point sets, where the category type of each key point set is the same as the target category type; the preset base comprises Q second key point sets, each key point set corresponds to an image, Q is a positive integer larger than or equal to P, P is a positive integer, the first key point set and the P second key point sets are input into a feature matching network to obtain P groups of matching relations, the images corresponding to the Q second key point sets are sorted according to the P groups of matching relations, K images which are ranked in the front after sorting are displayed, on one hand, key points of corresponding categories in the base can be screened out through the interesting regions of the images to be inquired, on the other hand, only the corresponding key points are matched, and the image retrieval efficiency is improved.
In accordance with the foregoing embodiments, please refer to fig. 3, where fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, which are applied to the electronic device, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:
acquiring an image to be inquired;
extracting an interested area of the image to be inquired;
inputting the region of interest into a feature extraction network to obtain a first key point set;
determining a target category type of the image to be queried, and screening key points in a preset base according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship;
and determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
Optionally, the program further includes instructions for performing the following steps:
s1, acquiring a first sample image, wherein the first sample image corresponds to the initial label;
s2, carrying out affine transformation on the first sample image through an affine transformation matrix, and carrying out enhancement processing on the image after the affine transformation to obtain a second sample image;
s3, inputting the first sample image and the second sample image into a feature extraction network respectively to obtain two groups of outputs, wherein the two groups of outputs are a first confidence coefficient and a first descriptor feature map of the first sample image respectively, and a second confidence coefficient and a second descriptor feature map of the second sample image respectively;
s4, determining coordinates and descriptors of corresponding key points in the two groups of groups according to the initial labels and the radiation change matrix, calculating a total loss function according to the coordinates and descriptors of the corresponding key points, and updating network parameters of the feature extraction network by using the loss function;
s5, executing steps S2-S4 by the feature extraction network after network parameters are updated until the feature extraction network reaches a stable condition, and removing initial key point labels according to the initial labels and the confidence coefficient of the output of the feature extraction network to obtain updated initial labels;
s6, executing steps S2-S4 by the feature extraction network after reaching stable conditions, and realizing that the updated key points are used for replacing the initial labels to optimize the network until the training is finished.
Optionally, in the calculating of the total loss function according to the coordinates and the descriptors of the corresponding key points, the program includes instructions for performing the following steps:
respectively determining a key point position loss function and a key point descriptor loss function according to the corresponding coordinates and descriptors of the key points;
determining the total loss function according to the keypoint location loss function and the keypoint descriptor loss function;
wherein the keypoint location loss function calculates an error of the coordinate location of the corresponding keypoint by an L2 loss function; the keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
Optionally, the initial tag is obtained by an artificial marking or deep learning method;
in the aspect of removing the initial key point label according to the initial label and the confidence coefficient of the output of the feature extraction network to obtain an updated initial label, the program includes instructions for executing the following steps:
determining a threshold corresponding to the confidence of the output of the feature extraction network;
and eliminating the initial key points with the responses smaller than the threshold value on the confidence coefficient, and updating the initial labels of the corresponding images.
Optionally, the feature matching network includes a feature point encoder, and the feature point encoder is configured to integrate the positions and descriptions of the previously extracted key points together by using a multi-layer perceptron; the program further includes instructions for performing the steps of:
connecting the features output by the feature point encoder by using feature points in a formal graph and among graphs, adding a learnable weight on the edge of the graph as an attention mechanism, and finally obtaining a matching descriptor of the feature points through iteration of a graph network;
scoring the matching of each point according to the feature descriptors to obtain a matching score matrix;
and determining a distribution matrix according to the score matrix, wherein the distribution matrix is used for generating a matching relation.
It can be seen that, in the electronic device described in the embodiment of the present application, an image to be queried is obtained, an interested region of the image to be queried is extracted, the interested region is input to a feature extraction network, a first key point set is obtained, a target category type of the image to be queried is determined, and key points in a preset base are screened according to the target category type, so as to obtain P second key point sets, where the category type of each key point set is the same as the target category type; the preset base comprises Q second key point sets, each key point set corresponds to an image, Q is a positive integer larger than or equal to P, P is a positive integer, the first key point set and the P second key point sets are input into a feature matching network to obtain P group matching relations, the similarity between the image to be inquired and the image corresponding to the P second key point sets is determined according to the P group matching relations, the image corresponding to the P second key point sets is displayed according to the similarity, on one hand, the key points of corresponding categories in the base can be screened out through the interested area of the image to be inquired, on the other hand, only the corresponding key points are matched, and the image retrieval efficiency is improved.
Fig. 4 is a block diagram showing functional units of an image search device 400 according to an embodiment of the present application. The image retrieval device 400 is applied to an electronic device, and the device 400 comprises: an acquisition unit 401, an extraction unit 402, an input unit 403, a determination unit 404, and a presentation unit 405, wherein,
the acquiring unit 401 is configured to acquire an image to be queried;
the extracting unit 402 is configured to extract a region of interest of the image to be queried;
the input unit 403 is configured to input the region of interest into a feature extraction network, so as to obtain a first key point set;
the determining unit 404 is configured to determine a target category type of the image to be queried, and filter the key points in a preset base library according to the target category type to obtain P second key point sets, where a category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
the input unit 403 is further configured to input the first keypoint set and the P second keypoint sets into a feature matching network, so as to obtain a P group matching relationship;
the display unit 405 is configured to determine similarity between the image to be queried and the images corresponding to the P second keypoint sets according to the P group matching relationship, and display the images corresponding to the P second keypoint sets according to the similarity.
Optionally, the apparatus 400 is further specifically configured to:
s1, acquiring a first sample image, wherein the first sample image corresponds to the initial label;
s2, carrying out affine transformation on the first sample image through an affine transformation matrix, and carrying out enhancement processing on the image after the affine transformation to obtain a second sample image;
s3, inputting the first sample image and the second sample image into a feature extraction network respectively to obtain two groups of outputs, wherein the two groups of outputs are a first confidence coefficient and a first descriptor feature map of the first sample image respectively, and a second confidence coefficient and a second descriptor feature map of the second sample image respectively;
s4, determining coordinates and descriptors of corresponding key points in the two groups of groups according to the initial labels and the radiation change matrix, calculating a total loss function according to the coordinates and descriptors of the corresponding key points, and updating network parameters of the feature extraction network by using the loss function;
s5, executing steps S2-S4 by the feature extraction network after network parameters are updated until the feature extraction network reaches a stable condition, and removing initial key point labels according to the initial labels and the confidence coefficient of the output of the feature extraction network to obtain updated initial labels;
s6, executing steps S2-S4 by the feature extraction network after reaching stable conditions, and realizing that the updated key points are used for replacing the initial labels to optimize the network until the training is finished.
Optionally, in the aspect of calculating the total loss function according to the coordinates and the descriptors of the corresponding key points, the apparatus 400 is specifically configured to:
respectively determining a key point position loss function and a key point descriptor loss function according to the corresponding coordinates and descriptors of the key points;
determining the total loss function according to the keypoint location loss function and the keypoint descriptor loss function;
wherein the keypoint location loss function calculates an error of the coordinate location of the corresponding keypoint by an L2 loss function; the keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
Optionally, the initial tag is obtained by an artificial marking or deep learning method;
in the aspect that the initial key point label is removed according to the initial label and the confidence level of the output of the feature extraction network to obtain an updated initial label, the apparatus 400 is further specifically configured to:
determining a threshold corresponding to the confidence of the output of the feature extraction network;
and eliminating the initial key points with the responses smaller than the threshold value on the confidence coefficient, and updating the initial labels of the corresponding images.
Optionally, the feature matching network includes a feature point encoder, and the feature point encoder is configured to integrate the positions and descriptions of the previously extracted key points together by using a multi-layer perceptron; the apparatus 400 is further specifically configured to:
connecting the features output by the feature point encoder by using feature points in a formal graph and among graphs, adding a learnable weight on the edge of the graph as an attention mechanism, and finally obtaining a matching descriptor of the feature points through iteration of a graph network;
scoring the matching of each point according to the feature descriptors to obtain a matching score matrix;
and determining a distribution matrix according to the score matrix, wherein the distribution matrix is used for generating a matching relation.
It can be seen that, the image retrieval device described in the embodiment of the present application obtains an image to be queried, extracts an interested region of the image to be queried, inputs the interested region to a feature extraction network, obtains a first key point set, determines a target category type of the image to be queried, and filters key points in a preset base according to the target category type to obtain P second key point sets, where the category type of each key point set is the same as the target category type; the preset base comprises Q second key point sets, each key point set corresponds to an image, Q is a positive integer larger than or equal to P, P is a positive integer, the first key point set and the P second key point sets are input into a feature matching network to obtain P group matching relations, the similarity between the image to be inquired and the image corresponding to the P second key point sets is determined according to the P group matching relations, the image corresponding to the P second key point sets is displayed according to the similarity, on one hand, the key points of corresponding categories in the base can be screened out through the interested area of the image to be inquired, on the other hand, only the corresponding key points are matched, and the image retrieval efficiency is improved.
It can be understood that the functions of each program module of the image retrieval apparatus of this embodiment can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process thereof can refer to the related description of the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An image retrieval method, characterized in that the method comprises:
acquiring an image to be inquired;
extracting an interested area of the image to be inquired;
inputting the region of interest into a feature extraction network to obtain a first key point set;
determining a target category type of the image to be queried, and screening key points in a preset base according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
inputting the first key point set and the P second key point sets into a feature matching network to obtain a P group matching relationship;
and determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
2. The method of claim 1, further comprising:
s1, acquiring a first sample image, wherein the first sample image corresponds to the initial label;
s2, carrying out affine transformation on the first sample image through an affine transformation matrix, and carrying out enhancement processing on the image after the affine transformation to obtain a second sample image;
s3, inputting the first sample image and the second sample image into a feature extraction network respectively to obtain two groups of outputs, wherein the two groups of outputs are a first confidence coefficient and a first descriptor feature map of the first sample image respectively, and a second confidence coefficient and a second descriptor feature map of the second sample image respectively;
s4, determining coordinates and descriptors of corresponding key points in the two groups of groups according to the initial labels and the radiation change matrix, calculating a total loss function according to the coordinates and descriptors of the corresponding key points, and updating network parameters of the feature extraction network by using the loss function;
s5, executing steps S2-S4 by the feature extraction network after network parameters are updated until the feature extraction network reaches a stable condition, and removing initial key point labels according to the initial labels and the confidence coefficient of the output of the feature extraction network to obtain updated initial labels;
s6, executing steps S2-S4 by the feature extraction network after reaching stable conditions, and realizing that the updated key points are used for replacing the initial labels to optimize the network until the training is finished.
3. The method of claim 2, wherein said computing an overall loss function from the coordinates and descriptors of the corresponding keypoints comprises:
respectively determining a key point position loss function and a key point descriptor loss function according to the corresponding coordinates and descriptors of the key points;
determining the total loss function according to the keypoint location loss function and the keypoint descriptor loss function;
wherein the keypoint location loss function calculates an error of the coordinate location of the corresponding keypoint by an L2 loss function; the keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
4. The method of claim 2, wherein the initial label is obtained by an artificial labeling or deep learning method;
the removing of the initial key point label according to the initial label and the confidence coefficient of the output of the feature extraction network to obtain an updated initial label comprises the following steps:
determining a threshold corresponding to the confidence of the output of the feature extraction network;
and eliminating the initial key points with the responses smaller than the threshold value on the confidence coefficient, and updating the initial labels of the corresponding images.
5. The method according to any one of claims 1-4, wherein the feature matching network comprises a feature point encoder for integrating the location and description of previously extracted keypoints together using a multi-tier perceptron; the method further comprises the following steps:
connecting the features output by the feature point encoder by using feature points in a formal graph and among graphs, adding a learnable weight on the edge of the graph as an attention mechanism, and finally obtaining a matching descriptor of the feature points through iteration of a graph network;
scoring the matching of each point according to the feature descriptors to obtain a matching score matrix;
and determining a distribution matrix according to the score matrix, wherein the distribution matrix is used for generating a matching relation.
6. An image retrieval apparatus, characterized in that the apparatus comprises: an acquisition unit, an extraction unit, an input unit, a determination unit and a presentation unit, wherein,
the acquisition unit is used for acquiring an image to be inquired;
the extraction unit is used for extracting an interested region of the image to be inquired;
the input unit is used for inputting the region of interest into a feature extraction network to obtain a first key point set;
the determining unit is used for determining a target category type of the image to be queried and screening the key points in a preset base library according to the target category type to obtain P second key point sets, wherein the category type of each key point set is the same as the target category type; the preset base library comprises Q second key point sets, each key point set corresponds to one image, Q is a positive integer larger than or equal to P, and P is a positive integer;
the input unit is further configured to input the first key point set and the P second key point sets to a feature matching network to obtain a P group matching relationship;
and the display unit is used for determining the similarity between the image to be inquired and the images corresponding to the P second key point sets according to the P group matching relationship, and displaying the images corresponding to the P second key point sets according to the similarity.
7. The apparatus of claim 6, wherein the apparatus is further specifically configured to:
s1, acquiring a first sample image, wherein the first sample image corresponds to the initial label;
s2, carrying out affine transformation on the first sample image through an affine transformation matrix, and carrying out enhancement processing on the image after the affine transformation to obtain a second sample image;
s3, inputting the first sample image and the second sample image into a feature extraction network respectively to obtain two groups of outputs, wherein the two groups of outputs are a first confidence coefficient and a first descriptor feature map of the first sample image respectively, and a second confidence coefficient and a second descriptor feature map of the second sample image respectively;
s4, determining coordinates and descriptors of corresponding key points in the two groups of groups according to the initial labels and the radiation change matrix, calculating a total loss function according to the coordinates and descriptors of the corresponding key points, and updating network parameters of the feature extraction network by using the loss function;
s5, executing steps S2-S4 by the feature extraction network after network parameters are updated until the feature extraction network reaches a stable condition, and removing initial key point labels according to the initial labels and the confidence coefficient of the output of the feature extraction network to obtain updated initial labels;
s6, executing steps S2-S4 by the feature extraction network after reaching stable conditions, and realizing that the updated key points are used for replacing the initial labels to optimize the network until the training is finished.
8. The apparatus according to claim 7, wherein in said computing an overall loss function from the coordinates and descriptors of the corresponding keypoints, the apparatus is specifically configured to:
respectively determining a key point position loss function and a key point descriptor loss function according to the corresponding coordinates and descriptors of the key points;
determining the total loss function according to the keypoint location loss function and the keypoint descriptor loss function;
wherein the keypoint location loss function calculates an error of the coordinate location of the corresponding keypoint by an L2 loss function; the keypoint descriptor loss function calculates the error between the descriptor of the corresponding keypoint and the descriptor of any other non-corresponding keypoint by a triplet loss function.
9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-5.
10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-5.
CN202111486875.3A 2021-12-07 2021-12-07 Image retrieval method and related equipment Pending CN114168768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111486875.3A CN114168768A (en) 2021-12-07 2021-12-07 Image retrieval method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111486875.3A CN114168768A (en) 2021-12-07 2021-12-07 Image retrieval method and related equipment

Publications (1)

Publication Number Publication Date
CN114168768A true CN114168768A (en) 2022-03-11

Family

ID=80483998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111486875.3A Pending CN114168768A (en) 2021-12-07 2021-12-07 Image retrieval method and related equipment

Country Status (1)

Country Link
CN (1) CN114168768A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527050A (en) * 2022-11-29 2022-12-27 南方科技大学 Image feature matching method, computer device and readable storage medium
CN116701695A (en) * 2023-06-01 2023-09-05 中国石油大学(华东) Image retrieval method and system for cascading corner features and twin network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527050A (en) * 2022-11-29 2022-12-27 南方科技大学 Image feature matching method, computer device and readable storage medium
CN116701695A (en) * 2023-06-01 2023-09-05 中国石油大学(华东) Image retrieval method and system for cascading corner features and twin network
CN116701695B (en) * 2023-06-01 2024-01-30 中国石油大学(华东) Image retrieval method and system for cascading corner features and twin network

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN109960742B (en) Local information searching method and device
CN111797893B (en) Neural network training method, image classification system and related equipment
CN107944020B (en) Face image searching method and device, computer device and storage medium
CN105354307B (en) Image content identification method and device
CN107944450B (en) License plate recognition method and device
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
JP2017062781A (en) Similarity-based detection of prominent objects using deep cnn pooling layers as features
CN110851641B (en) Cross-modal retrieval method and device and readable storage medium
CN110781911B (en) Image matching method, device, equipment and storage medium
CN110765954A (en) Vehicle weight recognition method, equipment and storage device
CN109492576B (en) Image recognition method and device and electronic equipment
CN110222718B (en) Image processing method and device
CN114168768A (en) Image retrieval method and related equipment
KR102468309B1 (en) Method for searching building based on image and apparatus for the same
CN112418195A (en) Face key point detection method and device, electronic equipment and storage medium
CN115115825A (en) Method and device for detecting object in image, computer equipment and storage medium
CN114972947B (en) Depth scene text detection method and device based on fuzzy semantic modeling
CN117011566A (en) Target detection method, detection model training method, device and electronic equipment
CN115100469A (en) Target attribute identification method, training method and device based on segmentation algorithm
CN110852102B (en) Chinese part-of-speech tagging method and device, storage medium and electronic equipment
CN113569600A (en) Method and device for identifying weight of object, electronic equipment and storage medium
CN111814865A (en) Image identification method, device, equipment and storage medium
CN111931680A (en) Vehicle weight recognition method and system based on multiple scales
CN117351246B (en) Mismatching pair removing method, system and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination