CN112765381A - Image retrieval method, electronic equipment and related product - Google Patents

Image retrieval method, electronic equipment and related product Download PDF

Info

Publication number
CN112765381A
CN112765381A CN202110063246.3A CN202110063246A CN112765381A CN 112765381 A CN112765381 A CN 112765381A CN 202110063246 A CN202110063246 A CN 202110063246A CN 112765381 A CN112765381 A CN 112765381A
Authority
CN
China
Prior art keywords
feature
features
local
target
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110063246.3A
Other languages
Chinese (zh)
Inventor
贺武
范艳
张鹏
吴伟华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN HARZONE TECHNOLOGY CO LTD
Original Assignee
SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN HARZONE TECHNOLOGY CO LTD filed Critical SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority to CN202110063246.3A priority Critical patent/CN112765381A/en
Publication of CN112765381A publication Critical patent/CN112765381A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an image retrieval method, electronic equipment and related products, wherein the method comprises the following steps: acquiring a target area in a target image; when the target area is the interested area, determining a first global descriptor according to the index of the target object where the interested area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects; extracting features of each target object in the N target objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features; acquiring local characteristics of reference key points of the region of interest, and aggregating the local characteristics of the reference key points into second region aggregated local characteristics with fixed length; and comparing the second region aggregation local features with the N first region aggregation local features to obtain comparison results, and determining K reference objects based on the comparison results. By adopting the embodiment of the application, the image retrieval efficiency can be improved.

Description

Image retrieval method, electronic equipment and related product
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image retrieval method, an electronic device, and a related product.
Background
Image retrieval, also known as graph search, is a technique for converting a query image into vector features and returning an image with the highest similarity to the query image by a similarity search engine. Common commercial image search engines include google image search, Baidu image search, and Taobao pat. However, at present, the efficiency of searching the image with the image is low, and therefore, how to improve the efficiency of searching the image with the image needs to be solved.
Disclosure of Invention
The embodiment of the application provides an image retrieval method and a related product, and the image retrieval efficiency can be improved.
In a first aspect, an embodiment of the present application provides an image retrieval method, which is applied to an electronic device, and the method includes:
acquiring a target area in a target image;
when the target area is an interesting area, determining a first global descriptor according to an index of a target object where the interesting area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects, and N is a positive integer;
extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features corresponding to the region of interest of the query object;
acquiring the local features of the reference key points of the region of interest, and aggregating the local features of the reference key points into a second region aggregated local feature with a fixed length;
comparing the second region aggregation local features with the N first region aggregation local features to obtain comparison results according to region overlapping area weighted similarity, determining K reference objects based on the comparison results, and taking the K reference objects as second search results, wherein the N reference objects comprise the K reference objects.
In a second aspect, an embodiment of the present application provides an image retrieval apparatus, which is applied to an electronic device, and the apparatus includes: an acquisition unit, a search unit, an extraction unit, an aggregation unit, and a determination unit, wherein,
the acquisition unit is used for acquiring a target area in a target image;
the searching unit is configured to, when the target area is an area of interest, determine a first global descriptor according to an index where a target object where the area of interest is located, and perform a search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, and N is a positive integer;
the extraction unit is used for extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features corresponding to the region of interest of the query object;
the aggregation unit is used for acquiring the local features of the reference key points of the region of interest and aggregating the local features of the reference key points into the aggregated local features of the second region with fixed length;
the determining unit is configured to compare the second region aggregation local feature with the N first region aggregation local features to obtain a comparison result according to the region overlapping area weighted similarity, determine K reference objects based on the comparison result, and use the K reference objects as a second search result, where the N reference objects include the K reference objects.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
The embodiment of the application has the following beneficial effects:
it can be seen that the image retrieval method, the electronic device and the related product described in the embodiments of the present application are applied to an electronic device, obtain a target region in a target image, determine a first global descriptor according to an index of a target object where the region of interest is located when the target region is the region of interest, and perform a search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, where N is a positive integer, perform feature extraction on each of the N reference objects to obtain local features of key points, and aggregate N first region aggregate local features corresponding to the region of interest of an inquiry object, obtain local features of reference key points of the region of interest, aggregate the local features of the reference key points into second region aggregate local features of a fixed length, compare the second region aggregate local features with the N first region aggregate local features, the comparison result is obtained according to the region overlapping area weighted similarity, K reference objects are determined based on the comparison result, the K reference objects serve as second search results, and the N reference objects comprise the K reference objects, so that the number of key points and key point feature aggregation compression dimensionality are reduced by using a global feature and local feature simultaneous extraction mode and combining key point feature selection, meanwhile, requirements for a memory and a processor are met through a strategy method of image ROI region retrieval, and image retrieval efficiency based on the ROI region is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 1B is a schematic flowchart of an image retrieval method according to an embodiment of the present application;
fig. 1C is a schematic diagram illustrating key point selection provided in an embodiment of the present application;
FIG. 1D is a schematic illustration of a demonstration of a learnable re-ordering model training provided by an embodiment of the present application;
fig. 1E is a schematic structural diagram of a transform network according to an embodiment of the present application;
FIG. 1F is a schematic illustration of a stacked transform encoder module according to an embodiment of the present disclosure;
FIG. 2 is a schematic flowchart of another image retrieval method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of another electronic device provided in an embodiment of the present application;
fig. 4 is a block diagram of functional units of an image retrieval apparatus according to an embodiment of the present application.
Detailed Description
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may include other steps or elements not listed or inherent to such process, method, article, or apparatus in one possible example.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The electronic device according to the embodiment of the present application may be a handheld device, an intelligent robot, a vehicle-mounted device, a wearable device, a computing device or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), a mobile station (mobile station, MS), a terminal device (terminal device), and the like, and the electronic device may also be a server or an intelligent home device.
In the embodiment of the application, the smart home device may be at least one of the following: refrigerator, washing machine, electricity rice cooker, intelligent (window) curtain, intelligent lamp, intelligent bed, intelligent garbage bin, microwave oven, steam ager, air conditioner, lampblack absorber, server, intelligent door, smart window, door wardrobe, intelligent audio amplifier, intelligent house, intelligent chair, intelligent clothes hanger, intelligent shower, water dispenser, water purifier, air purifier, doorbell, monitored control system, intelligent garage, TV set, projecting apparatus, intelligent dining table, intelligent sofa, massage armchair, treadmill etc. of course, can also include other equipment.
As shown in fig. 1A, fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device includes a processor, a Memory, a signal processor, a transceiver, a display screen, a speaker, a microphone, a Random Access Memory (RAM), a camera, a sensor, a network module, and the like. The storage, the signal processor DSP, the loudspeaker, the microphone, the RAM, the camera, the sensor and the network module are connected with the processor, and the transceiver is connected with the signal processor.
The Processor is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, executes various functions and processes data of the electronic device by running or executing software programs and/or modules stored in the memory and calling the data stored in the memory, thereby performing overall monitoring on the electronic device, and may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU) or a Network Processing Unit (NPU).
Further, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.
The memory is used for storing software programs and/or modules, and the processor executes various functional applications and image retrieval of the electronic equipment by running the software programs and/or modules stored in the memory. The memory mainly comprises a program storage area and a data storage area, wherein the program storage area can store an operating system, a software program required by at least one function and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
Wherein the sensor comprises at least one of: light-sensitive sensors, gyroscopes, infrared proximity sensors, vibration detection sensors, pressure sensors, etc. Among them, the light sensor, also called an ambient light sensor, is used to detect the ambient light brightness. The light sensor may include a light sensitive element and an analog to digital converter. The photosensitive element is used for converting collected optical signals into electric signals, and the analog-to-digital converter is used for converting the electric signals into digital signals. Optionally, the light sensor may further include a signal amplifier, and the signal amplifier may amplify the electrical signal converted by the photosensitive element and output the amplified electrical signal to the analog-to-digital converter. The photosensitive element may include at least one of a photodiode, a phototransistor, a photoresistor, and a silicon photocell.
The camera may be a visible light camera (general view angle camera, wide angle camera), an infrared camera, or a dual camera (having a distance measurement function), which is not limited herein.
The network module may be at least one of: a bluetooth module, a wireless fidelity (Wi-Fi), etc., which are not limited herein.
Referring to fig. 1B, fig. 1B is a schematic flowchart of an image retrieval method according to an embodiment of the present application, and as shown in the drawing, the image retrieval method is applied to the electronic device shown in fig. 1A, and includes:
101. a target region in a target image is acquired.
In this embodiment of the present application, the target image may be an image including a target, the target may be set by a user or default by a system, and the target may be at least one of the following: human face, human body, license plate, vehicle, etc., without limitation. The target region may be a target object or a region of interest in a target object.
102. When the target area is an interesting area, determining a first global descriptor according to an index of a target object where the interesting area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects, and N is a positive integer.
In a specific implementation, when a target area is an area of interest, the electronic device may search from a database of the electronic device according to an index where a target object where the area of interest is located to obtain a first global descriptor, and search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, and N is a positive integer.
103. And extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features corresponding to the region of interest of the query object.
The electronic device may perform feature extraction on each of the N reference objects, where the feature extraction includes multi-branch convolutional network extraction features, and a specific manner of aggregation features of each branch may be at least one of the following: GeM, REMAP, MAC, SPcC, etc., without limitation, derive global descriptor features. In a specific implementation, based on the above operation for extracting the global descriptor, a plurality of aggregation features are adopted for fusion, usually one aggregation feature is branched, all the branches are fused to generate a descriptor with discriminative power, and the fused features are subjected to PCA whitening dimensionality reduction to serve as a first global descriptor.
Optionally, in the step 103, the feature extraction performed on each of the N reference objects may include the following steps:
31. acquiring a characteristic diagram of a reference object a, wherein the reference object a is any one of the N reference objects;
32. acquiring a first feature of the feature map after the channel average reduction, acquiring a second feature of the feature map after the global average pooling, and acquiring an L2 distance of a neighborhood feature of the feature map in a preset range;
33. determining a standard deviation of the first feature and the second feature;
34. performing weighting operation according to the standard deviation and the L2 distance of the neighborhood characteristics to obtain a first operation result;
35. and selecting the key point local features of the reference object a based on the first operation result and a threshold value.
In a specific implementation, taking the reference object a as an example, the reference object a is any one of the N reference objects, and the preset range may be set by a user or default by a system. Specifically, the electronic device may obtain a feature map of the reference object a, obtain a first feature of the feature map after channel average reduction, obtain a second feature of the feature map after global average pooling, obtain an L2 distance of a neighborhood feature of the feature map within a preset range, determine a standard deviation of the first feature and the second feature, perform weighting operation according to the standard deviation and the L2 distance of the neighborhood feature, obtain a first operation result, and determine a local feature of a key point of the reference object a based on the first operation result in combination with a threshold selection, for example, a result within a certain range is selected.
In a specific implementation, the key point feature extraction is to select a key point with high responsivity (responsivity is in a preset responsivity range) and high distinguishability (distinguishability is in a preset distinguishability range) as a descriptor of a local feature. High responsiveness, i.e. features that contain a rich amount of information, and features of the same region can be found in different images. And high discrimination, i.e. features, are representative areas and can distinguish between different images. For example, the upper surface of a label at the upper right corner of a front window of a car is white, and the lower surface of the label is blue, or an inside rearview mirror is provided with a red ornament, such a region belongs to high-response high-discrimination, while most of other regions are basically similar to the same type of car, and may have high-response alone and not necessarily have high-discrimination. In order to enable the features to have two characteristics at the same time, a high-response high-discrimination attention module is adopted to learn the key point features with high responsiveness and high discrimination.
Further, as shown in fig. 1C, the high-response high-resolution attention module specifically operates as follows: firstly, a Conv-Bn-Relu operation reduction channel with the kernel size of 1 is carried out on an input feature map and is divided into two branches, wherein the high-responsivity branch carries out standard deviation on a channel mean value feature after average reduction channel (channel mean reduction) and a global mean value feature after global average pooling (global average pooling), in addition, the high-distinguishability branch carries out distance L2 and weighted summation on neighborhood features in a certain range of feature points, then the two branches are subjected to element product operation, then the Conv operation reduction channel with the kernel size of 1 is 1, and an attention score map is obtained through softplus activation. Optionally, in the training phase, the attention score map may further perform dot product with the input feature map after L2 normaize, that is, obtain an L2 normalized result of the feature map, perform vector dot product operation on the normalized result and the first operation result to obtain the attention score map of HxWx1, perform cross entropy loss optimization after reshape, and finally select all channels at corresponding positions on the input feature map according to the attention score map and a threshold value to obtain the local feature of the high-response high-resolution key point.
In a specific implementation, a larger standard deviation indicates more discrete data and a larger amplitude variation. The high-responsiveness branch essentially screens the high-responsiveness edge points with large gradient changes, and the standard deviation of the pixel point channel characteristics is expressed as:
Figure BDA0002903131240000081
the high-discrimination branch generates a weighted sum of a weight with the radius of 2 and a characteristic point L2 model distance by a Gaussian function, and the weighted sum is expressed as:
Figure BDA0002903131240000082
finally, the element product of the two branches represents an attention score map with high response and high distinction at the same time.
104. And acquiring the local features of the reference key points of the region of interest, and aggregating the local features of the reference key points into the first region aggregated local features with fixed length.
In a specific implementation, if each keypoint feature descriptor is independently stored in the database, huge costs of memory and search calculation are caused, and therefore, in the embodiment of the present application, any number of keypoint features in a certain region are further aggregated on the basis of keypoint feature selection, so as to generate a differentiated regional local aggregation descriptor vector. Dividing a human-vehicle object target feature map in a traffic scene into k x k grids, wherein each grid corresponds to a local aggregation descriptor vector and is responsible for feature representation of an original image receptive field area taking the grid as a center. In image ROI region retrieval, the local aggregation descriptor of the mesh where the ROI region overlaps most with the receptive field region is used as a query feature.
Optionally, in the step 104, aggregating the reference keypoint local features into a second region aggregated local feature with a fixed length may include the following steps:
41. acquiring a local aggregation descriptor matrix corresponding to the local feature of the reference key point;
42. acquiring a characteristic vector of the target area, and dividing the characteristic vector into c sections;
43. b key point features are obtained from the reference key point local features;
44. traversing each key point feature in the b key point features, determining the clustering center of each segment vector in the c segments, and obtaining the nearest clustering center vector of each clustering center;
45. determining a residual vector according to the clustering center vector and the sub-segment vectors corresponding to the clustering center vector;
46. accumulating the residual vector to the segmentation position of the local aggregation descriptor matrix corresponding to the residual vector, and recording the accumulation times corresponding to the corresponding clustering center;
47. determining an average result matrix based on the accumulated local aggregation descriptor matrix and the accumulation times;
48. and determining the second region aggregation local features according to the average result matrix.
In a specific implementation, the electronic device may obtain a local aggregation descriptor matrix corresponding to a local feature of a reference key point, obtain a feature vector of a target region, divide the feature vector into c segments, where c is a positive integer, obtain b key point features from the local feature of the reference key point, and b is a positive integer, traverse each key point feature of the b key point features, determine a cluster center of each segment vector in the c segments, obtain a cluster center vector nearest to each cluster center, determine a residual vector according to the cluster center vector and a sub-segment vector corresponding thereto, add the residual vector to a segment position of the local aggregation descriptor matrix corresponding to the residual vector, and record an accumulated number corresponding to the cluster center corresponding thereto, determine an average result matrix based on the accumulated local aggregation descriptor matrix and the accumulated number, i.e., divide the accumulated local aggregation descriptor matrix by the accumulated number, and finally, carrying out PCA whitening dimensionality reduction to obtain a regional local aggregation descriptor vector with a fixed length.
Before step 101, in a training phase, keypoint features may also be extracted from all samples in the training set, assuming that the dimension of the feature vector D is 1024 dimensions, and the original dimension of the feature vector is divided into M-32 segments, so that the dimension of each sub-segment sub-D is 32 dimensions. Clustering each segment by using K-mean, and assuming that each segment of the clusters K is 16 types, obtaining M-K sized codebooks after clustering respectively; with the codebook, the reference keypoint local features may be converted into a local aggregation descriptor matrix.
Taking the traffic scene image as an example, in the embodiment of the present application, the electronic device may divide the human-vehicle object target feature map in the traffic scene into k × k grids, where each grid corresponds to one local aggregation descriptor vector and is responsible for feature representation of an original image receptive field region centered on the grid. In image ROI region retrieval, the local aggregation descriptor of the mesh where the ROI region overlaps most with the receptive field region is used as a query feature.
In a specific implementation, a method for generating a local aggregation descriptor vector is as follows: and extracting key point features for all samples of the training set, and assuming that the dimension of the feature vector D is 1024 dimensions, dividing the original dimension of the feature vector into M-32 sections, so that the dimension of each sub-section sub-D is 32 dimensions. Clustering each segment by using K-mean, and assuming that each segment of the clusters K is 16 types, obtaining M-K sized codebooks after clustering respectively; after a codebook exists, initializing a local aggregation descriptor matrix O of a query target, acquiring any number of key point characteristics after the query target is subjected to a key point characteristic selection method, traversing each key point characteristic, searching a nearest clustering center id of each sub-segment vector, taking the nearest clustering center vector according to the id, and subtracting the nearest clustering center vector from a sub-D characteristic vector of an original sub-segment to obtain a residual vector resD; accumulating the residual error vector resD to the position of a local aggregation descriptor matrix O section corresponding to the clustering center id, and recording the accumulated times of the id; dividing the local aggregation descriptor matrix O by the accumulation times of the corresponding segmentation positions to obtain an average result matrix; and expanding and connecting the average result matrixes of each row to obtain a vector with the dimension of K x D, and finally, carrying out PCA (principal component analysis) dimension reduction to obtain a local aggregation descriptor vector with a fixed length.
105. Comparing the second region aggregation local features with the N first region aggregation local features to obtain comparison results according to region overlapping area weighted similarity, determining K reference objects based on the comparison results, wherein the N reference objects comprise the K reference objects, and taking the K reference objects as second search results.
In a specific implementation, the electronic device may compare the local feature of the region aggregation with the local features of the N second regions aggregation, that is, compare the similarity to obtain a comparison result, and determine K reference objects based on the comparison result, that is, reference objects corresponding to K comparison values with the front similarity, where the N reference objects include the K reference objects, and the K reference objects are used as second search results.
Optionally, in the step 105, determining K reference objects based on the comparison result, and using the K reference objects as a second search result, the method may include the following steps:
51. obtaining image indexes for determining K reference objects based on the comparison result;
52. performing a learnable reordering check based on the image index;
53. when the verification result is consistent, directly taking the K reference objects as the second search result;
54. and when the check result is inconsistent, rearranging the K reference objects, and taking the K rearranged reference objects as the second search result.
In specific implementation, the electronic device may obtain image indexes for determining the K reference objects based on the comparison result, further perform learnable reordering verification based on the image indexes, directly use the K reference objects as the second search result when the verification results are consistent, rearrange the K reference objects when the verification results are inconsistent, use the rearranged K reference objects as the second search result, and display the second search result.
In specific implementation, reordering (also called query expansion) is a commonly used technology in image retrieval, and due to changes of illumination, posture, visual angle and shielding, a positive sample image may not be in the topK similar features, but is in a retrieval subset of the topK similar features, so that correlation information can be obtained by processing the originally queried topK similar features, new expanded query features are generated, and original query results are reordered by using the new query features, thereby improving recall rate and accuracy. In the embodiment of the application, features similar to the previous TopK in the original query may be used as input, a transform model stacked on the self-attention encoder module is trained, and related information between similar items of different topks is learned, so that an optimal new extended query feature is generated for all related features ranked at the top. The method can perform reordering operation on the retrieval results of the global features and the key point aggregation features. In the training phase, a trained image ROI retrieval model can be used for extracting features of a training data set, a new training data set is generated according to retrieval results, each type of data comprises a feature similar to a query feature and topK, and a TripletLoss loss function is adopted for optimizing a transform self-attention model.
In particular, as shown in FIG. 1D, self-attention is a mechanism by which the transform model relates current input features to other input features, and this correlation information can be used well to learn the weight parameters needed to generate new query features. In actual training, the sum of the input features and the ranking position codes shares information through a multi-head attention mechanism, and then a layer normalization layer and a full connection layer are carried out, and a feed-forward neural network is used for remapping the input features into a new embedding space. By stacking multiple identical encoders to increase the capacity of the model, more context information can be shared. And finally, in the transformed embedding space, the dot product (similarity) between the query embedding feature and other topK embedding features is solved as the weight (formula 1) for measuring the similarity between the original query feature and the topK, the query feature weighting method is useful in other reordering methods, in the embodiment of the application, the weight is obtained by using a learning method, the weight and the original topK similar features are subjected to dot product and accumulated summation, finally, L2 is normalized to obtain a new query feature (formula 2), and the query feature is iteratively optimized by a TripletLoss loss function. The schematic diagram of the inference process structure is shown in fig. 1E, and the stacked transform encoder module is shown in fig. 1F.
Figure BDA0002903131240000111
Figure BDA0002903131240000112
sim (,) is a dot product operation; norm (.) is normalized by L2 norm; w is aiThe weight value of the similarity can be learned,
Figure BDA0002903131240000113
the similarity weights and topk similar feature dot products can be learned and summed up.
Optionally, in the step 54, the rearranging the K reference objects may include the following steps:
541. inputting the features to be queried and the features corresponding to the K reference objects into a preset learnable reordering model to obtain new extended query features, retrieving the preset learnable reordering model for obtaining local feature aggregation of all training sets to obtain X query features and corresponding X retrieval results, sequencing and coding each query feature and retrieval result, sending the sequence and coded result into a stacked self-attention coding model to obtain extended query features, and performing iterative optimization on model parameters of the stacked self-attention coding model by using the extended query features and a specified loss function to obtain the extended query features, wherein X is a positive integer;
542. searching based on the new expanded query features to obtain the K rearranged reference objects
In specific implementation, the designated loss function may be a tripletLoss loss (loss function), the electronic device may obtain region aggregation local features of all training sets, perform N-to-N retrieval through the region aggregation local features, obtain N query features and N most similar features corresponding to the N query features, send each query feature and the most similar features corresponding to the topK to a stacking self-attention coding model after being encoded in order, generate a new extended query feature, and perform iterative optimization using the tripletLoss loss function, where the optimized converged stacking self-attention coding model is the preset learnable reordering model. The electronic equipment can obtain a learnable reordering model of the regional aggregation local features, when reordering, the query features and the most similar features of topK are directly input into the learnable reordering model to obtain new expanded query features, searching is performed again to obtain new topK similar features, the secondary features are taken as retrieval results, and the learnable reordering model of the global descriptor can also be obtained through the steps. The new expanded query feature contains the correlation information of the most similar feature of the query feature and the topK, so that the method has richer distinguishing features, and the retrieval precision can be effectively improved by using the feature to query again.
Further, optionally, after the step 101, the following step may be further included:
a1, when the target area is a target object, determining a second global feature descriptor of the target object;
and A2, searching according to the second global feature descriptor to obtain a third retrieval result.
In a specific implementation, when the target area is the target object, the electronic device may determine a second global feature descriptor of the target object, and then search in the preset database according to the second global feature descriptor to obtain a third search result, and may further display the third search result.
In specific implementation, the embodiment of the application provides an end-to-end global and local image retrieval algorithm, the algorithm uses a residual convolutional network as a backbone network, adopts multi-branch joint training, and simultaneously extracts a global descriptor and a local descriptor. The global descriptor may be fused by using a plurality of feature aggregation methods to generate a descriptor with discriminative power, and the fused aggregated feature may include at least one of the following: GeM, REMAP, MAC, SPcC, etc., without limitation. The fusion features can be subjected to PCA whitening dimension reduction to be used as a final global descriptor; the local descriptor draws attention module selection key point characteristics on dense characteristics so as to obtain descriptors with high responsivity and high distinguishing degree, and finally, as the number of key points in the ROI area of the selected image is random, the local descriptor of the area with fixed length is generated by using a key point characteristic aggregation algorithm in the embodiment of the application. The algorithm is used as a core function for realizing image retrieval by taking one component of the whole image searching system, and the overall processing flow of the system is as follows:
for example, the electronic device may be a server, and the image retrieval method in the embodiment of the present application may include the following steps:
1. uploading an image containing a target object by a client;
2. the server downloads the image through the process and calls a structured algorithm;
3. carrying out target detection identification and feature extraction by a structured algorithm;
4. returning the structured data and searching the graph characteristics by the graph, and storing the global characteristics into a global descriptor base;
5. the client selects a target object or a target ROI area;
6. if the target object is the target object, finding a corresponding global feature descriptor in the base according to the index;
7. carrying out 1vN retrieval comparison on the global feature descriptor of the target object, and returning a topK retrieval result of the global feature;
8. if the target ROI area is the target ROI area, finding a corresponding global feature descriptor according to the index of a target object where the ROI area is located to perform target retrieval;
9. returning topN target object indexes with the highest feature similarity, and temporarily extracting key point local features of the topN target;
10. selecting local characteristics of key points in a target ROI area, and aggregating the local characteristics into a local characteristic descriptor with a fixed length;
11. and (5) carrying out local feature retrieval comparison in the topN object range, and then topK images with the highest similarity.
12. The images with the highest topK similarity are subjected to learnable reordering verification, and topK retrieval results of local feature rearrangement are returned
Wherein, the retrieval results of step 7 and step 12 can also be displayed at the client.
In a specific implementation, taking a traffic scene as an example, after a large amount of feature data is accumulated in a database, query performance and precision gradually decrease. Specifically, the traffic scene image may be divided into k × k grids, which are physically stored into a plurality of parts according to a certain rule, and the corresponding grid only needs to query and compare vectors of the corresponding grid in the database. The image ROI area retrieval method is used as a detailed description of sub-modules of a map searching system (system overall flow steps 8-12): firstly, carrying out structural analysis on a target image, extracting a global feature vector, and storing the global feature vector into a database; selecting a target ROI (region of interest) from a client, performing secondary structural analysis on an original image, and extracting a global feature vector and a key point local feature vector; calling a Milvus search according to the global feature vector to return topN most similar images; performing secondary structural analysis on the topN most similar images, and extracting global features and key point local features; finding a corresponding grid according to an image ROI (region of interest), and aggregating the local features of key points in the grid according to a codebook to generate topN local aggregation descriptor vectors; carrying out 1vN violence retrieval comparison in the topN object range, and returning topK local feature most similar images; and performing learnable reordering verification on the key points of the images with the most similar topK local features, and returning topK retrieval results of local feature reordering.
In one possible example, when the target object is a human face, between steps 101 to 102, the following steps may be further included:
b1, acquiring an image quality evaluation value of the target object in the target image;
b2, when the image quality evaluation value is larger than the preset image quality evaluation value, executing step 102.
In this embodiment, the preset image quality evaluation value may be pre-stored in the electronic device, and may be set by the user or default by the system.
In a specific implementation, the electronic device may perform image quality evaluation on the facial images in the facial image set by using at least one image quality evaluation index to obtain a plurality of facial image quality evaluation values, where the image quality evaluation index may be at least one of the following: face deviation degree, face integrity degree, definition degree, feature point distribution density, average gradient, information entropy, signal-to-noise ratio and the like, which are not limited herein. Further, the electronic device may execute step 102 from when the image quality evaluation value is greater than a preset image quality evaluation value. The human face deviation degree is the deviation degree between the human face angle in the image and the human face angle of the front face, and the human face integrity degree is the ratio of the area of the human face in the image to the area of the complete human face.
In one possible example, the step B1, acquiring the image quality evaluation value of the target object in the target image, may include the following steps:
b11, acquiring a target face deviation degree of a target object, a target face integrity degree of the target object, a target feature point distribution density of the target object and a target information entropy;
b12, when the target face deviation degree is greater than a preset deviation degree and the target face integrity degree is greater than a preset integrity degree, determining a target first reference evaluation value corresponding to the target face deviation degree according to a mapping relation between the preset face deviation degree and the first reference evaluation value;
b13, determining a target second reference evaluation value corresponding to the target face integrity according to a preset mapping relation between the face integrity and the second reference evaluation value;
b14, determining a target weight pair corresponding to the target feature point distribution density according to a preset mapping relation between the feature point distribution density and the weight pair, wherein the target weight pair comprises a target first weight and a target second weight, the target first weight is a weight corresponding to the first reference evaluation value, and the target second weight is a weight corresponding to the second reference evaluation value;
b15, performing weighted operation according to the target first weight, the target second weight, the target first reference evaluation value and the target second reference evaluation value to obtain a first reference evaluation value;
b16, determining a first image quality evaluation value corresponding to the target feature point distribution density according to a preset mapping relation between the feature point distribution density and the image quality evaluation value;
b17, determining a target image quality deviation value corresponding to the target information entropy according to a mapping relation between a preset information entropy and an image quality deviation value;
b18, acquiring a first shooting parameter of the target object;
b19, determining a target optimization coefficient corresponding to the first shooting parameter according to a preset mapping relation between the shooting parameter and the optimization coefficient;
b20, adjusting the first image quality evaluation value according to the target optimization coefficient and the target image quality deviation value to obtain a second reference evaluation value;
b21, acquiring a target environment parameter corresponding to the target object;
b22, determining a target weight coefficient pair corresponding to the target environment parameter according to a mapping relation between preset environment parameters and the weight coefficient pair, wherein the target weight coefficient pair comprises a target first weight coefficient and a target second weight coefficient, the target first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, and the target second weight coefficient is a weight coefficient corresponding to the second reference evaluation value;
b23, performing weighting operation according to the target first weight coefficient, the target second weight coefficient, the first reference evaluation value and the second reference evaluation value to obtain a face image quality evaluation value of the target object.
In the embodiment of the application, the preset deviation degree and the preset integrity degree can be set by a user or defaulted by a system, and the preset deviation degree and the preset integrity degree can be successfully recognized by the human face only if the preset deviation degree and the preset integrity degree are within a certain range. The electronic device may pre-store a mapping relationship between a preset face deviation degree and a first reference evaluation value, a mapping relationship between a preset face integrity degree and a second reference evaluation value, and a mapping relationship between a preset feature point distribution density and a weight pair, where the weight pair may include a first weight and a second weight, a sum of the first weight and the second weight is 1, the first weight is a weight corresponding to the first reference evaluation value, and the second weight is a weight corresponding to the second reference evaluation value. The electronic device may further store a mapping relationship between a preset feature point distribution density and an image quality evaluation value, a mapping relationship between a preset information entropy and an image quality deviation value, a mapping relationship between a preset shooting parameter and an optimization coefficient, and a mapping relationship between a preset environment parameter and a weight coefficient pair in advance. The weight coefficient pair may include a first weight coefficient and a second weight coefficient, the first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, the second weight coefficient is a weight coefficient corresponding to the second reference evaluation value, and a sum of the first weight coefficient and the second weight coefficient is 1.
The value range of the image quality evaluation value can be 0-1, or 0-100. The image quality deviation value may be a positive real number, for example, 0 to 1, or may be greater than 1. The value range of the optimization coefficient can be-1 to 1, for example, the optimization coefficient can be-0.1 to 0.1. In the embodiment of the present application, the shooting parameter may be at least one of the following: exposure time, shooting mode, sensitivity ISO, white balance parameters, focal length, focus, region of interest, etc., without limitation. The environmental parameter may be at least one of: ambient brightness, ambient temperature, ambient humidity, weather, atmospheric pressure, magnetic field interference strength, etc., and are not limited thereto.
In specific implementation, the electronic device may obtain a target face deviation degree of the target object, a target face integrity degree of the target object, a target feature point distribution density of the target object, and a target information entropy, where the target feature point distribution density may be a ratio between a total number of feature points of the target object and an area of the target object.
Furthermore, when the degree of deviation of the target face is greater than the preset degree of deviation and the degree of integrity of the target face is greater than the preset degree of integrity, the electronic device may determine a target first reference evaluation value corresponding to the degree of deviation of the target face according to a mapping relationship between the preset degree of deviation of the face and the first reference evaluation value, may also determine a target second reference evaluation value corresponding to the degree of integrity of the target face according to a mapping relationship between the preset degree of integrity of the face and the second reference evaluation value, and determine a target weight pair corresponding to the distribution density of the target feature points according to a mapping relationship between the preset feature point distribution density and the weight pair, where the target weight pair includes a target first weight and a target second weight, the target first weight is a weight corresponding to the first reference evaluation value, and the target second weight is a weight corresponding to the second reference evaluation value, and then, may determine the target first weight, the target second weight, the, And performing weighted operation on the target second weight, the target first reference evaluation value and the target second reference evaluation value to obtain a first reference evaluation value, wherein a specific calculation formula is as follows:
the first reference evaluation value is a target first reference evaluation value and a target first weight and the target second reference evaluation value is a target second weight
Furthermore, the quality of the image can be evaluated in terms of the human face angle and the human face integrity.
Further, the electronic device may determine a first image quality evaluation value corresponding to the target feature point distribution density according to a mapping relationship between a preset feature point distribution density and an image quality evaluation value, and determine a target image quality deviation value corresponding to the target information entropy according to a mapping relationship between a preset information entropy and an image quality deviation value. The electronic equipment can determine a target image quality deviation value corresponding to the target information entropy according to a mapping relation between the preset information entropy and the image quality deviation value, and because some noises are generated due to external (weather, light, angle, jitter and the like) or internal (system, GPU) reasons when an image is generated, and the noises can bring some influences on the image quality, the image quality can be adjusted to a certain degree, so that the objective evaluation on the image quality is ensured.
Further, the electronic device may further obtain a first shooting parameter of the target face image, determine a target optimization coefficient corresponding to the first shooting parameter according to a mapping relationship between preset shooting parameters and optimization coefficients, where the shooting parameter setting may also bring a certain influence on image quality evaluation, and therefore, it is necessary to determine an influence component of the shooting parameter on the image quality, and finally, adjust the first image quality evaluation value according to the target optimization coefficient and the target image quality deviation value to obtain a second reference evaluation value, where the second reference evaluation value may be obtained according to the following formula:
when the image quality evaluation value is a percentile system, the specific calculation formula is as follows:
second reference evaluation value ═ (first image quality evaluation value + target image quality deviation value) (1+ target optimization coefficient)
In the case where the image quality evaluation value is a percentage, the specific calculation formula is as follows:
the second reference evaluation value (first image quality evaluation value (1+ target image quality deviation value) (1+ target optimization coefficient))
Further, the electronic device may acquire a target environment parameter corresponding to the target object, and determine a target weight coefficient pair corresponding to the target environment parameter according to a mapping relationship between preset environment parameters and the weight coefficient pair, where the target weight coefficient pair includes a target first weight coefficient and a target second weight coefficient, the target first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, and the target second weight coefficient is a weight coefficient corresponding to the second reference evaluation value, and further, may perform a weighting operation according to the target first weight coefficient, the target second weight coefficient, the first reference evaluation value, and the second reference evaluation value to obtain a face image quality evaluation value of the target object, where a specific calculation formula is as follows:
the target object's face image quality evaluation value is first reference evaluation value target first weight coefficient + second reference evaluation value target second weight coefficient
Therefore, the image quality can be objectively evaluated by combining the influences of internal and external environment factors, shooting setting factors, human face angles, integrity and the like, and the evaluation accuracy of the human face image quality is improved.
It can be seen that the image retrieval method described in the embodiment of the present application is applied to an electronic device, obtains a target region in a target image, determines a first global descriptor according to an index of a target object where the target region is located when the target region is an interested region, and performs a search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, where N is a positive integer, performs feature extraction on each of the N reference objects to obtain local features of key points, and aggregates the local features of the key points to obtain N first region aggregated local features corresponding to an interested region of an inquiry object, obtains local features of the key points of the reference region, aggregates the local features of the key points to obtain second region aggregated local features of a fixed length, and compares the second region aggregated local features with the N first region local features, obtaining a comparison result based on the region overlapping area weighted similarity, determining K reference objects based on the comparison result, and taking the K reference objects as a second search result, wherein the N reference objects comprise the K reference objects, so that the number of key points and the aggregation compression dimensionality of the key point features are reduced by using a global feature and local feature simultaneous extraction mode and combining the key point feature selection, and meanwhile, the requirements on a memory and a processor are met through a strategy method of image ROI region retrieval, and the image retrieval efficiency based on the ROI region is improved.
Compared with image retrieval in the related art, in the embodiment of the application, local image retrieval is realized by selecting an ROI (region of interest) by a user, retrieving an image containing characteristics of the ROI or an image with similar characteristics, because of the uncertainty of the ROI, retrieval by utilizing global characteristics is impossible during image retrieval, and a method of key point characteristic retrieval is usually adopted, which brings challenges to the performances of a memory and a processor. The research realizes the retrieval of local images in traffic scenes on the premise of not increasing the performance of the memory and the processor obviously, and the research result has great economic benefit and has important significance to the system. In the embodiment of the application, a method for simultaneously extracting global features and local features is used, the number of key points is reduced by combining with key point feature selection, the key point feature aggregation compression dimensionality is reduced, and meanwhile, the requirements on a memory and a processor are met through a strategy method for image ROI area retrieval.
Referring to fig. 2, fig. 2 is a schematic flow chart of an image retrieval method according to an embodiment of the present application, applied to the electronic device shown in fig. 1A, where the image retrieval method includes:
201. a target region in a target image is acquired.
202. When the target area is an interesting area, determining a first global descriptor according to an index of a target object where the interesting area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects, and N is a positive integer.
203. And performing feature extraction on each reference object in the N reference objects to obtain key point local features, and aggregating to obtain N first region aggregated local features corresponding to the query object region of interest.
204. And acquiring the local features of the reference key points of the region of interest, and aggregating the local features of the reference key points into a second region aggregated local feature with a fixed length.
205. Comparing the second region aggregation local features with the N first region local features to obtain comparison results according to region overlapping area weighted similarity, determining K reference objects based on the comparison results, and taking the K reference objects as second search results, wherein the N reference objects comprise the K reference objects.
206. And when the target area is a target object, determining a second global feature descriptor of the target object.
207. And searching according to the second global feature descriptor to obtain a third retrieval result.
The detailed description of the steps 201 to 207 may refer to the corresponding steps of the image retrieval method described in the above fig. 1B, and will not be repeated herein.
The image retrieval method described in the embodiment of the application can be seen to be applied to electronic equipment, when a target region is a region of interest, a global feature and local feature simultaneous extraction mode is used, the number of key points is reduced by combining with key point feature selection, and key point feature aggregation compression dimensions are aggregated, meanwhile, requirements for a memory and a processor are met through a strategy method of image ROI region retrieval, image retrieval efficiency based on an ROI region is improved, and when the target region is a target object, searching is directly performed according to the global feature.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:
acquiring a target area in a target image;
when the target area is an interesting area, determining a first global descriptor according to an index of a target object where the interesting area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects, and N is a positive integer;
extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features corresponding to the region of interest of the query object;
acquiring the local features of the reference key points of the region of interest, and aggregating the local features of the reference key points into a second region aggregated local feature with a fixed length;
comparing the second region aggregation local features with the N first region aggregation local features to obtain comparison results according to region overlapping area weighted similarity, determining K reference objects based on the comparison results, and taking the K reference objects as second search results, wherein the N reference objects comprise the K reference objects.
Optionally, in the aspect of extracting features of each of the N reference objects, the program includes instructions for performing the following steps:
acquiring a characteristic diagram of a reference object a, wherein the reference object a is any one of the N reference objects;
acquiring a first feature of the feature map after the channel average reduction, acquiring a second feature of the feature map after the global average pooling, and acquiring an L2 distance of a neighborhood feature of the feature map in a preset range;
determining a standard deviation of the first feature and the second feature;
performing weighting operation according to the standard deviation and the L2 distance of the neighborhood characteristics to obtain a first operation result;
and determining the local feature of the key point of the reference object a based on the first operation result and threshold selection.
Optionally, in the aggregating the reference keypoint local features into second region aggregated local features of fixed length, the program includes instructions for:
acquiring a local aggregation descriptor matrix corresponding to the local feature of the reference key point;
acquiring a characteristic vector of the target area, and dividing the characteristic vector into c sections;
b key point features are obtained from the reference key point local features;
traversing each key point feature in the b key point features, determining the clustering center of each segment vector in the c segments, and obtaining the nearest clustering center vector of each clustering center;
determining a residual vector according to the clustering center vector and the sub-segment vectors corresponding to the clustering center vector;
accumulating the residual vector to the segmentation position of the local aggregation descriptor matrix corresponding to the residual vector, and recording the accumulation times corresponding to the corresponding clustering center;
determining an average result matrix based on the accumulated local aggregation descriptor matrix and the accumulation times;
and determining the second region aggregation local features according to the average result matrix.
Optionally, in the aspect of determining K reference objects based on the comparison result and using the K reference objects as a second search result, the program includes instructions for performing the following steps:
obtaining image indexes for determining K reference objects based on the comparison result;
performing a learnable reordering check based on the image index;
when the verification result is consistent, directly taking the K reference objects as the second search result;
and when the check result is inconsistent, rearranging the K reference objects, and taking the K rearranged reference objects as the second search result.
Optionally, in the aspect of rearranging the K reference objects, the program includes instructions for:
inputting the features to be queried and the features corresponding to the K reference objects into a preset learnable reordering model to obtain new extended query features, retrieving the preset learnable reordering model for obtaining local feature aggregation of all training sets to obtain X query features and corresponding X retrieval results, sequencing and coding each query feature and retrieval result, sending the sequence and coded result into a stacked self-attention coding model to obtain extended query features, and performing iterative optimization on model parameters of the stacked self-attention coding model by using the extended query features and a specified loss function to obtain the extended query features, wherein X is a positive integer;
and searching based on the new expanded query characteristics to obtain the K rearranged reference objects.
Optionally, the program further comprises instructions for performing the steps of:
when the target area is a target object, determining a second global feature descriptor of the target object;
and searching according to the second global feature descriptor to obtain a third retrieval result.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that in order to implement the above functions, it includes corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the functional units may be divided according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 4 is a block diagram of functional units of an image retrieval apparatus 400 according to an embodiment of the present application, where the apparatus 400 is applied to an electronic device, and the apparatus 400 includes: an acquisition unit 401, a search unit 402, an extraction unit 403, an aggregation unit 404, and a determination unit 405, wherein,
the acquiring unit 401 is configured to acquire a target region in a target image;
the searching unit 402 is configured to, when the target area is an interesting area, determine a first global descriptor according to an index where a target object where the interesting area is located, and perform a search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, and N is a positive integer;
the extracting unit 403 is configured to perform feature extraction on each of the N reference objects to obtain local features of key points, and aggregate the local features to obtain N first region aggregated local features corresponding to the query object region of interest;
the aggregating unit 404 is configured to obtain the local reference keypoint features of the region of interest, and aggregate the local reference keypoint features into a second region aggregated local feature with a fixed length;
the determining unit 405 is configured to compare the second region aggregation local features with the N first region local features, to obtain a comparison result according to the region overlapping area weighted similarity, determine K reference objects based on the comparison result, and use the K reference objects as a second search result, where the N reference objects include the K reference objects.
Optionally, in terms of performing feature extraction on each reference object in the N reference objects, the extraction unit 403 is specifically configured to:
acquiring a characteristic diagram of a reference object a, wherein the reference object a is any one of the N reference objects;
acquiring a first feature of the feature map after the channel average reduction, acquiring a second feature of the feature map after the global average pooling, and acquiring an L2 distance of a neighborhood feature of the feature map in a preset range;
determining a standard deviation of the first feature and the second feature;
performing weighting operation according to the standard deviation and the L2 distance of the neighborhood characteristics to obtain a first operation result;
and determining the local feature of the key point of the reference object a based on the first operation result and threshold selection.
Optionally, in terms of the aggregating the reference keypoint local features into second region aggregated local features of a fixed length, the aggregating unit 404 is specifically configured to:
acquiring a local aggregation descriptor matrix corresponding to the local feature of the reference key point;
acquiring a characteristic vector of the target area, and dividing the characteristic vector into c sections;
b key point features are obtained from the reference key point local features;
traversing each key point feature in the b key point features, determining the clustering center of each segment vector in the c segments, and obtaining the nearest clustering center vector of each clustering center;
determining a residual vector according to the clustering center vector and the sub-segment vectors corresponding to the clustering center vector;
accumulating the residual vector to the segmentation position of the local aggregation descriptor matrix corresponding to the residual vector, and recording the accumulation times corresponding to the corresponding clustering center;
determining an average result matrix based on the accumulated local aggregation descriptor matrix and the accumulation times;
and determining the second region aggregation local features according to the average result matrix.
Optionally, in the aspect that the K reference objects are determined based on the comparison result and the K reference objects are used as the second search result, the determining unit 405 is specifically configured to:
obtaining image indexes for determining K reference objects based on the comparison result;
performing a learnable reordering check based on the image index;
when the verification result is consistent, directly taking the K reference objects as the second search result;
and when the check result is inconsistent, rearranging the K reference objects, and taking the K rearranged reference objects as the second search result.
Optionally, in the aspect of rearranging the K reference objects, the determining unit 405 is specifically configured to:
inputting the features to be queried and the features corresponding to the K reference objects into a preset learnable reordering model to obtain new extended query features, retrieving the preset learnable reordering model for obtaining local feature aggregation of all training sets to obtain X query features and corresponding X retrieval results, sequencing and coding each query feature and retrieval result, sending the sequence and coded result into a stacked self-attention coding model to obtain extended query features, and performing iterative optimization on model parameters of the stacked self-attention coding model by using the extended query features and a specified loss function to obtain the extended query features, wherein X is a positive integer;
and searching based on the new expanded query characteristics to obtain the K rearranged reference objects.
Optionally, the apparatus 400 is further specifically configured to:
when the target area is a target object, determining a second global feature descriptor of the target object;
and searching according to the second global feature descriptor to obtain a third retrieval result.
It can be understood that the functions of each program module of the image retrieval apparatus of this embodiment can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process thereof can refer to the related description of the foregoing method embodiment, which is not described herein again.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An image retrieval method applied to an electronic device, the method comprising:
acquiring a target area in a target image;
when the target area is an interesting area, determining a first global descriptor according to an index of a target object where the interesting area is located, and searching according to the first global descriptor to obtain a first search result set, wherein the first search result set comprises N reference objects, and N is a positive integer;
extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating to obtain N first region aggregated local features corresponding to the region of interest of the query object;
acquiring the local features of the reference key points of the region of interest, and aggregating the local features of the reference key points into a second region aggregated local feature with a fixed length;
comparing the second region aggregation local features with the N first region aggregation local features to obtain comparison results according to region overlapping area weighted similarity, determining K reference objects based on the comparison results, and taking the K reference objects as second search results, wherein the N reference objects comprise the K reference objects.
2. The method according to claim 1, wherein the extracting features of each of the N reference objects to obtain N local features of the keypoint comprises:
acquiring a characteristic diagram of a reference object a, wherein the reference object a is any one of the N reference objects;
acquiring a first feature of the feature map after the channel average reduction, acquiring a second feature of the feature map after the global average pooling, and acquiring an L2 distance of a neighborhood feature of the feature map in a preset range;
determining a standard deviation of the first feature and the second feature;
performing weighting operation according to the standard deviation and the L2 distance of the neighborhood characteristics to obtain a first operation result;
and selecting the key point local features of the reference object a based on the first operation result and a threshold value.
3. The method according to claim 1 or 2, wherein the aggregating the reference keypoint local features into second region aggregated local features of fixed length comprises:
acquiring a local aggregation descriptor matrix corresponding to the local feature of the reference key point;
acquiring a characteristic vector of the target area, and dividing the characteristic vector into c sections;
b key point features are obtained from the reference key point local features;
traversing each key point feature in the b key point features, determining the clustering center of each segment vector in the c segments, and obtaining the nearest clustering center vector of each clustering center;
determining a residual vector according to the clustering center vector and the sub-segment vectors corresponding to the clustering center vector;
accumulating the residual vector to the segmentation position of the local aggregation descriptor matrix corresponding to the residual vector, and recording the accumulation times corresponding to the corresponding clustering center;
determining an average result matrix based on the accumulated local aggregation descriptor matrix and the accumulation times;
and determining the second region aggregation local features according to the average result matrix.
4. The method according to claim 1 or 2, wherein the determining K reference objects based on the comparison result, and using the K reference objects as a second search result, comprises:
obtaining image indexes for determining K reference objects based on the comparison result;
performing a learnable reordering check based on the image index;
when the verification result is consistent, directly taking the K reference objects as the second search result;
and when the check result is inconsistent, rearranging the K reference objects, and taking the K rearranged reference objects as the second search result.
5. The method of claim 4, wherein the reordering of the K reference objects comprises:
inputting the features to be queried and the features corresponding to the K reference objects into a preset learnable reordering model to obtain new extended query features, retrieving the preset learnable reordering model for obtaining local feature aggregation of all training sets to obtain X query features and corresponding X retrieval results, sequencing and coding each query feature and retrieval result, sending the sequence and coded result into a stacked self-attention coding model to obtain extended query features, and performing iterative optimization on model parameters of the stacked self-attention coding model by using the extended query features and a specified loss function to obtain the extended query features, wherein X is a positive integer;
and searching based on the new expanded query characteristics to obtain the K rearranged reference objects.
6. The method according to claim 1 or 2, characterized in that the method further comprises:
when the target area is a target object, determining a second global feature descriptor of the target object;
and searching according to the second global feature descriptor to obtain a third retrieval result.
7. An image retrieval apparatus applied to an electronic device, the apparatus comprising: an acquisition unit, a search unit, an extraction unit, an aggregation unit, and a determination unit, wherein,
the acquisition unit is used for acquiring a target area in a target image;
the searching unit is configured to, when the target area is an area of interest, determine a first global descriptor according to an index where a target object where the area of interest is located, and perform a search according to the first global descriptor to obtain a first search result set, where the first search result set includes N reference objects, and N is a positive integer;
the extraction unit is used for extracting features of each reference object in the N reference objects to obtain local features of key points, and aggregating the local features to obtain N first region aggregated local features;
the aggregation unit is used for acquiring the local features of the reference key points of the region of interest and aggregating the local features of the reference key points into the aggregated local features of the second region with fixed length;
the determining unit is configured to compare the second region aggregation local feature with the N first region aggregation local features to obtain a comparison result according to the region overlapping area weighted similarity, determine K reference objects based on the comparison result, and use the K reference objects as a second search result, where the N reference objects include the K reference objects.
8. The apparatus according to claim 7, wherein, in said extracting features from each of the N reference objects, the extracting unit is specifically configured to:
acquiring a characteristic diagram of a reference object a, wherein the reference object a is any one of the N reference objects;
acquiring a first feature of the feature map after the channel average reduction, acquiring a second feature of the feature map after the global average pooling, and acquiring an L2 distance of a neighborhood feature of the feature map in a preset range;
determining a standard deviation of the first feature and the second feature;
performing weighting operation according to the standard deviation and the L2 distance of the neighborhood characteristics to obtain a first operation result;
and selecting the key point local features of the reference object a based on the first operation result and a threshold value.
9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-6.
10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-6.
CN202110063246.3A 2021-01-18 2021-01-18 Image retrieval method, electronic equipment and related product Pending CN112765381A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110063246.3A CN112765381A (en) 2021-01-18 2021-01-18 Image retrieval method, electronic equipment and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110063246.3A CN112765381A (en) 2021-01-18 2021-01-18 Image retrieval method, electronic equipment and related product

Publications (1)

Publication Number Publication Date
CN112765381A true CN112765381A (en) 2021-05-07

Family

ID=75702798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110063246.3A Pending CN112765381A (en) 2021-01-18 2021-01-18 Image retrieval method, electronic equipment and related product

Country Status (1)

Country Link
CN (1) CN112765381A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344027A (en) * 2021-05-10 2021-09-03 北京迈格威科技有限公司 Retrieval method, device, equipment and storage medium for object in image
CN113378902A (en) * 2021-05-31 2021-09-10 深圳神目信息技术有限公司 Video plagiarism detection method based on optimized video characteristics
CN113505256A (en) * 2021-07-02 2021-10-15 北京达佳互联信息技术有限公司 Feature extraction network training method, image processing method and device
CN113591883A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Image recognition method, system, device and storage medium based on attention mechanism
CN113742504A (en) * 2021-09-13 2021-12-03 城云科技(中国)有限公司 Method, device, computer program product and computer program for searching images by images
CN114139013A (en) * 2021-11-29 2022-03-04 深圳集智数字科技有限公司 Image searching method and device, electronic equipment and computer readable storage medium
CN116796021A (en) * 2023-08-28 2023-09-22 上海任意门科技有限公司 Image retrieval method, system, electronic device and medium
WO2023185545A1 (en) * 2022-03-28 2023-10-05 华为技术有限公司 Method for acquiring region of interest, and related device

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344027A (en) * 2021-05-10 2021-09-03 北京迈格威科技有限公司 Retrieval method, device, equipment and storage medium for object in image
CN113344027B (en) * 2021-05-10 2024-04-23 北京迈格威科技有限公司 Method, device, equipment and storage medium for retrieving objects in image
CN113378902B (en) * 2021-05-31 2024-02-23 深圳神目信息技术有限公司 Video plagiarism detection method based on optimized video features
CN113378902A (en) * 2021-05-31 2021-09-10 深圳神目信息技术有限公司 Video plagiarism detection method based on optimized video characteristics
CN113505256A (en) * 2021-07-02 2021-10-15 北京达佳互联信息技术有限公司 Feature extraction network training method, image processing method and device
CN113505256B (en) * 2021-07-02 2022-09-02 北京达佳互联信息技术有限公司 Feature extraction network training method, image processing method and device
CN113742504A (en) * 2021-09-13 2021-12-03 城云科技(中国)有限公司 Method, device, computer program product and computer program for searching images by images
CN113591883A (en) * 2021-09-30 2021-11-02 中国人民解放军国防科技大学 Image recognition method, system, device and storage medium based on attention mechanism
CN113591883B (en) * 2021-09-30 2021-12-03 中国人民解放军国防科技大学 Image recognition method, system, device and storage medium based on attention mechanism
CN114139013A (en) * 2021-11-29 2022-03-04 深圳集智数字科技有限公司 Image searching method and device, electronic equipment and computer readable storage medium
WO2023185545A1 (en) * 2022-03-28 2023-10-05 华为技术有限公司 Method for acquiring region of interest, and related device
CN116796021B (en) * 2023-08-28 2023-12-05 上海任意门科技有限公司 Image retrieval method, system, electronic device and medium
CN116796021A (en) * 2023-08-28 2023-09-22 上海任意门科技有限公司 Image retrieval method, system, electronic device and medium

Similar Documents

Publication Publication Date Title
CN112765381A (en) Image retrieval method, electronic equipment and related product
Song et al. A novel convolutional neural network based indoor localization framework with WiFi fingerprinting
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
CN111104898B (en) Image scene classification method and device based on target semantics and attention mechanism
Arietta et al. City forensics: Using visual elements to predict non-visual city attributes
CN110209859B (en) Method and device for recognizing places and training models of places and electronic equipment
Song et al. Cnnloc: Deep-learning based indoor localization with wifi fingerprinting
CN108984785B (en) Historical data and increment-based fingerprint database updating method and device
CN112699808B (en) Dense target detection method, electronic equipment and related products
CN107885778B (en) Personalized recommendation method based on dynamic near point spectral clustering
CN111198959A (en) Two-stage image retrieval method based on convolutional neural network
CN110414550B (en) Training method, device and system of face recognition model and computer readable medium
CN112489081B (en) Visual target tracking method and device
CN109934258B (en) Image retrieval method based on feature weighting and region integration
KR102467890B1 (en) Method and apparatus for providing information about items related to a drawing using a neural network by a sever
CN108627798B (en) WLAN indoor positioning algorithm based on linear discriminant analysis and gradient lifting tree
CN113326930A (en) Data processing method, neural network training method, related device and equipment
JP2023510945A (en) Scene identification method and apparatus, intelligent device, storage medium and computer program
CN112767443A (en) Target tracking method, electronic equipment and related product
Liu et al. A semi-supervised method for surveillance-based visual location recognition
CN114743139A (en) Video scene retrieval method and device, electronic equipment and readable storage medium
CN111291785A (en) Target detection method, device, equipment and storage medium
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
TW202207155A (en) Model determination method and related terminal and computer readable storage medium
CN110602827B (en) Kara OK light effect implementation method, intelligent projector and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination