CN112818148B - Visual retrieval sequencing optimization method and device, electronic equipment and storage medium - Google Patents

Visual retrieval sequencing optimization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112818148B
CN112818148B CN202110411184.0A CN202110411184A CN112818148B CN 112818148 B CN112818148 B CN 112818148B CN 202110411184 A CN202110411184 A CN 202110411184A CN 112818148 B CN112818148 B CN 112818148B
Authority
CN
China
Prior art keywords
visual
entity
searched
queried
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110411184.0A
Other languages
Chinese (zh)
Other versions
CN112818148A (en
Inventor
王海
刘朝振
刘邦长
常德杰
赵洪文
谷书锋
赵进
罗晓斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Miaoyijia Health Technology Group Co ltd
Original Assignee
Beijing Miaoyijia Health Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Miaoyijia Health Technology Group Co ltd filed Critical Beijing Miaoyijia Health Technology Group Co ltd
Priority to CN202110411184.0A priority Critical patent/CN112818148B/en
Publication of CN112818148A publication Critical patent/CN112818148A/en
Application granted granted Critical
Publication of CN112818148B publication Critical patent/CN112818148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence

Abstract

The invention provides a sequencing optimization method and device for visual retrieval, electronic equipment and a storage medium. The method comprises the following steps: establishing a visual entity database; acquiring a visual entity to be queried; extracting the characteristics of the visual entity to be inquired and the visual entities in the set to be searched; inquiring a target retrieval entity similar to the visual entity to be inquired in the set to be searched; and arranging and outputting the target retrieval entities in a descending order according to the feature similarity with the entity to be queried. According to the method, the search ordering is directly optimized by directly optimizing an optimization method taking the average accuracy as a loss function instead of optimizing the loss function based on the distance, so that the defect that the loss function based on the distance only focuses on the similarity between the characteristics is effectively overcome, and the defect that the penalty for increasing the result of the error in the front row of the ordered list is not considered to be increased; the accuracy of visual retrieval is obviously improved.

Description

Visual retrieval sequencing optimization method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for optimizing a ranking of a visual search, an electronic device, and a storage medium.
Background
Existing visual retrieval systems can be divided into two major categories from a technical perspective:
(1) the features extracted based on the traditional computer vision can be global features or local features;
global features are as follows: color histograms, texture features, etc.; the local features are as follows: SIFT, SURF, ORB, etc.
(2) A deep learning based approach. This broad category also includes two approaches:
a. directly extracting a specific layer (such as a convolutional layer or a full-link layer) as a feature vector for searching;
b. and performing end-to-end training by combining with the metric function.
The visual search is aimed at the same category (or the same entity or the same semantic meaning) as the visual entity to be queried, and should be in the front of the ordered list as much as possible, especially to ensure the accuracy of the top K. Generally, the accuracy of a retrieval system is measured by using an Average accuracy index Average Precision (AP), as shown in formula 1-1:
Figure 563411DEST_PATH_IMAGE001
1-1
take an image as an example, as a representation form of a visual entity. Wherein q represents an image to be queried, APq represents the average accuracy of the image q to be queried, a data set S = { Ii, i =1, 2.. once, n } to be searched in the database is divided into S + and S-, S + represents an image set of the same category as the image to be queried, and S-represents an image set of a different category from the image to be queried, according to whether the query and the image to be queried belong to the same category, wherein S = S + U S-, and Rank (i, S) represents the ranking order number of the image i and the image set S.
The prior art has the following defects: there is no explicit optimization by optimizing the above-mentioned ordering penalty, such as when based on features extracted by conventional computer vision, which is not considered at all more unlikely to guarantee the above-mentioned properties. In the deep learning-based mode, a specific layer (such as a convolutional layer or a fully-connected layer) is extracted as a feature vector to be searched, and an ordered list is not displayed to be optimized; on the other hand, although the deep learning-based method and the end-to-end training method in combination with the metric function can implicitly control the sequence of the sorted list through the loss function, the method is essentially based on distance optimization, rather than explicitly adopting the sorting-based loss optimization, which results in that under the same distance loss, the losses generated by the two items which are arranged at the front of the sorted list and at the back of the sorted list are the same, but according to the target of image retrieval, the method is to ensure that the K items at the front are all correct as much as possible, and the priority at the back is not so high.
Disclosure of Invention
In order to solve the problems, the invention provides a sequencing optimization method, a sequencing optimization device, electronic equipment and a storage medium for visual retrieval, wherein an optimization method of directly optimizing average accuracy as a loss function is adopted, so that the defect that the loss function based on distance only focuses on the similarity between features is effectively overcome, the penalty for increasing the result of the error in the front row of a sequencing list is not considered to be increased, and the visual retrieval accuracy is obviously improved.
In order to achieve the purpose, the invention provides the following specific technical scheme:
in a first aspect, the present application provides a ranking optimization method for visual search, including:
establishing a visual entity database;
acquiring a visual entity to be queried;
extracting the characteristics of the visual entity to be inquired and the visual entities in the set to be searched, wherein the set to be searched is a set of all the visual entities in the visual entity database;
calculating the distance between any visual entity in the set to be searched and the visual entity to be inquired according to a distance measurement function, and identifying the visual entity with the distance smaller than a preset threshold value as a similar target retrieval entity;
according to a loss function
Figure 180337DEST_PATH_IMAGE002
Performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged according to the characteristic similarity of the visual entities to be queried in a descending orderAnd outputting;
wherein the content of the first and second substances,
Figure 706390DEST_PATH_IMAGE003
q represents the visual entity to be queried, APq represents the average accuracy of the visual entity q to be queried, S represents the data set of the set to be searched in the visual entity database, si and sj represent the similarity between the visual entity i, the visual entity j and the visual entity q to be queried in the set to be searched, n represents the number of visual entities in the set to be searched, S + represents the set of visual entities of the same category as the visual entity to be queried, S-represents the set of visual entities of different categories as the visual entity to be queried,
Figure 383359DEST_PATH_IMAGE004
representing the temperature parameter in the Sigmoid function.
With reference to the first aspect, in some possible implementations, the visual entity includes a key frame or an image frame in image data or video data.
In combination with the first aspect, in some possible implementations the distance metric function comprises a euclidean distance, a cosine similarity, a manhattan distance, a chebyshev distance, a minkowski distance, a mahalanobis distance, or a hamming distance.
With reference to the first aspect, in some possible implementations, when performing feature extraction on the visual entity to be queried and the visual entities in the set to be searched, optionally performing feature extraction based on a conventional computer vision feature extraction manner or a deep learning manner.
With reference to the first aspect, in some possible implementation manners, when performing feature extraction on the visual entity to be queried and the visual entities in the set to be searched, performing feature extraction in a deep learning-based manner includes:
image data of a training dataset and label data of an image;
and constructing a deep learning feature extraction network.
In a second aspect, the present application further provides a ranking optimization apparatus for visual search, including:
the storage module is used for establishing a visual entity database;
the acquisition module is used for acquiring the visual entity to be inquired;
the characteristic extraction module is used for extracting characteristics of the visual entity to be inquired and the visual entity in the set to be searched, and the set to be searched is a set of all the visual entities in the visual entity database;
the identification module is used for calculating the distance between any visual entity in the set to be searched and the visual entity to be inquired according to a distance measurement function, and identifying the visual entity with the distance smaller than a preset threshold value as a similar target retrieval entity;
a processing module for processing the loss function
Figure 352452DEST_PATH_IMAGE005
Performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged in a descending order according to the feature similarity with the visual entities to be queried, and outputting the target retrieval entity list;
wherein the content of the first and second substances,
Figure 445173DEST_PATH_IMAGE006
q represents the visual entity to be queried, APq represents the average accuracy of the visual entity q to be queried, S represents the data set of the set to be searched in the visual entity database, si and sj represent the similarity between the visual entity i, the visual entity j and the visual entity q to be queried in the set to be searched, n represents the number of visual entities in the set to be searched, S + represents the set of visual entities of the same category as the visual entity to be queried, S-represents the set of visual entities of different categories as the visual entity to be queried,
Figure 652163DEST_PATH_IMAGE007
representing the temperature parameter in the Sigmoid function.
In a third aspect, the present application provides an electronic device comprising a memory and a processor;
the memory is used for storing a computer program;
the processor is configured to execute the computer program and to implement the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the method of the first aspect.
Therefore, the embodiment of the invention provides a sequencing optimization method, a sequencing optimization device, electronic equipment and a storage medium for visual retrieval, wherein the searching sequencing is directly optimized by directly optimizing an average accuracy as an optimization method of a loss function instead of optimizing a loss function based on distance, so that the defect that the loss function based on distance only focuses on the similarity between features is effectively overcome, and the defect that the penalty is increased for the result of the error in the front row of a sequencing list is not considered to be increased; the accuracy of visual retrieval is obviously improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
FIG. 1 is a schematic flow chart diagram of a method for rank optimization for visual search in accordance with an embodiment of the present invention;
FIG. 2 is a diagram of an Indicator function in an embodiment of the present invention;
FIG. 3 is a diagram illustrating an approximate Indicator function of a Sigmoid function according to an embodiment of the present invention;
FIG. 4 shows parameters in an embodiment of the present invention
Figure 296771DEST_PATH_IMAGE008
Different schematic diagrams of Sigmoid functions;
fig. 5 is an overall block diagram of the ranking optimization device for visual search according to the embodiment of the present invention.
Detailed Description
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
The embodiment of the invention provides a visual retrieval sequencing optimization method, and fig. 1 is a schematic flow chart of the visual retrieval sequencing optimization method of the embodiment of the invention. As shown in fig. 1, the method for optimizing the ranking of visual search according to the embodiment of the present invention includes step S110, establishing a visual entity database; step S120, acquiring a visual entity to be queried; step S130, extracting the characteristics of the visual entity to be inquired and the visual entities in the set to be searched; step S140, calculating the distance between any visual entity in the set to be searched and the visual entity to be inquired according to a distance measurement function, and identifying the visual entity with the distance smaller than a preset threshold value as a similar target retrieval entity; step S150, according to the loss function
Figure 628527DEST_PATH_IMAGE009
And performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged in a descending order according to the feature similarity with the visual entities to be queried, and outputting the target retrieval entity list.
In an embodiment of the invention, the visual entity comprises a key frame or an image frame in the image data or the video data.
In step S130, the set to be searched is a set of all visual entities in the visual entity database, and feature extraction is optionally performed based on a conventional computer vision feature extraction manner or a deep learning manner. The features extracted based on the traditional computer vision can be global features or local features; global features are as follows: color histograms, shape features, texture features, etc.; the local features are as follows: SIFT, SURF, ORB, etc. The deep learning-based method includes directly extracting a specific layer (such as a convolutional layer or a fully-connected layer) as a feature vector, searching, or performing end-to-end training in combination with a metric function.
In a preferred embodiment of the present invention, the image is used as an expression form of a visual entity, the step S140 uses a deep learning model, and matches with an equidistance measurement function of euclidean distance, cosine similarity, manhattan distance, chebyshev distance, minkowski distance, mahalanobis distance or hamming distance to perform visual retrieval and sorting, and the step S150 specifically includes: defining an image data set to be searched in a visual entity database as S = { Ii, i =1, 2.. once, n }, n is the number of all images in the image data set to be searched, defining an image to be inquired as q, and dividing S into S + and S-, S + representing an image set of the same category as the image to be inquired and S-representing an image set of different categories as the image to be inquired according to whether the inquired image and the image to be inquired belong to the same category, wherein S = S + U S-, and Rank (i, S) represents the ordered sequence number of the image i and the image set S.
The deep learning feature extraction network is constructed by training the image data of the data set and the label data of the image in a metric learning-based mode, and network architectures such as twin networks and the like can be adopted, and the method adopts
Figure 587255DEST_PATH_IMAGE010
As a function of the loss, among others,
Figure 711069DEST_PATH_IMAGE011
and APq represents the average accuracy of the image q to be queried.
In the training process, any one picture is selected from a training data set to serve as an image q to be queried, and the training set is divided into S + and S-according to label data of the image q to be queried. According to the formula
Figure 401945DEST_PATH_IMAGE012
And (4) performing gradient back transmission by using the defined loss function, and optimizing the neural network. And obtaining a feature extractor after the training is finished, namely, performing feature extraction on the input image I to obtain the image feature fI.
When defining the loss function, the method specifically includes: miningBy using
Figure 548892DEST_PATH_IMAGE013
As a sorting function, as shown in fig. 2, si and sj respectively represent the similarity between the image i, the image j and the image q to be queried in the image data set to be searched.
As can be seen from fig. 2, the Indicator function is discontinuous near x =0, which results in a discontinuous loss function AP, and thus end-to-end training cannot be performed by an optimization method based on gradient descent or the like. Therefore, according to the expression form of Indicator function, Sigmoid function is adopted to approximate the Indicator function, i.e. the Indicator function is expressed
Figure 108050DEST_PATH_IMAGE014
Wherein x is an independent variable,
Figure 194692DEST_PATH_IMAGE008
is a temperature (temperature) parameter that controls the shape of the function value.
Figure 384365DEST_PATH_IMAGE008
The effect of the parameters on the function values is shown in figure 4.
As can be seen from fig. 3, when the argument values in the real number domain are all continuous everywhere, and the Indicator function is well fitted, the Indicator function is replaced by the Sigmoid function in an approximate manner, and because the Sigmoid function is continuous and has a derivative, end-to-end optimization can be performed by an optimization method such as gradient descent, and a sequencing objective function is directly optimized, instead of optimizing a distance-based loss function. The optimization target is consistent with the target expected to be obtained, and the problem that similar images are in front of the ordered list in the visual retrieval problem is solved well.
After replacing the Indicator function with the Sigmoid function,
Figure 80926DEST_PATH_IMAGE015
finally, finally
Figure 381457DEST_PATH_IMAGE016
As can be seen from FIG. 4, when
Figure 89650DEST_PATH_IMAGE008
When approaching zero, the right side of the equal sign approaches the left side of the equal sign. For n images in the set S, each image as a search image, APq in the above equation is substituted into the loss function, eventually as per
Figure 246962DEST_PATH_IMAGE017
And as a loss function, arranging the images in the image data set to be searched in a descending order according to the feature similarity of the images to be inquired according to the loss calculation result, and outputting the images.
On the other hand, an embodiment of the present invention provides a ranking optimization apparatus for visual search, and fig. 5 is an overall framework diagram of the ranking optimization apparatus for visual search according to the embodiment of the present invention. As shown in fig. 5, the ranking optimization apparatus for visual search according to an embodiment of the present invention includes: a storage module 501, configured to establish a visual entity database; an obtaining module 502, configured to obtain a visual entity to be queried; a feature extraction module 503, configured to perform feature extraction on the visual entity to be queried and a visual entity in a set to be searched, where the set to be searched is a set of all visual entities in the visual entity database; an identifying module 504, configured to calculate a distance between any visual entity in the set to be searched and the visual entity to be queried according to a distance metric function, and identify a visual entity whose distance is smaller than a predetermined threshold as a similar target retrieval entity; a processing module 505 for performing a function based on the loss
Figure 368501DEST_PATH_IMAGE017
Performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged in a descending order according to the feature similarity with the visual entities to be queried, and outputting the target retrieval entity list; wherein the content of the first and second substances,
Figure 144828DEST_PATH_IMAGE018
q represents the visual entity to be queriedAPq represents the average accuracy of the visual entity q to be queried, S represents the data set of the set to be searched in the visual entity database, si and sj represent the similarity between the visual entity i, the visual entity j and the visual entity q to be queried in the set to be searched, respectively, n represents the number of visual entities in the set to be searched, S + represents the set of visual entities of the same category as the visual entity to be queried, S-represents the set of visual entities of different categories as the visual entity to be queried,
Figure 35423DEST_PATH_IMAGE019
representing the temperature parameter in the Sigmoid function. In one possible design, the structure of the ranking optimization device for visual search includes a processor and a memory, the memory is used for storing a program for supporting the ranking optimization device for visual search to execute the ranking optimization method for visual search, and the processor is configured to execute the program stored in the memory.
In yet another aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor; the memory is used for storing a computer program; a processor for executing the computer program and for implementing the method of any of the above-described methods of ranking optimization for visual search when executing the computer program.
In still another aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, the program causes the processor to implement any one of the above-mentioned methods for optimizing ranking of visual search. Compared with the prior art, the beneficial effect of this application lies in:
a visual retrieval ordering optimization method, device, electronic equipment and storage medium are provided, which are used for end-to-end visual search. Wherein, the visual search is carried out by adopting an optimization mode, the inaccurate search result caused by the loss function based on measurement is pointed out, and the optimization method of directly optimizing the average accuracy as the loss function is adopted; by the form of the indication function, the relation between the ranking digit number in the ranking list and the similarity between the query feature and the query feature with the query feature is established definitely;by analyzing the irreducible indication function, optimization methods based on gradient descent and the like cannot be adopted for optimization, so that a method for optimizing based on sequencing loss cannot be carried out; by observing the form of the exponential function, Sigmoid (x;
Figure 629216DEST_PATH_IMAGE008
) The family of functions approximates the indicator function, Sigmoid (x;
Figure 972472DEST_PATH_IMAGE008
) The family of functions is continuous and conductive and can be optimized by employing optimization methods based on gradient descent, etc.
By directly optimizing search sequencing instead of optimizing a loss function based on distance, the method effectively overcomes the defect that the loss function based on distance only focuses on the similarity between features, and does not consider increasing the penalty for increasing the result of the error in the front row of the sequencing list; the accuracy of visual retrieval is obviously improved. Furthermore, the technical scheme in the application is easy to realize, clear in structure and easy to maintain and upgrade; the neural network trained by the method is used as a feature extractor and can be applied to downstream tasks such as visual clustering, visual recognition and the like; and the modularized structure can be matched with different grid structures, batch sampling functions are plug-and-play, and the practicability is high. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments. The embodiment of the device corresponds to the embodiment of the method, so that the description of the embodiment of the device is relatively simple, and the related description can refer to the description of the embodiment of the method.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A method for rank optimization for visual search, comprising:
establishing a visual entity database;
acquiring a visual entity to be queried;
extracting the characteristics of the visual entity to be inquired and the visual entities in the set to be searched, wherein the set to be searched is a set of all the visual entities in the visual entity database;
calculating the distance between any visual entity in the set to be searched and the visual entity to be inquired according to a distance measurement function, and identifying the visual entity with the distance smaller than a preset threshold value as a similar target retrieval entity;
according to a loss function
Figure 57606DEST_PATH_IMAGE001
Performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged in a descending order according to the feature similarity with the visual entities to be queried, and outputting the target retrieval entity list;
wherein the content of the first and second substances,
Figure 60197DEST_PATH_IMAGE002
q represents the visual entity to be queried, APq represents the average accuracy of the visual entity q to be queried, S represents the data set of the set to be searched in the visual entity database, si and sj represent the similarity between the visual entity i, the visual entity j and the visual entity q to be queried in the set to be searched, n represents the number of visual entities in the set to be searched, S + represents the set of visual entities of the same category as the visual entity to be queried, and S-represents the visual entities of different categories as the visual entity to be queriedA set of entities is provided that is capable of,
Figure 223062DEST_PATH_IMAGE003
representing the temperature parameter in the Sigmoid function.
2. The method of claim 1, wherein the visual entity comprises a key frame or an image frame in image data or video data.
3. The method according to claim 2, wherein said distance metric function comprises a euclidean distance, a cosine similarity, a manhattan distance, a chebyshev distance, a minkowski distance, a mahalanobis distance, or a hamming distance.
4. The method according to claim 3, when performing feature extraction on the visual entity to be queried and the visual entities in the set to be searched, selecting a deep learning-based manner for feature extraction.
5. The method according to claim 4, wherein in the feature extraction of the visual entity to be queried and the visual entities in the set to be searched, the feature extraction is performed in a deep learning-based manner, and the method comprises:
image data of a training dataset and label data of an image;
and constructing a deep learning feature extraction network.
6. An apparatus for ranking optimization for visual search, comprising:
the storage module is used for establishing a visual entity database;
the acquisition module is used for acquiring the visual entity to be inquired;
the characteristic extraction module is used for extracting characteristics of the visual entity to be inquired and the visual entity in the set to be searched, and the set to be searched is a set of all the visual entities in the visual entity database;
the identification module is used for calculating the distance between any visual entity in the set to be searched and the visual entity to be inquired according to a distance measurement function, and identifying the visual entity with the distance smaller than a preset threshold value as a similar target retrieval entity;
a processing module for processing the loss function
Figure 918486DEST_PATH_IMAGE004
Performing loss calculation on the target retrieval entities to obtain a target retrieval entity list which is arranged in a descending order according to the feature similarity with the visual entities to be inquired, and outputting the target retrieval entity list;
wherein the content of the first and second substances,
Figure 425691DEST_PATH_IMAGE005
q represents the visual entity to be queried, APq represents the average accuracy of the visual entity q to be queried, S represents the data set of the set to be searched in the visual entity database, si and sj represent the similarity between the visual entity i, the visual entity j and the visual entity q to be queried in the set to be searched, n represents the number of visual entities in the set to be searched, S + represents the set of visual entities of the same category as the visual entity to be queried, S-represents the set of visual entities of different categories as the visual entity to be queried,
Figure 966393DEST_PATH_IMAGE006
representing the temperature parameter in the Sigmoid function.
7. An electronic device, comprising a memory and a processor;
the memory is used for storing a computer program;
the processor for executing the computer program and for implementing the method according to any of claims 1-5 when executing the computer program.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method according to any one of claims 1-5.
CN202110411184.0A 2021-04-16 2021-04-16 Visual retrieval sequencing optimization method and device, electronic equipment and storage medium Active CN112818148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110411184.0A CN112818148B (en) 2021-04-16 2021-04-16 Visual retrieval sequencing optimization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110411184.0A CN112818148B (en) 2021-04-16 2021-04-16 Visual retrieval sequencing optimization method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112818148A CN112818148A (en) 2021-05-18
CN112818148B true CN112818148B (en) 2021-11-05

Family

ID=75863630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110411184.0A Active CN112818148B (en) 2021-04-16 2021-04-16 Visual retrieval sequencing optimization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112818148B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003630B (en) * 2021-12-28 2022-03-18 北京文景松科技有限公司 Data searching method and device, electronic equipment and storage medium
CN117168080B (en) * 2023-10-30 2024-02-02 南通百源制冷设备有限公司 Energy-saving spiral instant freezer state control method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2530687B1 (en) * 2013-09-04 2016-08-19 Shot & Shop. S.L. Method implemented by computer for image recovery by content and computer program of the same
CN105512273A (en) * 2015-12-03 2016-04-20 中山大学 Image retrieval method based on variable-length depth hash learning
CN108415937A (en) * 2018-01-24 2018-08-17 博云视觉(北京)科技有限公司 A kind of method and apparatus of image retrieval
CN108733801B (en) * 2018-05-17 2020-06-09 武汉大学 Digital-human-oriented mobile visual retrieval method
CN108920727A (en) * 2018-08-03 2018-11-30 厦门大学 Compact visual in vision retrieval describes sub- deep neural network and generates model
CN109558823B (en) * 2018-11-22 2020-11-24 北京市首都公路发展集团有限公司 Vehicle identification method and system for searching images by images

Also Published As

Publication number Publication date
CN112818148A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN106095893B (en) A kind of cross-media retrieval method
CN111275060B (en) Identification model updating processing method and device, electronic equipment and storage medium
CA3066029A1 (en) Image feature acquisition
CN109189991A (en) Repeat video frequency identifying method, device, terminal and computer readable storage medium
CN112818148B (en) Visual retrieval sequencing optimization method and device, electronic equipment and storage medium
CN110704659B (en) Image list ordering method and device, storage medium and electronic device
CN112434172A (en) Pathological image prognosis feature weight calculation method and system
CN114238329A (en) Vector similarity calculation method, device, equipment and storage medium
CN113129335A (en) Visual tracking algorithm and multi-template updating strategy based on twin network
CN111340213B (en) Neural network training method, electronic device, and storage medium
CN113963303A (en) Image processing method, video recognition method, device, equipment and storage medium
Aristoteles et al. Identification of human sperm based on morphology using the you only look once version 4 algorithm
CN106599926A (en) Expression picture pushing method and system
CN113553975A (en) Pedestrian re-identification method, system, equipment and medium based on sample pair relation distillation
CN113486902A (en) Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning
CN115270754B (en) Cross-modal matching method, related device, electronic equipment and storage medium
CN115018884B (en) Visible light infrared visual tracking method based on multi-strategy fusion tree
CN116705310A (en) Data set construction method, device, equipment and medium for perioperative risk assessment
CN116958626A (en) Image classification model training, image classification method and device and electronic equipment
CN115953430A (en) Video single-target tracking method and system based on improved batch sample loss function
JP2016014990A (en) Moving image search method, moving image search device, and program thereof
CN114168780A (en) Multimodal data processing method, electronic device, and storage medium
CN114022698A (en) Multi-tag behavior identification method and device based on binary tree structure
CN114048148A (en) Crowdsourcing test report recommendation method and device and electronic equipment
CN112861689A (en) Searching method and device of coordinate recognition model based on NAS technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Wang Hai

Inventor after: Liu Chaozhen

Inventor after: Liu Bangchang

Inventor after: Chang Dejie

Inventor after: Zhao Hongwen

Inventor after: Gu Shufeng

Inventor after: Zhao Jin

Inventor after: Luo Xiaobin

Inventor before: Wang Hai

Inventor before: Liu Chaozhen

Inventor before: Liu Bangchang

Inventor before: Chang Dejie

Inventor before: Zhao Hongwen

Inventor before: Gu Shufeng

Inventor before: Zhao Jin

Inventor before: Luo Xiaobin

CB03 Change of inventor or designer information