WO2019100348A1

WO2019100348A1 - Image retrieval method and device, and image library generation method and device

Info

Publication number: WO2019100348A1
Application number: PCT/CN2017/112956
Authority: WO
Inventors: 付宇新; 温丰; 薛常亮
Original assignee: 华为技术有限公司
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2019-05-31
Also published as: CN111373393A; CN111373393B

Abstract

An image retrieval method and device, and an image library generation method and device, the image retrieval method comprising: acquiring a plurality of visual words of an image to be retrieved and image category information of the image to be retrieved; determining, according to the image category information of the image to be retrieved and a stop word lexicon, visual stop words corresponding to the image category of the image to be retrieved, the visual stop words corresponding to the image category of the image to be retrieved comprising visual words which are not relevant to the image category of the image to be retrieved; removing, from the plurality of visual words of the image to be retrieved, the visual stop words corresponding to the image category of the image to be retrieved so as to obtain target visual words of the image to be retrieved (S130); and determining a retrieval result according to the target visual words of the image to be retrieved and a retrieval image library, the retrieval image library comprising a plurality of retrieval images (S140). Thus, the efficiency and accuracy of image retrieval may be improved.

Description

Image retrieval method and device, and image library generation method and device

Technical field

The present application relates to the field of image retrieval technology, and more particularly to an image retrieval method and apparatus and an image library generation method and apparatus in the field of image retrieval technology.

Background technique

The bag of visual words (BoVW) model is widely applied to the field of image retrieval. The visual word bag model includes a plurality of visual words, which are performed on a plurality of visual feature descriptors extracted from a plurality of images. Clustered, each of the plurality of visual words is a cluster center.

In the existing image retrieval process, a plurality of visual feature descriptors of the image to be retrieved are first acquired, and the plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model to obtain a plurality of images to be retrieved. a visual word, the plurality of visual words being used to represent the image to be retrieved, and calculating a similarity between the image to be retrieved and the search image in the search image library according to the plurality of visual words of the image to be retrieved, the search image library At least one image having the highest degree of similarity with the image to be retrieved is output as an image retrieval result.

However, when the content of the image to be retrieved is disorderly, or the amount of information contained in the image to be retrieved is large, the number of multiple visual words of the image to be retrieved is large, and therefore, when performing image retrieval, efficiency It is low and the accuracy is poor.

Summary of the invention

The present application provides an image retrieval method and apparatus, and an image processing method and apparatus, which are advantageous for improving the efficiency and accuracy of image retrieval.

In a first aspect, the present application provides an image retrieval method, the method comprising:

Obtaining a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, wherein the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual feature descriptors of the image to be retrieved and the visual word bag model Obtaining a mapping result of the word, the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;

Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, the visual stop word corresponding to the image category of the image to be retrieved includes the image to be retrieved The image category-independent visual word includes a mapping relationship between the image category of the image to be retrieved and the visual stop word corresponding to the image category of the image to be retrieved;

Removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, to obtain a target visual word of the image to be retrieved;

A search result is determined according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.

The image retrieval method provided by the embodiment of the present application, the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, From the pending The plurality of visual words of the cable image remove visual words that have no significant effect on the recognition of the search image or affect image recognition, that is, the target visual words of the image to be retrieved are more significant for identifying the image to be retrieved. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.

In a possible implementation, the search image library further includes target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images. The plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.

In the image retrieval method provided by the embodiment of the present application, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.

In addition, determining the retrieval result according to the similarity between the target visual word of the image to be retrieved and the target visual word of the retrieved image in the search image library is beneficial to improving the efficiency and accuracy of the image retrieval.

It should be understood that the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.

It should also be understood that the visual feature points of the image in the embodiment of the present application refer to pixels that are consistent in image transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corners. Point or texture rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.

Optionally, the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.

Optionally, the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.

It should also be understood that the visual feature descriptor of the image in the embodiment of the present application refers to a visual feature point of the image through the mathematical feature.

For example, taking the ORB as an example, the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or coding of 1; using the information of the direction of the visual feature point, the visual feature point is rotated to obtain a robust binary vector visual feature descriptor.

Optionally, the visual feature descriptor of the image may be one or more embodiments of the present application.

It should also be understood that the visual word bag model in the embodiment of the present application includes a plurality of visual words, each of the plurality of visual words being obtained by clustering visual feature descriptors extracted from the plurality of images. A clustering center.

It should also be understood that the visual word of the image in the embodiment of the present application refers to the visual mapping between the visual feature descriptor of the image and the visual word in the visual word bag model, and the visual word bag model and the visual image are obtained. The feature describes the nearest visual word.

Optionally, the visual word of the image may be one or more, which is not limited by the embodiment of the present application.

It should also be understood that, in the embodiment of the present application, a plurality of images are classified according to different classification methods, and an image category of each image can be obtained.

As an alternative embodiment, if the images are classified according to the scene, the image categories of the images may include Mori Forest scenes, suburban scenes, indoor scenes, etc.

As another alternative embodiment, if the images are classified by weather, the image categories of the images may include sunny, rainy, snowy, and the like.

It should also be understood that since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images. A visual stop word corresponding to a medium image category refers to a visual word that has no significant effect on an image that recognizes a certain image category, or affects image recognition, that is, a visual word that is not related to an image of the image category.

It should be understood that the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.

Optionally, the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.

For example, in forest scenes and suburban scenes, almost every image contains a large number of trees. The feature points extracted from the trees in the image are less recognizable to identify whether the image is a forest scene or a suburban scene. Therefore, the trees can be Visual stop words for forest or suburban categories.

For another example, on rainy days, the image will leave traces of rain falling. The feature points extracted from the rainwater in the image will also pollute multiple visual words of the image. Therefore, the rainwater can be a visual stoppage for rainy days.

It should also be understood that the target visual word of the image in the embodiment of the present application includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.

Optionally, the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.

It should also be understood that the positive sample image set in the embodiment of the present application includes artificially labeled images that can be considered to be of high similarity or the same.

For example, shooting two images of the same object in different scenes, for example, a rainy school and a snowy school.

For example, two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.

In a possible implementation manner, before determining the visual stop word corresponding to the image category of the image to be retrieved according to the image category information of the image to be retrieved and the stop word dictionary, the method further includes: acquiring training a plurality of visual words of each training image in the image library, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by multiple visuals of each of the training images The feature descriptor is matched with the visual word in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used to indicate at least a positive sample image set including a plurality of similar training images in the training image library manually labeled; a plurality of visual words according to the each training image, image category information and positive of each of the training images The sample image collection information generates the stop word vocabulary.

In a possible implementation manner, the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, including: according to the Determining a plurality of visual words of the training image, image category information of the each training image, and the positive sample image set information, determining correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.

As an optional embodiment, the image retrieval device may use, as the each image category, at least one visual word of the plurality of visual words corresponding to the training image library that has the least correlation with each image category corresponding to the training image library. Corresponding visual stop words.

As another optional embodiment, the image retrieving device may: the at least one visual word whose correspondence between the plurality of visual words corresponding to the training image library and each of the image categories corresponding to the training image library is less than a first preset threshold, A visual stop word corresponding to each of the image categories.

Optionally, the visual stop words corresponding to each image category may include one or more visual words, which are not limited by the embodiment of the present application.

Optionally, the stop word dictionary may include a mapping relationship between each image category and a visual stop word corresponding to each image category.

It should be understood that since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images.

In a possible implementation manner, determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and The correlation between the plurality of visual words of the training image library includes: determining a first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously a correlation between the category and the first visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event indicating the The plurality of visual words of the first training image in the training image library and the plurality of visual words of the second training image in the training image library each include the first visual word, and the image category of the first training image is the first An image category, the second event indicating that the first training image and the second training image belong to the same positive sample image set.

For example, assuming that there are P training images, M image categories, and L visual words in the training image set, and there are N training images of the first category in the P training images, the first visual word and the first image category. The correlation of the training images can be determined by equations (1) to (6):

Wherein, x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image Each of the visual words includes a first visual word of the L visual words, and y represents a second event, the second event is that the first training image and the second training image belong to the same positive sample image set, count(x ) indicates the number of times the first event occurred, count(y) indicates the number of times the second event occurred, count(x, y) indicates the number of simultaneous occurrences of the first event and the second event, and p(x) indicates the occurrence of the first event. Probability, p(y) represents the probability of occurrence of the second event, p(x, y) represents the probability that the first event and the second event occur simultaneously, and PMI(x, y) represents the mutual point of the first event and the second event The amount of information, H(y) represents the information entropy of the second event, and the RATE _PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category. Where P, L, M, and N are positive integers greater than one.

Alternatively, the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.

Optionally, the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.

As an optional embodiment, the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.

Specifically, the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images. The visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.

Optionally, the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods. The application embodiment does not limit this.

For example, in electronic commerce, the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.

As another alternative embodiment, the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.

For example, in loopback detection, the search image library includes all historical pose images, and the image to be retrieved is the current pose image.

In the loopback detection scenario, the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map. In an e-commerce scenario, when the product name is not known, the user submits an image of the product, the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.

Optionally, the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.

As an optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved The at least one search image having the highest degree is determined as the search result.

As another optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.

In addition, according to the similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library, at least one search image similar to the image to be retrieved is determined to obtain a search result, which is beneficial to improving the efficiency of image retrieval. And accuracy.

In a second aspect, the present application provides an image processing method, the method comprising:

Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are The visual feature descriptors are matched and mapped with the visual words in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used for At least one positive sample image set is indicated, the positive sample image set comprising a plurality of similar training images in the training image library manually annotated.

Generating a stop word vocabulary based on the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, the stop word vocabulary including the image of each training image a mapping relationship between the category and the visual stop word corresponding to the image category of each of the images, the visual stop words corresponding to the image category of each training image including visual words not related to the image category of each training image .

The method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image The plurality of visual words, the image category information of each of the training images, and the positive sample image set information generate a stop word vocabulary, which is beneficial to improving the efficiency and accuracy of image retrieval.

It should be understood that the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.

Optionally, the second device and the first device in the first aspect may be the same device or different devices, which is not limited in this embodiment of the present application.

Optionally, when the second device is the same as the second device, the image library generating device and the image searching device in the first aspect are different functional modules in the same device, or the image library generating device is A functional module in an image retrieval device.

In a possible implementation manner, the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, including: Determining a plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and a plurality of visual words of the training image library a correlation between the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; The dependency between the plurality of image categories and the plurality of visual words of the training image library generates the stop word vocabulary.

In a possible implementation manner, the plurality of visual words corresponding to each training image, each training figure Determining, between the image category information of the image and the positive sample image set information, a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library, including: a probability according to the first event, Determining a probability of occurrence of the second event and a probability that the first event coincides with the second event, determining a correlation between the first image category and the first visual word, the plurality of image categories of the training image library including the first image a plurality of visual words of the training image library including the first visual word, the first event representing a plurality of visual words of the first training image in the training image library and a second training image in the training image library Each of the plurality of visual words includes the first visual word, the image category of the first training image is the first image category, and the second event indicates that the first training image and the second training image belong to the same positive sample image set .

In a possible implementation, after the stop word lexicon is generated according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, the method The method further includes: determining, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing from the plurality of visual words of each training image The visual stop words corresponding to the image categories of each training image, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.

In a possible implementation, the acquiring a plurality of visual words of each training image in the training image library includes: acquiring each training image, and extracting a plurality of visual feature descriptors of each training image, the plurality of The visual feature descriptor is used to describe a plurality of visual feature points of each training image, and the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is acquired, and the visual word bag model is A plurality of visual words that are closest to the distance of each of the plurality of visual feature descriptors are determined as a plurality of visual words for each of the training images.

In a third aspect, the present application provides an image retrieval apparatus for performing the method of any of the above first aspect or any of the possible implementations of the first aspect.

In a fourth aspect, the present application provides an image processing apparatus for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.

In a fifth aspect, the present application provides an image retrieval apparatus, the apparatus comprising: a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the processor The method of any of the above-described first aspects or any of the possible implementations of the first aspect is performed when the computer program is executed.

In a sixth aspect, the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor The method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed when the computer program is executed.

In a seventh aspect, the application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.

In an eighth aspect, the present application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of any of the second aspect or any of the possible implementations of the second aspect.

In a ninth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspect or the first aspect of the first aspect.

In a tenth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above-described second aspect or any of the possible implementations of the second aspect.

In an eleventh aspect, the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path. Mutual In communication, the processor is operative to execute code in the memory, the processor being operative to perform the method of any of the first aspect or the first aspect of the first aspect when the code is executed.

In a twelfth aspect, the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path Communicating with each other, the processor is operative to execute code in the memory, and when the code is executed, the processor is operative to perform the method of any of the second aspect or the second aspect of the second aspect.

DRAWINGS

1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application;

2 is a schematic block diagram of a method for generating an image library according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of an image retrieval apparatus according to an embodiment of the present application; FIG.

4 is a schematic block diagram of an apparatus for generating an image library according to an embodiment of the present application;

FIG. 5 is a schematic block diagram of another image retrieval apparatus according to an embodiment of the present application; FIG.

FIG. 6 is a schematic block diagram of another image library generating apparatus according to an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

For the sake of clarity, the terms used in this application are first explained.

1, the visual feature points of the image

The visual feature points of an image refer to pixels that are consistent in the transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corner points or texture-rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.

For example, taking the ORB as an example, the main steps of extracting the FAST corner point of the image include: calculating the difference between the brightness of each pixel in the image and its neighboring pixels, if the pixel has a large difference from the pixels in its neighborhood, Then it is more likely to be a corner point; then by non-maximum suppression, only the corner points of the response maxima are retained in a certain area, avoiding the problem of corner point concentration; for the FAST corner point, there is no directionality and scale weakness, Add a description of the scale and rotation. Scale invariance is achieved by constructing an image pyramid, downsampling the image at different levels, and obtaining images of different resolutions. The rotation invariance is realized by the gray scale centroid method, that is, the direction vector obtained by calculating the centroid of the gray value of the image block and the geometric center connection is used as the description of the feature point direction.

2, the visual feature descriptor of the image

The visual feature descriptor of an image refers to a visual feature point that describes an image by mathematical features.

For example, taking the ORB as an example, the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or 1 coding; using visual information point direction information to rotate the visual feature points to obtain robust binary vector vision Feature descriptor.

3, visual word bag model

The visual word bag model includes a plurality of visual words, each of the plurality of visual words being a cluster center obtained by clustering visual feature descriptors extracted from the plurality of images.

4, the visual words of the image

The visual word of the image refers to a visual word in the visual word bag model that is closest to the visual feature descriptor by matching and mapping the visual feature descriptor of the image with the visual word in the visual bag model.

5, the image category of the image

By classifying a plurality of images according to different classification methods, an image category of each image can be obtained.

As an optional embodiment, if the images are classified according to the scene, the image categories of the images may include forest scenes, suburban scenes, indoor scenes, and the like.

6, the visual stop word corresponding to the image category

Since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images, and the visual stop words corresponding to the image categories refer to A visual word that has no significant effect on an image that identifies a certain image category, or that affects image recognition, that is, a visual word that is unrelated to the image of the image category.

For another example, in rainy days, the image will leave traces of rain falling. The feature points extracted from the rainwater in the image will also cause pollution to the word representation of the image. Therefore, the rainwater can be a visual stoppage for rainy days.

7, the target visual words of the image

The target visual word of the image includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.

8, positive sample image collection

The positive sample image set includes artificially labeled images that can be considered as high or similar.

The applicable scenarios of the embodiments of the present application include instant localization and map construction (simultaneous localization and Loop closure in mapping, SLAM), product image retrieval in e-commerce, etc.

The loop detection detects the scenes that have appeared in the history, uses the current image to retrieve and recognize the loop, constructs a constraint of the current pose and the historical pose, and reduces the overall error by optimization to obtain a globally consistent map.

In the e-commerce, when the product name is not clear, the user submits an image of the product, the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.

FIG. 1 is a schematic flowchart of an image retrieval method 100 provided by an embodiment of the present application. The method can be performed by an image retrieval device.

S110. Acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by using a plurality of visual feature descriptors of the image to be retrieved and the visual word bag model The visual word is obtained by matching mapping, and the image category information of the image to be retrieved is used to indicate the image category of the image to be retrieved.

S120: Determine, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, where the visual stop word corresponding to the image category of the image to be retrieved includes Retrieving a visual word irrelevant for an image category of the image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved.

S130. The visual stop words corresponding to the image categories of the image to be retrieved are removed from the plurality of visual words of the image to be retrieved, and the target visual words of the image to be retrieved are obtained.

S140. Determine a search result according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.

Optionally, in S110, the image retrieval device may acquire a plurality of visual words of the image to be retrieved in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the image retrieval device may acquire an image to be retrieved, and extract a plurality of visual feature descriptors of the image to be retrieved, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the image to be retrieved. And the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is obtained, and the distance between the visual word bag model and each of the plurality of visual feature descriptors is the closest. A plurality of visual words are determined as a plurality of visual words of the image to be retrieved.

Optionally, the visual word bag model may be an existing trained visual word bag model, or may be obtained by clustering the visual feature descriptors of the training images in the training picture set by the image retrieval device. This example does not limit this.

Optionally, the image retrieval device may obtain the image to be retrieved in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.

Optionally, the image to be retrieved obtained by the image retrieving device may be an image after de-distortion, denoising, or other pre-processing operations, which is not limited in this embodiment of the present application.

Optionally, in S110, the image retrieving device may obtain the image category information of the image to be retrieved in a plurality of manners, which is not limited in this embodiment of the present application.

As an optional embodiment, the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and an image classification model, where the image classification model includes the image to be retrieved and the image of the image to be retrieved. The mapping relationship of categories.

As another optional embodiment, the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and a preset classification algorithm.

As still another optional embodiment, the image retrieval device may acquire image category information of the image to be retrieved manually labeled.

Optionally, the image category information of the image to be retrieved may be one or more bits, that is, the image type of the image to be retrieved is indicated by the one or more bits, which is not limited in this embodiment of the present application.

As an optional embodiment, the image category information of the image to be retrieved may be 2 bits. For example, when the 2 bits are “00”, the image to be retrieved is indicated as the first type of image, and when the 2 bits are “01”. The image to be retrieved is indicated as a second type of image. When the 2 bits are "10", the image to be retrieved is indicated as a third type of image, and when the 2 bits are "11", the image to be retrieved is indicated as a fourth type of image.

Optionally, the image retrieval device may acquire the stop word dictionary before S120.

Optionally, the stop word dictionary may include a mapping of an identifier of each of the plurality of image categories and a visual stop word corresponding to the identifier of each of the image categories.

Optionally, the image retrieving device may generate the stop word vocabulary by itself, or may acquire the stop word vocabulary from the image library generating device, which is not limited by the embodiment of the present application.

Optionally, the image retrieval device may acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image collection information, where the plurality of visual words of each training image are Obtaining a map by matching the plurality of visual feature descriptors of each training image with the visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; and a plurality of visual words according to the each training image And the image category information of each training image and the positive sample image set information, generating a stop word vocabulary, the stop word vocabulary including the image category of each training image and corresponding to the image category of each image a mapping relationship between visual stop words, the visual stop words corresponding to the image categories of each training image are included with each training map Class independent visual image of the word.

Specifically, the image retrieval device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, which may be according to each And determining, by the plurality of visual words corresponding to the training image, the image category information of the each training image, and the positive sample image set information, determining a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.

As an optional embodiment, the image retrieval device may determine the first image category and the first vision according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between words, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library The plurality of visual words of the first training image and the plurality of visual words of the second training image of the training image library each include the first visual word, and the image category of the first training image is the first image category, the first The second event indicates that the first training image and the second training image belong to the same positive sample image set.

Optionally, the image retrieval device may obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the image retrieval device may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.

As another optional embodiment, the image retrieval device may acquire first information including an identifier of each positive sample image set of the plurality of positive sample image sets and a training image included in each of the positive sample image sets The mapping relationship between the identifiers, the image retrieval device may acquire the positive sample image collection information according to the first information.

Optionally, the image retrieving device may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the image retrieval device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop word corresponding to each image category. .

As another optional embodiment, the image retrieval device may use, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. The visual stop word for the category.

Optionally, the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.

Optionally, the image retrieval device may acquire the retrieval image library before S140.

Optionally, the search image library includes a plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images. The plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.

As an optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved At least the highest degree A search image is determined as the search result.

The image retrieval method provided by the embodiment of the present application, the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, Removing a visual word that has no significant effect on the recognition of the search image or affects image recognition from a plurality of visual words of the image to be retrieved, that is, a comparison of the target visual words of the image to be retrieved for identifying the image to be retrieved Significant. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.

In addition, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image. The memory usage of the library.

FIG. 2 is a schematic flowchart of a method 200 for generating an image library according to an embodiment of the present disclosure. The method 200 may be performed by a device for generating an image library, which is not limited by the embodiment of the present application.

Optionally, the second device and the first device in FIG. 1 may be the same device or different devices, which is not limited in this embodiment of the present application.

Optionally, when the second device is the same as the second device, the image library generating device and the image searching device described in FIG. 1 are different functional modules in the same device, or the image library generating device is A functional module in the image retrieval device.

S210. Acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, the positive sample image set information. And for indicating at least one positive sample image set, the positive sample image set comprising a plurality of similar training images in the training image library manually labeled.

S220. Generate, according to the multiple visual words of each training image, the image category information of each training image, and the positive sample image set information, a stop word vocabulary, where the stop word vocabulary includes each training image a mapping relationship between the image categories and the visual stop words corresponding to the image categories of the each image, the visual stop words corresponding to the image categories of each of the training images are independent of the image categories of the each training image Visual word.

The method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image. The efficiency and precision of the search.

Optionally, in S210, the generating device of the image library may acquire multiple views of the training image in multiple manners. The embodiment of the present application does not limit this.

As an optional embodiment, the image training device may acquire a training image, and extract a plurality of visual feature descriptors of the training image, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the training image, The visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and obtain a visual word bag model, and the plurality of visual word bag models are closest to each of the plurality of visual feature descriptors. A visual word determined as a plurality of visual words of the training image.

Optionally, the visual word bag model may also be an existing trained visual word bag model, or may be obtained by the image generating device itself by clustering a plurality of visual feature descriptors corresponding to the plurality of images. The embodiment of the present application does not limit this.

Optionally, the image generating device of the image library can obtain the training image in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.

Optionally, the multiple images obtained by the generating device of the image library may be images after de-distortion, de-noising, or other pre-processing operations, which are not limited in this embodiment of the present application.

Optionally, in S210, the generating device of the image library may obtain the image category information of the training image in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the image library generating device may determine image category information of the training image according to the training image and the image classification model, where the image classification model includes a mapping relationship between the training image and an image category of the training image. .

As another optional embodiment, the image library generating device may determine image category information of the training image according to the training image and a preset classification algorithm.

As still another optional embodiment, the image library generating device may acquire image category information of the training image manually labeled.

Optionally, the image category information of the training image may be one or more bits, that is, the image type of the training image is indicated by the one or more bits, which is not limited in this embodiment of the present application.

As an optional embodiment, the image category information of the training image may be 2 bits. For example, when the 2 bits are “00”, the training image is indicated as a first type of image, and when the 2 bits are “01”, the The training image is a second type of image. When the 2 bits are "10", the training image is indicated as a third type of image, and when the 2 bits are "11", the training image is indicated as a fourth type of image.

Optionally, in S220, the image library generating device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information. Determining, according to the plurality of visual words corresponding to each training image, the image category information of the each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and the plurality of training image libraries Correlation between visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library is generated to generate the stop word vocabulary.

As an optional embodiment, the generating device of the image library may determine the first image category and the first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between a visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library Multiple visual words in the first training image And the plurality of visual words of the second training image in the training image library respectively include the first visual word, the image category of the first training image is the first image category, and the second event represents the first training image and The second training image belongs to the same positive sample image set.

For example, assuming that there are P training images, M image categories, and L visual words in the training image set, and there are N training images of the first category in the P training images, the first visual word and the first image category. The correlation of the training images can be determined by the above formulas (1) to (6).

Optionally, the generating device of the image library can obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the generating device of the image library may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.

As another optional embodiment, the generating device of the image library may acquire first information, where the first information includes an identifier of each positive sample image set of the plurality of positive sample image sets and the each positive sample image set includes And a mapping relationship between the identifiers of the training images, the generating device of the image library may acquire the positive sample image set information according to the first information.

Optionally, the generating device of the image library may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.

As an optional embodiment, the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop corresponding to each image category. Use words.

As another optional embodiment, the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. Visual stop words corresponding to image categories.

Optionally, after S220, the generating device of the image library may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from Removing a visual stop word corresponding to the image category of each training image from the plurality of visual words of each training image, obtaining a target visual word of each training image, and adding a target visual word of each training image To the search image library.

The image retrieval method and the image processing method provided by the embodiments of the present application are described in detail below with reference to FIG. 1 and FIG. 2 . The image retrieval apparatus and the image processing apparatus provided by the embodiments of the present application will be described below with reference to FIG. 3 to FIG.

FIG. 3 is a schematic block diagram of an image retrieval apparatus 300 provided by an embodiment of the present application. The device 300 includes:

The acquiring unit 310 is configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual features of the image to be retrieved The visual word in the word bag model is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;

The processing unit 320 is configured to determine, according to the image category information of the image to be retrieved acquired by the acquiring unit 310 and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and an image category of the image to be retrieved The corresponding visual stop word includes a visual word irrelevant to the image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved a mapping relationship corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit 310, to obtain a target visual word of the image to be retrieved;

The searching unit 330 is configured to determine a search result according to the target visual word and the search image library of the image to be retrieved obtained by the processing unit 320, where the search image library includes a plurality of search images.

Optionally, the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are from each of the And obtaining a visual stop word corresponding to the image category of each of the search images among the plurality of visual words corresponding to the image.

Optionally, the device further includes a generating unit, where the acquiring unit is further configured to: before determining the visual stop word corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word dictionary Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped to the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used. And indicating at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; the generating unit is configured to use, according to the plurality of visual words of each training image, each The image category information of the training image and the positive sample image collection information generate the stop word vocabulary.

Optionally, the generating unit is specifically configured to: determine, according to the multiple visual words of each training image, image category information of each training image, and the positive sample image set information, multiple image categories of the training image library. Correlation with a plurality of visual words of the training image library, the plurality of image categories of the training image library including image categories of the each training image, the plurality of visual words of the training image library including the each training image a plurality of visual words; generating the stop word lexicon according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.

Optionally, the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set.

Optionally, the searching unit is specifically configured to: determine a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library; and the target visual word of the image to be retrieved The at least one search image whose similarity is greater than the first preset value is determined as the search result.

It should be understood that the image retrieval device 300 herein is embodied in the form of a functional unit. The term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (eg, a shared processor, a proprietary processor, or a group) for executing one or more software or firmware programs. Processors, etc.) and memory, merge logic, and/or other suitable components that support the described functionality. In an alternative example, the skill It will be understood by those skilled in the art that the image retrieval device 300 can be specifically the image retrieval device in the foregoing method 100 and the method 200. The image retrieval device 300 can be used to execute the image retrieval device corresponding to the image retrieval device in the method 100 and the method 200 described above. The various processes and/or steps are not repeated here to avoid repetition.

FIG. 4 is a schematic block diagram of an image library generating apparatus 400 provided by an embodiment of the present application. The apparatus 400 includes:

The acquiring unit 410 is configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are A plurality of visual feature descriptors of each training image are matched and mapped with visual words in the visual word bag model, and image category information of each training image is used to indicate an image category of each training image, the positive The sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;

The generating unit 420 is configured to generate a stop word vocabulary according to the plurality of visual words of the each training image acquired by the acquiring unit 410, the image category information of the each training image, and the positive sample image set information, where the stoppage is generated. The word dictionary includes a mapping relationship between the image category of each training image and a visual stop word corresponding to the image category of each training image, and the visual stop words corresponding to the image category of each training image include A visual word that is independent of the image category of each training image.

Optionally, the generating unit is configured to: determine, according to the multiple visual words corresponding to each training image, the image category information of each training image, and the positive sample image set information, multiple images of the training image library. a correlation between a category and a plurality of visual words of the training image library, the plurality of image categories of the training image library including an image category of the each training image, the plurality of visual words of the training image library including the each training a plurality of visual words of the image; generating the stop word lexicon according to a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library.

Optionally, the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set

Optionally, the generating unit is further configured to: after generating the stop word lexicon according to the plurality of visual words according to the each training image, the image category information of the each training image, and the positive sample image set information, according to Determining, by the image category information of each training image, the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing each training from the plurality of visual words of each training image A visual stop word corresponding to the image category of the image, a target visual word of each training image is obtained, and the target visual word of each training image is added to the search image library.

Optionally, the acquiring unit is configured to acquire the each training image, and extract a plurality of visual feature descriptors of each training image, where the multiple visual feature descriptors are used to describe multiple visions of each training image. a feature point, the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and a visual word bag model is obtained, and each visual feature descriptor in the visual word bag model and the plurality of visual feature descriptors is obtained The plurality of closest visual words are determined as a plurality of visual words for each of the training images.

It should be understood that the image library generating apparatus 400 herein is embodied in the form of a functional unit. The term "unit" here May be referred to as an ASIC, an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a proprietary processor, or a group processor, etc.) and memory, merge logic, and/or other support described. The right component for the function. In an optional example, those skilled in the art may understand that the image library generating apparatus 400 may be specifically the image library generating apparatus in the foregoing method 100 and the method 100 embodiment, and the image library generating apparatus 400 may be configured to execute the above. The various processes and/or steps corresponding to the image library generating device in the method 100 and the method 200 are not repeated here to avoid repetition.

FIG. 5 is a schematic block diagram of an image retrieval device 500 provided by an embodiment of the present application. The image retrieval device 500 may be the image retrieval device described in FIG. 1 and FIG. 2, and the image retrieval device may adopt the image retrieval device as shown in FIG. The hardware architecture shown. The image retrieval device can include a processor 510, a communication interface 520, and a memory 530 that communicate with one another via internal connection paths. The related functions implemented by the processing unit 320 and the retrieval unit 330 in FIG. 3 may be implemented by the processor 510, and the related functions implemented by the acquisition unit 310 may be implemented by the processor 510 controlling the communication interface 520.

The processor 510 may include one or more processors, for example, including one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single core CPU, and It can be a multi-core CPU.

The communication interface 520 is for transmitting and/or receiving data. The communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.

The memory 530 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory. A compact disc read-only memory (CD-ROM) for storing related instructions and data.

The memory 530 is used to store program code and data of the image retrieval device, and may be a separate device or integrated in the processor 510.

Specifically, the processor 510 is configured to control the communication interface to perform data transmission with other devices, such as a generating device of the image library. For details, refer to the description in the method embodiment, and details are not described herein again.

It will be appreciated that Figure 5 only shows a simplified design of the image retrieval device. In an actual application, the image retrieval device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all image retrieval devices that can implement the present application are in the present application. Within the scope of protection.

In one possible design, image retrieval device 500 can be replaced with a chip device, such as a chip that can be used in an image retrieval device for implementing related functions of processor 510 in an image retrieval device. The chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip. . Optionally, the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.

FIG. 6 is a schematic block diagram of an image library generating apparatus 600 provided by an embodiment of the present application. The image library generating apparatus 600 may be the image library generating apparatus described in FIG. 1 and FIG. 2, and the image library is The generating device can adopt a hardware architecture as shown in FIG. 6. The image library generating means may include a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 communicate with each other through an internal connection path. The related functions implemented by the generating unit 420 in FIG. 4 may be implemented by the processor 610, and the correlation implemented by the obtaining unit 410 The functionality may be implemented by the processor 610 controlling the communication interface 620.

The processor 610 may include one or more processors, for example, including one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single core CPU, It can be a multi-core CPU.

The communication interface 620 is for transmitting and/or receiving data. The communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.

The memory 630 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory. A compact disc read-only memory (CD-ROM) for storing related instructions and data.

The memory 630 is used to store the program code and data of the generating means of the image library, and may be a separate device or integrated in the processor 610.

Specifically, the processor 610 is configured to control the communication interface to perform data transmission with other devices, such as an image retrieval device. For details, refer to the description in the method embodiment, and details are not described herein again.

It will be appreciated that Figure 6 only shows a simplified design of the image library generation device. In practical applications, the image library generating device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all generating devices that can implement the image library of the present application. All are within the scope of this application.

In one possible design, the image library generating device 600 may be replaced with a chip device, for example, a chip that can be used in a generating device of an image library for implementing related functions of the processor 610 in the image generating device. . The chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip. . Optionally, the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple networks. On the unit. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

An image retrieval method, comprising:

Obtaining a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, wherein the plurality of visual words of the image to be retrieved are by using a plurality of visual feature descriptors of the image to be retrieved and a visual word bag model The visual category in the image is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;

Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and the visual stop word corresponding to the image category of the image to be retrieved includes a visual word irrelevant to an image category of the retrieved image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to an image category of the image to be retrieved;

Removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, to obtain a target visual word of the image to be retrieved;

Determining a search result according to the target visual word and the search image library of the image to be retrieved, wherein the search image library includes a plurality of search images.
The method according to claim 1, wherein the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, each of the The target visual words corresponding to the search images are obtained by removing the visual stop words corresponding to the image categories of the respective search images from the plurality of visual words corresponding to each of the search images.
The method according to claim 1 or 2, wherein the visual stop word corresponding to the image category of the image to be retrieved is determined according to the image category information of the image to be retrieved and the stop word dictionary Previously, the method further includes:

Obtaining a plurality of visual words of each training image in the training image library, image category information of each of the training images, and positive sample image set information, wherein the plurality of visual words of each training image are Obtaining a mapping between a plurality of visual feature descriptors of the image and visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each of the training images, The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;

The stop word vocabulary is generated according to the plurality of visual words of each training image, image category information of each training image, and positive sample image set information.
The method according to claim 3, wherein said generating said stop word based on said plurality of visual words of said each training image, image category information of said each training image, and positive sample image set information Thesaurus, including:

Determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, a plurality of image categories of the training image library and the training image library Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of each of the training images Visual words;

Corresponding to a plurality of visual categories of the training image library and a plurality of visual words of the training image library, The stop word dictionary is generated.
The method according to claim 4, wherein said determining said said plurality of visual words of said each training image, image category information of said each training image, and said positive sample image set information Correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library, including:

Determining a correlation between the first image category and the first visual word according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, the training The plurality of image categories of the image library include the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing a first training image in the training image library The plurality of visual words and the plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, The second event indicates that the first training image and the second training image belong to the same positive sample image set.
The method according to any one of claims 2 to 5, wherein the determining the retrieval result according to the target visual word and the retrieval image library of the image to be retrieved comprises:

Determining a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library;

Determining, as the search result, at least one search image having a similarity with the target visual word of the image to be retrieved that is greater than the first preset value.
A method for generating an image library, comprising:

Obtaining a plurality of visual words of each training image in the training image library, image category information of each of the training images, and positive sample image set information, wherein the plurality of visual words of each training image are Obtaining a mapping between a plurality of visual feature descriptors of the image and visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each of the training images, The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;

Generating a stop word vocabulary according to the plurality of visual words of each training image, the image category information of each of the training images, and the positive sample image set information, the stop word vocabulary including the each a mapping relationship between image categories of the training images and visual stop words corresponding to the image categories of each of the training images, the visual stop words corresponding to the image categories of each of the training images including each A visual word that is irrelevant to the image category of the training image.
The method according to claim 7, wherein said generating is deactivated based on a plurality of visual words of said each training image, image category information of said each training image, and said positive sample image set information Thesaurus includes:

Determining a plurality of image categories of the training image library and the training image library according to the plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including the each of the training images Multiple visual words;

The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
An image retrieval device, comprising:

An acquiring unit, configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, The plurality of visual words that are to be retrieved are obtained by mapping the plurality of visual feature descriptors of the image to be retrieved with the visual words in the visual word bag model, and the image category information of the image to be retrieved is used for Indicating an image category of the image to be retrieved;

a processing unit, configured to determine a visual stop word corresponding to an image category of the image to be retrieved according to image category information of the image to be retrieved acquired by the acquiring unit, and a stop word dictionary, where the image to be retrieved is The visual stop word corresponding to the image category includes a visual word irrelevant to an image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and an image category corresponding to the image to be retrieved a mapping relationship between visual stop words; removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit, to obtain the image to be retrieved Target visual word;

a retrieval unit, configured to determine a retrieval result according to the target visual word and the retrieval image library of the image to be retrieved obtained by the processing unit, where the retrieval image library includes a plurality of retrieval images.
The apparatus according to claim 9, wherein said search image library includes a mapping relationship between said plurality of search images and target visual words corresponding to each of said plurality of search images, said each The target visual words corresponding to the search images are obtained by removing the visual stop words corresponding to the image categories of the respective search images from the plurality of visual words corresponding to each of the search images.
Device according to claim 9 or 10, characterized in that the device further comprises a generating unit,

The obtaining unit is further configured to: before the visual stoppage corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word vocabulary, obtain each of the training image libraries a plurality of visual words of the training image, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by a plurality of visual features of each of the training images Obtaining a mapping between the descriptor and the visual word in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image, and the positive sample image collection information is used. And indicating at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;

The generating unit is configured to generate the stop word vocabulary according to the plurality of visual words of each training image, image category information of each training image, and positive sample image set information.
The device according to claim 11, wherein the generating unit is specifically configured to:

Determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, a plurality of image categories of the training image library and the training image library Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of each of the training images Visual words;

The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
The device according to claim 12, wherein the generating unit is specifically configured to:

Determining a correlation between the first image category and the first visual word according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, the training The plurality of image categories of the image library include the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing a first training image in the training image library The plurality of visual words and the plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, The second event indicates that the first training image and the second training image belong to the same positive sample image Like a collection.
The device according to any one of claims 10 to 13, wherein the retrieval unit is specifically configured to:

Determining a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library;

Determining, as the search result, at least one search image having a similarity with the target visual word of the image to be retrieved that is greater than the first preset value.
An apparatus for generating an image library, comprising:

An acquiring unit, configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where multiple visual words of each training image are passed And the plurality of visual feature descriptors of each training image are matched and mapped with the visual words in the visual word bag model, and the image category information of each training image is used to indicate the An image category, the positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;

a generating unit, configured to generate a stop word vocabulary according to the plurality of visual words of each training image acquired by the acquiring unit, image category information of each training image, and the positive sample image set information, The stop word vocabulary includes a mapping relationship between an image category of each training image and a visual stop word corresponding to an image category of each training image, the image category of each training image corresponding to The visual stop words include visual words that are unrelated to the image categories of each of the training images.
The device according to claim 15, wherein the generating unit is specifically configured to:

Determining a plurality of image categories of the training image library and the training image library according to the plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including the each of the training images Multiple visual words;

The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
An image retrieval device, the device comprising a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the memory, the processor, and the The communication interfaces communicate with one another via internal connection paths, characterized in that the processor executes the method of any one of claims 1 to 6 when the computer program is executed.
An apparatus for generating an image library, the apparatus comprising a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the memory, the processor, and The communication interfaces communicate with one another via internal connection paths, characterized in that the processor executes the method of claim 7 or claim 8 when executing the computer program.
A computer readable medium for storing a computer program, characterized in that the computer program comprises instructions for performing the method of any of the preceding claims 1 to 6.
A computer readable medium for storing a computer program, characterized in that the computer program comprises instructions for performing the method of claim 7 or claim 8.
A computer program product, the computer program product comprising instructions, wherein when The computer is caused to perform the method of any of the preceding claims 1 to 6 when run on a computer.
A computer program product comprising instructions, wherein when the instructions are run on a computer, causing the computer to perform the method of claim 7 or claim 8.