WO2019100348A1 - Image retrieval method and device, and image library generation method and device - Google Patents

Image retrieval method and device, and image library generation method and device Download PDF

Info

Publication number
WO2019100348A1
WO2019100348A1 PCT/CN2017/112956 CN2017112956W WO2019100348A1 WO 2019100348 A1 WO2019100348 A1 WO 2019100348A1 CN 2017112956 W CN2017112956 W CN 2017112956W WO 2019100348 A1 WO2019100348 A1 WO 2019100348A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
visual
training
retrieved
library
Prior art date
Application number
PCT/CN2017/112956
Other languages
French (fr)
Chinese (zh)
Inventor
付宇新
温丰
薛常亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/112956 priority Critical patent/WO2019100348A1/en
Priority to CN201780097137.5A priority patent/CN111373393B/en
Publication of WO2019100348A1 publication Critical patent/WO2019100348A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present application relates to the field of image retrieval technology, and more particularly to an image retrieval method and apparatus and an image library generation method and apparatus in the field of image retrieval technology.
  • the bag of visual words (BoVW) model is widely applied to the field of image retrieval.
  • the visual word bag model includes a plurality of visual words, which are performed on a plurality of visual feature descriptors extracted from a plurality of images. Clustered, each of the plurality of visual words is a cluster center.
  • a plurality of visual feature descriptors of the image to be retrieved are first acquired, and the plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model to obtain a plurality of images to be retrieved.
  • a visual word, the plurality of visual words being used to represent the image to be retrieved, and calculating a similarity between the image to be retrieved and the search image in the search image library according to the plurality of visual words of the image to be retrieved, the search image library At least one image having the highest degree of similarity with the image to be retrieved is output as an image retrieval result.
  • the present application provides an image retrieval method and apparatus, and an image processing method and apparatus, which are advantageous for improving the efficiency and accuracy of image retrieval.
  • the present application provides an image retrieval method, the method comprising:
  • the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual feature descriptors of the image to be retrieved and the visual word bag model Obtaining a mapping result of the word, the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
  • Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, the visual stop word corresponding to the image category of the image to be retrieved includes the image to be retrieved
  • the image category-independent visual word includes a mapping relationship between the image category of the image to be retrieved and the visual stop word corresponding to the image category of the image to be retrieved;
  • a search result is determined according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
  • the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, From the pending
  • the plurality of visual words of the cable image remove visual words that have no significant effect on the recognition of the search image or affect image recognition, that is, the target visual words of the image to be retrieved are more significant for identifying the image to be retrieved. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
  • the search image library further includes target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images.
  • the plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • determining the retrieval result according to the similarity between the target visual word of the image to be retrieved and the target visual word of the retrieved image in the search image library is beneficial to improving the efficiency and accuracy of the image retrieval.
  • the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
  • the visual feature points of the image in the embodiment of the present application refer to pixels that are consistent in image transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corners. Point or texture rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
  • the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
  • SIFT scale-invariant feature transform
  • ORB ORB
  • SURF speeded up robust feature
  • FAST FAST
  • the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
  • the visual feature descriptor of the image in the embodiment of the present application refers to a visual feature point of the image through the mathematical feature.
  • the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or coding of 1; using the information of the direction of the visual feature point, the visual feature point is rotated to obtain a robust binary vector visual feature descriptor.
  • the visual feature descriptor of the image may be one or more embodiments of the present application.
  • the visual word bag model in the embodiment of the present application includes a plurality of visual words, each of the plurality of visual words being obtained by clustering visual feature descriptors extracted from the plurality of images. A clustering center.
  • the visual word of the image in the embodiment of the present application refers to the visual mapping between the visual feature descriptor of the image and the visual word in the visual word bag model, and the visual word bag model and the visual image are obtained.
  • the feature describes the nearest visual word.
  • the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
  • the image categories of the images may include Mori Forest scenes, suburban scenes, indoor scenes, etc.
  • the image categories of the images may include sunny, rainy, snowy, and the like.
  • a visual stop word corresponding to a medium image category refers to a visual word that has no significant effect on an image that recognizes a certain image category, or affects image recognition, that is, a visual word that is not related to an image of the image category.
  • the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
  • the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
  • the trees can be Visual stop words for forest or suburban categories.
  • the image will leave traces of rain falling.
  • the feature points extracted from the rainwater in the image will also pollute multiple visual words of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
  • the target visual word of the image in the embodiment of the present application includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
  • the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
  • the positive sample image set in the embodiment of the present application includes artificially labeled images that can be considered to be of high similarity or the same.
  • shooting two images of the same object in different scenes for example, a rainy school and a snowy school.
  • two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
  • the method before determining the visual stop word corresponding to the image category of the image to be retrieved according to the image category information of the image to be retrieved and the stop word dictionary, the method further includes: acquiring training a plurality of visual words of each training image in the image library, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by multiple visuals of each of the training images
  • the feature descriptor is matched with the visual word in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used to indicate at least a positive sample image set including a plurality of similar training images in the training image library manually labeled; a plurality of visual words according to the each training image, image category information and positive of each of the training images
  • the sample image collection information generates the stop word vocabulary.
  • the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information including: according to the Determining a plurality of visual words of the training image, image category information of the each training image, and the positive sample image set information, determining correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the image retrieval device may use, as the each image category, at least one visual word of the plurality of visual words corresponding to the training image library that has the least correlation with each image category corresponding to the training image library. Corresponding visual stop words.
  • the image retrieving device may: the at least one visual word whose correspondence between the plurality of visual words corresponding to the training image library and each of the image categories corresponding to the training image library is less than a first preset threshold, A visual stop word corresponding to each of the image categories.
  • the visual stop words corresponding to each image category may include one or more visual words, which are not limited by the embodiment of the present application.
  • the stop word dictionary may include a mapping relationship between each image category and a visual stop word corresponding to each image category.
  • determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and The correlation between the plurality of visual words of the training image library includes: determining a first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously a correlation between the category and the first visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event indicating the The plurality of visual words of the first training image in the training image library and the plurality of visual words of the second training image in the training image library each include the first visual word, and the image category of the first training image is the first An image category, the second event indicating that the first training image and the second training image belong to the same positive sample image set.
  • x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image
  • Each of the visual words includes a first visual word of the L visual words
  • y represents a second event
  • the second event is that the first training image and the second training image belong to the same positive sample image set
  • count(x ) indicates the number of times the first event occurred
  • count(y) indicates the number of times the second event occurred
  • count(x, y) indicates the number of simultaneous occurrences of the first event and the second event
  • p(x) indicates the occurrence of the first event.
  • Probability, p(y) represents the probability of occurrence of the second event
  • p(x, y) represents the probability that the first event and the second event occur simultaneously
  • PMI(x, y) represents the mutual point of the first event and the second event
  • H(y) represents the information entropy of the second event
  • RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category.
  • the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
  • the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
  • the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
  • the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images.
  • the visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods.
  • the application embodiment does not limit this.
  • the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
  • the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
  • the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
  • the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved
  • the at least one search image having the highest degree is determined as the search result.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • At least one search image similar to the image to be retrieved is determined to obtain a search result, which is beneficial to improving the efficiency of image retrieval. And accuracy.
  • the present application provides an image processing method, the method comprising:
  • the stop word vocabulary including the image of each training image a mapping relationship between the category and the visual stop word corresponding to the image category of each of the images, the visual stop words corresponding to the image category of each training image including visual words not related to the image category of each training image .
  • the method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image
  • the plurality of visual words, the image category information of each of the training images, and the positive sample image set information generate a stop word vocabulary, which is beneficial to improving the efficiency and accuracy of image retrieval.
  • the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
  • the second device and the first device in the first aspect may be the same device or different devices, which is not limited in this embodiment of the present application.
  • the image library generating device and the image searching device in the first aspect are different functional modules in the same device, or the image library generating device is A functional module in an image retrieval device.
  • the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information including: Determining a plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and a plurality of visual words of the training image library a correlation between the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; The dependency between the plurality of image categories and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the plurality of visual words corresponding to each training image each training figure Determining, between the image category information of the image and the positive sample image set information, a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library, including: a probability according to the first event, Determining a probability of occurrence of the second event and a probability that the first event coincides with the second event, determining a correlation between the first image category and the first visual word, the plurality of image categories of the training image library including the first image a plurality of visual words of the training image library including the first visual word, the first event representing a plurality of visual words of the first training image in the training image library and a second training image in the training image library
  • Each of the plurality of visual words includes the first visual word, the image category of the first training image is the first image category, and the second event indicates that the first training image and the second training image belong to the same positive sample image set .
  • the method further includes: determining, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing from the plurality of visual words of each training image
  • the visual stop words corresponding to the image categories of each training image, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the acquiring a plurality of visual words of each training image in the training image library includes: acquiring each training image, and extracting a plurality of visual feature descriptors of each training image, the plurality of The visual feature descriptor is used to describe a plurality of visual feature points of each training image, and the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is acquired, and the visual word bag model is A plurality of visual words that are closest to the distance of each of the plurality of visual feature descriptors are determined as a plurality of visual words for each of the training images.
  • the present application provides an image retrieval apparatus for performing the method of any of the above first aspect or any of the possible implementations of the first aspect.
  • the present application provides an image processing apparatus for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
  • the present application provides an image retrieval apparatus, the apparatus comprising: a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the processor
  • the method of any of the above-described first aspects or any of the possible implementations of the first aspect is performed when the computer program is executed.
  • the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor
  • the method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed when the computer program is executed.
  • the application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • the present application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of any of the second aspect or any of the possible implementations of the second aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspect or the first aspect of the first aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above-described second aspect or any of the possible implementations of the second aspect.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path.
  • the processor is operative to execute code in the memory, the processor being operative to perform the method of any of the first aspect or the first aspect of the first aspect when the code is executed.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path Communicating with each other, the processor is operative to execute code in the memory, and when the code is executed, the processor is operative to perform the method of any of the second aspect or the second aspect of the second aspect.
  • FIG. 1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application.
  • FIG. 2 is a schematic block diagram of a method for generating an image library according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of an image retrieval apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an apparatus for generating an image library according to an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of another image retrieval apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of another image library generating apparatus according to an embodiment of the present application.
  • the visual feature points of an image refer to pixels that are consistent in the transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corner points or texture-rich edge points.
  • the quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
  • the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
  • SIFT scale-invariant feature transform
  • ORB ORB
  • SURF speeded up robust feature
  • FAST FAST
  • the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
  • the main steps of extracting the FAST corner point of the image include: calculating the difference between the brightness of each pixel in the image and its neighboring pixels, if the pixel has a large difference from the pixels in its neighborhood, Then it is more likely to be a corner point; then by non-maximum suppression, only the corner points of the response maxima are retained in a certain area, avoiding the problem of corner point concentration; for the FAST corner point, there is no directionality and scale weakness, Add a description of the scale and rotation. Scale invariance is achieved by constructing an image pyramid, downsampling the image at different levels, and obtaining images of different resolutions. The rotation invariance is realized by the gray scale centroid method, that is, the direction vector obtained by calculating the centroid of the gray value of the image block and the geometric center connection is used as the description of the feature point direction.
  • the visual feature descriptor of an image refers to a visual feature point that describes an image by mathematical features.
  • the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or 1 coding; using visual information point direction information to rotate the visual feature points to obtain robust binary vector vision Feature descriptor.
  • the visual feature descriptor of the image may be one or more embodiments of the present application.
  • the visual word bag model includes a plurality of visual words, each of the plurality of visual words being a cluster center obtained by clustering visual feature descriptors extracted from the plurality of images.
  • the visual word of the image refers to a visual word in the visual word bag model that is closest to the visual feature descriptor by matching and mapping the visual feature descriptor of the image with the visual word in the visual bag model.
  • the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
  • an image category of each image can be obtained.
  • the image categories of the images may include forest scenes, suburban scenes, indoor scenes, and the like.
  • the image categories of the images may include sunny, rainy, snowy, and the like.
  • the same visual words appearing in different images may have different effects on recognizing the image
  • the same visual words appearing in different images may have the same effect on recognizing the two images
  • the visual stop words corresponding to the image categories refer to A visual word that has no significant effect on an image that identifies a certain image category, or that affects image recognition, that is, a visual word that is unrelated to the image of the image category.
  • the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
  • the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
  • the trees can be Visual stop words for forest or suburban categories.
  • the image will leave traces of rain falling.
  • the feature points extracted from the rainwater in the image will also cause pollution to the word representation of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
  • the target visual word of the image includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
  • the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
  • the positive sample image set includes artificially labeled images that can be considered as high or similar.
  • shooting two images of the same object in different scenes for example, a rainy school and a snowy school.
  • two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
  • the applicable scenarios of the embodiments of the present application include instant localization and map construction (simultaneous localization and Loop closure in mapping, SLAM), product image retrieval in e-commerce, etc.
  • the loop detection detects the scenes that have appeared in the history, uses the current image to retrieve and recognize the loop, constructs a constraint of the current pose and the historical pose, and reduces the overall error by optimization to obtain a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • FIG. 1 is a schematic flowchart of an image retrieval method 100 provided by an embodiment of the present application. The method can be performed by an image retrieval device.
  • the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
  • S120 Determine, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, where the visual stop word corresponding to the image category of the image to be retrieved includes Retrieving a visual word irrelevant for an image category of the image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved.
  • the visual stop words corresponding to the image categories of the image to be retrieved are removed from the plurality of visual words of the image to be retrieved, and the target visual words of the image to be retrieved are obtained.
  • S140 Determine a search result according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
  • the image retrieval device may acquire a plurality of visual words of the image to be retrieved in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire an image to be retrieved, and extract a plurality of visual feature descriptors of the image to be retrieved, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the image to be retrieved. And the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is obtained, and the distance between the visual word bag model and each of the plurality of visual feature descriptors is the closest.
  • a plurality of visual words are determined as a plurality of visual words of the image to be retrieved.
  • the visual word bag model may be an existing trained visual word bag model, or may be obtained by clustering the visual feature descriptors of the training images in the training picture set by the image retrieval device. This example does not limit this.
  • the image retrieval device may obtain the image to be retrieved in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
  • the image to be retrieved obtained by the image retrieving device may be an image after de-distortion, denoising, or other pre-processing operations, which is not limited in this embodiment of the present application.
  • the image retrieving device may obtain the image category information of the image to be retrieved in a plurality of manners, which is not limited in this embodiment of the present application.
  • the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and an image classification model, where the image classification model includes the image to be retrieved and the image of the image to be retrieved.
  • the mapping relationship of categories may be determined by the image retrieval device.
  • the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and a preset classification algorithm.
  • the image retrieval device may acquire image category information of the image to be retrieved manually labeled.
  • the image category information of the image to be retrieved may be one or more bits, that is, the image type of the image to be retrieved is indicated by the one or more bits, which is not limited in this embodiment of the present application.
  • the image category information of the image to be retrieved may be 2 bits. For example, when the 2 bits are “00”, the image to be retrieved is indicated as the first type of image, and when the 2 bits are “01”. The image to be retrieved is indicated as a second type of image. When the 2 bits are "10”, the image to be retrieved is indicated as a third type of image, and when the 2 bits are "11”, the image to be retrieved is indicated as a fourth type of image.
  • the image retrieval device may acquire the stop word dictionary before S120.
  • the stop word dictionary may include a mapping of an identifier of each of the plurality of image categories and a visual stop word corresponding to the identifier of each of the image categories.
  • the image retrieving device may generate the stop word vocabulary by itself, or may acquire the stop word vocabulary from the image library generating device, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image collection information, where the plurality of visual words of each training image are Obtaining a map by matching the plurality of visual feature descriptors of each training image with the visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image
  • the positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; and a plurality of visual words according to the each training image
  • the image category information of each training image and the positive sample image set information generating a stop word vocabulary, the stop word vocabulary including the image category of each training image and corresponding to the image category of each image a mapping relationship between visual stop words, the visual stop words corresponding to the image categories of each training image are included with each training map Class independent visual image of the word.
  • the image retrieval device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, which may be according to each And determining, by the plurality of visual words corresponding to the training image, the image category information of the each training image, and the positive sample image set information, determining a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the image retrieval device may determine the first image category and the first vision according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between words, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library
  • the plurality of visual words of the first training image and the plurality of visual words of the second training image of the training image library each include the first visual word
  • the image category of the first training image is the first image category
  • the first The second event indicates that the first training image and the second training image belong to the same positive sample image set.
  • x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image
  • Each of the visual words includes a first visual word of the L visual words
  • y represents a second event
  • the second event is that the first training image and the second training image belong to the same positive sample image set
  • count(x ) indicates the number of times the first event occurred
  • count(y) indicates the number of times the second event occurred
  • count(x, y) indicates the number of simultaneous occurrences of the first event and the second event
  • p(x) indicates the occurrence of the first event.
  • Probability, p(y) represents the probability of occurrence of the second event
  • p(x, y) represents the probability that the first event and the second event occur simultaneously
  • PMI(x, y) represents the mutual point of the first event and the second event
  • H(y) represents the information entropy of the second event
  • RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category.
  • the image retrieval device may obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
  • the image retrieval device may acquire first information including an identifier of each positive sample image set of the plurality of positive sample image sets and a training image included in each of the positive sample image sets The mapping relationship between the identifiers, the image retrieval device may acquire the positive sample image collection information according to the first information.
  • the image retrieving device may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop word corresponding to each image category.
  • the image retrieval device may use, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold.
  • the visual stop word for the category may be used, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold.
  • the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
  • the image retrieval device may acquire the retrieval image library before S140.
  • the search image library includes a plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images.
  • the plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
  • the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
  • the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
  • the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
  • the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images.
  • the visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods.
  • the application embodiment does not limit this.
  • the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
  • the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
  • the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
  • the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved At least the highest degree A search image is determined as the search result.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
  • the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, Removing a visual word that has no significant effect on the recognition of the search image or affects image recognition from a plurality of visual words of the image to be retrieved, that is, a comparison of the target visual words of the image to be retrieved for identifying the image to be retrieved Significant. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image.
  • the memory usage of the library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image.
  • FIG. 2 is a schematic flowchart of a method 200 for generating an image library according to an embodiment of the present disclosure.
  • the method 200 may be performed by a device for generating an image library, which is not limited by the embodiment of the present application.
  • the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
  • the second device and the first device in FIG. 1 may be the same device or different devices, which is not limited in this embodiment of the present application.
  • the image library generating device and the image searching device described in FIG. 1 are different functional modules in the same device, or the image library generating device is A functional module in the image retrieval device.
  • a stop word vocabulary includes each training image a mapping relationship between the image categories and the visual stop words corresponding to the image categories of the each image, the visual stop words corresponding to the image categories of each of the training images are independent of the image categories of the each training image Visual word.
  • the method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image.
  • the efficiency and precision of the search by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image.
  • the efficiency and precision of the search by acquiring
  • the generating device of the image library may acquire multiple views of the training image in multiple manners.
  • the embodiment of the present application does not limit this.
  • the image training device may acquire a training image, and extract a plurality of visual feature descriptors of the training image, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the training image,
  • the visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and obtain a visual word bag model, and the plurality of visual word bag models are closest to each of the plurality of visual feature descriptors.
  • the visual word bag model may also be an existing trained visual word bag model, or may be obtained by the image generating device itself by clustering a plurality of visual feature descriptors corresponding to the plurality of images.
  • the embodiment of the present application does not limit this.
  • the image generating device of the image library can obtain the training image in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
  • the multiple images obtained by the generating device of the image library may be images after de-distortion, de-noising, or other pre-processing operations, which are not limited in this embodiment of the present application.
  • the generating device of the image library may obtain the image category information of the training image in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image library generating device may determine image category information of the training image according to the training image and the image classification model, where the image classification model includes a mapping relationship between the training image and an image category of the training image. .
  • the image library generating device may determine image category information of the training image according to the training image and a preset classification algorithm.
  • the image library generating device may acquire image category information of the training image manually labeled.
  • the image category information of the training image may be one or more bits, that is, the image type of the training image is indicated by the one or more bits, which is not limited in this embodiment of the present application.
  • the image category information of the training image may be 2 bits. For example, when the 2 bits are “00”, the training image is indicated as a first type of image, and when the 2 bits are “01”, the The training image is a second type of image. When the 2 bits are "10”, the training image is indicated as a third type of image, and when the 2 bits are "11”, the training image is indicated as a fourth type of image.
  • the image library generating device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information. Determining, according to the plurality of visual words corresponding to each training image, the image category information of the each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and the plurality of training image libraries Correlation between visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library is generated to generate the stop word vocabulary.
  • the generating device of the image library may determine the first image category and the first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between a visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library Multiple visual words in the first training image And the plurality of visual words of the second training image in the training image library respectively include the first visual word, the image category of the first training image is the first image category, and the second event represents the first training image and The second training image belongs to the same positive sample image set.
  • the correlation of the training images can be determined by the above formulas (1) to (6).
  • the generating device of the image library can obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
  • the generating device of the image library may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
  • the generating device of the image library may acquire first information, where the first information includes an identifier of each positive sample image set of the plurality of positive sample image sets and the each positive sample image set includes And a mapping relationship between the identifiers of the training images, the generating device of the image library may acquire the positive sample image set information according to the first information.
  • the generating device of the image library may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop corresponding to each image category. Use words.
  • the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. Visual stop words corresponding to image categories.
  • the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
  • the generating device of the image library may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from Removing a visual stop word corresponding to the image category of each training image from the plurality of visual words of each training image, obtaining a target visual word of each training image, and adding a target visual word of each training image To the search image library.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • FIG. 3 is a schematic block diagram of an image retrieval apparatus 300 provided by an embodiment of the present application.
  • the device 300 includes:
  • the acquiring unit 310 is configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual features of the image to be retrieved The visual word in the word bag model is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
  • the processing unit 320 is configured to determine, according to the image category information of the image to be retrieved acquired by the acquiring unit 310 and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and an image category of the image to be retrieved
  • the corresponding visual stop word includes a visual word irrelevant to the image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved a mapping relationship corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit 310, to obtain a target visual word of the image to be retrieved;
  • the searching unit 330 is configured to determine a search result according to the target visual word and the search image library of the image to be retrieved obtained by the processing unit 320, where the search image library includes a plurality of search images.
  • the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are from each of the And obtaining a visual stop word corresponding to the image category of each of the search images among the plurality of visual words corresponding to the image.
  • the device further includes a generating unit, where the acquiring unit is further configured to: before determining the visual stop word corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word dictionary Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped to the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used.
  • the positive sample image set includes a plurality of similar training images in the training image library manually labeled; the generating unit is configured to use, according to the plurality of visual words of each training image, each The image category information of the training image and the positive sample image collection information generate the stop word vocabulary.
  • the generating unit is specifically configured to: determine, according to the multiple visual words of each training image, image category information of each training image, and the positive sample image set information, multiple image categories of the training image library. Correlation with a plurality of visual words of the training image library, the plurality of image categories of the training image library including image categories of the each training image, the plurality of visual words of the training image library including the each training image a plurality of visual words; generating the stop word lexicon according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
  • the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set.
  • the searching unit is specifically configured to: determine a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library; and the target visual word of the image to be retrieved The at least one search image whose similarity is greater than the first preset value is determined as the search result.
  • the image retrieval device 300 herein is embodied in the form of a functional unit.
  • the term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (eg, a shared processor, a proprietary processor, or a group) for executing one or more software or firmware programs. Processors, etc.) and memory, merge logic, and/or other suitable components that support the described functionality.
  • ASIC application specific integrated circuit
  • the image retrieval device 300 can be specifically the image retrieval device in the foregoing method 100 and the method 200.
  • the image retrieval device 300 can be used to execute the image retrieval device corresponding to the image retrieval device in the method 100 and the method 200 described above. The various processes and/or steps are not repeated here to avoid repetition.
  • FIG. 4 is a schematic block diagram of an image library generating apparatus 400 provided by an embodiment of the present application.
  • the apparatus 400 includes:
  • the acquiring unit 410 is configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are A plurality of visual feature descriptors of each training image are matched and mapped with visual words in the visual word bag model, and image category information of each training image is used to indicate an image category of each training image, the positive
  • the sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;
  • the generating unit 420 is configured to generate a stop word vocabulary according to the plurality of visual words of the each training image acquired by the acquiring unit 410, the image category information of the each training image, and the positive sample image set information, where the stoppage is generated.
  • the word dictionary includes a mapping relationship between the image category of each training image and a visual stop word corresponding to the image category of each training image, and the visual stop words corresponding to the image category of each training image include A visual word that is independent of the image category of each training image.
  • the generating unit is configured to: determine, according to the multiple visual words corresponding to each training image, the image category information of each training image, and the positive sample image set information, multiple images of the training image library. a correlation between a category and a plurality of visual words of the training image library, the plurality of image categories of the training image library including an image category of the each training image, the plurality of visual words of the training image library including the each training a plurality of visual words of the image; generating the stop word lexicon according to a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library.
  • the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set
  • the generating unit is further configured to: after generating the stop word lexicon according to the plurality of visual words according to the each training image, the image category information of the each training image, and the positive sample image set information, according to Determining, by the image category information of each training image, the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing each training from the plurality of visual words of each training image A visual stop word corresponding to the image category of the image, a target visual word of each training image is obtained, and the target visual word of each training image is added to the search image library.
  • the acquiring unit is configured to acquire the each training image, and extract a plurality of visual feature descriptors of each training image, where the multiple visual feature descriptors are used to describe multiple visions of each training image.
  • a feature point the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and a visual word bag model is obtained, and each visual feature descriptor in the visual word bag model and the plurality of visual feature descriptors is obtained
  • the plurality of closest visual words are determined as a plurality of visual words for each of the training images.
  • the image library generating apparatus 400 herein is embodied in the form of a functional unit.
  • the term "unit” here May be referred to as an ASIC, an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a proprietary processor, or a group processor, etc.) and memory, merge logic, and/or other support described.
  • the right component for the function eg, those skilled in the art may understand that the image library generating apparatus 400 may be specifically the image library generating apparatus in the foregoing method 100 and the method 100 embodiment, and the image library generating apparatus 400 may be configured to execute the above.
  • the various processes and/or steps corresponding to the image library generating device in the method 100 and the method 200 are not repeated here to avoid repetition.
  • FIG. 5 is a schematic block diagram of an image retrieval device 500 provided by an embodiment of the present application.
  • the image retrieval device 500 may be the image retrieval device described in FIG. 1 and FIG. 2, and the image retrieval device may adopt the image retrieval device as shown in FIG.
  • the image retrieval device can include a processor 510, a communication interface 520, and a memory 530 that communicate with one another via internal connection paths.
  • the related functions implemented by the processing unit 320 and the retrieval unit 330 in FIG. 3 may be implemented by the processor 510, and the related functions implemented by the acquisition unit 310 may be implemented by the processor 510 controlling the communication interface 520.
  • the processor 510 may include one or more processors, for example, including one or more central processing units (CPUs).
  • processors for example, including one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single core CPU, and It can be a multi-core CPU.
  • the communication interface 520 is for transmitting and/or receiving data.
  • the communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
  • the memory 530 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • the memory 530 is used to store program code and data of the image retrieval device, and may be a separate device or integrated in the processor 510.
  • the processor 510 is configured to control the communication interface to perform data transmission with other devices, such as a generating device of the image library.
  • other devices such as a generating device of the image library.
  • Figure 5 only shows a simplified design of the image retrieval device.
  • the image retrieval device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all image retrieval devices that can implement the present application are in the present application. Within the scope of protection.
  • image retrieval device 500 can be replaced with a chip device, such as a chip that can be used in an image retrieval device for implementing related functions of processor 510 in an image retrieval device.
  • the chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip.
  • the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
  • FIG. 6 is a schematic block diagram of an image library generating apparatus 600 provided by an embodiment of the present application.
  • the image library generating apparatus 600 may be the image library generating apparatus described in FIG. 1 and FIG. 2, and the image library is
  • the generating device can adopt a hardware architecture as shown in FIG. 6.
  • the image library generating means may include a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 communicate with each other through an internal connection path.
  • the related functions implemented by the generating unit 420 in FIG. 4 may be implemented by the processor 610, and the correlation implemented by the obtaining unit 410
  • the functionality may be implemented by the processor 610 controlling the communication interface 620.
  • the processor 610 may include one or more processors, for example, including one or more central processing units (CPUs).
  • processors for example, including one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single core CPU, It can be a multi-core CPU.
  • the communication interface 620 is for transmitting and/or receiving data.
  • the communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
  • the memory 630 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • the memory 630 is used to store the program code and data of the generating means of the image library, and may be a separate device or integrated in the processor 610.
  • the processor 610 is configured to control the communication interface to perform data transmission with other devices, such as an image retrieval device.
  • other devices such as an image retrieval device.
  • Figure 6 only shows a simplified design of the image library generation device.
  • the image library generating device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all generating devices that can implement the image library of the present application. All are within the scope of this application.
  • the image library generating device 600 may be replaced with a chip device, for example, a chip that can be used in a generating device of an image library for implementing related functions of the processor 610 in the image generating device.
  • the chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip.
  • the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple networks. On the unit. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

An image retrieval method and device, and an image library generation method and device, the image retrieval method comprising: acquiring a plurality of visual words of an image to be retrieved and image category information of the image to be retrieved; determining, according to the image category information of the image to be retrieved and a stop word lexicon, visual stop words corresponding to the image category of the image to be retrieved, the visual stop words corresponding to the image category of the image to be retrieved comprising visual words which are not relevant to the image category of the image to be retrieved; removing, from the plurality of visual words of the image to be retrieved, the visual stop words corresponding to the image category of the image to be retrieved so as to obtain target visual words of the image to be retrieved (S130); and determining a retrieval result according to the target visual words of the image to be retrieved and a retrieval image library, the retrieval image library comprising a plurality of retrieval images (S140). Thus, the efficiency and accuracy of image retrieval may be improved.

Description

图像检索方法和装置以及图像库的生成方法和装置Image retrieval method and device, and image library generation method and device 技术领域Technical field
本申请涉及图像检索技术领域,更具体地,涉及图像检索技术领域中图像检索方法和装置以及图像库的生成方法和装置。The present application relates to the field of image retrieval technology, and more particularly to an image retrieval method and apparatus and an image library generation method and apparatus in the field of image retrieval technology.
背景技术Background technique
视觉词袋(bag of visual words,BoVW)模型被广泛应用到图像检索领域,视觉词袋模型包括多个视觉词,该多个视觉词是对从多个图像提取的多个视觉特征描述子进行聚类得到的,该多个视觉词中的每个视觉词为一个聚类中心。The bag of visual words (BoVW) model is widely applied to the field of image retrieval. The visual word bag model includes a plurality of visual words, which are performed on a plurality of visual feature descriptors extracted from a plurality of images. Clustered, each of the plurality of visual words is a cluster center.
现有的图像检索过程中,首先获取待检索图像的多个视觉特征描述子,将该多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射,得到该待检索图像的多个视觉词,该多个视觉词用于表示该待检索图像,根据该待检索图像的多个视觉词,计算该待检索图像与检索图像库中的检索图像的相似度,将该检索图像库中与该待检索图像的相似度最高的至少一个图像作为图像检索结果输出。In the existing image retrieval process, a plurality of visual feature descriptors of the image to be retrieved are first acquired, and the plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model to obtain a plurality of images to be retrieved. a visual word, the plurality of visual words being used to represent the image to be retrieved, and calculating a similarity between the image to be retrieved and the search image in the search image library according to the plurality of visual words of the image to be retrieved, the search image library At least one image having the highest degree of similarity with the image to be retrieved is output as an image retrieval result.
然而,当该待检索图像的内容杂乱繁多,或该待检索图像蕴含的信息量较大的时候,该待检索图像的多个视觉词的数量较大,因此,在进行图像检索的时候,效率比较低,精确度也比较差。However, when the content of the image to be retrieved is disorderly, or the amount of information contained in the image to be retrieved is large, the number of multiple visual words of the image to be retrieved is large, and therefore, when performing image retrieval, efficiency It is low and the accuracy is poor.
发明内容Summary of the invention
本申请提供一种图像检索方法和装置以及图像处理方法和装置,有利于提高图像检索的效率和精确度。The present application provides an image retrieval method and apparatus, and an image processing method and apparatus, which are advantageous for improving the efficiency and accuracy of image retrieval.
第一方面,本申请提供了一种图像检索方法,该方法包括:In a first aspect, the present application provides an image retrieval method, the method comprising:
获取待检索图像的多个视觉词和该待检索图像的图像类别信息,该待检索图像的多个视觉词是通过将该待检索图像的多个视觉特征描述子与视觉词袋模型中的视觉词进行匹配映射得到的,该待检索图像的图像类别信息用于指示该待检索图像的图像类别;Obtaining a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, wherein the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual feature descriptors of the image to be retrieved and the visual word bag model Obtaining a mapping result of the word, the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
根据该待检索图像的图像类别信息和停用词词库,确定该待检索图像的图像类别对应的视觉停用词,该待检索图像的图像类别对应的视觉停用词包括与该待检索图像的图像类别无关的视觉词,该停用词词库包括该待检索图像的图像类别和该待检索图像的图像类别对应的视觉停用词之间的映射关系;Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, the visual stop word corresponding to the image category of the image to be retrieved includes the image to be retrieved The image category-independent visual word includes a mapping relationship between the image category of the image to be retrieved and the visual stop word corresponding to the image category of the image to be retrieved;
从该待检索图像的多个视觉词中除去该待检索图像的图像类别对应的视觉停用词,得到该待检索图像的目标视觉词;Removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, to obtain a target visual word of the image to be retrieved;
根据该待检索图像的目标视觉词和检索图像库,确定检索结果,该检索图像库中包括多个检索图像。A search result is determined according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
本申请实施例提供的图像检索方法,该待检索图像的目标视觉词是通过从该待检索图像的多个视觉词中除去该待检索图像的图像类别对应的视觉停用词后得到的,即从该待检 索图像的多个视觉词中除去了对于辨识该检索图像无显著作用、或影响图像识别的视觉词,也就是说,该待检索图像的目标视觉词对于识别该待检索图像的作用比较显著。因此,通过该待检索图像的目标视觉词与检索图像库进行检索,有利于提高图像检索的效率和精确度。The image retrieval method provided by the embodiment of the present application, the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, From the pending The plurality of visual words of the cable image remove visual words that have no significant effect on the recognition of the search image or affect image recognition, that is, the target visual words of the image to be retrieved are more significant for identifying the image to be retrieved. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
在一种可能的实现方式中,该检索图像库中还包括该多个检索图像中每个检索图像对应的目标视觉词,该每个检索图像对应的目标视觉词是从该每个检索图像对应的多个视觉词中除去该每个检索图像的图像类别对应的视觉停用词后得到的。In a possible implementation, the search image library further includes target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images. The plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
本申请实施例提供的图像检索方法,该检索图像库中存储的检索图像的目标视觉词是通过从检索图像的多个视觉词中除去了该检索图像的图像类别对应的视觉停用词后得到的,有利于减少该检索图像库内存的占用率。In the image retrieval method provided by the embodiment of the present application, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
此外,根据待检索图像的目标视觉词与检索图像库中检索图像的目标视觉词的相似度,确定检索结果,有利于提高图像检索的效率和精确度。In addition, determining the retrieval result according to the similarity between the target visual word of the image to be retrieved and the target visual word of the retrieved image in the search image library is beneficial to improving the efficiency and accuracy of the image retrieval.
应理解,该图像检索装置可以为具有计算和存储功能的第一设备,该第一设备例如可以为计算机,或者该图像检索装置可以为第一设备中的功能模块,本申请实施例对此不作限定。It should be understood that the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
还应理解,本申请实施例中的图像的视觉特征点,是指图像经比例、旋转、平移、视角等变换还能保持一致性的像素点,即图像中最容易识别的像素点,例如角点或者纹理丰富的边缘点。图像的视觉特征点质量好坏将直接影响图像检索的效率和精度。It should also be understood that the visual feature points of the image in the embodiment of the present application refer to pixels that are consistent in image transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corners. Point or texture rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
可选地,图像的视觉特征点的类型可以包括尺度不变特征转换(scale-invariant feature transform,SIFT)、ORB、加速稳健特征(speeded up robust feature,SURF)、加速分割测试获得特征(features from accelerated segment test,FAST)等,本申请实施例对此不作限定。Optionally, the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
可选地,图像的视觉特征点可以为一个或多个,本申请实施例对此不作限定。Optionally, the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
还应理解,本申请实施例中的图像的视觉特征描述子,是指通过数学特征描述图像的视觉特征点。It should also be understood that the visual feature descriptor of the image in the embodiment of the present application refers to a visual feature point of the image through the mathematical feature.
例如,以ORB为例,获取图像的视觉特征描述子的主要步骤包括:在图像的视觉特征点附近随机选取若干个像素对,通过比较每个像素对中两个像素之间的大小关系得到0或1的编码;利用视觉特征点方向的信息,将视觉特征点旋转得到鲁棒的二进制向量视觉特征描述子。For example, taking the ORB as an example, the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or coding of 1; using the information of the direction of the visual feature point, the visual feature point is rotated to obtain a robust binary vector visual feature descriptor.
可选地,图像的视觉特征描述子可以为一个或多个本申请实施例对此不作限定。Optionally, the visual feature descriptor of the image may be one or more embodiments of the present application.
还应理解,本申请实施例中的视觉词袋模型包括多个视觉词,该多个视觉词中的每个视觉词是对从多个图像中提取的视觉特征描述子进行聚类后得到的一个聚类中心。It should also be understood that the visual word bag model in the embodiment of the present application includes a plurality of visual words, each of the plurality of visual words being obtained by clustering visual feature descriptors extracted from the plurality of images. A clustering center.
还应理解,本申请实施例中的图像的视觉词,是指通过将该图像的视觉特征描述子和视觉词袋模型中的视觉词进行匹配映射,得到的该视觉词袋模型中与该视觉特征描述子距离最近的视觉词。It should also be understood that the visual word of the image in the embodiment of the present application refers to the visual mapping between the visual feature descriptor of the image and the visual word in the visual word bag model, and the visual word bag model and the visual image are obtained. The feature describes the nearest visual word.
可选地,图像的视觉词可以为一个或多个,本申请实施例对此不作限定。Optionally, the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
还应理解,本申请实施例中根据不同的分类方法对多个图像进行分类,能够得到每个图像的图像类别。It should also be understood that, in the embodiment of the present application, a plurality of images are classified according to different classification methods, and an image category of each image can be obtained.
作为一个可选实施例,如果按照场景对图像进行分类,则图像的图像类别可以包括森 林场景、郊区场景、室内场景等。As an alternative embodiment, if the images are classified according to the scene, the image categories of the images may include Mori Forest scenes, suburban scenes, indoor scenes, etc.
作为另一个可选实施例,如果按照天气对图像进行分类,则图像的图像类别可以包括晴天、雨天、雪天等。As another alternative embodiment, if the images are classified by weather, the image categories of the images may include sunny, rainy, snowy, and the like.
还应理解,由于同一个图像中出现的不同视觉词可能对辨识该图像具有不同的作用,不同图像中出现的相同视觉词可能对辨识这两个图像具有相同的作用,因此,本申请实施例中图像类别对应的视觉停用词是指对于辨识某种图像类别的图像无显著作用、或影响图像识别的视觉词,即与该图像类别的图像无关的视觉词。It should also be understood that since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images. A visual stop word corresponding to a medium image category refers to a visual word that has no significant effect on an image that recognizes a certain image category, or affects image recognition, that is, a visual word that is not related to an image of the image category.
应理解,本申请实施例中所述的与某种图像类别无关的视觉停用词,是指与该种图像类别的图像的相关性低于预设阈值的视觉词。It should be understood that the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
可选地,图像类别对应的视觉停用词可以包括一个或多个视觉词,本申请实施例对此不作限定。Optionally, the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
例如,在森林场景和郊区场景下,几乎每个图像都包含大量的树木,从图像中的树木提取的特征点对辨识该图像是森林场景还是郊区场景的辨识度较低,因此,树木可以为森林类或郊区类的视觉停用词。For example, in forest scenes and suburban scenes, almost every image contains a large number of trees. The feature points extracted from the trees in the image are less recognizable to identify whether the image is a forest scene or a suburban scene. Therefore, the trees can be Visual stop words for forest or suburban categories.
又例如,在雨天,图像中会留下雨水降落的痕迹,从图像中的雨水提取的特征点也会对该图像的多个视觉词造成污染,因此,雨水可以为雨天的视觉停用词。For another example, on rainy days, the image will leave traces of rain falling. The feature points extracted from the rainwater in the image will also pollute multiple visual words of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
还应理解,本申请实施例中的图像的目标视觉词包括从该图像的多个视觉词中除去该图像的图像类别对应的视觉停用词后的视觉词。It should also be understood that the target visual word of the image in the embodiment of the present application includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
可选地,图像的目标视觉词可以包括一个或多个视觉词,本申请实施例对此不作限定。Optionally, the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
还应理解,本申请实施例中的正样本图像集合中包括人工标注的可以被认为是相似度高或相同的图像。It should also be understood that the positive sample image set in the embodiment of the present application includes artificially labeled images that can be considered to be of high similarity or the same.
例如,在不同场景拍摄同一物体的两个图像,例如,雨天的学校和雪天的学校。For example, shooting two images of the same object in different scenes, for example, a rainy school and a snowy school.
例如,在不同时刻拍摄同一场景的两个图像,例如回环检测中同一场景的当前位姿和历史位姿。For example, two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
在一种可能的实现方式中,在该根据该待检索图像的图像类别信息和停用词词库,确定该待检索图像的图像类别对应的视觉停用词之前,该方法还包括:获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像;根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,生成该停用词词库。In a possible implementation manner, before determining the visual stop word corresponding to the image category of the image to be retrieved according to the image category information of the image to be retrieved and the stop word dictionary, the method further includes: acquiring training a plurality of visual words of each training image in the image library, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by multiple visuals of each of the training images The feature descriptor is matched with the visual word in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used to indicate at least a positive sample image set including a plurality of similar training images in the training image library manually labeled; a plurality of visual words according to the each training image, image category information and positive of each of the training images The sample image collection information generates the stop word vocabulary.
在一种可能的实现方式中,该根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,生成该停用词词库,包括:根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。 In a possible implementation manner, the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, including: according to the Determining a plurality of visual words of the training image, image category information of the each training image, and the positive sample image set information, determining correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
作为一个可选实施例,该图像检索装置可以将该训练图像库对应的多个视觉词中与该训练图像库对应每种图像类别的相关性最小的至少一个视觉词,作为该每种图像类别对应的视觉停用词。As an optional embodiment, the image retrieval device may use, as the each image category, at least one visual word of the plurality of visual words corresponding to the training image library that has the least correlation with each image category corresponding to the training image library. Corresponding visual stop words.
作为另一个可选实施例,该图像检索装置可以将该训练图像库对应的多个视觉词中与该训练图像库对应每种图像类别的相关性小于第一预设阈值的至少一个视觉词,作为该每种图像类别对应的视觉停用词。As another optional embodiment, the image retrieving device may: the at least one visual word whose correspondence between the plurality of visual words corresponding to the training image library and each of the image categories corresponding to the training image library is less than a first preset threshold, A visual stop word corresponding to each of the image categories.
可选地,每种图像类别对应的视觉停用词可以包括一个或多个视觉词,本申请实施例对此不作限定。Optionally, the visual stop words corresponding to each image category may include one or more visual words, which are not limited by the embodiment of the present application.
可选地,该停用词词库可以包括该每种图像类别和与该每种图像类别对应的视觉停用词的映射关系。Optionally, the stop word dictionary may include a mapping relationship between each image category and a visual stop word corresponding to each image category.
应理解,由于同一个图像中出现的不同视觉词可能对辨识该图像具有不同的作用,不同图像中出现的相同视觉词可能对辨识这两个图像具有相同的作用。It should be understood that since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images.
在一种可能的实现方式中,该根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,包括:根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合。In a possible implementation manner, determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and The correlation between the plurality of visual words of the training image library includes: determining a first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously a correlation between the category and the first visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event indicating the The plurality of visual words of the first training image in the training image library and the plurality of visual words of the second training image in the training image library each include the first visual word, and the image category of the first training image is the first An image category, the second event indicating that the first training image and the second training image belong to the same positive sample image set.
例如,假设该训练图像集中共有P个训练图像,M种图像类别,L个视觉词,该P个训练图像中第一类别的训练图像共有N个,则第一视觉词和该第一图像类别的训练图像的相关性可以通过公式(1)至公式(6)确定:For example, assuming that there are P training images, M image categories, and L visual words in the training image set, and there are N training images of the first category in the P training images, the first visual word and the first image category. The correlation of the training images can be determined by equations (1) to (6):
Figure PCTCN2017112956-appb-000001
Figure PCTCN2017112956-appb-000001
Figure PCTCN2017112956-appb-000002
Figure PCTCN2017112956-appb-000002
Figure PCTCN2017112956-appb-000003
Figure PCTCN2017112956-appb-000003
Figure PCTCN2017112956-appb-000004
Figure PCTCN2017112956-appb-000004
Figure PCTCN2017112956-appb-000005
Figure PCTCN2017112956-appb-000005
Figure PCTCN2017112956-appb-000006
Figure PCTCN2017112956-appb-000006
其中,x表示第一事件,该第一事件为该N个训练图像中的第一训练图像的多个视觉 词和该P个训练图像中除该第一训练图像外的第二训练图像的多个视觉词均包括该L个视觉词中的第一视觉词,y表示第二事件,该第二事件为该第一训练图像和该第二训练图像属于同一个正样本图像集合,count(x)表示第一事件发生的次数,count(y)表示第二事件发生的次数,count(x,y)表示第一事件和第二事件同时发生的次数,p(x)表示第一事件发生的概率,p(y)表示第二事件发生的概率,p(x,y)表示第一事件与第二事件同时发生的概率,PMI(x,y)表示第一事件和第二事件的点互信息量,H(y)表示第二事件的信息熵,RATEPMI(x,y)表示第一事件和第二事件的点互信息率即该第一视觉词和该第一图像类别的相关性,其中,P、L、M、N均为大于1的正整数。Wherein, x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image Each of the visual words includes a first visual word of the L visual words, and y represents a second event, the second event is that the first training image and the second training image belong to the same positive sample image set, count(x ) indicates the number of times the first event occurred, count(y) indicates the number of times the second event occurred, count(x, y) indicates the number of simultaneous occurrences of the first event and the second event, and p(x) indicates the occurrence of the first event. Probability, p(y) represents the probability of occurrence of the second event, p(x, y) represents the probability that the first event and the second event occur simultaneously, and PMI(x, y) represents the mutual point of the first event and the second event The amount of information, H(y) represents the information entropy of the second event, and the RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category. Where P, L, M, and N are positive integers greater than one.
可选地,该图像检索装置可以自己生成该检索图像库,或者可以从图像库的生成装置获取该检索图像库,本申请实施例对此不作限定。Alternatively, the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
可选地,该检索图像库可以为根据该多个训练图像训练得到的,或者该检索图像库可以为根据该图像检索装置在本次检索之前检索的历史待检索图像训练得到的,或者可以为根据其他图像训练得到的,本申请实施例对此不作限定。Optionally, the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
作为一个可选实施例,该图像检索装置可以根据该多个训练图像中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该停用词词库,生成该检索图像库。也就是说,该检索图像库是根据该多个训练图像训练得到的。As an optional embodiment, the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
具体而言,该图像检索装置可以根据该每个训练图像的图像类别信息和该停用词词库,确定该每个训练图像的图像类别对应的视觉停用词,从该每个训练图像的多个视觉词中除去该每个训练图像的图像类别对应的视觉停用词,得到该每个训练图像的目标视觉词,并将该每个训练图像的目标视觉词添加至该检索图像库。Specifically, the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images. The visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
可选地,该图像检索装置还可以使用本申请实施例中的停用词词库获得该每个训练图像的目标视觉词,或者可以通过其他方式获得该每个训练图像的目标视觉词,本申请实施例对此不作限定。Optionally, the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods. The application embodiment does not limit this.
例如,电子商务中,该检索图像库包括用户提供的所有商品图像的目标视觉词,该检索图像为用户想要购买的商品图像。For example, in electronic commerce, the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
作为另一个可选实施例,该图像检索装置可以将在S140之前检索的历史待检索图像的目标视觉词添加至该检索图像库,以生成该检索图像库。As another alternative embodiment, the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
例如,回环检测中,检索图像库包括所有历史位姿图像,该待检索图像为当前位姿图像。For example, in loopback detection, the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
本申请实施例提供的图像检索方法,在回环检测场景中,能够通过保存历史出现过的场景,利用当前图像进行检索识别回环,构建当前位姿和历史位姿的一个约束,通过优化减小整体误差,以得到全局一致的地图。在电子商务场景中,在不清楚商品名称时,用户提交商品的图像,系统根据该商品的图像进行检索,并返回相似度较高的图像作为检索结果。In the loopback detection scenario, the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map. In an e-commerce scenario, when the product name is not known, the user submits an image of the product, the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
可选地,该图像检索装置可以根据多种方式从该检索图像库中选出与该待检索图像最相似的至少一个检索图像作为检索结果输出,本申请实施例对此不作限定。Optionally, the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像检索装置可以计算该待检索图像的多个视觉词与该检索图像库中的检索图像的多个视觉词之间的相似度,将与该待检索图像的相似度最高的至少一个检索图像确定为检索结果。 As an optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved The at least one search image having the highest degree is determined as the search result.
作为另一个可选实施例,该图像检索装置可以计算该待检索图像的多个视觉词与该检索图像库中的检索图像的多个视觉词之间的相似度,将相似度大于第二预设阈值的至少一个检索图像确定为检索结果。As another optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
本申请实施例提供的图像检索方法,该检索图像库中存储的检索图像的目标视觉词是通过从检索图像的多个视觉词中除去了该检索图像的图像类别对应的视觉停用词后得到的,有利于减少该检索图像库内存的占用率。In the image retrieval method provided by the embodiment of the present application, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
此外,根据待检索图像的目标视觉词与检索图像库中检索图像的目标视觉词的相似度,确定与该待检索图像相似的至少一个检索图像,以得到检索结果,有利于提高图像检索的效率和精确度。In addition, according to the similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library, at least one search image similar to the image to be retrieved is determined to obtain a search result, which is beneficial to improving the efficiency of image retrieval. And accuracy.
第二方面,本申请提供了一种图像处理方法,该方法包括:In a second aspect, the present application provides an image processing method, the method comprising:
获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像。Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are The visual feature descriptors are matched and mapped with the visual words in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used for At least one positive sample image set is indicated, the positive sample image set comprising a plurality of similar training images in the training image library manually annotated.
根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,该停用词词库包括该每个训练图像的图像类别和与该每个图像的图像类别对应的视觉停用词之间的映射关系,该每个训练图像的图像类别对应的视觉停用词包括与该每个训练图像的图像类别无关的视觉词。Generating a stop word vocabulary based on the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, the stop word vocabulary including the image of each training image a mapping relationship between the category and the visual stop word corresponding to the image category of each of the images, the visual stop words corresponding to the image category of each training image including visual words not related to the image category of each training image .
本申请实施例提供的数据库的生成方法,通过获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,并根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,有利于提高图像检索的效率和精确度。The method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image The plurality of visual words, the image category information of each of the training images, and the positive sample image set information generate a stop word vocabulary, which is beneficial to improving the efficiency and accuracy of image retrieval.
应理解,该图像库的生成装置可以为具有计算和存储功能的第二设备,该第二设备例如可以为计算机,或者该图像库生成装置可以为该第二设备中的功能模块,本申请实施例对此不作限定。It should be understood that the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
可选地,该第二设备与第一方面中的第一设备可以为相同的设备或不同的设备,本申请实施例对此不作限定。Optionally, the second device and the first device in the first aspect may be the same device or different devices, which is not limited in this embodiment of the present application.
可选地,该第二设备与该第二设备相同时,该图像库的生成装置与第一方面中的图像检索装置为同一个设备中的不同功能模块,或者该图像库的生成装置为该图像检索装置中的功能模块。Optionally, when the second device is the same as the second device, the image library generating device and the image searching device in the first aspect are different functional modules in the same device, or the image library generating device is A functional module in an image retrieval device.
在一种可能的实现方式中,该根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成该停用词词库,包括:根据该每个训练图像对应的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。In a possible implementation manner, the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, including: Determining a plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and a plurality of visual words of the training image library a correlation between the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; The dependency between the plurality of image categories and the plurality of visual words of the training image library generates the stop word vocabulary.
在一种可能的实现方式中,该根据该每个训练图像对应的多个视觉词、该每个训练图 像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,包括:根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合。In a possible implementation manner, the plurality of visual words corresponding to each training image, each training figure Determining, between the image category information of the image and the positive sample image set information, a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library, including: a probability according to the first event, Determining a probability of occurrence of the second event and a probability that the first event coincides with the second event, determining a correlation between the first image category and the first visual word, the plurality of image categories of the training image library including the first image a plurality of visual words of the training image library including the first visual word, the first event representing a plurality of visual words of the first training image in the training image library and a second training image in the training image library Each of the plurality of visual words includes the first visual word, the image category of the first training image is the first image category, and the second event indicates that the first training image and the second training image belong to the same positive sample image set .
在一种可能的实现方式中,在该根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库之后,该方法还包括:根据该每个训练图像的图像类别信息和该停用词词库,确定该每个训练图像的图像类别对应的视觉停用词,从该每个训练图像的多个视觉词中除去该每个训练图像的图像类别对应的视觉停用词,得到该每个训练图像的目标视觉词,并将该每个训练图像的目标视觉词添加至该检索图像库。In a possible implementation, after the stop word lexicon is generated according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information, the method The method further includes: determining, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing from the plurality of visual words of each training image The visual stop words corresponding to the image categories of each training image, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
在一种可能的实现方式中,该获取训练图像库中每个训练图像的多个视觉词,包括:获取该每个训练图像,提取该每个训练图像的多个视觉特征描述子,该多个视觉特征描述子用于描述该每个训练图像的多个视觉特征点,该多个视觉特征描述子与该多个视觉特征点一一对应,获取视觉词袋模型,将视觉词袋模型中与该多个视觉特征描述子中的每个视觉特征描述子的距离最近的多个视觉词,确定为该每个训练图像的多个视觉词。In a possible implementation, the acquiring a plurality of visual words of each training image in the training image library includes: acquiring each training image, and extracting a plurality of visual feature descriptors of each training image, the plurality of The visual feature descriptor is used to describe a plurality of visual feature points of each training image, and the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is acquired, and the visual word bag model is A plurality of visual words that are closest to the distance of each of the plurality of visual feature descriptors are determined as a plurality of visual words for each of the training images.
第三方面,本申请提供了一种图像检索装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。In a third aspect, the present application provides an image retrieval apparatus for performing the method of any of the above first aspect or any of the possible implementations of the first aspect.
第四方面,本申请提供了一种图像处理装置,用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。In a fourth aspect, the present application provides an image processing apparatus for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
第五方面,本申请提供了一种图像检索装置,该装置包括:存储器、处理器、通信接口及存储在该存储器上并可在该处理器上运行的计算机程序,其特征在于,该处理器执行该计算机程序时执行上述第一方面或第一方面的任意可能的实现方式中的方法。In a fifth aspect, the present application provides an image retrieval apparatus, the apparatus comprising: a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the processor The method of any of the above-described first aspects or any of the possible implementations of the first aspect is performed when the computer program is executed.
第六方面,本申请提供了一种图像处理装置,该装置包括:存储器、处理器、通信接口及存储在该存储器上并可在该处理器上运行的计算机程序,其特征在于,该处理器执行该计算机程序时执行上述第二方面或第二方面的任意可能的实现方式中的方法。In a sixth aspect, the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor The method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed when the computer program is executed.
第七方面,本申请提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。In a seventh aspect, the application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
第八方面,本申请提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第二方面或第二方面的任意可能的实现方式中的方法的指令。In an eighth aspect, the present application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of any of the second aspect or any of the possible implementations of the second aspect.
第九方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任意可能的实现方式中的方法。In a ninth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspect or the first aspect of the first aspect.
第十方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面的任意可能的实现方式中的方法。In a tenth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above-described second aspect or any of the possible implementations of the second aspect.
第十一方面,本申请提供了一种芯片,包括:输入接口、输出接口、至少一个处理器、存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路互 相通信,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。In an eleventh aspect, the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path. Mutual In communication, the processor is operative to execute code in the memory, the processor being operative to perform the method of any of the first aspect or the first aspect of the first aspect when the code is executed.
第十二方面,本申请提供了一种芯片,包括:输入接口、输出接口、至少一个处理器、存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路互相通信,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。In a twelfth aspect, the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path Communicating with each other, the processor is operative to execute code in the memory, and when the code is executed, the processor is operative to perform the method of any of the second aspect or the second aspect of the second aspect.
附图说明DRAWINGS
图1是本申请实施例的图像检索方法的示意性流程图;1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application;
图2是本申请实施例的图像库的生成方法的示意性框图;2 is a schematic block diagram of a method for generating an image library according to an embodiment of the present application;
图3是本申请实施例的图像检索装置的示意性框图;FIG. 3 is a schematic block diagram of an image retrieval apparatus according to an embodiment of the present application; FIG.
图4是本申请实施例的图像库的生成装置的示意性框图;4 is a schematic block diagram of an apparatus for generating an image library according to an embodiment of the present application;
图5是本申请实施例的另一图像检索装置的示意性框图;FIG. 5 is a schematic block diagram of another image retrieval apparatus according to an embodiment of the present application; FIG.
图6是本申请实施例的另一图像库的生成装置的示意性框图。FIG. 6 is a schematic block diagram of another image library generating apparatus according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in the present application will be described below with reference to the accompanying drawings.
为了清楚起见,首先对本申请中所使用的术语作以解释。For the sake of clarity, the terms used in this application are first explained.
1、图像的视觉特征点1, the visual feature points of the image
图像的视觉特征点是指图像经比例、旋转、平移、视角等变换还能保持一致性的像素点,即图像中最容易识别的像素点,例如角点或者纹理丰富的边缘点。图像的视觉特征点质量好坏将直接影响图像检索的效率和精度。The visual feature points of an image refer to pixels that are consistent in the transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corner points or texture-rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
可选地,图像的视觉特征点的类型可以包括尺度不变特征转换(scale-invariant feature transform,SIFT)、ORB、加速稳健特征(speeded up robust feature,SURF)、加速分割测试获得特征(features from accelerated segment test,FAST)等,本申请实施例对此不作限定。Optionally, the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
可选地,图像的视觉特征点可以为一个或多个,本申请实施例对此不作限定。Optionally, the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
例如,以ORB为例,对该图像进行FAST角点的提取的主要步骤包括:计算该图像中每个像素与其邻域像素亮度之间的差异,如果像素与它邻域的像素差别较大,则它更可能是角点;然后通过非极大值抑制,在一定区域内仅保留响应极大值的角点,避免角点集中的问题;针对FAST角点不具有方向性和尺度的弱点,添加尺度和旋转的描述。尺度不变性通过构建图像金字塔,对图像进行不同层次的降采样,获得不同分辨率的图像来实现。旋转不变性则通过灰度质心法实现,即通过计算图像块灰度值的质心和几何中心连接得到的方向向量来作为特征点方向的描述。For example, taking the ORB as an example, the main steps of extracting the FAST corner point of the image include: calculating the difference between the brightness of each pixel in the image and its neighboring pixels, if the pixel has a large difference from the pixels in its neighborhood, Then it is more likely to be a corner point; then by non-maximum suppression, only the corner points of the response maxima are retained in a certain area, avoiding the problem of corner point concentration; for the FAST corner point, there is no directionality and scale weakness, Add a description of the scale and rotation. Scale invariance is achieved by constructing an image pyramid, downsampling the image at different levels, and obtaining images of different resolutions. The rotation invariance is realized by the gray scale centroid method, that is, the direction vector obtained by calculating the centroid of the gray value of the image block and the geometric center connection is used as the description of the feature point direction.
2、图像的视觉特征描述子2, the visual feature descriptor of the image
图像的视觉特征描述子是指通过数学特征描述图像的视觉特征点。The visual feature descriptor of an image refers to a visual feature point that describes an image by mathematical features.
例如,以ORB为例,获取图像的视觉特征描述子的主要步骤包括:在图像的视觉特征点附近随机选取若干个像素对,通过比较每个像素对中两个像素之间的大小关系得到0或1的编码;利用视觉特征点方向的信息,将视觉特征点旋转得到鲁棒的二进制向量视觉 特征描述子。For example, taking the ORB as an example, the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or 1 coding; using visual information point direction information to rotate the visual feature points to obtain robust binary vector vision Feature descriptor.
可选地,图像的视觉特征描述子可以为一个或多个本申请实施例对此不作限定。Optionally, the visual feature descriptor of the image may be one or more embodiments of the present application.
3、视觉词袋模型3, visual word bag model
视觉词袋模型包括多个视觉词,该多个视觉词中的每个视觉词是对从多个图像中提取的视觉特征描述子进行聚类后得到的一个聚类中心。The visual word bag model includes a plurality of visual words, each of the plurality of visual words being a cluster center obtained by clustering visual feature descriptors extracted from the plurality of images.
4、图像的视觉词4, the visual words of the image
图像的视觉词是指通过将该图像的视觉特征描述子和视觉词袋模型中的视觉词进行匹配映射,得到的该视觉词袋模型中与该视觉特征描述子距离最近的视觉词。The visual word of the image refers to a visual word in the visual word bag model that is closest to the visual feature descriptor by matching and mapping the visual feature descriptor of the image with the visual word in the visual bag model.
可选地,图像的视觉词可以为一个或多个,本申请实施例对此不作限定。Optionally, the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
5、图像的图像类别5, the image category of the image
根据不同的分类方法对多个图像进行分类,能够得到每个图像的图像类别。By classifying a plurality of images according to different classification methods, an image category of each image can be obtained.
作为一个可选实施例,如果按照场景对图像进行分类,则图像的图像类别可以包括森林场景、郊区场景、室内场景等。As an optional embodiment, if the images are classified according to the scene, the image categories of the images may include forest scenes, suburban scenes, indoor scenes, and the like.
作为另一个可选实施例,如果按照天气对图像进行分类,则图像的图像类别可以包括晴天、雨天、雪天等。As another alternative embodiment, if the images are classified by weather, the image categories of the images may include sunny, rainy, snowy, and the like.
6、图像类别对应的视觉停用词6, the visual stop word corresponding to the image category
由于同一个图像中出现的不同视觉词可能对辨识该图像具有不同的作用,不同图像中出现的相同视觉词可能对辨识这两个图像具有相同的作用,图像类别对应的视觉停用词是指对于辨识某种图像类别的图像无显著作用、或影响图像识别的视觉词,即与该图像类别的图像无关的视觉词。Since different visual words appearing in the same image may have different effects on recognizing the image, the same visual words appearing in different images may have the same effect on recognizing the two images, and the visual stop words corresponding to the image categories refer to A visual word that has no significant effect on an image that identifies a certain image category, or that affects image recognition, that is, a visual word that is unrelated to the image of the image category.
应理解,本申请实施例中所述的与某种图像类别无关的视觉停用词,是指与该种图像类别的图像的相关性低于预设阈值的视觉词。It should be understood that the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
可选地,图像类别对应的视觉停用词可以包括一个或多个视觉词,本申请实施例对此不作限定。Optionally, the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
例如,在森林场景和郊区场景下,几乎每个图像都包含大量的树木,从图像中的树木提取的特征点对辨识该图像是森林场景还是郊区场景的辨识度较低,因此,树木可以为森林类或郊区类的视觉停用词。For example, in forest scenes and suburban scenes, almost every image contains a large number of trees. The feature points extracted from the trees in the image are less recognizable to identify whether the image is a forest scene or a suburban scene. Therefore, the trees can be Visual stop words for forest or suburban categories.
又例如,在雨天,图像中会留下雨水降落的痕迹,从图像中的雨水提取的特征点也会对该图像的单词表示造成污染,因此,雨水可以为雨天的视觉停用词。For another example, in rainy days, the image will leave traces of rain falling. The feature points extracted from the rainwater in the image will also cause pollution to the word representation of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
7、图像的目标视觉词7, the target visual words of the image
图像的目标视觉词包括从该图像的多个视觉词中除去该图像的图像类别对应的视觉停用词后的视觉词。The target visual word of the image includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
可选地,图像的目标视觉词可以包括一个或多个视觉词,本申请实施例对此不作限定。Optionally, the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
8、正样本图像集合8, positive sample image collection
正样本图像集合中包括人工标注的可以被认为是相似度高或相同的图像。The positive sample image set includes artificially labeled images that can be considered as high or similar.
例如,在不同场景拍摄同一物体的两个图像,例如,雨天的学校和雪天的学校。For example, shooting two images of the same object in different scenes, for example, a rainy school and a snowy school.
例如,在不同时刻拍摄同一场景的两个图像,例如回环检测中同一场景的当前位姿和历史位姿。For example, two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
本申请实施例可应用的场景包括即时定位与地图构建(simultaneous localization and  mapping,SLAM)中的回环检测(loop closure)、电子商务中的商品图像检索等。The applicable scenarios of the embodiments of the present application include instant localization and map construction (simultaneous localization and Loop closure in mapping, SLAM), product image retrieval in e-commerce, etc.
回环检测通过保存历史出现过的场景,利用当前图像进行检索识别回环,构建当前位姿和历史位姿的一个约束,通过优化减小整体误差,以得到全局一致的地图。The loop detection detects the scenes that have appeared in the history, uses the current image to retrieve and recognize the loop, constructs a constraint of the current pose and the historical pose, and reduces the overall error by optimization to obtain a globally consistent map.
电子商务中在不清楚商品名称时,用户提交商品的图像,系统根据该商品的图像进行检索,并返回相似度较高的图像作为检索结果。In the e-commerce, when the product name is not clear, the user submits an image of the product, the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
图1示出了本申请实施例提供的图像检索方法100的示意性流程图。该方法可以由图像检索装置执行。FIG. 1 is a schematic flowchart of an image retrieval method 100 provided by an embodiment of the present application. The method can be performed by an image retrieval device.
应理解,该图像检索装置可以为具有计算和存储功能的第一设备,该第一设备例如可以为计算机,或者该图像检索装置可以为第一设备中的功能模块,本申请实施例对此不作限定。It should be understood that the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
S110,获取待检索图像的多个视觉词和该待检索图像的图像类别信息,该待检索图像的多个视觉词是通过将该待检索图像的多个视觉特征描述子与视觉词袋模型中的视觉词进行匹配映射得到的,该待检索图像的图像类别信息用于指示该待检索图像的图像类别。S110. Acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by using a plurality of visual feature descriptors of the image to be retrieved and the visual word bag model The visual word is obtained by matching mapping, and the image category information of the image to be retrieved is used to indicate the image category of the image to be retrieved.
S120,根据该待检索图像的图像类别信息和停用词词库,确定该待检索图像的图像类别对应的视觉停用词,该待检索图像的图像类别对应的视觉停用词包括与该待检索图像的图像类别无关的视觉词,该停用词词库包括该待检索图像的图像类别和该待检索图像的图像类别对应的视觉停用词之间的映射关系。S120: Determine, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, where the visual stop word corresponding to the image category of the image to be retrieved includes Retrieving a visual word irrelevant for an image category of the image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved.
S130,从该待检索图像的多个视觉词中除去该待检索图像的图像类别对应的视觉停用词,得到该待检索图像的目标视觉词。S130. The visual stop words corresponding to the image categories of the image to be retrieved are removed from the plurality of visual words of the image to be retrieved, and the target visual words of the image to be retrieved are obtained.
S140,根据该待检索图像的目标视觉词和检索图像库,确定检索结果,该检索图像库中包括多个检索图像。S140. Determine a search result according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
可选地,S110中,该图像检索装置可以通过多种方式获取该待检索图像的多个视觉词,本申请实施例对此不作限定。Optionally, in S110, the image retrieval device may acquire a plurality of visual words of the image to be retrieved in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像检索装置可以获取待检索图像,提取该待检索图像的多个视觉特征描述子,该多个视觉特征描述子用于描述该待检索图像的多个视觉特征点,该多个视觉特征描述子与该多个视觉特征点一一对应,获取视觉词袋模型,将视觉词袋模型中与该多个视觉特征描述子中的每个视觉特征描述子的距离最近的多个视觉词,确定为该待检索图像的多个视觉词。As an optional embodiment, the image retrieval device may acquire an image to be retrieved, and extract a plurality of visual feature descriptors of the image to be retrieved, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the image to be retrieved. And the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is obtained, and the distance between the visual word bag model and each of the plurality of visual feature descriptors is the closest. A plurality of visual words are determined as a plurality of visual words of the image to be retrieved.
可选地,该视觉词袋模型可以为现有训练好的视觉词袋模型,或者可以为该图像检索装置通过对训练图集中的训练图像的视觉特征描述子进行聚类得到的,本申请实施例对此不作限定。Optionally, the visual word bag model may be an existing trained visual word bag model, or may be obtained by clustering the visual feature descriptors of the training images in the training picture set by the image retrieval device. This example does not limit this.
可选地,该图像检索装置可以通过多种方式获取该待检索图像,例如通过摄像头拍摄、本地磁盘读取、网络下载,或其他方式,本申请实施例对此不作限定。Optionally, the image retrieval device may obtain the image to be retrieved in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
可选地,该图像检索装置获取得到的该待检索图像可以为经过去畸变、去噪、或其他预处理操作后的图像,本申请实施例对此不作限定。Optionally, the image to be retrieved obtained by the image retrieving device may be an image after de-distortion, denoising, or other pre-processing operations, which is not limited in this embodiment of the present application.
可选地,S110中,该图像检索装置可以通过多种方式获取该待检索图像的图像类别信息,本申请实施例对此不作限定。Optionally, in S110, the image retrieving device may obtain the image category information of the image to be retrieved in a plurality of manners, which is not limited in this embodiment of the present application.
作为一个可选实施例,该图像检索装置可以根据该待检索图像和图像分类模型,确定该待检索图像的图像类别信息,该图像分类模型包括该待检索图像与该待检索图像的图像 类别的映射关系。As an optional embodiment, the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and an image classification model, where the image classification model includes the image to be retrieved and the image of the image to be retrieved. The mapping relationship of categories.
作为另一个可选实施例,该图像检索装置可以根据该待检索图像和预设的分类算法,确定该待检索图像的图像类别信息。As another optional embodiment, the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and a preset classification algorithm.
作为又一个可选实施例,该图像检索装置可以获取人工标注的该待检索图像的图像类别信息。As still another optional embodiment, the image retrieval device may acquire image category information of the image to be retrieved manually labeled.
可选地,该待检索图像的图像类别信息可以为一个或多个比特,即通过该1个或多个比特指示该待检索图像的图像类别,本申请实施例对此不作下限定。Optionally, the image category information of the image to be retrieved may be one or more bits, that is, the image type of the image to be retrieved is indicated by the one or more bits, which is not limited in this embodiment of the present application.
作为一个可选实施例,该待检索图像的图像类别信息可以为2个比特,例如,该2比特为“00”时指示该待检索图像为第一类图像,该2比特为“01”时指示该待检索图像为第二类图像,该2比特为“10”时指示该待检索图像为第三类图像,该2比特为“11”时指示该待检索图像为第四类图像。As an optional embodiment, the image category information of the image to be retrieved may be 2 bits. For example, when the 2 bits are “00”, the image to be retrieved is indicated as the first type of image, and when the 2 bits are “01”. The image to be retrieved is indicated as a second type of image. When the 2 bits are "10", the image to be retrieved is indicated as a third type of image, and when the 2 bits are "11", the image to be retrieved is indicated as a fourth type of image.
可选地,在S120之前,该图像检索装置可以获取该停用词词库。Optionally, the image retrieval device may acquire the stop word dictionary before S120.
可选地,该停用词词库可以包括多种图像类别中每种图像类别的标识和与该每种图像类别的标识对应的视觉停用词的映射关系。Optionally, the stop word dictionary may include a mapping of an identifier of each of the plurality of image categories and a visual stop word corresponding to the identifier of each of the image categories.
可选地,该图像检索装置可以自己生成该停用词词库,或者可以从图像库的生成装置获取该停用词词库,本申请实施例对此不限定。Optionally, the image retrieving device may generate the stop word vocabulary by itself, or may acquire the stop word vocabulary from the image library generating device, which is not limited by the embodiment of the present application.
可选地,该图像检索装置可以获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像;根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,该停用词词库包括该每个训练图像的图像类别和与该每个图像的图像类别对应的视觉停用词之间的映射关系,该每个训练图像的图像类别对应的视觉停用词包括与该每个训练图像的图像类别无关的视觉词。Optionally, the image retrieval device may acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image collection information, where the plurality of visual words of each training image are Obtaining a map by matching the plurality of visual feature descriptors of each training image with the visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; and a plurality of visual words according to the each training image And the image category information of each training image and the positive sample image set information, generating a stop word vocabulary, the stop word vocabulary including the image category of each training image and corresponding to the image category of each image a mapping relationship between visual stop words, the visual stop words corresponding to the image categories of each training image are included with each training map Class independent visual image of the word.
具体地,该图像检索装置根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成该停用词词库,可以为根据该每个训练图像对应的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。Specifically, the image retrieval device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, which may be according to each And determining, by the plurality of visual words corresponding to the training image, the image category information of the each training image, and the positive sample image set information, determining a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
作为一个可选实施例,该图像检索装置可以根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合。 As an optional embodiment, the image retrieval device may determine the first image category and the first vision according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between words, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library The plurality of visual words of the first training image and the plurality of visual words of the second training image of the training image library each include the first visual word, and the image category of the first training image is the first image category, the first The second event indicates that the first training image and the second training image belong to the same positive sample image set.
例如,假设该训练图像集中共有P个训练图像,M种图像类别,L个视觉词,该P个训练图像中第一类别的训练图像共有N个,则第一视觉词和该第一图像类别的训练图像的相关性可以通过公式(1)至公式(6)确定:For example, assuming that there are P training images, M image categories, and L visual words in the training image set, and there are N training images of the first category in the P training images, the first visual word and the first image category. The correlation of the training images can be determined by equations (1) to (6):
Figure PCTCN2017112956-appb-000007
Figure PCTCN2017112956-appb-000007
Figure PCTCN2017112956-appb-000008
Figure PCTCN2017112956-appb-000008
Figure PCTCN2017112956-appb-000009
Figure PCTCN2017112956-appb-000009
Figure PCTCN2017112956-appb-000010
Figure PCTCN2017112956-appb-000010
Figure PCTCN2017112956-appb-000011
Figure PCTCN2017112956-appb-000011
Figure PCTCN2017112956-appb-000012
Figure PCTCN2017112956-appb-000012
其中,x表示第一事件,该第一事件为该N个训练图像中的第一训练图像的多个视觉词和该P个训练图像中除该第一训练图像外的第二训练图像的多个视觉词均包括该L个视觉词中的第一视觉词,y表示第二事件,该第二事件为该第一训练图像和该第二训练图像属于同一个正样本图像集合,count(x)表示第一事件发生的次数,count(y)表示第二事件发生的次数,count(x,y)表示第一事件和第二事件同时发生的次数,p(x)表示第一事件发生的概率,p(y)表示第二事件发生的概率,p(x,y)表示第一事件与第二事件同时发生的概率,PMI(x,y)表示第一事件和第二事件的点互信息量,H(y)表示第二事件的信息熵,RATEPMI(x,y)表示第一事件和第二事件的点互信息率即该第一视觉词和该第一图像类别的相关性,其中,P、L、M、N均为大于1的正整数。Wherein, x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image Each of the visual words includes a first visual word of the L visual words, and y represents a second event, the second event is that the first training image and the second training image belong to the same positive sample image set, count(x ) indicates the number of times the first event occurred, count(y) indicates the number of times the second event occurred, count(x, y) indicates the number of simultaneous occurrences of the first event and the second event, and p(x) indicates the occurrence of the first event. Probability, p(y) represents the probability of occurrence of the second event, p(x, y) represents the probability that the first event and the second event occur simultaneously, and PMI(x, y) represents the mutual point of the first event and the second event The amount of information, H(y) represents the information entropy of the second event, and the RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category. Where P, L, M, and N are positive integers greater than one.
可选地,该图像检索装置可以通过多种方式获取正样本图像集合信息,本申请实施例对此不作限定。Optionally, the image retrieval device may obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像检索装置可以获取该每个训练图像中携带的一个或多个比特,根据该每个训练图像的一个或多个比特,获取该正样本图像集合信息。例如,若该多个训练图像中的第一训练图像和第二训练图像携带相同的比特,则确定该第一训练图像和该第二训练图像属于同一个正样本图像集合。As an optional embodiment, the image retrieval device may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
作为另一个可选实施例,该图像检索装置可以获取第一信息,该第一信息包括多个正样本图像集合中每个正样本图像集合的标识和该每个正样本图像集合包括的训练图像的标识之间的映射关系,该图像检索装置可以根据该第一信息,获取该正样本图像集合信息。As another optional embodiment, the image retrieval device may acquire first information including an identifier of each positive sample image set of the plurality of positive sample image sets and a training image included in each of the positive sample image sets The mapping relationship between the identifiers, the image retrieval device may acquire the positive sample image collection information according to the first information.
可选地,该图像检索装置可以通过多种方式从该训练图像库的多个视觉词中确定该每种图像类别对应的视觉停用词,本申请实施例对此不作限定。Optionally, the image retrieving device may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像检索装置可以将该训练图像库的多个视觉词中与每种图像类别的相关性最小的至少一个视觉词,作为该每种图像类别对应的视觉停用词。 As an optional embodiment, the image retrieval device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop word corresponding to each image category. .
作为另一个可选实施例,该图像检索装置可以将该训练图像库的多个视觉词中与该每种图像类别的相关性小于第一预设阈值的至少一个视觉词,作为该每种图像类别对应的视觉停用词。As another optional embodiment, the image retrieval device may use, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. The visual stop word for the category.
可选地,每种图像类别对应的视觉停用词可以为一个或多个视觉词,本申请实施例对此不作限定。Optionally, the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
可选地,在S140之前,该图像检索装置可以获取该检索图像库。Optionally, the image retrieval device may acquire the retrieval image library before S140.
可选地,该检索图像库中包括多个检索图像和该多个检索图像中每个检索图像对应的目标视觉词,该每个检索图像对应的目标视觉词是从该每个检索图像对应的多个视觉词中除去该每个检索图像的图像类别对应的视觉停用词后得到的。Optionally, the search image library includes a plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images. The plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
可选地,该图像检索装置可以自己生成该检索图像库,或者可以从图像库的生成装置获取该检索图像库,本申请实施例对此不作限定。Alternatively, the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
可选地,该检索图像库可以为根据该多个训练图像训练得到的,或者该检索图像库可以为根据该图像检索装置在本次检索之前检索的历史待检索图像训练得到的,或者可以为根据其他图像训练得到的,本申请实施例对此不作限定。Optionally, the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
作为一个可选实施例,该图像检索装置可以根据该多个训练图像中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该停用词词库,生成该检索图像库。也就是说,该检索图像库是根据该多个训练图像训练得到的。As an optional embodiment, the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
具体而言,该图像检索装置可以根据该每个训练图像的图像类别信息和该停用词词库,确定该每个训练图像的图像类别对应的视觉停用词,从该每个训练图像的多个视觉词中除去该每个训练图像的图像类别对应的视觉停用词,得到该每个训练图像的目标视觉词,并将该每个训练图像的目标视觉词添加至该检索图像库。Specifically, the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images. The visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
可选地,该图像检索装置还可以使用本申请实施例中的停用词词库获得该每个训练图像的目标视觉词,或者可以通过其他方式获得该每个训练图像的目标视觉词,本申请实施例对此不作限定。Optionally, the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods. The application embodiment does not limit this.
例如,电子商务中,该检索图像库包括用户提供的所有商品图像的目标视觉词,该检索图像为用户想要购买的商品图像。For example, in electronic commerce, the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
作为另一个可选实施例,该图像检索装置可以将在S140之前检索的历史待检索图像的目标视觉词添加至该检索图像库,以生成该检索图像库。As another alternative embodiment, the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
例如,回环检测中,检索图像库包括所有历史位姿图像,该待检索图像为当前位姿图像。For example, in loopback detection, the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
本申请实施例提供的图像检索方法,在回环检测场景中,能够通过保存历史出现过的场景,利用当前图像进行检索识别回环,构建当前位姿和历史位姿的一个约束,通过优化减小整体误差,以得到全局一致的地图。在电子商务场景中,在不清楚商品名称时,用户提交商品的图像,系统根据该商品的图像进行检索,并返回相似度较高的图像作为检索结果。In the loopback detection scenario, the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map. In an e-commerce scenario, when the product name is not known, the user submits an image of the product, the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
可选地,该图像检索装置可以根据多种方式从该检索图像库中选出与该待检索图像最相似的至少一个检索图像作为检索结果输出,本申请实施例对此不作限定。Optionally, the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像检索装置可以计算该待检索图像的多个视觉词与该检索图像库中的检索图像的多个视觉词之间的相似度,将与该待检索图像的相似度最高的至少 一个检索图像确定为检索结果。As an optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved At least the highest degree A search image is determined as the search result.
作为另一个可选实施例,该图像检索装置可以计算该待检索图像的多个视觉词与该检索图像库中的检索图像的多个视觉词之间的相似度,将相似度大于第二预设阈值的至少一个检索图像确定为检索结果。As another optional embodiment, the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
本申请实施例提供的图像检索方法,该待检索图像的目标视觉词是通过从该待检索图像的多个视觉词中除去该待检索图像的图像类别对应的视觉停用词后得到的,即从该待检索图像的多个视觉词中除去对于辨识该检索图像无显著作用、或影响图像识别的视觉词,也就是说,该待检索图像的目标视觉词对于识别该待检索图像的作用比较显著。因此,通过该待检索图像的目标视觉词与检索图像库进行检索,有利于提高图像检索的效率和精确度。The image retrieval method provided by the embodiment of the present application, the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, Removing a visual word that has no significant effect on the recognition of the search image or affects image recognition from a plurality of visual words of the image to be retrieved, that is, a comparison of the target visual words of the image to be retrieved for identifying the image to be retrieved Significant. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
此外,该检索图像库中存储的检索图像的目标视觉词是通过从检索图像的多个视觉词中除去了该检索图像的图像类别对应的视觉停用词后得到的,有利于减少该检索图像库内存的占用率。In addition, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image. The memory usage of the library.
图2是本申请实施例提供的图像库的生成方法200的示意性流程图,该方法200可以由图像库的生成装置执行,本申请实施例对此不作限定。FIG. 2 is a schematic flowchart of a method 200 for generating an image library according to an embodiment of the present disclosure. The method 200 may be performed by a device for generating an image library, which is not limited by the embodiment of the present application.
应理解,该图像库的生成装置可以为具有计算和存储功能的第二设备,该第二设备例如可以为计算机,或者该图像库生成装置可以为该第二设备中的功能模块,本申请实施例对此不作限定。It should be understood that the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
可选地,该第二设备与图1中所述的第一设备可以为相同的设备或不同的设备,本申请实施例对此不作限定。Optionally, the second device and the first device in FIG. 1 may be the same device or different devices, which is not limited in this embodiment of the present application.
可选地,该第二设备与该第二设备相同时,该图像库的生成装置与图1中所述的图像检索装置为同一个设备中的不同功能模块,或者该图像库的生成装置为该图像检索装置中的功能模块。Optionally, when the second device is the same as the second device, the image library generating device and the image searching device described in FIG. 1 are different functional modules in the same device, or the image library generating device is A functional module in the image retrieval device.
S210,获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像。S210. Acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, the positive sample image set information. And for indicating at least one positive sample image set, the positive sample image set comprising a plurality of similar training images in the training image library manually labeled.
S220,根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,该停用词词库包括该每个训练图像的图像类别和与该每个图像的图像类别对应的视觉停用词之间的映射关系,该每个训练图像的图像类别对应的视觉停用词包括与该每个训练图像的图像类别无关的视觉词。S220. Generate, according to the multiple visual words of each training image, the image category information of each training image, and the positive sample image set information, a stop word vocabulary, where the stop word vocabulary includes each training image a mapping relationship between the image categories and the visual stop words corresponding to the image categories of the each image, the visual stop words corresponding to the image categories of each of the training images are independent of the image categories of the each training image Visual word.
本申请实施例提供的数据库的生成方法,通过获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,并根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,该停用词词库用于获得待检索图像的目标视觉词,有利于提高图像检索的效率和精确度。The method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image. The efficiency and precision of the search.
可选地,S210中,该图像库的生成装置可以通过多种方式获取该训练图像的多个视 觉词,本申请实施例对此不作限定。Optionally, in S210, the generating device of the image library may acquire multiple views of the training image in multiple manners. The embodiment of the present application does not limit this.
作为一个可选实施例,该图像训练装置可以获取训练图像,提取该训练图像的多个视觉特征描述子,该多个视觉特征描述子用于描述该训练图像的多个视觉特征点,该多个视觉特征描述子与该多个视觉特征点一一对应,获取视觉词袋模型,将视觉词袋模型中与该多个视觉特征描述子中的每个视觉特征描述子的距离最近的多个视觉词,确定为该训练图像的多个视觉词。As an optional embodiment, the image training device may acquire a training image, and extract a plurality of visual feature descriptors of the training image, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the training image, The visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and obtain a visual word bag model, and the plurality of visual word bag models are closest to each of the plurality of visual feature descriptors. A visual word determined as a plurality of visual words of the training image.
可选地,该视觉词袋模型也可以为现有训练好的视觉词袋模型,或者可以为该图像库的生成装置自己通过对该多个图像对应的多个视觉特征描述子进行聚类得到的,本申请实施例对此不作限定。Optionally, the visual word bag model may also be an existing trained visual word bag model, or may be obtained by the image generating device itself by clustering a plurality of visual feature descriptors corresponding to the plurality of images. The embodiment of the present application does not limit this.
可选地,该图像库的生成装置可以通过多种方式获取训练图像,例如通过摄像头拍摄、本地磁盘读取、网络下载,或其他方式,本申请实施例对此不作限定。Optionally, the image generating device of the image library can obtain the training image in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
可选地,该图像库的生成装置获取得到的该多个图像可以为经过去畸变、去噪、或其他预处理操作后的图像,本申请实施例对此不作限定。Optionally, the multiple images obtained by the generating device of the image library may be images after de-distortion, de-noising, or other pre-processing operations, which are not limited in this embodiment of the present application.
可选地,S210中,该图像库的生成装置可以通过多种方式获取训练图像的图像类别信息,本申请实施例对此不作限定。Optionally, in S210, the generating device of the image library may obtain the image category information of the training image in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像库的生成装置可以根据该训练图像和图像分类模型,确定该训练图像的图像类别信息,该图像分类模型包括该训练图像与该训练图像的图像类别的映射关系。As an optional embodiment, the image library generating device may determine image category information of the training image according to the training image and the image classification model, where the image classification model includes a mapping relationship between the training image and an image category of the training image. .
作为另一个可选实施例,该图像库的生成装置可以根据该训练图像和预设的分类算法,确定该训练图像的图像类别信息。As another optional embodiment, the image library generating device may determine image category information of the training image according to the training image and a preset classification algorithm.
作为又一个可选实施例,该图像库的生成装置可以获取人工标注的该训练图像的图像类别信息。As still another optional embodiment, the image library generating device may acquire image category information of the training image manually labeled.
可选地,该训练图像的图像类别信息可以为一个或多个比特,即通过该1个或多个比特指示该训练图像的图像类别,本申请实施例对此不作下限定。Optionally, the image category information of the training image may be one or more bits, that is, the image type of the training image is indicated by the one or more bits, which is not limited in this embodiment of the present application.
作为一个可选实施例,该训练图像的图像类别信息可以为2个比特,例如,该2比特为“00”时指示该训练图像为第一类图像,该2比特为“01”时指示该训练图像为第二类图像,该2比特为“10”时指示该训练图像为第三类图像,该2比特为“11”时指示该训练图像为第四类图像。As an optional embodiment, the image category information of the training image may be 2 bits. For example, when the 2 bits are “00”, the training image is indicated as a first type of image, and when the 2 bits are “01”, the The training image is a second type of image. When the 2 bits are "10", the training image is indicated as a third type of image, and when the 2 bits are "11", the training image is indicated as a fourth type of image.
可选地,S220中,该图像库的生成装置根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成该停用词词库,可以为根据该每个训练图像对应的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。Optionally, in S220, the image library generating device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information. Determining, according to the plurality of visual words corresponding to each training image, the image category information of the each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and the plurality of training image libraries Correlation between visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library is generated to generate the stop word vocabulary.
作为一个可选实施例,该图像库的生成装置可以根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词 和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合。As an optional embodiment, the generating device of the image library may determine the first image category and the first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between a visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library Multiple visual words in the first training image And the plurality of visual words of the second training image in the training image library respectively include the first visual word, the image category of the first training image is the first image category, and the second event represents the first training image and The second training image belongs to the same positive sample image set.
例如,假设该训练图像集中共有P个训练图像,M种图像类别,L个视觉词,该P个训练图像中第一类别的训练图像共有N个,则第一视觉词和该第一图像类别的训练图像的相关性可以通过上述公式(1)至公式(6)确定。For example, assuming that there are P training images, M image categories, and L visual words in the training image set, and there are N training images of the first category in the P training images, the first visual word and the first image category. The correlation of the training images can be determined by the above formulas (1) to (6).
可选地,该图像库的生成装置可以通过多种方式获取正样本图像集合信息,本申请实施例对此不作限定。Optionally, the generating device of the image library can obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像库的生成装置可以获取该每个训练图像中携带的一个或多个比特,根据该每个训练图像的一个或多个比特,获取该正样本图像集合信息。例如,若该多个训练图像中的第一训练图像和第二训练图像携带相同的比特,则确定该第一训练图像和该第二训练图像属于同一个正样本图像集合。As an optional embodiment, the generating device of the image library may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
作为另一个可选实施例,该图像库的生成装置可以获取第一信息,该第一信息包括多个正样本图像集合中每个正样本图像集合的标识和该每个正样本图像集合包括的训练图像的标识之间的映射关系,该图像库的生成装置可以根据该第一信息,获取该正样本图像集合信息。As another optional embodiment, the generating device of the image library may acquire first information, where the first information includes an identifier of each positive sample image set of the plurality of positive sample image sets and the each positive sample image set includes And a mapping relationship between the identifiers of the training images, the generating device of the image library may acquire the positive sample image set information according to the first information.
可选地,该图像库的生成装置可以通过多种方式从该训练图像库的多个视觉词中确定该每种图像类别对应的视觉停用词,本申请实施例对此不作限定。Optionally, the generating device of the image library may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
作为一个可选实施例,该图像库的生成装置可以将该训练图像库的多个视觉词中与每种图像类别的相关性最小的至少一个视觉词,作为该每种图像类别对应的视觉停用词。As an optional embodiment, the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop corresponding to each image category. Use words.
作为另一个可选实施例,该图像库的生成装置可以将该训练图像库的多个视觉词中与该每种图像类别的相关性小于第一预设阈值的至少一个视觉词,作为该每种图像类别对应的视觉停用词。As another optional embodiment, the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. Visual stop words corresponding to image categories.
可选地,每种图像类别对应的视觉停用词可以为一个或多个视觉词,本申请实施例对此不作限定。Optionally, the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
可选地,在S220之后,该图像库的生成装置可以根据该每个训练图像的图像类别信息和该停用词词库,确定该每个训练图像的图像类别对应的视觉停用词,从该每个训练图像的多个视觉词中除去该每个训练图像的图像类别对应的视觉停用词,得到该每个训练图像的目标视觉词,并将该每个训练图像的目标视觉词添加至该检索图像库。Optionally, after S220, the generating device of the image library may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from Removing a visual stop word corresponding to the image category of each training image from the plurality of visual words of each training image, obtaining a target visual word of each training image, and adding a target visual word of each training image To the search image library.
本申请实施例提供的图像检索方法,该检索图像库中存储的检索图像的目标视觉词是通过从检索图像的多个视觉词中除去了该检索图像的图像类别对应的视觉停用词后得到的,有利于减少该检索图像库内存的占用率。In the image retrieval method provided by the embodiment of the present application, the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
上面结合图1和图2详细描述了本申请实施例提供的图像检索方法和图像处理方法,下面将结合图3至图6描述本申请实施例提供的图像检索装置和图像处理装置。The image retrieval method and the image processing method provided by the embodiments of the present application are described in detail below with reference to FIG. 1 and FIG. 2 . The image retrieval apparatus and the image processing apparatus provided by the embodiments of the present application will be described below with reference to FIG. 3 to FIG.
图3是本申请实施例提供的图像检索装置300的示意性框图。该装置300包括:FIG. 3 is a schematic block diagram of an image retrieval apparatus 300 provided by an embodiment of the present application. The device 300 includes:
获取单元310,用于获取待检索图像的多个视觉词和该待检索图像的图像类别信息,该待检索图像的多个视觉词是通过将该待检索图像的多个视觉特征描述子与视觉词袋模型中的视觉词进行匹配映射得到的,该待检索图像的图像类别信息用于指示该待检索图像的图像类别; The acquiring unit 310 is configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual features of the image to be retrieved The visual word in the word bag model is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
处理单元320,用于根据该获取单元310获取的该待检索图像的图像类别信息和停用词词库,确定该待检索图像的图像类别对应的视觉停用词,该待检索图像的图像类别对应的视觉停用词包括与该待检索图像的图像类别无关的视觉词,该停用词词库包括该待检索图像的图像类别和该待检索图像的图像类别对应的视觉停用词之间的映射关系;从该获取单元310获取的该待检索图像的多个视觉词中除去该待检索图像的图像类别对应的视觉停用词,得到该待检索图像的目标视觉词;The processing unit 320 is configured to determine, according to the image category information of the image to be retrieved acquired by the acquiring unit 310 and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and an image category of the image to be retrieved The corresponding visual stop word includes a visual word irrelevant to the image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved a mapping relationship corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit 310, to obtain a target visual word of the image to be retrieved;
检索单元330,用于根据该处理单元320得到的该待检索图像的目标视觉词和检索图像库,确定检索结果,该检索图像库中包括多个检索图像。The searching unit 330 is configured to determine a search result according to the target visual word and the search image library of the image to be retrieved obtained by the processing unit 320, where the search image library includes a plurality of search images.
可选地,该检索图像库中包括该多个检索图像和该多个检索图像中每个检索图像对应的目标视觉词的映射关系,该每个检索图像对应的目标视觉词是从该每个检索图像对应的多个视觉词中除去该每个检索图像的图像类别对应的视觉停用词后得到的。Optionally, the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are from each of the And obtaining a visual stop word corresponding to the image category of each of the search images among the plurality of visual words corresponding to the image.
可选地,该装置还包括生成单元,该获取单元还用于在该根据该待检索图像的图像类别信息和停用词词库,确定该待检索图像的图像类别对应的视觉停用词之前,获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像;该生成单元用于根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,生成该停用词词库。Optionally, the device further includes a generating unit, where the acquiring unit is further configured to: before determining the visual stop word corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word dictionary Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped to the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used. And indicating at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; the generating unit is configured to use, according to the plurality of visual words of each training image, each The image category information of the training image and the positive sample image collection information generate the stop word vocabulary.
可选地,该生成单元具体用于:根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。Optionally, the generating unit is specifically configured to: determine, according to the multiple visual words of each training image, image category information of each training image, and the positive sample image set information, multiple image categories of the training image library. Correlation with a plurality of visual words of the training image library, the plurality of image categories of the training image library including image categories of the each training image, the plurality of visual words of the training image library including the each training image a plurality of visual words; generating the stop word lexicon according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
可选地,该生成单元具体用于:根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合。Optionally, the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set.
可选地,该检索单元具体用于:确定该待检索图像的目标视觉词与该检索图像库中的检索图像的目标视觉词之间的相似度;将与该待检索图像的目标视觉词的相似度大于第一预设值的至少一个检索图像,确定为该检索结果。Optionally, the searching unit is specifically configured to: determine a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library; and the target visual word of the image to be retrieved The at least one search image whose similarity is greater than the first preset value is determined as the search result.
应理解,这里的图像检索装置300以功能单元的形式体现。这里的术语“单元”可以指应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领 域技术人员可以理解,图像检索装置300可以具体为上述方法100和方法200实施例中的图像检索装置,图像检索装置300可以用于执行上述方法100和方法200实施例中与图像检索装置对应的各个流程和/或步骤,为避免重复,在此不再赘述。It should be understood that the image retrieval device 300 herein is embodied in the form of a functional unit. The term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (eg, a shared processor, a proprietary processor, or a group) for executing one or more software or firmware programs. Processors, etc.) and memory, merge logic, and/or other suitable components that support the described functionality. In an alternative example, the skill It will be understood by those skilled in the art that the image retrieval device 300 can be specifically the image retrieval device in the foregoing method 100 and the method 200. The image retrieval device 300 can be used to execute the image retrieval device corresponding to the image retrieval device in the method 100 and the method 200 described above. The various processes and/or steps are not repeated here to avoid repetition.
图4示出了本申请实施例提供的图像库的生成装置400的示意性框图,该装置400包括:FIG. 4 is a schematic block diagram of an image library generating apparatus 400 provided by an embodiment of the present application. The apparatus 400 includes:
获取单元410,用于获取训练图像库中每个训练图像的多个视觉词、该每个训练图像的图像类别信息和正样本图像集合信息,该每个训练图像的多个视觉词是通过将该每个训练图像的多个视觉特征描述子与该视觉词袋模型中的视觉词进行匹配映射得到的,该每个训练图像的图像类别信息用于指示该每个训练图像的图像类别,该正样本图像集合信息用于指示至少一个正样本图像集合,该正样本图像集合包括人工标注的该训练图像库中的多个相似的训练图像;The acquiring unit 410 is configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are A plurality of visual feature descriptors of each training image are matched and mapped with visual words in the visual word bag model, and image category information of each training image is used to indicate an image category of each training image, the positive The sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;
生成单元420,用于根据该获取单元410获取的该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库,该停用词词库包括该每个训练图像的图像类别和与该每个训练图像的图像类别对应的视觉停用词之间的映射关系,该每个训练图像的图像类别对应的视觉停用词包括与该每个训练图像的图像类别无关的视觉词。The generating unit 420 is configured to generate a stop word vocabulary according to the plurality of visual words of the each training image acquired by the acquiring unit 410, the image category information of the each training image, and the positive sample image set information, where the stoppage is generated. The word dictionary includes a mapping relationship between the image category of each training image and a visual stop word corresponding to the image category of each training image, and the visual stop words corresponding to the image category of each training image include A visual word that is independent of the image category of each training image.
可选地,该生成单元具体用于:根据该每个训练图像对应的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,确定该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,该训练图像库的多种图像类别包括该每个训练图像的图像类别,该训练图像库的多个视觉词包括该每个训练图像的多个视觉词;根据该训练图像库的多种图像类别与该训练图像库的多个视觉词之间的相关性,生成该停用词词库。Optionally, the generating unit is configured to: determine, according to the multiple visual words corresponding to each training image, the image category information of each training image, and the positive sample image set information, multiple images of the training image library. a correlation between a category and a plurality of visual words of the training image library, the plurality of image categories of the training image library including an image category of the each training image, the plurality of visual words of the training image library including the each training a plurality of visual words of the image; generating the stop word lexicon according to a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library.
可选地,该生成单元具体用于:根据第一事件发生的概率、第二事件发生的概率和该第一事件与该第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,该训练图像库的多种图像类别包括该第一图像类别,该训练图像库的多个视觉词包括该第一视觉词,该第一事件表示该训练图像库中的第一训练图像的多个视觉词和该训练图像库中的第二训练图像的多个视觉词均包括该第一视觉词,该第一训练图像的图像类别为该第一图像类别,该第二事件表示该第一训练图像和该第二训练图像属于相同的正样本图像集合Optionally, the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set
可选地,该生成单元还用于在该根据该每个训练图像的多个视觉词、该每个训练图像的图像类别信息和该正样本图像集合信息,生成停用词词库之后,根据该每个训练图像的图像类别信息和该停用词词库,确定该每个训练图像的图像类别对应的视觉停用词,从该每个训练图像的多个视觉词中除去该每个训练图像的图像类别对应的视觉停用词,得到该每个训练图像的目标视觉词,并将该每个训练图像的目标视觉词添加至该检索图像库。Optionally, the generating unit is further configured to: after generating the stop word lexicon according to the plurality of visual words according to the each training image, the image category information of the each training image, and the positive sample image set information, according to Determining, by the image category information of each training image, the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing each training from the plurality of visual words of each training image A visual stop word corresponding to the image category of the image, a target visual word of each training image is obtained, and the target visual word of each training image is added to the search image library.
可选地,该获取单元具体用于获取该每个训练图像,提取该每个训练图像的多个视觉特征描述子,该多个视觉特征描述子用于描述该每个训练图像的多个视觉特征点,该多个视觉特征描述子与该多个视觉特征点一一对应,获取视觉词袋模型,将视觉词袋模型中与该多个视觉特征描述子中的每个视觉特征描述子的距离最近的多个视觉词,确定为该每个训练图像的多个视觉词。Optionally, the acquiring unit is configured to acquire the each training image, and extract a plurality of visual feature descriptors of each training image, where the multiple visual feature descriptors are used to describe multiple visions of each training image. a feature point, the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and a visual word bag model is obtained, and each visual feature descriptor in the visual word bag model and the plurality of visual feature descriptors is obtained The plurality of closest visual words are determined as a plurality of visual words for each of the training images.
应理解,这里的图像库的生成装置400以功能单元的形式体现。这里的术语“单元” 可以指ASIC、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,图像库的生成装置400可以具体为上述方法100和方法100实施例中的图像库的生成装置,图像库的生成装置400可以用于执行上述方法100和方法200实施例中与图像库的生成装置对应的各个流程和/或步骤,为避免重复,在此不再赘述。It should be understood that the image library generating apparatus 400 herein is embodied in the form of a functional unit. The term "unit" here May be referred to as an ASIC, an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a proprietary processor, or a group processor, etc.) and memory, merge logic, and/or other support described. The right component for the function. In an optional example, those skilled in the art may understand that the image library generating apparatus 400 may be specifically the image library generating apparatus in the foregoing method 100 and the method 100 embodiment, and the image library generating apparatus 400 may be configured to execute the above. The various processes and/or steps corresponding to the image library generating device in the method 100 and the method 200 are not repeated here to avoid repetition.
图5示出了本申请实施例提供的图像检索装置500的示意性框图,该图像检索装置500可以为图1和图2中所述的图像检索装置,该图像检索装置可以采用如图5所示的硬件架构。该图像检索装置可以包括处理器510、通信接口520和存储器530,该处理器510、通信接口520和存储器530通过内部连接通路互相通信。图3中的处理单元320和检索单元330所实现的相关功能可以由处理器510来实现,获取单元310所实现的相关功能可以由处理器510控制通信接口520来实现。FIG. 5 is a schematic block diagram of an image retrieval device 500 provided by an embodiment of the present application. The image retrieval device 500 may be the image retrieval device described in FIG. 1 and FIG. 2, and the image retrieval device may adopt the image retrieval device as shown in FIG. The hardware architecture shown. The image retrieval device can include a processor 510, a communication interface 520, and a memory 530 that communicate with one another via internal connection paths. The related functions implemented by the processing unit 320 and the retrieval unit 330 in FIG. 3 may be implemented by the processor 510, and the related functions implemented by the acquisition unit 310 may be implemented by the processor 510 controlling the communication interface 520.
该处理器510可以包括是一个或多个处理器,例如包括一个或多个中央处理单元(central processing unit,CPU),在处理器是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The processor 510 may include one or more processors, for example, including one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single core CPU, and It can be a multi-core CPU.
该通信接口520用于发送和/或接收数据。该通信接口可以包括发送接口和接收接口,发送接口用于发送数据,接收接口用于接收数据。The communication interface 520 is for transmitting and/or receiving data. The communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
该存储器530包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程存储器(erasable programmable read only memory,EPROM)、只读光盘(compact disc read-only memory,CD-ROM),该存储器530用于存储相关指令及数据。The memory 530 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory. A compact disc read-only memory (CD-ROM) for storing related instructions and data.
存储器530用于存储图像检索装置的程序代码和数据,可以为单独的器件或集成在处理器510中。The memory 530 is used to store program code and data of the image retrieval device, and may be a separate device or integrated in the processor 510.
具体地,所述处理器510用于控制通信接口与其它装置,例如图像库的生成装置进行数据传输。具体可参见方法实施例中的描述,在此不再赘述。Specifically, the processor 510 is configured to control the communication interface to perform data transmission with other devices, such as a generating device of the image library. For details, refer to the description in the method embodiment, and details are not described herein again.
可以理解的是,图5仅仅示出了图像检索装置的简化设计。在实际应用中,图像检索装置还可以分别包含必要的其他元件,包含但不限于任意数量的通信接口、处理器、控制器、存储器等,而所有可以实现本申请的图像检索装置都在本申请的保护范围之内。It will be appreciated that Figure 5 only shows a simplified design of the image retrieval device. In an actual application, the image retrieval device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all image retrieval devices that can implement the present application are in the present application. Within the scope of protection.
在一种可能的设计中,图像检索装置500可以被替换为芯片装置,例如可以为可用于图像检索装置中的芯片,用于实现图像检索装置中处理器510的相关功能。该芯片装置可以为实现相关功能的现场可编程门阵列,专用集成芯片,系统芯片,中央处理器,网络处理器,数字信号处理电路,微控制器,还可以采用可编程控制器或其他集成芯片。该芯片中,可选的可以包括一个或多个存储器,用于存储程序代码,当所述代码被执行时,使得处理器实现相应的功能。In one possible design, image retrieval device 500 can be replaced with a chip device, such as a chip that can be used in an image retrieval device for implementing related functions of processor 510 in an image retrieval device. The chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip. . Optionally, the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
图6示出了本申请实施例提供的图像库的生成装置600的示意性框图,该图像库的生成装置600可以为图1和图2中所述的图像库的生成装置,该图像库的生成装置可以采用如图6所示的硬件架构。该图像库的生成装置可以包括处理器610、通信接口620和存储器630,该处理器610、通信接口620和存储器630通过内部连接通路互相通信。图4中的生成单元420所实现的相关功能可以由处理器610来实现,获取单元410所实现的相关 功能可以由处理器610控制通信接口620来实现。FIG. 6 is a schematic block diagram of an image library generating apparatus 600 provided by an embodiment of the present application. The image library generating apparatus 600 may be the image library generating apparatus described in FIG. 1 and FIG. 2, and the image library is The generating device can adopt a hardware architecture as shown in FIG. 6. The image library generating means may include a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 communicate with each other through an internal connection path. The related functions implemented by the generating unit 420 in FIG. 4 may be implemented by the processor 610, and the correlation implemented by the obtaining unit 410 The functionality may be implemented by the processor 610 controlling the communication interface 620.
该处理器610可以包括是一个或多个处理器,例如包括一个或多个中央处理单元(central processing unit,CPU),在处理器是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The processor 610 may include one or more processors, for example, including one or more central processing units (CPUs). In the case where the processor is a CPU, the CPU may be a single core CPU, It can be a multi-core CPU.
该通信接口620用于发送和/或接收数据。该通信接口可以包括发送接口和接收接口,发送接口用于发送数据,接收接口用于接收数据。The communication interface 620 is for transmitting and/or receiving data. The communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
该存储器630包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程存储器(erasable programmable read only memory,EPROM)、只读光盘(compact disc read-only memory,CD-ROM),该存储器630用于存储相关指令及数据。The memory 630 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory. A compact disc read-only memory (CD-ROM) for storing related instructions and data.
存储器630用于存储图像库的生成装置的程序代码和数据,可以为单独的器件或集成在处理器610中。The memory 630 is used to store the program code and data of the generating means of the image library, and may be a separate device or integrated in the processor 610.
具体地,所述处理器610用于控制通信接口与其它装置,例如图像检索装置进行数据传输。具体可参见方法实施例中的描述,在此不再赘述。Specifically, the processor 610 is configured to control the communication interface to perform data transmission with other devices, such as an image retrieval device. For details, refer to the description in the method embodiment, and details are not described herein again.
可以理解的是,图6仅仅示出了图像库的生成装置的简化设计。在实际应用中,图像库的生成装置还可以分别包含必要的其他元件,包含但不限于任意数量的通信接口、处理器、控制器、存储器等,而所有可以实现本申请的图像库的生成装置都在本申请的保护范围之内。It will be appreciated that Figure 6 only shows a simplified design of the image library generation device. In practical applications, the image library generating device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all generating devices that can implement the image library of the present application. All are within the scope of this application.
在一种可能的设计中,图像库的生成装置600可以被替换为芯片装置,例如可以为可用于图像库的生成装置中的芯片,用于实现图像库的生成装置中处理器610的相关功能。该芯片装置可以为实现相关功能的现场可编程门阵列,专用集成芯片,系统芯片,中央处理器,网络处理器,数字信号处理电路,微控制器,还可以采用可编程控制器或其他集成芯片。该芯片中,可选的可以包括一个或多个存储器,用于存储程序代码,当所述代码被执行时,使得处理器实现相应的功能。In one possible design, the image library generating device 600 may be replaced with a chip device, for example, a chip that can be used in a generating device of an image library for implementing related functions of the processor 610 in the image generating device. . The chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip. . Optionally, the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple networks. On the unit. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。 The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims (22)

  1. 一种图像检索方法,其特征在于,包括:An image retrieval method, comprising:
    获取待检索图像的多个视觉词和所述待检索图像的图像类别信息,所述待检索图像的多个视觉词是通过将所述待检索图像的多个视觉特征描述子与视觉词袋模型中的视觉词进行匹配映射得到的,所述待检索图像的图像类别信息用于指示所述待检索图像的图像类别;Obtaining a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, wherein the plurality of visual words of the image to be retrieved are by using a plurality of visual feature descriptors of the image to be retrieved and a visual word bag model The visual category in the image is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
    根据所述待检索图像的图像类别信息和停用词词库,确定所述待检索图像的图像类别对应的视觉停用词,所述待检索图像的图像类别对应的视觉停用词包括与所述待检索图像的图像类别无关的视觉词,所述停用词词库包括所述待检索图像的图像类别和所述待检索图像的图像类别对应的视觉停用词之间的映射关系;Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and the visual stop word corresponding to the image category of the image to be retrieved includes a visual word irrelevant to an image category of the retrieved image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to an image category of the image to be retrieved;
    从所述待检索图像的多个视觉词中除去所述待检索图像的图像类别对应的视觉停用词,得到所述待检索图像的目标视觉词;Removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, to obtain a target visual word of the image to be retrieved;
    根据所述待检索图像的目标视觉词和检索图像库,确定检索结果,所述检索图像库中包括多个检索图像。Determining a search result according to the target visual word and the search image library of the image to be retrieved, wherein the search image library includes a plurality of search images.
  2. 根据权利要求1所述的方法,其特征在于,所述检索图像库中包括所述多个检索图像和所述多个检索图像中每个检索图像对应的目标视觉词的映射关系,所述每个检索图像对应的目标视觉词是从所述每个检索图像对应的多个视觉词中除去所述每个检索图像的图像类别对应的视觉停用词后得到的。The method according to claim 1, wherein the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, each of the The target visual words corresponding to the search images are obtained by removing the visual stop words corresponding to the image categories of the respective search images from the plurality of visual words corresponding to each of the search images.
  3. 根据权利要求1或2所述的方法,其特征在于,在所述根据所述待检索图像的图像类别信息和停用词词库,确定所述待检索图像的图像类别对应的视觉停用词之前,所述方法还包括:The method according to claim 1 or 2, wherein the visual stop word corresponding to the image category of the image to be retrieved is determined according to the image category information of the image to be retrieved and the stop word dictionary Previously, the method further includes:
    获取训练图像库中每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,所述每个训练图像的多个视觉词是通过将所述每个训练图像的多个视觉特征描述子与所述视觉词袋模型中的视觉词进行匹配映射得到的,所述每个训练图像的图像类别信息用于指示所述每个训练图像的图像类别,所述正样本图像集合信息用于指示至少一个正样本图像集合,所述正样本图像集合包括人工标注的所述训练图像库中的多个相似的训练图像;Obtaining a plurality of visual words of each training image in the training image library, image category information of each of the training images, and positive sample image set information, wherein the plurality of visual words of each training image are Obtaining a mapping between a plurality of visual feature descriptors of the image and visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each of the training images, The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;
    根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,生成所述停用词词库。The stop word vocabulary is generated according to the plurality of visual words of each training image, image category information of each training image, and positive sample image set information.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,生成所述停用词词库,包括:The method according to claim 3, wherein said generating said stop word based on said plurality of visual words of said each training image, image category information of said each training image, and positive sample image set information Thesaurus, including:
    根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,确定所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,所述训练图像库的多种图像类别包括所述每个训练图像的图像类别,所述训练图像库的多个视觉词包括所述每个训练图像的多个视觉词;Determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, a plurality of image categories of the training image library and the training image library Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of each of the training images Visual words;
    根据所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性, 生成所述停用词词库。Corresponding to a plurality of visual categories of the training image library and a plurality of visual words of the training image library, The stop word dictionary is generated.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,确定所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,包括:The method according to claim 4, wherein said determining said said plurality of visual words of said each training image, image category information of said each training image, and said positive sample image set information Correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library, including:
    根据第一事件发生的概率、第二事件发生的概率和所述第一事件与所述第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,所述训练图像库的多种图像类别包括所述第一图像类别,所述训练图像库的多个视觉词包括所述第一视觉词,所述第一事件表示所述训练图像库中的第一训练图像的多个视觉词和所述训练图像库中的第二训练图像的多个视觉词均包括所述第一视觉词,所述第一训练图像的图像类别为所述第一图像类别,所述第二事件表示所述第一训练图像和所述第二训练图像属于相同的正样本图像集合。Determining a correlation between the first image category and the first visual word according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, the training The plurality of image categories of the image library include the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing a first training image in the training image library The plurality of visual words and the plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, The second event indicates that the first training image and the second training image belong to the same positive sample image set.
  6. 根据权利要求2至5中任一项所述的方法,其特征在于,所述根据所述待检索图像的目标视觉词和检索图像库,确定检索结果,包括:The method according to any one of claims 2 to 5, wherein the determining the retrieval result according to the target visual word and the retrieval image library of the image to be retrieved comprises:
    确定所述待检索图像的目标视觉词与所述检索图像库中的检索图像的目标视觉词之间的相似度;Determining a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library;
    将与所述待检索图像的目标视觉词的相似度大于第一预设值的至少一个检索图像,确定为所述检索结果。Determining, as the search result, at least one search image having a similarity with the target visual word of the image to be retrieved that is greater than the first preset value.
  7. 一种图像库的生成方法,其特征在于,包括:A method for generating an image library, comprising:
    获取训练图像库中每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,所述每个训练图像的多个视觉词是通过将所述每个训练图像的多个视觉特征描述子与所述视觉词袋模型中的视觉词进行匹配映射得到的,所述每个训练图像的图像类别信息用于指示所述每个训练图像的图像类别,所述正样本图像集合信息用于指示至少一个正样本图像集合,所述正样本图像集合包括人工标注的所述训练图像库中的多个相似的训练图像;Obtaining a plurality of visual words of each training image in the training image library, image category information of each of the training images, and positive sample image set information, wherein the plurality of visual words of each training image are Obtaining a mapping between a plurality of visual feature descriptors of the image and visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each of the training images, The positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;
    根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,生成停用词词库,所述停用词词库包括所述每个训练图像的图像类别和与所述每个训练图像的图像类别对应的视觉停用词之间的映射关系,所述每个训练图像的图像类别对应的视觉停用词包括与所述每个训练图像的图像类别无关的视觉词。Generating a stop word vocabulary according to the plurality of visual words of each training image, the image category information of each of the training images, and the positive sample image set information, the stop word vocabulary including the each a mapping relationship between image categories of the training images and visual stop words corresponding to the image categories of each of the training images, the visual stop words corresponding to the image categories of each of the training images including each A visual word that is irrelevant to the image category of the training image.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,生成停用词词库,包括:The method according to claim 7, wherein said generating is deactivated based on a plurality of visual words of said each training image, image category information of said each training image, and said positive sample image set information Thesaurus includes:
    根据所述每个训练图像对应的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,确定所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,所述训练图像库的多种图像类别包括所述每个训练图像的图像类别,所述训练图像库的多个视觉词包括所述每个训练图像的多个视觉词;Determining a plurality of image categories of the training image library and the training image library according to the plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including the each of the training images Multiple visual words;
    根据所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,生成所述停用词词库。The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
  9. 一种图像检索装置,其特征在于,包括:An image retrieval device, comprising:
    获取单元,用于获取待检索图像的多个视觉词和所述待检索图像的图像类别信息,所 述待检索图像的多个视觉词是通过将所述待检索图像的多个视觉特征描述子与视觉词袋模型中的视觉词进行匹配映射得到的,所述待检索图像的图像类别信息用于指示所述待检索图像的图像类别;An acquiring unit, configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, The plurality of visual words that are to be retrieved are obtained by mapping the plurality of visual feature descriptors of the image to be retrieved with the visual words in the visual word bag model, and the image category information of the image to be retrieved is used for Indicating an image category of the image to be retrieved;
    处理单元,用于根据所述获取单元获取的所述待检索图像的图像类别信息和停用词词库,确定所述待检索图像的图像类别对应的视觉停用词,所述待检索图像的图像类别对应的视觉停用词包括与所述待检索图像的图像类别无关的视觉词,所述停用词词库包括所述待检索图像的图像类别和所述待检索图像的图像类别对应的视觉停用词之间的映射关系;从所述获取单元获取的所述待检索图像的多个视觉词中除去所述待检索图像的图像类别对应的视觉停用词,得到所述待检索图像的目标视觉词;a processing unit, configured to determine a visual stop word corresponding to an image category of the image to be retrieved according to image category information of the image to be retrieved acquired by the acquiring unit, and a stop word dictionary, where the image to be retrieved is The visual stop word corresponding to the image category includes a visual word irrelevant to an image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and an image category corresponding to the image to be retrieved a mapping relationship between visual stop words; removing a visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit, to obtain the image to be retrieved Target visual word;
    检索单元,用于根据所述处理单元得到的所述待检索图像的目标视觉词和检索图像库,确定检索结果,所述检索图像库中包括多个检索图像。a retrieval unit, configured to determine a retrieval result according to the target visual word and the retrieval image library of the image to be retrieved obtained by the processing unit, where the retrieval image library includes a plurality of retrieval images.
  10. 根据权利要求9所述的装置,其特征在于,所述检索图像库中包括所述多个检索图像和所述多个检索图像中每个检索图像对应的目标视觉词的映射关系,所述每个检索图像对应的目标视觉词是从所述每个检索图像对应的多个视觉词中除去所述每个检索图像的图像类别对应的视觉停用词后得到的。The apparatus according to claim 9, wherein said search image library includes a mapping relationship between said plurality of search images and target visual words corresponding to each of said plurality of search images, said each The target visual words corresponding to the search images are obtained by removing the visual stop words corresponding to the image categories of the respective search images from the plurality of visual words corresponding to each of the search images.
  11. 根据权利要求9或10所述的装置,其特征在于,所述装置还包括生成单元,Device according to claim 9 or 10, characterized in that the device further comprises a generating unit,
    所述获取单元还用于在所述根据所述待检索图像的图像类别信息和停用词词库,确定所述待检索图像的图像类别对应的视觉停用词之前,获取训练图像库中每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,所述每个训练图像的多个视觉词是通过将所述每个训练图像的多个视觉特征描述子与所述视觉词袋模型中的视觉词进行匹配映射得到的,所述每个训练图像的图像类别信息用于指示所述每个训练图像的图像类别,所述正样本图像集合信息用于指示至少一个正样本图像集合,所述正样本图像集合包括人工标注的所述训练图像库中的多个相似的训练图像;The obtaining unit is further configured to: before the visual stoppage corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word vocabulary, obtain each of the training image libraries a plurality of visual words of the training image, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by a plurality of visual features of each of the training images Obtaining a mapping between the descriptor and the visual word in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image, and the positive sample image collection information is used. And indicating at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;
    所述生成单元用于根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,生成所述停用词词库。The generating unit is configured to generate the stop word vocabulary according to the plurality of visual words of each training image, image category information of each training image, and positive sample image set information.
  12. 根据权利要求11所述的装置,其特征在于,所述生成单元具体用于:The device according to claim 11, wherein the generating unit is specifically configured to:
    根据所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,确定所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,所述训练图像库的多种图像类别包括所述每个训练图像的图像类别,所述训练图像库的多个视觉词包括所述每个训练图像的多个视觉词;Determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, a plurality of image categories of the training image library and the training image library Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of each of the training images Visual words;
    根据所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,生成所述停用词词库。The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
  13. 根据权利要求12所述的装置,其特征在于,所述生成单元具体用于:The device according to claim 12, wherein the generating unit is specifically configured to:
    根据第一事件发生的概率、第二事件发生的概率和所述第一事件与所述第二事件同时发生的概率,确定第一图像类别与第一视觉词之间的相关性,所述训练图像库的多种图像类别包括所述第一图像类别,所述训练图像库的多个视觉词包括所述第一视觉词,所述第一事件表示所述训练图像库中的第一训练图像的多个视觉词和所述训练图像库中的第二训练图像的多个视觉词均包括所述第一视觉词,所述第一训练图像的图像类别为所述第一图像类别,所述第二事件表示所述第一训练图像和所述第二训练图像属于相同的正样本图 像集合。Determining a correlation between the first image category and the first visual word according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, the training The plurality of image categories of the image library include the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing a first training image in the training image library The plurality of visual words and the plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, The second event indicates that the first training image and the second training image belong to the same positive sample image Like a collection.
  14. 根据权利要求10至13中任一项所述的装置,其特征在于,所述检索单元具体用于:The device according to any one of claims 10 to 13, wherein the retrieval unit is specifically configured to:
    确定所述待检索图像的目标视觉词与所述检索图像库中的检索图像的目标视觉词之间的相似度;Determining a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library;
    将与所述待检索图像的目标视觉词的相似度大于第一预设值的至少一个检索图像,确定为所述检索结果。Determining, as the search result, at least one search image having a similarity with the target visual word of the image to be retrieved that is greater than the first preset value.
  15. 一种图像库的生成装置,其特征在于,包括:An apparatus for generating an image library, comprising:
    获取单元,用于获取训练图像库中每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和正样本图像集合信息,所述每个训练图像的多个视觉词是通过将所述每个训练图像的多个视觉特征描述子与所述视觉词袋模型中的视觉词进行匹配映射得到的,所述每个训练图像的图像类别信息用于指示所述每个训练图像的图像类别,所述正样本图像集合信息用于指示至少一个正样本图像集合,所述正样本图像集合包括人工标注的所述训练图像库中的多个相似的训练图像;An acquiring unit, configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where multiple visual words of each training image are passed And the plurality of visual feature descriptors of each training image are matched and mapped with the visual words in the visual word bag model, and the image category information of each training image is used to indicate the An image category, the positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the manually labeled image of the training image;
    生成单元,用于根据所述获取单元获取的所述每个训练图像的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,生成停用词词库,所述停用词词库包括所述每个训练图像的图像类别和与所述每个训练图像的图像类别对应的视觉停用词之间的映射关系,所述每个训练图像的图像类别对应的视觉停用词包括与所述每个训练图像的图像类别无关的视觉词。a generating unit, configured to generate a stop word vocabulary according to the plurality of visual words of each training image acquired by the acquiring unit, image category information of each training image, and the positive sample image set information, The stop word vocabulary includes a mapping relationship between an image category of each training image and a visual stop word corresponding to an image category of each training image, the image category of each training image corresponding to The visual stop words include visual words that are unrelated to the image categories of each of the training images.
  16. 根据权利要求15所述的装置,其特征在于,所述生成单元具体用于:The device according to claim 15, wherein the generating unit is specifically configured to:
    根据所述每个训练图像对应的多个视觉词、所述每个训练图像的图像类别信息和所述正样本图像集合信息,确定所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,所述训练图像库的多种图像类别包括所述每个训练图像的图像类别,所述训练图像库的多个视觉词包括所述每个训练图像的多个视觉词;Determining a plurality of image categories of the training image library and the training image library according to the plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information Correlation between a plurality of visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including the each of the training images Multiple visual words;
    根据所述训练图像库的多种图像类别与所述训练图像库的多个视觉词之间的相关性,生成所述停用词词库。The stop word vocabulary is generated according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
  17. 一种图像检索装置,所述装置包括存储器、处理器、通信接口及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述存储器、所述处理器以及所述通信接口之间通过内部连接通路互相通信,其特征在于,所述处理器执行所述计算机程序时执行上述权利要求1至权利要求6中任一项所述的方法。An image retrieval device, the device comprising a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the memory, the processor, and the The communication interfaces communicate with one another via internal connection paths, characterized in that the processor executes the method of any one of claims 1 to 6 when the computer program is executed.
  18. 一种图像库的生成装置,所述装置包括存储器、处理器、通信接口及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述存储器、所述处理器以及所述通信接口之间通过内部连接通路互相通信,其特征在于,所述处理器执行所述计算机程序时执行上述权利要求7或权利要求8所述的方法。An apparatus for generating an image library, the apparatus comprising a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the memory, the processor, and The communication interfaces communicate with one another via internal connection paths, characterized in that the processor executes the method of claim 7 or claim 8 when executing the computer program.
  19. 一种计算机可读介质,用于存储计算机程序,其特征在于,所述计算机程序包括用于执行上述权利要求1至权利要求6中任一项所述的方法的指令。A computer readable medium for storing a computer program, characterized in that the computer program comprises instructions for performing the method of any of the preceding claims 1 to 6.
  20. 一种计算机可读介质,用于存储计算机程序,其特征在于,所述计算机程序包括用于执行上述权利要求7或权利要求8所述的方法的指令。A computer readable medium for storing a computer program, characterized in that the computer program comprises instructions for performing the method of claim 7 or claim 8.
  21. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指 令在计算机上运行时,使得计算机执行上述权利要求1至权利要求6中任一项所述的方法。A computer program product, the computer program product comprising instructions, wherein when The computer is caused to perform the method of any of the preceding claims 1 to 6 when run on a computer.
  22. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机上运行时,使得计算机执行上述权利要求7或权利要求8所述的方法。 A computer program product comprising instructions, wherein when the instructions are run on a computer, causing the computer to perform the method of claim 7 or claim 8.
PCT/CN2017/112956 2017-11-24 2017-11-24 Image retrieval method and device, and image library generation method and device WO2019100348A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/112956 WO2019100348A1 (en) 2017-11-24 2017-11-24 Image retrieval method and device, and image library generation method and device
CN201780097137.5A CN111373393B (en) 2017-11-24 2017-11-24 Image retrieval method and device and image library generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/112956 WO2019100348A1 (en) 2017-11-24 2017-11-24 Image retrieval method and device, and image library generation method and device

Publications (1)

Publication Number Publication Date
WO2019100348A1 true WO2019100348A1 (en) 2019-05-31

Family

ID=66630527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112956 WO2019100348A1 (en) 2017-11-24 2017-11-24 Image retrieval method and device, and image library generation method and device

Country Status (2)

Country Link
CN (1) CN111373393B (en)
WO (1) WO2019100348A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276348A (en) * 2019-06-20 2019-09-24 腾讯科技(深圳)有限公司 A kind of image position method, device, server and storage medium
CN112348885A (en) * 2019-08-09 2021-02-09 华为技术有限公司 Visual feature library construction method, visual positioning method, device and storage medium
CN113591865A (en) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114264297B (en) * 2021-12-01 2022-10-18 清华大学 Positioning and mapping method and system for UWB and visual SLAM fusion algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844299A (en) * 2016-03-23 2016-08-10 浙江理工大学 Image classification method based on bag of words
CN106250909A (en) * 2016-07-11 2016-12-21 南京邮电大学 A kind of based on the image classification method improving visual word bag model
CN106354735A (en) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 Image target searching method and device
CN106407327A (en) * 2016-08-31 2017-02-15 广州精点计算机科技有限公司 Similar image searching method and device based on HOG (Histogram of Oriented Gradient) and visual word bag
CN106855883A (en) * 2016-12-21 2017-06-16 中国科学院上海高等研究院 The Research on face image retrieval of view-based access control model bag of words
CN106919920A (en) * 2017-03-06 2017-07-04 重庆邮电大学 Scene recognition method based on convolution feature and spatial vision bag of words
CN107133640A (en) * 2017-04-24 2017-09-05 河海大学 Image classification method based on topography's block description and Fei Sheer vectors

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8073818B2 (en) * 2008-10-03 2011-12-06 Microsoft Corporation Co-location visual pattern mining for near-duplicate image retrieval
CN103235955A (en) * 2013-05-03 2013-08-07 中国传媒大学 Extraction method of visual word in image retrieval
CN104424226B (en) * 2013-08-26 2018-08-24 阿里巴巴集团控股有限公司 A kind of method and device obtaining visual word dictionary, image retrieval
CN103838864B (en) * 2014-03-20 2017-02-22 北京工业大学 Visual saliency and visual phrase combined image retrieval method
US9697234B1 (en) * 2014-12-16 2017-07-04 A9.Com, Inc. Approaches for associating terms with image regions
CN104615676B (en) * 2015-01-20 2018-08-24 同济大学 One kind being based on the matched picture retrieval method of maximum similarity

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354735A (en) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 Image target searching method and device
CN105844299A (en) * 2016-03-23 2016-08-10 浙江理工大学 Image classification method based on bag of words
CN106250909A (en) * 2016-07-11 2016-12-21 南京邮电大学 A kind of based on the image classification method improving visual word bag model
CN106407327A (en) * 2016-08-31 2017-02-15 广州精点计算机科技有限公司 Similar image searching method and device based on HOG (Histogram of Oriented Gradient) and visual word bag
CN106855883A (en) * 2016-12-21 2017-06-16 中国科学院上海高等研究院 The Research on face image retrieval of view-based access control model bag of words
CN106919920A (en) * 2017-03-06 2017-07-04 重庆邮电大学 Scene recognition method based on convolution feature and spatial vision bag of words
CN107133640A (en) * 2017-04-24 2017-09-05 河海大学 Image classification method based on topography's block description and Fei Sheer vectors

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276348A (en) * 2019-06-20 2019-09-24 腾讯科技(深圳)有限公司 A kind of image position method, device, server and storage medium
CN110276348B (en) * 2019-06-20 2022-11-25 腾讯科技(深圳)有限公司 Image positioning method, device, server and storage medium
CN112348885A (en) * 2019-08-09 2021-02-09 华为技术有限公司 Visual feature library construction method, visual positioning method, device and storage medium
CN113591865A (en) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment
CN113591865B (en) * 2021-07-28 2024-03-26 深圳甲壳虫智能有限公司 Loop detection method and device and electronic equipment

Also Published As

Publication number Publication date
CN111373393A (en) 2020-07-03
CN111373393B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
WO2019001481A1 (en) Vehicle appearance feature identification and vehicle search method and apparatus, storage medium, and electronic device
US20230087526A1 (en) Neural network training method, image classification system, and related device
WO2022111069A1 (en) Image processing method and apparatus, electronic device and storage medium
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
WO2019018063A1 (en) Fine-grained image recognition
Xia et al. Loop closure detection for visual SLAM using PCANet features
Ali et al. A real-time deformable detector
US20130121600A1 (en) Methods and Apparatus for Visual Search
WO2019100348A1 (en) Image retrieval method and device, and image library generation method and device
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN112016638B (en) Method, device and equipment for identifying steel bar cluster and storage medium
CN110532413B (en) Information retrieval method and device based on picture matching and computer equipment
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN114898266B (en) Training method, image processing device, electronic equipment and storage medium
Ning et al. Occluded person re-identification with deep learning: a survey and perspectives
CN110909817B (en) Distributed clustering method and system, processor, electronic device and storage medium
Liao et al. Multi-scale saliency features fusion model for person re-identification
AU2011265494A1 (en) Kernalized contextual feature
US11880405B2 (en) Method for searching similar images in an image database using global values of a similarity measure for discarding partitions of the image database
CN114462479A (en) Model training method, model searching method, model, device and medium
CN116052220B (en) Pedestrian re-identification method, device, equipment and medium
CN112131902A (en) Closed loop detection method and device, storage medium and electronic equipment
Jia et al. An adaptive framework for saliency detection
WO2023071577A1 (en) Feature extraction model training method and apparatus, picture searching method and apparatus, and device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17932942

Country of ref document: EP

Kind code of ref document: A1