WO2019100348A1 - Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images - Google Patents

Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images Download PDF

Info

Publication number
WO2019100348A1
WO2019100348A1 PCT/CN2017/112956 CN2017112956W WO2019100348A1 WO 2019100348 A1 WO2019100348 A1 WO 2019100348A1 CN 2017112956 W CN2017112956 W CN 2017112956W WO 2019100348 A1 WO2019100348 A1 WO 2019100348A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
visual
training
retrieved
library
Prior art date
Application number
PCT/CN2017/112956
Other languages
English (en)
Chinese (zh)
Inventor
付宇新
温丰
薛常亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780097137.5A priority Critical patent/CN111373393B/zh
Priority to PCT/CN2017/112956 priority patent/WO2019100348A1/fr
Publication of WO2019100348A1 publication Critical patent/WO2019100348A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present application relates to the field of image retrieval technology, and more particularly to an image retrieval method and apparatus and an image library generation method and apparatus in the field of image retrieval technology.
  • the bag of visual words (BoVW) model is widely applied to the field of image retrieval.
  • the visual word bag model includes a plurality of visual words, which are performed on a plurality of visual feature descriptors extracted from a plurality of images. Clustered, each of the plurality of visual words is a cluster center.
  • a plurality of visual feature descriptors of the image to be retrieved are first acquired, and the plurality of visual feature descriptors are matched and mapped with the visual words in the visual word bag model to obtain a plurality of images to be retrieved.
  • a visual word, the plurality of visual words being used to represent the image to be retrieved, and calculating a similarity between the image to be retrieved and the search image in the search image library according to the plurality of visual words of the image to be retrieved, the search image library At least one image having the highest degree of similarity with the image to be retrieved is output as an image retrieval result.
  • the present application provides an image retrieval method and apparatus, and an image processing method and apparatus, which are advantageous for improving the efficiency and accuracy of image retrieval.
  • the present application provides an image retrieval method, the method comprising:
  • the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual feature descriptors of the image to be retrieved and the visual word bag model Obtaining a mapping result of the word, the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
  • Determining, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, the visual stop word corresponding to the image category of the image to be retrieved includes the image to be retrieved
  • the image category-independent visual word includes a mapping relationship between the image category of the image to be retrieved and the visual stop word corresponding to the image category of the image to be retrieved;
  • a search result is determined according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
  • the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, From the pending
  • the plurality of visual words of the cable image remove visual words that have no significant effect on the recognition of the search image or affect image recognition, that is, the target visual words of the image to be retrieved are more significant for identifying the image to be retrieved. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
  • the search image library further includes target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images.
  • the plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • determining the retrieval result according to the similarity between the target visual word of the image to be retrieved and the target visual word of the retrieved image in the search image library is beneficial to improving the efficiency and accuracy of the image retrieval.
  • the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
  • the visual feature points of the image in the embodiment of the present application refer to pixels that are consistent in image transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corners. Point or texture rich edge points. The quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
  • the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
  • SIFT scale-invariant feature transform
  • ORB ORB
  • SURF speeded up robust feature
  • FAST FAST
  • the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
  • the visual feature descriptor of the image in the embodiment of the present application refers to a visual feature point of the image through the mathematical feature.
  • the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or coding of 1; using the information of the direction of the visual feature point, the visual feature point is rotated to obtain a robust binary vector visual feature descriptor.
  • the visual feature descriptor of the image may be one or more embodiments of the present application.
  • the visual word bag model in the embodiment of the present application includes a plurality of visual words, each of the plurality of visual words being obtained by clustering visual feature descriptors extracted from the plurality of images. A clustering center.
  • the visual word of the image in the embodiment of the present application refers to the visual mapping between the visual feature descriptor of the image and the visual word in the visual word bag model, and the visual word bag model and the visual image are obtained.
  • the feature describes the nearest visual word.
  • the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
  • the image categories of the images may include Mori Forest scenes, suburban scenes, indoor scenes, etc.
  • the image categories of the images may include sunny, rainy, snowy, and the like.
  • a visual stop word corresponding to a medium image category refers to a visual word that has no significant effect on an image that recognizes a certain image category, or affects image recognition, that is, a visual word that is not related to an image of the image category.
  • the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
  • the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
  • the trees can be Visual stop words for forest or suburban categories.
  • the image will leave traces of rain falling.
  • the feature points extracted from the rainwater in the image will also pollute multiple visual words of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
  • the target visual word of the image in the embodiment of the present application includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
  • the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
  • the positive sample image set in the embodiment of the present application includes artificially labeled images that can be considered to be of high similarity or the same.
  • shooting two images of the same object in different scenes for example, a rainy school and a snowy school.
  • two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
  • the method before determining the visual stop word corresponding to the image category of the image to be retrieved according to the image category information of the image to be retrieved and the stop word dictionary, the method further includes: acquiring training a plurality of visual words of each training image in the image library, image category information of each of the training images, and positive sample image set information, the plurality of visual words of each of the training images being by multiple visuals of each of the training images
  • the feature descriptor is matched with the visual word in the visual bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used to indicate at least a positive sample image set including a plurality of similar training images in the training image library manually labeled; a plurality of visual words according to the each training image, image category information and positive of each of the training images
  • the sample image collection information generates the stop word vocabulary.
  • the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information including: according to the Determining a plurality of visual words of the training image, image category information of the each training image, and the positive sample image set information, determining correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the image retrieval device may use, as the each image category, at least one visual word of the plurality of visual words corresponding to the training image library that has the least correlation with each image category corresponding to the training image library. Corresponding visual stop words.
  • the image retrieving device may: the at least one visual word whose correspondence between the plurality of visual words corresponding to the training image library and each of the image categories corresponding to the training image library is less than a first preset threshold, A visual stop word corresponding to each of the image categories.
  • the visual stop words corresponding to each image category may include one or more visual words, which are not limited by the embodiment of the present application.
  • the stop word dictionary may include a mapping relationship between each image category and a visual stop word corresponding to each image category.
  • determining, according to the plurality of visual words of each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and The correlation between the plurality of visual words of the training image library includes: determining a first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously a correlation between the category and the first visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event indicating the The plurality of visual words of the first training image in the training image library and the plurality of visual words of the second training image in the training image library each include the first visual word, and the image category of the first training image is the first An image category, the second event indicating that the first training image and the second training image belong to the same positive sample image set.
  • x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image
  • Each of the visual words includes a first visual word of the L visual words
  • y represents a second event
  • the second event is that the first training image and the second training image belong to the same positive sample image set
  • count(x ) indicates the number of times the first event occurred
  • count(y) indicates the number of times the second event occurred
  • count(x, y) indicates the number of simultaneous occurrences of the first event and the second event
  • p(x) indicates the occurrence of the first event.
  • Probability, p(y) represents the probability of occurrence of the second event
  • p(x, y) represents the probability that the first event and the second event occur simultaneously
  • PMI(x, y) represents the mutual point of the first event and the second event
  • H(y) represents the information entropy of the second event
  • RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category.
  • the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
  • the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
  • the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
  • the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images.
  • the visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods.
  • the application embodiment does not limit this.
  • the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
  • the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
  • the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
  • the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved
  • the at least one search image having the highest degree is determined as the search result.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • At least one search image similar to the image to be retrieved is determined to obtain a search result, which is beneficial to improving the efficiency of image retrieval. And accuracy.
  • the present application provides an image processing method, the method comprising:
  • the stop word vocabulary including the image of each training image a mapping relationship between the category and the visual stop word corresponding to the image category of each of the images, the visual stop words corresponding to the image category of each training image including visual words not related to the image category of each training image .
  • the method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image
  • the plurality of visual words, the image category information of each of the training images, and the positive sample image set information generate a stop word vocabulary, which is beneficial to improving the efficiency and accuracy of image retrieval.
  • the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
  • the second device and the first device in the first aspect may be the same device or different devices, which is not limited in this embodiment of the present application.
  • the image library generating device and the image searching device in the first aspect are different functional modules in the same device, or the image library generating device is A functional module in an image retrieval device.
  • the generating the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information including: Determining a plurality of visual words corresponding to each training image, image category information of each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and a plurality of visual words of the training image library a correlation between the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; The dependency between the plurality of image categories and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the plurality of visual words corresponding to each training image each training figure Determining, between the image category information of the image and the positive sample image set information, a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library, including: a probability according to the first event, Determining a probability of occurrence of the second event and a probability that the first event coincides with the second event, determining a correlation between the first image category and the first visual word, the plurality of image categories of the training image library including the first image a plurality of visual words of the training image library including the first visual word, the first event representing a plurality of visual words of the first training image in the training image library and a second training image in the training image library
  • Each of the plurality of visual words includes the first visual word, the image category of the first training image is the first image category, and the second event indicates that the first training image and the second training image belong to the same positive sample image set .
  • the method further includes: determining, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing from the plurality of visual words of each training image
  • the visual stop words corresponding to the image categories of each training image, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the acquiring a plurality of visual words of each training image in the training image library includes: acquiring each training image, and extracting a plurality of visual feature descriptors of each training image, the plurality of The visual feature descriptor is used to describe a plurality of visual feature points of each training image, and the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is acquired, and the visual word bag model is A plurality of visual words that are closest to the distance of each of the plurality of visual feature descriptors are determined as a plurality of visual words for each of the training images.
  • the present application provides an image retrieval apparatus for performing the method of any of the above first aspect or any of the possible implementations of the first aspect.
  • the present application provides an image processing apparatus for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
  • the present application provides an image retrieval apparatus, the apparatus comprising: a memory, a processor, a communication interface, and a computer program stored on the memory and executable on the processor, wherein the processor
  • the method of any of the above-described first aspects or any of the possible implementations of the first aspect is performed when the computer program is executed.
  • the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor
  • the method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed when the computer program is executed.
  • the application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • the present application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of any of the second aspect or any of the possible implementations of the second aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspect or the first aspect of the first aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above-described second aspect or any of the possible implementations of the second aspect.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path.
  • the processor is operative to execute code in the memory, the processor being operative to perform the method of any of the first aspect or the first aspect of the first aspect when the code is executed.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path Communicating with each other, the processor is operative to execute code in the memory, and when the code is executed, the processor is operative to perform the method of any of the second aspect or the second aspect of the second aspect.
  • FIG. 1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application.
  • FIG. 2 is a schematic block diagram of a method for generating an image library according to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of an image retrieval apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an apparatus for generating an image library according to an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of another image retrieval apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of another image library generating apparatus according to an embodiment of the present application.
  • the visual feature points of an image refer to pixels that are consistent in the transformation, such as scaling, rotation, translation, and viewing angle, that is, the most easily recognized pixels in the image, such as corner points or texture-rich edge points.
  • the quality of the visual feature points of the image will directly affect the efficiency and accuracy of image retrieval.
  • the type of the visual feature points of the image may include a scale-invariant feature transform (SIFT), an ORB, a speeded up robust feature (SURF), and an accelerated segmentation test to obtain features (features from The accelerated segment test, FAST, and the like are not limited in the embodiment of the present application.
  • SIFT scale-invariant feature transform
  • ORB ORB
  • SURF speeded up robust feature
  • FAST FAST
  • the visual feature points of the image may be one or more, which is not limited by the embodiment of the present application.
  • the main steps of extracting the FAST corner point of the image include: calculating the difference between the brightness of each pixel in the image and its neighboring pixels, if the pixel has a large difference from the pixels in its neighborhood, Then it is more likely to be a corner point; then by non-maximum suppression, only the corner points of the response maxima are retained in a certain area, avoiding the problem of corner point concentration; for the FAST corner point, there is no directionality and scale weakness, Add a description of the scale and rotation. Scale invariance is achieved by constructing an image pyramid, downsampling the image at different levels, and obtaining images of different resolutions. The rotation invariance is realized by the gray scale centroid method, that is, the direction vector obtained by calculating the centroid of the gray value of the image block and the geometric center connection is used as the description of the feature point direction.
  • the visual feature descriptor of an image refers to a visual feature point that describes an image by mathematical features.
  • the main steps of acquiring the visual feature descriptor of the image include: randomly selecting a plurality of pixel pairs in the vicinity of the visual feature points of the image, and comparing the size relationship between the two pixels in each pixel pair to obtain 0 Or 1 coding; using visual information point direction information to rotate the visual feature points to obtain robust binary vector vision Feature descriptor.
  • the visual feature descriptor of the image may be one or more embodiments of the present application.
  • the visual word bag model includes a plurality of visual words, each of the plurality of visual words being a cluster center obtained by clustering visual feature descriptors extracted from the plurality of images.
  • the visual word of the image refers to a visual word in the visual word bag model that is closest to the visual feature descriptor by matching and mapping the visual feature descriptor of the image with the visual word in the visual bag model.
  • the visual word of the image may be one or more, which is not limited by the embodiment of the present application.
  • an image category of each image can be obtained.
  • the image categories of the images may include forest scenes, suburban scenes, indoor scenes, and the like.
  • the image categories of the images may include sunny, rainy, snowy, and the like.
  • the same visual words appearing in different images may have different effects on recognizing the image
  • the same visual words appearing in different images may have the same effect on recognizing the two images
  • the visual stop words corresponding to the image categories refer to A visual word that has no significant effect on an image that identifies a certain image category, or that affects image recognition, that is, a visual word that is unrelated to the image of the image category.
  • the visual stop words that are not related to a certain image category described in the embodiments of the present application refer to visual words whose correlation with the image of the image category is lower than a preset threshold.
  • the visual stop words corresponding to the image categories may include one or more visual words, which are not limited by the embodiment of the present application.
  • the trees can be Visual stop words for forest or suburban categories.
  • the image will leave traces of rain falling.
  • the feature points extracted from the rainwater in the image will also cause pollution to the word representation of the image. Therefore, the rainwater can be a visual stoppage for rainy days.
  • the target visual word of the image includes a visual word after the visual stop word corresponding to the image category of the image is removed from the plurality of visual words of the image.
  • the target visual word of the image may include one or more visual words, which is not limited by the embodiment of the present application.
  • the positive sample image set includes artificially labeled images that can be considered as high or similar.
  • shooting two images of the same object in different scenes for example, a rainy school and a snowy school.
  • two images of the same scene are taken at different times, such as the current pose and the historical pose of the same scene in the loop detection.
  • the applicable scenarios of the embodiments of the present application include instant localization and map construction (simultaneous localization and Loop closure in mapping, SLAM), product image retrieval in e-commerce, etc.
  • the loop detection detects the scenes that have appeared in the history, uses the current image to retrieve and recognize the loop, constructs a constraint of the current pose and the historical pose, and reduces the overall error by optimization to obtain a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • FIG. 1 is a schematic flowchart of an image retrieval method 100 provided by an embodiment of the present application. The method can be performed by an image retrieval device.
  • the image retrieval device may be a first device having a computing and storage function, and the first device may be, for example, a computer, or the image retrieval device may be a functional module in the first device. limited.
  • S120 Determine, according to the image category information of the image to be retrieved and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, where the visual stop word corresponding to the image category of the image to be retrieved includes Retrieving a visual word irrelevant for an image category of the image, the stop word vocabulary including a mapping relationship between an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved.
  • the visual stop words corresponding to the image categories of the image to be retrieved are removed from the plurality of visual words of the image to be retrieved, and the target visual words of the image to be retrieved are obtained.
  • S140 Determine a search result according to the target visual word and the search image library of the image to be retrieved, and the search image library includes a plurality of search images.
  • the image retrieval device may acquire a plurality of visual words of the image to be retrieved in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire an image to be retrieved, and extract a plurality of visual feature descriptors of the image to be retrieved, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the image to be retrieved. And the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and the visual word bag model is obtained, and the distance between the visual word bag model and each of the plurality of visual feature descriptors is the closest.
  • a plurality of visual words are determined as a plurality of visual words of the image to be retrieved.
  • the visual word bag model may be an existing trained visual word bag model, or may be obtained by clustering the visual feature descriptors of the training images in the training picture set by the image retrieval device. This example does not limit this.
  • the image retrieval device may obtain the image to be retrieved in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
  • the image to be retrieved obtained by the image retrieving device may be an image after de-distortion, denoising, or other pre-processing operations, which is not limited in this embodiment of the present application.
  • the image retrieving device may obtain the image category information of the image to be retrieved in a plurality of manners, which is not limited in this embodiment of the present application.
  • the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and an image classification model, where the image classification model includes the image to be retrieved and the image of the image to be retrieved.
  • the mapping relationship of categories may be determined by the image retrieval device.
  • the image retrieval device may determine image category information of the image to be retrieved according to the image to be retrieved and a preset classification algorithm.
  • the image retrieval device may acquire image category information of the image to be retrieved manually labeled.
  • the image category information of the image to be retrieved may be one or more bits, that is, the image type of the image to be retrieved is indicated by the one or more bits, which is not limited in this embodiment of the present application.
  • the image category information of the image to be retrieved may be 2 bits. For example, when the 2 bits are “00”, the image to be retrieved is indicated as the first type of image, and when the 2 bits are “01”. The image to be retrieved is indicated as a second type of image. When the 2 bits are "10”, the image to be retrieved is indicated as a third type of image, and when the 2 bits are "11”, the image to be retrieved is indicated as a fourth type of image.
  • the image retrieval device may acquire the stop word dictionary before S120.
  • the stop word dictionary may include a mapping of an identifier of each of the plurality of image categories and a visual stop word corresponding to the identifier of each of the image categories.
  • the image retrieving device may generate the stop word vocabulary by itself, or may acquire the stop word vocabulary from the image library generating device, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image collection information, where the plurality of visual words of each training image are Obtaining a map by matching the plurality of visual feature descriptors of each training image with the visual words in the visual word bag model, the image category information of each training image is used to indicate an image category of each training image
  • the positive sample image set information is used to indicate at least one positive sample image set, the positive sample image set includes a plurality of similar training images in the training image library manually labeled; and a plurality of visual words according to the each training image
  • the image category information of each training image and the positive sample image set information generating a stop word vocabulary, the stop word vocabulary including the image category of each training image and corresponding to the image category of each image a mapping relationship between visual stop words, the visual stop words corresponding to the image categories of each training image are included with each training map Class independent visual image of the word.
  • the image retrieval device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of the each training image, and the positive sample image set information, which may be according to each And determining, by the plurality of visual words corresponding to the training image, the image category information of the each training image, and the positive sample image set information, determining a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library a plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between the image category and the plurality of visual words of the training image library generates the stop word vocabulary.
  • the image retrieval device may determine the first image category and the first vision according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between words, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library
  • the plurality of visual words of the first training image and the plurality of visual words of the second training image of the training image library each include the first visual word
  • the image category of the first training image is the first image category
  • the first The second event indicates that the first training image and the second training image belong to the same positive sample image set.
  • x represents a first event, where the first event is a plurality of visual words of the first training image of the N training images and a second training image of the P training images other than the first training image
  • Each of the visual words includes a first visual word of the L visual words
  • y represents a second event
  • the second event is that the first training image and the second training image belong to the same positive sample image set
  • count(x ) indicates the number of times the first event occurred
  • count(y) indicates the number of times the second event occurred
  • count(x, y) indicates the number of simultaneous occurrences of the first event and the second event
  • p(x) indicates the occurrence of the first event.
  • Probability, p(y) represents the probability of occurrence of the second event
  • p(x, y) represents the probability that the first event and the second event occur simultaneously
  • PMI(x, y) represents the mutual point of the first event and the second event
  • H(y) represents the information entropy of the second event
  • RATE PMI (x, y) represents the point mutual information rate of the first event and the second event, that is, the correlation between the first visual word and the first image category.
  • the image retrieval device may obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
  • the image retrieval device may acquire first information including an identifier of each positive sample image set of the plurality of positive sample image sets and a training image included in each of the positive sample image sets The mapping relationship between the identifiers, the image retrieval device may acquire the positive sample image collection information according to the first information.
  • the image retrieving device may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image retrieval device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop word corresponding to each image category.
  • the image retrieval device may use, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold.
  • the visual stop word for the category may be used, as the each image, at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold.
  • the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
  • the image retrieval device may acquire the retrieval image library before S140.
  • the search image library includes a plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are corresponding to each of the search images.
  • the plurality of visual words are obtained by removing the visual stop words corresponding to the image categories of each of the search images.
  • the image retrieval device may generate the search image library by itself, or may acquire the search image library from the image library generation device, which is not limited in this embodiment of the present application.
  • the search image library may be trained according to the plurality of training images, or the search image library may be trained according to the historical image to be retrieved retrieved by the image retrieval device before the current retrieval, or may be This embodiment of the present application does not limit this.
  • the image retrieval device may generate the search image according to a plurality of visual words of each of the plurality of training images, image category information of each of the training images, and the stop word dictionary. Library. That is, the search image library is trained based on the plurality of training images.
  • the image retrieval device may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from each of the training images.
  • the visual stop words corresponding to the image categories of each training image are removed from the plurality of visual words, the target visual words of each training image are obtained, and the target visual words of each training image are added to the search image library.
  • the image retrieving device may obtain the target visual word of each training image by using the stop word vocabulary in the embodiment of the present application, or may obtain the target visual word of each training image by using other methods.
  • the application embodiment does not limit this.
  • the search image library includes target visual words of all product images provided by the user, and the search images are product images that the user wants to purchase.
  • the image retrieval device may add a target visual word of the historical to-be-retrieved image retrieved prior to S140 to the retrieval image library to generate the retrieval image library.
  • the search image library includes all historical pose images, and the image to be retrieved is the current pose image.
  • the image retrieval method provided by the embodiment of the present application can save the history of the scene, use the current image to perform the retrieval and recognition loop, and construct a constraint of the current pose and the historical pose, and reduce the overall by optimization. Errors to get a globally consistent map.
  • the system searches according to the image of the product, and returns an image with a higher similarity as a retrieval result.
  • the image retrieval device may select at least one search image that is the most similar to the image to be retrieved as the search result, which is not limited by the embodiment of the present application.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, which will be similar to the image to be retrieved At least the highest degree A search image is determined as the search result.
  • the image retrieval device may calculate a similarity between the plurality of visual words of the image to be retrieved and the plurality of visual words of the retrieved image in the search image library, and the similarity is greater than the second pre- At least one search image of the threshold is determined to be a search result.
  • the target visual word of the image to be retrieved is obtained by removing the visual stop word corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved, that is, Removing a visual word that has no significant effect on the recognition of the search image or affects image recognition from a plurality of visual words of the image to be retrieved, that is, a comparison of the target visual words of the image to be retrieved for identifying the image to be retrieved Significant. Therefore, searching through the target visual words of the image to be retrieved and the search image library is beneficial to improving the efficiency and accuracy of image retrieval.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image.
  • the memory usage of the library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image, thereby facilitating reducing the search image.
  • FIG. 2 is a schematic flowchart of a method 200 for generating an image library according to an embodiment of the present disclosure.
  • the method 200 may be performed by a device for generating an image library, which is not limited by the embodiment of the present application.
  • the generating device of the image library may be a second device having a computing and storage function, the second device may be, for example, a computer, or the image library generating device may be a functional module in the second device, which is implemented by the present application. This example does not limit this.
  • the second device and the first device in FIG. 1 may be the same device or different devices, which is not limited in this embodiment of the present application.
  • the image library generating device and the image searching device described in FIG. 1 are different functional modules in the same device, or the image library generating device is A functional module in the image retrieval device.
  • a stop word vocabulary includes each training image a mapping relationship between the image categories and the visual stop words corresponding to the image categories of the each image, the visual stop words corresponding to the image categories of each of the training images are independent of the image categories of the each training image Visual word.
  • the method for generating a database provided by the embodiment of the present application, by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image.
  • the efficiency and precision of the search by acquiring a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, and according to each training image Generating a plurality of visual words, image category information of each of the training images, and the positive sample image set information to generate a stop word vocabulary, wherein the stop word vocabulary is used to obtain a target visual word of the image to be retrieved, which is beneficial to improving the image.
  • the efficiency and precision of the search by acquiring
  • the generating device of the image library may acquire multiple views of the training image in multiple manners.
  • the embodiment of the present application does not limit this.
  • the image training device may acquire a training image, and extract a plurality of visual feature descriptors of the training image, where the plurality of visual feature descriptors are used to describe a plurality of visual feature points of the training image,
  • the visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and obtain a visual word bag model, and the plurality of visual word bag models are closest to each of the plurality of visual feature descriptors.
  • the visual word bag model may also be an existing trained visual word bag model, or may be obtained by the image generating device itself by clustering a plurality of visual feature descriptors corresponding to the plurality of images.
  • the embodiment of the present application does not limit this.
  • the image generating device of the image library can obtain the training image in a plurality of manners, for example, by camera shooting, local disk reading, network downloading, or other manners, which is not limited by the embodiment of the present application.
  • the multiple images obtained by the generating device of the image library may be images after de-distortion, de-noising, or other pre-processing operations, which are not limited in this embodiment of the present application.
  • the generating device of the image library may obtain the image category information of the training image in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image library generating device may determine image category information of the training image according to the training image and the image classification model, where the image classification model includes a mapping relationship between the training image and an image category of the training image. .
  • the image library generating device may determine image category information of the training image according to the training image and a preset classification algorithm.
  • the image library generating device may acquire image category information of the training image manually labeled.
  • the image category information of the training image may be one or more bits, that is, the image type of the training image is indicated by the one or more bits, which is not limited in this embodiment of the present application.
  • the image category information of the training image may be 2 bits. For example, when the 2 bits are “00”, the training image is indicated as a first type of image, and when the 2 bits are “01”, the The training image is a second type of image. When the 2 bits are "10”, the training image is indicated as a third type of image, and when the 2 bits are "11”, the training image is indicated as a fourth type of image.
  • the image library generating device generates the stop word vocabulary according to the plurality of visual words of each training image, the image category information of each training image, and the positive sample image set information. Determining, according to the plurality of visual words corresponding to each training image, the image category information of the each training image, and the positive sample image set information, determining a plurality of image categories of the training image library and the plurality of training image libraries Correlation between visual words, the plurality of image categories of the training image library including image categories of each of the training images, the plurality of visual words of the training image library including a plurality of visual words of each of the training images; A correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library is generated to generate the stop word vocabulary.
  • the generating device of the image library may determine the first image category and the first image according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously Correlation between a visual word, the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the training image library Multiple visual words in the first training image And the plurality of visual words of the second training image in the training image library respectively include the first visual word, the image category of the first training image is the first image category, and the second event represents the first training image and The second training image belongs to the same positive sample image set.
  • the correlation of the training images can be determined by the above formulas (1) to (6).
  • the generating device of the image library can obtain the positive sample image set information in a plurality of manners, which is not limited by the embodiment of the present application.
  • the generating device of the image library may acquire one or more bits carried in each training image, and acquire the positive sample image set information according to one or more bits of each training image. For example, if the first training image and the second training image of the plurality of training images carry the same bit, it is determined that the first training image and the second training image belong to the same positive sample image set.
  • the generating device of the image library may acquire first information, where the first information includes an identifier of each positive sample image set of the plurality of positive sample image sets and the each positive sample image set includes And a mapping relationship between the identifiers of the training images, the generating device of the image library may acquire the positive sample image set information according to the first information.
  • the generating device of the image library may determine the visual stop words corresponding to each of the image categories from the plurality of visual words of the training image library in a plurality of manners, which is not limited by the embodiment of the present application.
  • the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has the least correlation with each image category as the visual stop corresponding to each image category. Use words.
  • the image library generating device may use at least one visual word of the plurality of visual words of the training image library that has a correlation with each of the image categories that is less than a first preset threshold. Visual stop words corresponding to image categories.
  • the visual stop words corresponding to each image category may be one or more visual words, which are not limited in this embodiment of the present application.
  • the generating device of the image library may determine, according to the image category information of each training image and the stop word vocabulary, a visual stop word corresponding to the image category of each training image, from Removing a visual stop word corresponding to the image category of each training image from the plurality of visual words of each training image, obtaining a target visual word of each training image, and adding a target visual word of each training image To the search image library.
  • the target visual word of the search image stored in the search image library is obtained by removing the visual stop word corresponding to the image category of the search image from the plurality of visual words of the search image. It is beneficial to reduce the memory usage of the search image library.
  • FIG. 3 is a schematic block diagram of an image retrieval apparatus 300 provided by an embodiment of the present application.
  • the device 300 includes:
  • the acquiring unit 310 is configured to acquire a plurality of visual words of the image to be retrieved and image category information of the image to be retrieved, where the plurality of visual words of the image to be retrieved are by visualizing the plurality of visual features of the image to be retrieved The visual word in the word bag model is matched and mapped, and the image category information of the image to be retrieved is used to indicate an image category of the image to be retrieved;
  • the processing unit 320 is configured to determine, according to the image category information of the image to be retrieved acquired by the acquiring unit 310 and the stop word vocabulary, a visual stop word corresponding to the image category of the image to be retrieved, and an image category of the image to be retrieved
  • the corresponding visual stop word includes a visual word irrelevant to the image category of the image to be retrieved, the stop word vocabulary including an image category of the image to be retrieved and a visual stop word corresponding to the image category of the image to be retrieved a mapping relationship corresponding to the image category of the image to be retrieved from the plurality of visual words of the image to be retrieved acquired by the obtaining unit 310, to obtain a target visual word of the image to be retrieved;
  • the searching unit 330 is configured to determine a search result according to the target visual word and the search image library of the image to be retrieved obtained by the processing unit 320, where the search image library includes a plurality of search images.
  • the search image library includes a mapping relationship between the plurality of search images and target visual words corresponding to each of the plurality of search images, and the target visual words corresponding to each of the search images are from each of the And obtaining a visual stop word corresponding to the image category of each of the search images among the plurality of visual words corresponding to the image.
  • the device further includes a generating unit, where the acquiring unit is further configured to: before determining the visual stop word corresponding to the image category of the image to be retrieved, according to the image category information of the image to be retrieved and the stop word dictionary Obtaining a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, wherein the plurality of visual words of each training image are obtained by using each of the training images The plurality of visual feature descriptors are matched and mapped to the visual words in the visual word bag model, and the image category information of each training image is used to indicate an image category of each training image, and the positive sample image set information is used.
  • the positive sample image set includes a plurality of similar training images in the training image library manually labeled; the generating unit is configured to use, according to the plurality of visual words of each training image, each The image category information of the training image and the positive sample image collection information generate the stop word vocabulary.
  • the generating unit is specifically configured to: determine, according to the multiple visual words of each training image, image category information of each training image, and the positive sample image set information, multiple image categories of the training image library. Correlation with a plurality of visual words of the training image library, the plurality of image categories of the training image library including image categories of the each training image, the plurality of visual words of the training image library including the each training image a plurality of visual words; generating the stop word lexicon according to a correlation between a plurality of image categories of the training image library and a plurality of visual words of the training image library.
  • the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set.
  • the searching unit is specifically configured to: determine a similarity between the target visual word of the image to be retrieved and the target visual word of the search image in the search image library; and the target visual word of the image to be retrieved The at least one search image whose similarity is greater than the first preset value is determined as the search result.
  • the image retrieval device 300 herein is embodied in the form of a functional unit.
  • the term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (eg, a shared processor, a proprietary processor, or a group) for executing one or more software or firmware programs. Processors, etc.) and memory, merge logic, and/or other suitable components that support the described functionality.
  • ASIC application specific integrated circuit
  • the image retrieval device 300 can be specifically the image retrieval device in the foregoing method 100 and the method 200.
  • the image retrieval device 300 can be used to execute the image retrieval device corresponding to the image retrieval device in the method 100 and the method 200 described above. The various processes and/or steps are not repeated here to avoid repetition.
  • FIG. 4 is a schematic block diagram of an image library generating apparatus 400 provided by an embodiment of the present application.
  • the apparatus 400 includes:
  • the acquiring unit 410 is configured to acquire a plurality of visual words of each training image in the training image library, image category information of each training image, and positive sample image set information, where the plurality of visual words of each training image are A plurality of visual feature descriptors of each training image are matched and mapped with visual words in the visual word bag model, and image category information of each training image is used to indicate an image category of each training image, the positive
  • the sample image set information is used to indicate at least one positive sample image set, the positive sample image set including a plurality of similar training images in the training image library manually labeled;
  • the generating unit 420 is configured to generate a stop word vocabulary according to the plurality of visual words of the each training image acquired by the acquiring unit 410, the image category information of the each training image, and the positive sample image set information, where the stoppage is generated.
  • the word dictionary includes a mapping relationship between the image category of each training image and a visual stop word corresponding to the image category of each training image, and the visual stop words corresponding to the image category of each training image include A visual word that is independent of the image category of each training image.
  • the generating unit is configured to: determine, according to the multiple visual words corresponding to each training image, the image category information of each training image, and the positive sample image set information, multiple images of the training image library. a correlation between a category and a plurality of visual words of the training image library, the plurality of image categories of the training image library including an image category of the each training image, the plurality of visual words of the training image library including the each training a plurality of visual words of the image; generating the stop word lexicon according to a correlation between the plurality of image categories of the training image library and the plurality of visual words of the training image library.
  • the generating unit is specifically configured to: determine, according to a probability of occurrence of the first event, a probability of occurrence of the second event, and a probability that the first event and the second event occur simultaneously, determining the first image category and the first visual word a correlation between the plurality of image categories of the training image library including the first image category, the plurality of visual words of the training image library including the first visual word, the first event representing the first in the training image library a plurality of visual words of a training image and a plurality of visual words of the second training image in the training image library each include the first visual word, the image category of the first training image is the first image category, the second The event indicates that the first training image and the second training image belong to the same positive sample image set
  • the generating unit is further configured to: after generating the stop word lexicon according to the plurality of visual words according to the each training image, the image category information of the each training image, and the positive sample image set information, according to Determining, by the image category information of each training image, the stop word vocabulary, a visual stop word corresponding to the image category of each training image, and removing each training from the plurality of visual words of each training image A visual stop word corresponding to the image category of the image, a target visual word of each training image is obtained, and the target visual word of each training image is added to the search image library.
  • the acquiring unit is configured to acquire the each training image, and extract a plurality of visual feature descriptors of each training image, where the multiple visual feature descriptors are used to describe multiple visions of each training image.
  • a feature point the plurality of visual feature descriptors are in one-to-one correspondence with the plurality of visual feature points, and a visual word bag model is obtained, and each visual feature descriptor in the visual word bag model and the plurality of visual feature descriptors is obtained
  • the plurality of closest visual words are determined as a plurality of visual words for each of the training images.
  • the image library generating apparatus 400 herein is embodied in the form of a functional unit.
  • the term "unit” here May be referred to as an ASIC, an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a proprietary processor, or a group processor, etc.) and memory, merge logic, and/or other support described.
  • the right component for the function eg, those skilled in the art may understand that the image library generating apparatus 400 may be specifically the image library generating apparatus in the foregoing method 100 and the method 100 embodiment, and the image library generating apparatus 400 may be configured to execute the above.
  • the various processes and/or steps corresponding to the image library generating device in the method 100 and the method 200 are not repeated here to avoid repetition.
  • FIG. 5 is a schematic block diagram of an image retrieval device 500 provided by an embodiment of the present application.
  • the image retrieval device 500 may be the image retrieval device described in FIG. 1 and FIG. 2, and the image retrieval device may adopt the image retrieval device as shown in FIG.
  • the image retrieval device can include a processor 510, a communication interface 520, and a memory 530 that communicate with one another via internal connection paths.
  • the related functions implemented by the processing unit 320 and the retrieval unit 330 in FIG. 3 may be implemented by the processor 510, and the related functions implemented by the acquisition unit 310 may be implemented by the processor 510 controlling the communication interface 520.
  • the processor 510 may include one or more processors, for example, including one or more central processing units (CPUs).
  • processors for example, including one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single core CPU, and It can be a multi-core CPU.
  • the communication interface 520 is for transmitting and/or receiving data.
  • the communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
  • the memory 530 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • the memory 530 is used to store program code and data of the image retrieval device, and may be a separate device or integrated in the processor 510.
  • the processor 510 is configured to control the communication interface to perform data transmission with other devices, such as a generating device of the image library.
  • other devices such as a generating device of the image library.
  • Figure 5 only shows a simplified design of the image retrieval device.
  • the image retrieval device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all image retrieval devices that can implement the present application are in the present application. Within the scope of protection.
  • image retrieval device 500 can be replaced with a chip device, such as a chip that can be used in an image retrieval device for implementing related functions of processor 510 in an image retrieval device.
  • the chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip.
  • the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
  • FIG. 6 is a schematic block diagram of an image library generating apparatus 600 provided by an embodiment of the present application.
  • the image library generating apparatus 600 may be the image library generating apparatus described in FIG. 1 and FIG. 2, and the image library is
  • the generating device can adopt a hardware architecture as shown in FIG. 6.
  • the image library generating means may include a processor 610, a communication interface 620, and a memory 630, and the processor 610, the communication interface 620, and the memory 630 communicate with each other through an internal connection path.
  • the related functions implemented by the generating unit 420 in FIG. 4 may be implemented by the processor 610, and the correlation implemented by the obtaining unit 410
  • the functionality may be implemented by the processor 610 controlling the communication interface 620.
  • the processor 610 may include one or more processors, for example, including one or more central processing units (CPUs).
  • processors for example, including one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single core CPU, It can be a multi-core CPU.
  • the communication interface 620 is for transmitting and/or receiving data.
  • the communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
  • the memory 630 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • the memory 630 is used to store the program code and data of the generating means of the image library, and may be a separate device or integrated in the processor 610.
  • the processor 610 is configured to control the communication interface to perform data transmission with other devices, such as an image retrieval device.
  • other devices such as an image retrieval device.
  • Figure 6 only shows a simplified design of the image library generation device.
  • the image library generating device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all generating devices that can implement the image library of the present application. All are within the scope of this application.
  • the image library generating device 600 may be replaced with a chip device, for example, a chip that can be used in a generating device of an image library for implementing related functions of the processor 610 in the image generating device.
  • the chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip.
  • the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple networks. On the unit. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de récupération d'images ainsi qu'un procédé et un dispositif de génération de bibliothèques d'images. Le procédé de récupération d'images comprend : l'acquisition d'une pluralité de mots visuels d'une image à récupérer et d'informations de catégorie d'images de l'image à récupérer ; la détermination, selon les informations de catégorie d'images de l'image à récupérer et un lexique de mots d'arrêt, de mots d'arrêt visuels correspondant à la catégorie d'images de l'image à récupérer, les mots d'arrêt visuels correspondant à la catégorie d'images de l'image à récupérer comprenant des mots visuels qui ne sont pas pertinents pour la catégorie d'images de l'image à récupérer ; le fait de supprimer, de la pluralité de mots visuels de l'image à récupérer, les mots d'arrêt visuels correspondant à la catégorie d'images de l'image à récupérer de sorte à obtenir des mots visuels cibles de l'image à récupérer (S130) ; et la détermination d'un résultat de récupération selon les mots visuels cibles de l'image à récupérer et une bibliothèque d'image de récupération, la bibliothèque d'images de récupération comprenant une pluralité d'images de récupération (S140). De cette façon, l'efficacité et la précision pour récupérer des images peuvent être améliorées.
PCT/CN2017/112956 2017-11-24 2017-11-24 Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images WO2019100348A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780097137.5A CN111373393B (zh) 2017-11-24 2017-11-24 图像检索方法和装置以及图像库的生成方法和装置
PCT/CN2017/112956 WO2019100348A1 (fr) 2017-11-24 2017-11-24 Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/112956 WO2019100348A1 (fr) 2017-11-24 2017-11-24 Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images

Publications (1)

Publication Number Publication Date
WO2019100348A1 true WO2019100348A1 (fr) 2019-05-31

Family

ID=66630527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112956 WO2019100348A1 (fr) 2017-11-24 2017-11-24 Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images

Country Status (2)

Country Link
CN (1) CN111373393B (fr)
WO (1) WO2019100348A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276348A (zh) * 2019-06-20 2019-09-24 腾讯科技(深圳)有限公司 一种图像定位方法、装置、服务器及存储介质
CN112348885A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 视觉特征库的构建方法、视觉定位方法、装置和存储介质
CN113591865A (zh) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 一种回环检测方法、装置以及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114264297B (zh) * 2021-12-01 2022-10-18 清华大学 Uwb和视觉slam融合算法的定位建图方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105844299A (zh) * 2016-03-23 2016-08-10 浙江理工大学 一种基于词袋模型的图像分类方法
CN106250909A (zh) * 2016-07-11 2016-12-21 南京邮电大学 一种基于改进视觉词袋模型的图像分类方法
CN106354735A (zh) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 一种图像中目标的检索方法和装置
CN106407327A (zh) * 2016-08-31 2017-02-15 广州精点计算机科技有限公司 一种基于hog和视觉词袋的相似图像搜索方法和装置
CN106855883A (zh) * 2016-12-21 2017-06-16 中国科学院上海高等研究院 基于视觉词袋模型的人脸图像检索方法
CN106919920A (zh) * 2017-03-06 2017-07-04 重庆邮电大学 基于卷积特征和空间视觉词袋模型的场景识别方法
CN107133640A (zh) * 2017-04-24 2017-09-05 河海大学 基于局部图像块描述子和费舍尔向量的图像分类方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8073818B2 (en) * 2008-10-03 2011-12-06 Microsoft Corporation Co-location visual pattern mining for near-duplicate image retrieval
CN103235955A (zh) * 2013-05-03 2013-08-07 中国传媒大学 一种图像检索中视觉单词的提取方法
CN104424226B (zh) * 2013-08-26 2018-08-24 阿里巴巴集团控股有限公司 一种获得视觉词词典、图像检索的方法及装置
CN103838864B (zh) * 2014-03-20 2017-02-22 北京工业大学 一种视觉显著性与短语相结合的图像检索方法
US9697234B1 (en) * 2014-12-16 2017-07-04 A9.Com, Inc. Approaches for associating terms with image regions
CN104615676B (zh) * 2015-01-20 2018-08-24 同济大学 一种基于最大相似度匹配的图片检索方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354735A (zh) * 2015-07-22 2017-01-25 杭州海康威视数字技术股份有限公司 一种图像中目标的检索方法和装置
CN105844299A (zh) * 2016-03-23 2016-08-10 浙江理工大学 一种基于词袋模型的图像分类方法
CN106250909A (zh) * 2016-07-11 2016-12-21 南京邮电大学 一种基于改进视觉词袋模型的图像分类方法
CN106407327A (zh) * 2016-08-31 2017-02-15 广州精点计算机科技有限公司 一种基于hog和视觉词袋的相似图像搜索方法和装置
CN106855883A (zh) * 2016-12-21 2017-06-16 中国科学院上海高等研究院 基于视觉词袋模型的人脸图像检索方法
CN106919920A (zh) * 2017-03-06 2017-07-04 重庆邮电大学 基于卷积特征和空间视觉词袋模型的场景识别方法
CN107133640A (zh) * 2017-04-24 2017-09-05 河海大学 基于局部图像块描述子和费舍尔向量的图像分类方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276348A (zh) * 2019-06-20 2019-09-24 腾讯科技(深圳)有限公司 一种图像定位方法、装置、服务器及存储介质
CN110276348B (zh) * 2019-06-20 2022-11-25 腾讯科技(深圳)有限公司 一种图像定位方法、装置、服务器及存储介质
CN112348885A (zh) * 2019-08-09 2021-02-09 华为技术有限公司 视觉特征库的构建方法、视觉定位方法、装置和存储介质
CN113591865A (zh) * 2021-07-28 2021-11-02 深圳甲壳虫智能有限公司 一种回环检测方法、装置以及电子设备
CN113591865B (zh) * 2021-07-28 2024-03-26 深圳甲壳虫智能有限公司 一种回环检测方法、装置以及电子设备

Also Published As

Publication number Publication date
CN111373393A (zh) 2020-07-03
CN111373393B (zh) 2022-05-31

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
US20230087526A1 (en) Neural network training method, image classification system, and related device
WO2019001481A1 (fr) Procédé et appareil de recherche de véhicule et d'identification de caractéristique d'aspect de véhicule, support de stockage et dispositif électronique
WO2022111069A1 (fr) Procédé et appareil de traitement d'images, dispositif électronique et support de stockage
CN108734210B (zh) 一种基于跨模态多尺度特征融合的对象检测方法
CN112016638B (zh) 一种钢筋簇的识别方法、装置、设备及存储介质
WO2019018063A1 (fr) Reconnaissance d'image à grain fin
Xia et al. Loop closure detection for visual SLAM using PCANet features
Ali et al. A real-time deformable detector
US20130121600A1 (en) Methods and Apparatus for Visual Search
CN110765860A (zh) 摔倒判定方法、装置、计算机设备及存储介质
CN112861575A (zh) 一种行人结构化方法、装置、设备和存储介质
Ning et al. Occluded person re-identification with deep learning: a survey and perspectives
CN110532413B (zh) 基于图片匹配的信息检索方法、装置、计算机设备
Yang et al. Binary descriptor based nonparametric background modeling for foreground extraction by using detection theory
WO2019100348A1 (fr) Procédé et dispositif de récupération d'images, ainsi que procédé et dispositif de génération de bibliothèques d'images
CN114898266B (zh) 训练方法、图像处理方法、装置、电子设备以及存储介质
CN110909817B (zh) 分布式聚类方法及系统、处理器、电子设备及存储介质
CN115203408A (zh) 一种多模态试验数据智能标注方法
Liao et al. Multi-scale saliency features fusion model for person re-identification
AU2011265494A1 (en) Kernalized contextual feature
US11880405B2 (en) Method for searching similar images in an image database using global values of a similarity measure for discarding partitions of the image database
CN114462479A (zh) 模型训练方法、检索方法以及模型、设备和介质
CN113569934A (zh) Logo分类模型构建方法、系统、电子设备及存储介质
CN112131902A (zh) 闭环检测方法及装置、存储介质和电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17932942

Country of ref document: EP

Kind code of ref document: A1