WO2022227760A1 - 图像检索方法、装置、电子设备及计算机可读存储介质 - Google Patents

图像检索方法、装置、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022227760A1
WO2022227760A1 PCT/CN2022/074951 CN2022074951W WO2022227760A1 WO 2022227760 A1 WO2022227760 A1 WO 2022227760A1 CN 2022074951 W CN2022074951 W CN 2022074951W WO 2022227760 A1 WO2022227760 A1 WO 2022227760A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature point
feature
local
retrieved
Prior art date
Application number
PCT/CN2022/074951
Other languages
English (en)
French (fr)
Inventor
杨敏
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2022227760A1 publication Critical patent/WO2022227760A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, and in particular, to an image retrieval method, apparatus, electronic device, and computer-readable storage medium in the fields of computer vision and deep learning.
  • image retrieval technology has been applied to all aspects of life, such as commodity retrieval, landmark retrieval and so on.
  • the following specific implementation methods can be adopted: perform global feature extraction on each image in the gallery (for ease of expression, it is referred to as the image to be selected), and the global feature usually refers to a descriptor extracted for the entire image.
  • the global features of the images to be retrieved can also be extracted, and the global features of the images to be retrieved can be compared with the global features of the images to be selected, and the retrieval results can be determined according to the comparison results.
  • the image to be retrieved is a part of the image, and the image to be selected in the gallery is a complete image.
  • the retrieval After the retrieval is performed according to the above method, the retrieval The effect is usually poor.
  • the present disclosure provides an image retrieval method, apparatus, electronic device, and computer-readable storage medium.
  • An image retrieval method comprising:
  • the selected image to be selected is checked to obtain the image to be selected as the retrieval result.
  • An image retrieval device comprising: an acquisition module, a screening module and a verification module;
  • the acquisition module is used to acquire local features of the image to be retrieved
  • the screening module is configured to screen out the image to be selected that matches the image to be retrieved from the gallery according to the local features of the image to be retrieved and the local features of each image to be selected in the gallery;
  • the verification module is used for verifying the selected images to be selected according to the images to be retrieved and the local features of the selected images to be selected to obtain the images to be selected as the retrieval results.
  • An electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
  • a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described above.
  • a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
  • An embodiment in the above disclosure has the following advantages or beneficial effects: image retrieval can be realized based on the local features of the image, which has good applicability to various scenarios, thereby improving the retrieval effect, that is, improving the retrieval result. Moreover, a two-level retrieval mechanism based on local features of screening and verification is adopted, which further improves the accuracy of retrieval results.
  • FIG. 1 is a flowchart of the first embodiment of the image retrieval method described in the present disclosure
  • FIG. 2 is a flowchart of a second embodiment of the image retrieval method described in the present disclosure
  • FIG. 3 is a schematic diagram of the composition and structure of the embodiment 300 of the image retrieval apparatus according to the disclosure.
  • FIG. 4 shows a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure.
  • FIG. 1 is a flowchart of the first embodiment of the image retrieval method described in the present disclosure. As shown in Figure 1, the following specific implementations are included.
  • step 101 local features of the image to be retrieved are acquired.
  • step 102 according to the local features of the images to be retrieved and the local features of the images to be selected in the gallery, the images to be selected that match the images to be retrieved are screened out from the gallery.
  • step 103 according to the image to be retrieved and the local features of the selected image to be selected, the selected image to be selected is verified, and the image to be selected as the retrieval result is obtained.
  • the image retrieval can be realized based on the local features of the image, which has good applicability to various scenarios, thereby improving the retrieval effect, that is, improving the accuracy of the retrieval results.
  • a two-level retrieval mechanism based on local features of screening and verification is adopted, which further improves the accuracy of retrieval results.
  • each candidate image in the gallery its local features can be acquired separately in advance, and can be stored in a certain format.
  • feature point extraction can be performed on the candidate image respectively, and local features corresponding to the extracted feature points can be obtained separately, such as the local features of P bits, where P is greater than one
  • the positive integer of , and the specific value can be determined according to actual needs, such as 32.
  • How to extract the feature points of the image is not limited, for example, various existing feature point extraction methods can be used. For two different images, the number of feature points extracted from them may be the same or different.
  • the local feature in the method described in the present disclosure can be a quantized feature, as described above.
  • the local features of the P bits can be obtained respectively.
  • the local features can be deep features obtained based on convolutional neural networks (CNN, Convolutional Neural Networks), which can be obtained by means of pre-trained convolutional neural networks. network model to obtain the local features.
  • CNN convolutional neural networks
  • CNN Convolutional Neural Networks
  • features can be divided into traditional features and deep features.
  • Traditional features can include Scale-Invariant Feature Transform (SIFT, Scale-Invariant Feature Transform), etc.
  • Deep features are mainly based on high-dimensional features obtained by CNN. Compared with traditional features, depth features contain more semantic information, etc. Accordingly, using depth features as local features in the method of the present disclosure can improve the accuracy of subsequent retrieval results.
  • the image to be retrieved can also be processed in the above manner, that is, feature points of the image to be retrieved can be extracted, and local features corresponding to the extracted feature points can be obtained separately, such as local features of P bits.
  • an inverted index between each feature point and the image to be selected including the feature point can also be established, that is, the corresponding relationship between each feature point and the image to be selected including the feature point can be established.
  • a candidate image including this feature point can be located.
  • the image to be selected that matches the image to be retrieved can be screened out from the gallery.
  • the following processing can be performed respectively: the feature point is regarded as the feature point to be processed; according to the local features corresponding to the feature points to be processed and the corresponding feature points in the inverted index Local features, screen out the feature points that are the same feature point as the feature point to be processed from the feature points in the inverted index; take the candidate image corresponding to the selected feature point as the candidate image matching the image to be retrieved .
  • the content of the 1st to M bits in the local feature corresponding to the feature point can be compared with the content of the 1st to M bits in the local feature corresponding to the feature point to be processed. Yes, according to the comparison result, determine whether the feature point and the feature point to be processed are the same feature point.
  • M is a positive integer greater than one and less than P, and the specific value can be determined according to actual needs.
  • the feature can be determined The point and the feature point to be processed are the same feature point.
  • the specific value of the above-mentioned threshold can be determined according to actual needs. For example, if the value of the threshold is 1, then for a feature point, the content of bits 1 to M in the local feature corresponding to the feature point is the same as the feature to be processed. If the Hamming distance between the 1st to M bits in the local feature corresponding to the point is smaller than the predetermined threshold, it means that the 1st to M bits in the local feature corresponding to the feature point and the local feature corresponding to the feature point to be processed The contents of bits 1 to M are the same, and if the threshold value is 2, it means that the contents of bits 1 to M in the local feature corresponding to the feature point are the same as the bits 1 to M in the local feature corresponding to the feature point to be processed. There can be at most one bit difference between M bits of content.
  • the threshold can take a value of 1, that is, if the content of the 1st to M bits in the local feature corresponding to the feature point is the same as the content of the 1st to M bits in the local feature corresponding to the feature point to be processed, it can be It is determined that the feature point and the feature point to be processed are the same feature point.
  • the image to be retrieved includes 3 feature points, namely feature a, feature point b and feature point c, and assume that there are 50 feature points for establishing an inverted index (actually it may be much larger than this, here is just an example) , which are feature point 1 to feature point 50, respectively.
  • the feature points that are the same as the feature point a can be selected from the feature points 1 to 50 .
  • the content of bits 1 to M in the local feature corresponding to feature point 1 can be compared with the content of bits 1 to M in the local feature corresponding to feature point a. If the content of the 1st to M bits in the local feature is the same as the content of the 1st to M bits in the local feature corresponding to the feature point a, it can be determined that the feature point 1 and the feature point a are the same feature point, otherwise, they are different feature points, That is to say, the feature-to-feature comparison can use the exclusive OR operation.
  • the feature point that is the same feature point as the feature point b can be selected from the feature point 1 to the feature point 50 .
  • the image to be selected corresponding to the feature point 10 can be used as the selected image that matches the image to be retrieved. It is assumed that the number of images to be selected this time is 2.
  • the feature points that are the same feature point as the feature point c can be screened out from the feature point 1 to the feature point 50 .
  • the image to be selected corresponding to the feature point 20 can be used as the selected image that matches the image to be retrieved. It is assumed that the number of images to be selected this time is 4.
  • each feature point in the image to be retrieved can be used to recall the image to be selected, thereby improving the coverage of the recall, etc.
  • the inverted index can be used to carry out the retrieval of the image to be selected.
  • the recall improves the recall efficiency, etc.
  • the recall of candidate images can be performed by comparing part of the content in the local features, thereby improving the comparison speed and further improving the recall efficiency.
  • the selected images to be selected After filtering out the images to be selected that match the images to be retrieved from the gallery, the selected images to be selected can be verified according to the images to be retrieved and the local features of the selected images to be retrieved, and the retrieved images to be retrieved can be obtained as the retrieval result. Choose an image.
  • the following processing may be performed respectively: according to the local features corresponding to the feature points in the image to be selected and the local features corresponding to the feature points in the image to be retrieved, from the image to be selected
  • the feature points that meet the predetermined requirements are screened out from the feature points in , and the predetermined requirements include: being the same feature point as a feature point in the image to be retrieved; taking the number of feature points screened out as the score of the image to be selected.
  • the screened images to be selected can be sorted in descending order of scores, and the images to be selected in the top Q positions after the sorting are used as the retrieval results, where Q is a positive integer and is smaller than the selected images to be screened out. quantity.
  • Q is a positive integer and is smaller than the selected images to be screened out.
  • the specific value of Q can be determined according to actual needs, for example, it can be 1 or greater than one.
  • the feature when a feature point that meets the predetermined requirements is selected from the feature points in the image to be selected, for any feature point in the image to be selected, the feature The content of the N-P bits in the local feature corresponding to the point is compared with the content of the N-P bits in the local features corresponding to each feature point in the image to be retrieved, and according to the comparison result, it is determined whether the feature point conforms to the predetermined It is required that N is a positive integer greater than one and less than P, and the specific value can be determined according to actual needs.
  • the Hamming distance between the N-P-th bit content in the local feature corresponding to the feature point and the N-P-th bit content in the local feature corresponding to any feature point in the image to be retrieved is less than a predetermined threshold. Then it can be determined that the feature point meets the predetermined requirements.
  • the specific value of the threshold may be determined according to actual needs.
  • the image to be retrieved includes 3 feature points, namely feature a, feature point b and feature point c, and suppose that 9 candidate images matching the image to be retrieved are screened out from the gallery, which are the images to be selected respectively. 1 to image 9 to be selected.
  • image 1 it is assumed that it also includes three feature points, namely feature point 1, feature point 2, and feature point 3.
  • feature point 1 the Hamming distance between the N-P-th bit content in the local feature corresponding to feature point 1 and the N-P-th bit content in the local feature corresponding to feature point a can be calculated respectively, and the feature point 1
  • the Hamming distance between the N-Pth bit content in the corresponding local feature and the N-Pth bit content in the local feature corresponding to feature point b, and the N-Pth bit content in the local feature corresponding to feature point 1 The Hamming distance between the contents of the Nth to Pth bits in the local feature corresponding to the feature point c, it is assumed that the contents of the Nth to Pth bits in the local feature corresponding to the feature point 1 are the same as that of the local feature corresponding to the feature point c.
  • the Hamming distance between the N-P bit contents is less than the predetermined threshold, then it can be considered that the feature point 1 meets the predetermined requirement, that is, the feature point 1 and the feature point c are the same feature point.
  • feature point 2 and feature point 3 can be processed separately, assuming that feature point 2 also meets the predetermined requirements and is the same feature point as feature point a, but feature point 3 does not meet the predetermined requirements. Then, it can be determined that the number of feature points selected from the feature points in the image 1 to be selected that meet the predetermined requirements is 2, and accordingly, the score of the image 1 to be selected can be 2.
  • the scores of the images to be selected 2 to the images to be selected 9 can be obtained respectively, and the images to be selected 1 to the images to be selected 9 can be sorted according to the order of the scores from high to low, and then the ranking can be ranked first. bit candidate image as the desired retrieval result.
  • the selected images to be selected are further verified based on the local features, thereby further improving the accuracy of the retrieval results. This improves the comparison speed and further improves the retrieval efficiency.
  • FIG. 2 is a flowchart of a second embodiment of the image retrieval method described in the present disclosure. As shown in FIG. 2 , the following specific implementations are included.
  • step 201 feature points are extracted for each candidate image in the gallery, and 32-bit local features corresponding to the extracted feature points are obtained respectively.
  • the local features may be deep features obtained based on a convolutional neural network.
  • step 202 for the gallery, an inverted index between each feature point and a candidate image including the feature point is established.
  • step 203 feature point extraction is performed on the image to be retrieved, and 32-bit local features corresponding to the extracted feature points are obtained respectively.
  • step 204 for each feature point in the image to be retrieved, the processes shown in steps 205 to 206 are performed respectively.
  • the feature point is used as the feature point to be processed, and for each feature point in the inverted index, the contents of the first 24 bits (ie the first to 24th bits) in the local feature corresponding to the feature point are respectively combined with The contents of the first 24 bits in the local features corresponding to the feature points to be processed are compared. If the contents of the first 24 bits in the local features corresponding to any feature point are the same as the contents of the first 24 bits in the local features corresponding to the feature points to be processed, Then, the feature point is used as the screened feature point that is the same feature point as the feature point to be processed.
  • step 206 the image to be selected corresponding to the screened feature points is used as the selected image to be selected that matches the image to be retrieved.
  • step 207 the processing shown in steps 208 to 209 is respectively performed for each image to be selected after screening.
  • step 208 for each feature point in the image to be selected, the content of the last 24 bits in the local feature corresponding to the feature point and the content of the last 24 bits in the local feature corresponding to each feature point in the image to be retrieved are respectively The contents are compared. If the Hamming distance between the last 24 bits of content in the local feature corresponding to the feature point and the last 24 bits of content in the local feature corresponding to any feature point in the image to be retrieved is less than the predetermined threshold, then It is determined that the feature point meets the predetermined requirements.
  • step 209 the number of feature points that meet the predetermined requirements is used as the score of the image to be selected.
  • step 210 the selected images to be selected are sorted in descending order of scores, and the image to be selected at the first position after the sorting is used as the retrieval result.
  • FIG. 3 is a schematic diagram of the composition and structure of an embodiment 300 of an image retrieval apparatus according to the present disclosure. As shown in FIG. 3 , it includes an acquisition module 301 , a screening module 302 and a verification module 303 .
  • the acquiring module 301 is used for acquiring local features of the image to be retrieved.
  • the screening module 302 is configured to screen out the images to be selected that match the images to be retrieved from the gallery according to the local features of the images to be retrieved and the local features of the images to be retrieved in the gallery.
  • the verification module 303 is configured to verify the selected images to be selected according to the images to be retrieved and the local features of the selected images to be selected, and obtain the images to be selected as the retrieval result.
  • the obtaining module 301 may extract feature points from the image to be retrieved, and obtain local features corresponding to the extracted feature points, such as P-bit local features, where P is a positive integer greater than one.
  • the acquisition module 301 may further perform feature point extraction on each candidate image in the gallery in advance, and separately acquire local features corresponding to the extracted feature points, such as local features of P bits.
  • the local features may be deep features obtained based on a convolutional neural network.
  • the acquisition module 301 may further establish an inverted index between each feature point and the image to be selected including the feature point for the gallery.
  • the screening module 302 selects the images to be selected that match the images to be retrieved from the gallery according to the local features of the images to be retrieved and the local features of the images to be retrieved For any feature point, perform the following processing respectively: take the feature point as the feature point to be processed; The feature points that are the same feature point as the feature point to be processed are screened out from the feature points; the candidate image corresponding to the screened feature point is used as the screened candidate image matching the to-be-retrieved image.
  • the screening module 302 may, for any feature point in the inverted index, respectively select the 1st to M bits in the local feature corresponding to the feature point and the 1st to M bits in the local feature corresponding to the feature point to be processed. Compare the contents, and determine whether the feature point and the feature point to be processed are the same feature point according to the comparison result, where M is a positive integer greater than one and less than P.
  • the screening module 302 determines that the Hamming distance between the 1st to M bits in the local feature corresponding to the feature point and the 1st to M bits in the local feature corresponding to the feature point to be processed is smaller than the predetermined threshold, It can be determined that the feature point and the feature point to be processed are the same feature point.
  • the verification module 303 can verify the selected images to be selected according to the images to be retrieved and the local features of the selected images to be selected, and obtain the images to be selected as the retrieval results.
  • the verification module 303 may perform the following processing for any selected image to be selected: according to the local features corresponding to the feature points in the to-be-selected image and the local features corresponding to the feature points in the Feature points that meet predetermined requirements are screened out of the feature points in the image to be selected. Meeting the predetermined requirements includes: being the same feature point as a feature point in the image to be retrieved; score.
  • the verification module 303 can sort the selected images in the order of the scores from high to low, and use the sorted images in the top Q positions as the retrieval result, where Q is a positive integer and is smaller than the selected images. number of images to select.
  • the verification module 303 may, for any feature point in the to-be-selected image, respectively, respectively, the N-Pth bit content in the local feature corresponding to the feature point and the local feature corresponding to each feature point in the to-be-retrieved image.
  • the contents of the Nth to Pth bits are compared, and whether the feature point meets the predetermined requirement is determined according to the comparison result, where N is a positive integer greater than one and less than P.
  • the verification module 303 determines the Hamming difference between the N-P-th bit content in the local feature corresponding to the feature point and the N-P-th bit content in the local feature corresponding to any feature point in the image to be retrieved When the distance is smaller than the predetermined threshold, it can be determined that the feature point meets the predetermined requirement.
  • the image retrieval can be realized based on the local features of the images, which has good applicability to various scenarios, thereby improving the retrieval effect, that is, the accuracy of the retrieval results.
  • a two-level retrieval mechanism based on local feature screening and verification is adopted, which further improves the accuracy of retrieval results.
  • solutions described in the present disclosure can be applied to the field of artificial intelligence, especially to the fields of computer vision and deep learning, and can be applied to image retrieval scenarios.
  • AI hardware technologies generally include Sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies
  • artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology and machine learning/deep learning, big data processing technology, Knowledge graph technology and other major directions.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 4 shows a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 400 includes a computing unit 401 that can be executed according to a computer program stored in a read only memory (ROM) 402 or loaded from a storage unit 408 into a random access memory (RAM) 403 Various appropriate actions and handling. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored.
  • the computing unit 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • Various components in the device 400 are connected to the I/O interface 405, including: an input unit 406, such as a keyboard, mouse, etc.; an output unit 407, such as various types of displays, speakers, etc.; a storage unit 408, such as a magnetic disk, an optical disk, etc. ; and a communication unit 409, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 409 allows the device 400 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • Computing unit 401 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 401 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 401 performs the various methods and processes described above, such as the methods described in this disclosure.
  • the methods described in this disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 408.
  • part or all of the computer program may be loaded and/or installed on device 400 via ROM 402 and/or communication unit 409 .
  • ROM 402 and/or communication unit 409 When a computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the methods described in this disclosure may be performed.
  • the computing unit 401 may be configured by any other suitable means (eg, by means of firmware) to perform the methods described in this disclosure.
  • Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC systems on chips system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and virtual private server (VPS), which are difficult to manage and expand business. Sexual weakness.
  • the server can also be a server of a distributed system, or a server combined with a blockchain. Cloud computing refers to accessing elastically scalable shared physical or virtual resource pools through the network.
  • Resources can include servers, operating systems, networks, software, applications, and storage devices, etc., and can be used on-demand and self-service.
  • the technical system for deployment and management, through cloud computing technology, can provide efficient and powerful data processing capabilities for technical applications such as artificial intelligence, blockchain, and model training.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种图像检索方法、装置、电子设备及计算机可读存储介质,涉及计算机视觉和深度学习等人工智能领域,其中的方法包括:获取待检索图像的局部特征(101);根据待检索图像的局部特征及图库中的各待选图像的局部特征,从图库中筛选出与待检索图像相匹配的待选图像(102);根据待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像(103)。上述方法可提升检索结果的准确性。

Description

图像检索方法、装置、电子设备及计算机可读存储介质
本申请要求了申请日为2021年04月28日,申请号为202110468934.8发明名称为“图像检索方法、装置、电子设备及计算机可读存储介质”的中国专利申请的优先权。
技术领域
本公开涉及人工智能技术领域,特别涉及计算机视觉和深度学习等领域的图像检索方法、装置、电子设备及计算机可读存储介质。
背景技术
目前,图像检索技术已经应用到了生活中的方方面面,如商品检索,地标检索等。
可采用以下具体实现方式:分别对图库中的各图像(为便于表述,将其称为待选图像)进行全局特征提取,全局特征通常是指针对整个图像提取出的一个描述子,针对待检索图像,也可对其进行全局特征提取,并可将待检索图像的全局特征与各待选图像的全局特征进行比对,根据比对结果确定出检索结果。
上述方式实现起来简单方便,但也存在一定的局限性,比如,待检索图像为图像的一部分,而图库中的待选图像为完整的图像,这种场景下,按照上述方式进行检索后,检索效果通常都会较差。
发明内容
本公开提供了图像检索方法、装置、电子设备及计算机可读存储介质。
一种图像检索方法,包括:
获取待检索图像的局部特征;
根据所述待检索图像的局部特征及图库中的各待选图像的局部特征,从所述图库中筛选出与所述待检索图像相匹配的待选图像;
根据所述待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
一种图像检索装置,包括:获取模块、筛选模块以及校验模块;
所述获取模块,用于获取待检索图像的局部特征;
所述筛选模块,用于根据所述待检索图像的局部特征及图库中的各待选图像的局部特征,从所述图库中筛选出与所述待检索图像相匹配的待选图像;
所述校验模块,用于根据所述待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如以上所述的方法。
一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使计算机执行如以上所述的方法。
一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如以上所述的方法。
上述公开中的一个实施例具有如下优点或有益效果:可基于图像的局部特征来实现图像的检索,对于各种场景均有较好的适用性,从而提升了检索效果,即提升了检索结果的准确性,而且,采用了基于局部特征的筛选和校验两级检索机制,进而进一步提升了检索结果的准确性等。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1为本公开所述图像检索方法第一实施例的流程图;
图2为本公开所述图像检索方法第二实施例的流程图;
图3为本公开所述图像检索装置实施例300的组成结构示意图;
图4示出了可以用来实施本公开的实施例的示例电子设备400的示 意性框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
另外,应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
图1为本公开所述图像检索方法第一实施例的流程图。如图1所示,包括以下具体实现方式。
在步骤101中,获取待检索图像的局部特征。
在步骤102中,根据待检索图像的局部特征及图库中的各待选图像的局部特征,从图库中筛选出与待检索图像相匹配的待选图像。
在步骤103中,根据待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
可以看出,上述方法实施例所述方案中,可基于图像的局部特征来实现图像的检索,对于各种场景均有较好的适用性,从而提升了检索效果,即提升了检索结果的准确性,而且,采用了基于局部特征的筛选和校验两级检索机制,进而进一步提升了检索结果的准确性等。
对于图库中的各待选图像,可预先分别获取其局部特征,并可按照一定的格式进行存储。比如,针对每个待选图像,可分别对该待选图像进行特征点提取,并可分别获取提取出的各特征点对应的局部特征,如P比特(bit)的局部特征,P为大于一的正整数,具体取值可根据实际需要而定,如32。
如何提取图像的特征点不作限制,如可采用现有的各种特征点提取方法。对于不同的两个图像,从中提取出的特征点的数量可能相同,也可能不同。
针对提取出的各特征点,可分别获取其对应的局部特征,即对应的描述子,该描述子通常是固定维度的,本公开所述方法中的局部特征可为量化后的特征,如上述的P比特的局部特征。
如何获取各特征点对应的局部特征同样不作限制,比如,所述局部特征可为基于卷积神经网络(CNN,Convolutional Neural Networks)获取到的深度特征,即可借助于预先训练得到的卷积神经网络模型来获取所述局部特征。
从时间角度来看,特征可分为传统特征和深度特征,传统特征可包括尺度不变特征变换(SIFT,Scale-Invariant Feature Transform)等,深度特征主要是基于CNN获取到的高维特征。相比于传统特征,深度特征中包含了更多的语义信息等,相应地,采用深度特征作为本公开所述方法中的局部特征,可提升后续检索结果的准确性等。
对于待检索图像,也可按照上述方式进行处理,即可对待检索图像进行特征点提取,并可分别获取提取出的各特征点对应的局部特征,如P比特的局部特征。
针对所述图库,还可建立各特征点到包括特征点的待选图像之间的倒排索引,即可建立各特征点与包括特征点的待选图像之间的对应关系等,通过特征点可以定位到包括这个特征点的待选图像。
相应地,可结合所建立的倒排索引,根据待检索图像及待选图像的局部特征,从图库中筛选出与待检索图像相匹配的待选图像。
具体地,针对待检索图像中的任一特征点,可分别执行以下处理:将该特征点作为待处理特征点;根据待处理特征点对应的局部特征以及倒排索引中的各特征点对应的局部特征,从倒排索引中的各特征点中筛选出与待处理特征点为同一特征点的特征点;将筛选出的特征点对应的待选图像作为与待检索图像相匹配的待选图像。
其中,针对倒排索引中的任一特征点,可分别将该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容进行比对,根据比对结果确定出该特征点与待处理特征点是否为同一特征点。M为大于一的正整数,且小于P,具体取值可根据实际需要而定。
比如,若该特征点对应的局部特征中的第1~M比特内容与待处理特 征点对应的局部特征中的第1~M比特内容之间的汉明距离小于预定阈值,则可确定该特征点与待处理特征点为同一特征点。
上述阈值的具体取值可根据实际需要而定,比如,若阈值的取值为1,那么对于一个特征点来说,该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容之间的汉明距离小于预定阈值,则是指该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容相同,若阈值的取值为2,则是指该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容之间最多可以有一位不同。
优选地,所述阈值可取值为1,即若该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容相同,则可确定该特征点与待处理特征点为同一特征点。
上述过程可举例说明如下:
假设待检索图像中包括3个特征点,分别为特征a、特征点b和特征点c,并假设建立倒排索引的特征点为50个(实际可能远大于此,此处仅为举例说明),分别为特征点1~特征点50。
针对特征点a,可从特征点1~特征点50中筛选出与特征点a为同一特征点的特征点。比如,针对特征点1,可将特征点1对应的局部特征中的第1~M比特内容与特征点a对应的局部特征中的第1~M比特内容进行比对,若特征点1对应的局部特征中的第1~M比特内容与特征点a对应的局部特征中的第1~M比特内容相同,则可确定特征点1与特征点a为同一特征点,反之,为不同特征点,也就是说,特征与特征的比对可采用异或操作,只有当两个特征点对应的局部特征的第1~M比特内容完全相同(一位不差)时,才会认为两个特征点是同一个特征点,按照同样的方式,可分别对特征点1之外的其它特征点进行处理。假设特征点5与特征点a为同一特征点,那么可将特征点5对应的待选图像作为筛选出的与待检索图像相匹配的待选图像,假设此次筛选出的待选图像数量为3。
按照与特征点a同样的方式,针对特征点b,可从特征点1~特征点50中筛选出与特征点b为同一特征点的特征点。假设特征点10与特征 点b为同一特征点,那么可将特征点10对应的待选图像作为筛选出的与待检索图像相匹配的待选图像,假设此次筛选出的待选图像数量为2。
按照与特征点a同样的方式,针对特征点c,可从特征点1~特征点50中筛选出与特征点c为同一特征点的特征点。假设特征点20与特征点c为同一特征点,那么可将特征点20对应的待选图像作为筛选出的与待检索图像相匹配的待选图像,假设此次筛选出的待选图像数量为4。
通过上述处理,针对待检索图像,共可从图库中筛选出9(3+2+4)个与待检索图像相匹配的待选图像。
可以看出,上述处理方式中,可分别利用待检索图像中的各特征点进行待选图像的召回,从而提升了召回的覆盖率等,而且,可借助于倒排索引来进行待选图像的召回,从而提升了召回效率等,再有,可通过对局部特征中的部分内容进行比对来进行候选图像的召回,从而提升了比对速度,进而进一步提升了召回效率等。
从图库中筛选出与待检索图像相匹配的待选图像后,可根据待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
具体地,针对筛选出的任一待选图像,可分别执行以下处理:根据该待选图像中的特征点对应的局部特征以及待检索图像中的特征点对应的局部特征,从该待选图像中的特征点中筛选出符合预定要求的特征点,符合预定要求包括:与待检索图像中的一个特征点为同一特征点;将筛选出的特征点数量作为该待选图像的评分。
进一步地,可按照评分由高到低的顺序对筛选出的待选图像进行排序,将排序后处于前Q位的待选图像作为检索结果,Q为正整数,且小于筛选出的待选图像数量。Q的具体取值可根据实际需要而定,比如,可以为1,也可以大于一。
其中,针对筛选出的任一待选图像,在从该待选图像中的特征点中筛选出符合预定要求的特征点时,可针对该待选图像中的任一特征点,分别将该特征点对应的局部特征中的第N~P比特内容与待检索图像中的各特征点对应的局部特征中的第N~P比特内容进行比对,根据比对结果确定出该特征点是否符合预定要求,N为大于一的正整数,且小于P,具体取值可根据实际需要而定。
比如,若该特征点对应的局部特征中的第N~P比特内容与待检索图像中的任一特征点对应的局部特征中的第N~P比特内容之间的汉明距离小于预定阈值,则可确定该特征点符合预定要求。同样地,所述阈值的具体取值可根据实际需要而定。
上述过程可举例说明如下:
假设待检索图像中包括3个特征点,分别为特征a、特征点b和特征点c,并假设从图库中筛选出了9个与待检索图像相匹配的待选图像,分别为待选图像1~待选图像9。
以待选图像1为例,假设其中也包括3个特征点,分别为特征点1、特征点2和特征点3。那么针对特征点1,可分别计算特征点1对应的局部特征中的第N~P比特内容与特征点a对应的局部特征中的第N~P比特内容之间的汉明距离、特征点1对应的局部特征中的第N~P比特内容与特征点b对应的局部特征中的第N~P比特内容之间的汉明距离以及特征点1对应的局部特征中的第N~P比特内容与特征点c对应的局部特征中的第N~P比特内容之间的汉明距离,假设特征点1对应的局部特征中的第N~P比特内容与特征点c对应的局部特征中的第N~P比特内容之间的汉明距离小于预定阈值,那么则可认为特征点1符合预定要求,即特征点1与特征点c为同一特征点。按照同样的方式,可分别对特征点2和特征点3进行处理,假设特征点2也符合预定要求,与特征点a为同一特征点,但特征点3不符合预定要求。那么则可确定从待选图像1中的特征点中筛选出的符合预定要求的特征点数量为2,相应地,待选图像1的评分可为2。
按照同样的方式,可分别得到待选图像2~待选图像9的评分,并可按照评分由高到低的顺序对待选图像1~待选图像9进行排序,进而可将排序后处于第一位的待选图像作为所需的检索结果。
上述处理方式中,基于局部特征对筛选出的待选图像进行了进一步校验,从而进一步提升了检索结果的准确性等,而且,可通过对局部特征中的部分内容进行比对来实现所述校验,从而提升了比对速度,进而进一步提升了检索效率等。
图2为本公开所述图像检索方法第二实施例的流程图。如图2所示,包括以下具体实现方式。
在步骤201中,针对图库中的各待选图像,分别进行特征点提取,并分别获取提取出的各特征点对应的32比特的局部特征。
所述局部特征可为基于卷积神经网络获取到的深度特征。
在步骤202中,针对图库,建立各特征点到包括特征点的待选图像之间的倒排索引。
在步骤203中,对待检索图像进行特征点提取,并分别获取提取出的各特征点对应的32比特的局部特征。
在步骤204中,针对待检索图像中的每个特征点,分别执行步骤205~步骤206所示处理。
在步骤205中,将该特征点作为待处理特征点,针对倒排索引中的每个特征点,分别将该特征点对应的局部特征中的前24比特(即第1~24比特)内容与待处理特征点对应的局部特征中的前24比特内容进行比对,若任一特征点对应的局部特征中的前24比特内容与待处理特征点对应的局部特征中的前24比特内容相同,则将该特征点作为筛选出的与待处理特征点为同一特征点的特征点。
在步骤206中,将筛选出的特征点对应的待选图像作为筛选出的与待检索图像相匹配的待选图像。
在步骤207中,针对筛选出的每个待选图像,分别执行步骤208-步骤209所示处理。
在步骤208中,针对该待选图像中的每个特征点,分别将该特征点对应的局部特征中的后24比特内容与待检索图像中的各特征点对应的局部特征中的后24比特内容进行比对,若该特征点对应的局部特征中的后24比特内容与待检索图像中的任一特征点对应的局部特征中的后24比特内容之间的汉明距离小于预定阈值,则确定该特征点符合预定要求。
在步骤209中,将符合预定要求的特征点数量作为该待选图像的评分。
在步骤210中,按照评分由高到低的顺序对筛选出的待选图像进行排序,将排序后处于第一位的待选图像作为检索结果。
需要说明的是,对于前述的各方法实施例,为了简单描述,将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其它 顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。另外,某个实施例中没有详述的部分,可以参见其它实施例中的相关说明。
以上是关于方法实施例的介绍,以下通过装置实施例,对本公开所述方案进行进一步说明。
图3为本公开所述图像检索装置实施例300的组成结构示意图。如图3所示,包括:获取模块301、筛选模块302以及校验模块303。
获取模块301,用于获取待检索图像的局部特征。
筛选模块302,用于根据待检索图像的局部特征及图库中的各待选图像的局部特征,从图库中筛选出与待检索图像相匹配的待选图像。
校验模块303,用于根据待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
其中,获取模块301可对待检索图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征,如P比特的局部特征,P为大于一的正整数。
获取模块301还可预先分别对图库中的各待选图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征,如P比特的局部特征。
所述局部特征可为基于卷积神经网络获取到的深度特征。
进一步地,获取模块301还可针对图库,建立各特征点到包括特征点的待选图像之间的倒排索引。
相应地,筛选模块302在根据待检索图像的局部特征及图库中的各待选图像的局部特征,从图库中筛选出与待检索图像相匹配的待选图像时,可针对待检索图像中的任一特征点,分别执行以下处理:将该特征点作为待处理特征点;根据待处理特征点对应的局部特征以及倒排索引中的各特征点对应的局部特征,从倒排索引中的各特征点中筛选出与待处理特征点为同一特征点的特征点;将筛选出的特征点对应的待选图像作为筛选出的与待检索图像相匹配的待选图像。
其中,筛选模块302可针对倒排索引中的任一特征点,分别将该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容进行比对,根据比对结果确定出该特征点与待 处理特征点是否为同一特征点,M为大于一的正整数,且小于P。
比如,筛选模块302在确定该特征点对应的局部特征中的第1~M比特内容与待处理特征点对应的局部特征中的第1~M比特内容之间的汉明距离小于预定阈值时,可确定该特征点与待处理特征点为同一特征点。
进一步地,校验模块303可根据待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
具体地,校验模块303可针对筛选出的任一待选图像,分别执行以下处理:根据该待选图像中的特征点对应的局部特征以及待检索图像中的特征点对应的局部特征,从该待选图像中的特征点中筛选出符合预定要求的特征点,符合预定要求包括:与待检索图像中的一个特征点为同一特征点;将筛选出的特征点数量作为该待选图像的评分。
相应地,校验模块303可按照评分由高到低的顺序对筛选出的待选图像进行排序,将排序后处于前Q位的待选图像作为检索结果,Q为正整数,且小于筛选出的待选图像数量。
其中,校验模块303可针对待选图像中的任一特征点,分别将该特征点对应的局部特征中的第N~P比特内容与待检索图像中的各特征点对应的局部特征中的第N~P比特内容进行比对,根据比对结果确定出该特征点是否符合预定要求,N为大于一的正整数,且小于P。
比如,校验模块303在确定该特征点对应的局部特征中的第N~P比特内容与待检索图像中的任一特征点对应的局部特征中的第N~P比特内容之间的汉明距离小于预定阈值时,可确定该特征点符合预定要求。
图3所示装置实施例的具体工作流程请参照前述方法实施例中的相关说明,不再赘述。
总之,采用本公开装置实施例所述方案,可基于图像的局部特征来实现图像的检索,对于各种场景均有较好的适用性,从而提升了检索效果,即提升了检索结果的准确性,而且,采用了基于局部特征的筛选和校验两级检索机制,进而进一步提升了检索结果的准确性等。
本公开所述方案可应用于人工智能领域,特别涉及计算机视觉和深度学习等领域,可应用于图像检索场景下。
人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如 学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术,人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术,人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图4示出了可以用来实施本公开的实施例的示例电子设备400的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图4所示,设备400包括计算单元401,其可以根据存储在只读存储器(ROM)402中的计算机程序或者从存储单元408加载到随机访问存储器(RAM)403中的计算机程序,来执行各种适当的动作和处理。在RAM 403中,还可存储设备400操作所需的各种程序和数据。计算单元401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。
设备400中的多个部件连接至I/O接口405,包括:输入单元406,例如键盘、鼠标等;输出单元407,例如各种类型的显示器、扬声器等;存储单元408,例如磁盘、光盘等;以及通信单元409,例如网卡、调制解调器、无线通信收发机等。通信单元409允许设备400通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元401可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元401的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元401执行上文所描述的各个方法和处理,例如本公开所述的方法。例如,在一些实施例中,本 公开所述的方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元408。在一些实施例中,计算机程序的部分或者全部可以经由ROM 402和/或通信单元409而被载入和/或安装到设备400上。当计算机程序加载到RAM 403并由计算单元401执行时,可以执行本公开所述的方法的一个或多个步骤。备选地,在其他实施例中,计算单元401可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行本公开所述的方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只 读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS)中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。云计算指的是通过网络接入弹性可扩展的共享物理或虚拟资源池,资源可以包括服务器、操作系统、网络、软件、应用和存储设备等,并可以以按需、自服务的方式对资源进行部署和管理的技术体系,通过云计算技术,可以为人工智能、区块链等技术应用、模型训练提供高效强大的数据处理能力。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (21)

  1. 一种图像检索方法,包括:
    获取待检索图像的局部特征;
    根据所述待检索图像的局部特征及图库中的各待选图像的局部特征,从所述图库中筛选出与所述待检索图像相匹配的待选图像;
    根据所述待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
  2. 根据权利要求1所述的方法,其中,
    所述获取待检索图像的局部特征包括:对所述待检索图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征;
    所述方法还包括:预先分别对所述图库中的各待选图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征。
  3. 根据权利要求2所述的方法,其中,
    所述局部特征包括:基于卷积神经网络获取到的深度特征。
  4. 根据权利要求2或3所述的方法,还包括:针对所述图库,建立各特征点到包括特征点的待选图像之间的倒排索引;
    其中,所述从所述图库中筛选出与所述待检索图像相匹配的待选图像包括:针对所述待检索图像中的任一特征点,分别执行以下处理:
    将所述特征点作为待处理特征点;
    根据所述待处理特征点对应的局部特征以及所述倒排索引中的各特征点对应的局部特征,从所述倒排索引中的各特征点中筛选出与所述待处理特征点为同一特征点的特征点;
    将筛选出的特征点对应的待选图像作为与所述待检索图像相匹配的待选图像。
  5. 根据权利要求4所述的方法,其中,
    所述局部特征包括:P比特的局部特征,P为大于一的正整数;
    所述从所述倒排索引中的各特征点中筛选出与所述待处理特征点为同一特征点的特征点包括:
    针对所述倒排索引中的任一特征点,分别将所述特征点对应的局部特征中的第1~M比特内容与所述待处理特征点对应的局部特征中的第 1~M比特内容进行比对,根据比对结果确定出所述特征点与所述待处理特征点是否为同一特征点,M为大于一的正整数,且小于P。
  6. 根据权利要求5所述的方法,其中,所述根据比对结果确定出所述特征点与所述待处理特征点是否为同一特征点包括:
    若所述特征点对应的局部特征中的第1~M比特内容与所述待处理特征点对应的局部特征中的第1~M比特内容之间的汉明距离小于预定阈值,则确定所述特征点与所述待处理特征点为同一特征点。
  7. 根据权利要求2或3所述的方法,其中,
    所述对筛选出的待选图像进行校验,得到作为检索结果的待选图像包括:
    针对筛选出的任一待选图像,分别执行以下处理:根据所述待选图像中的特征点对应的局部特征以及所述待检索图像中的特征点对应的局部特征,从所述待选图像中的特征点中筛选出符合预定要求的特征点,所述符合预定要求包括:与所述待检索图像中的一个特征点为同一特征点;将筛选出的特征点数量作为所述待选图像的评分;
    按照所述评分由高到低的顺序对筛选出的待选图像进行排序,将排序后处于前Q位的待选图像作为所述检索结果,Q为正整数,且小于筛选出的待选图像数量。
  8. 根据权利要求7所述的方法,其中,
    所述局部特征包括:P比特的局部特征,P为大于一的正整数;
    所述从所述待选图像中的特征点中筛选出符合预定要求的特征点包括:
    针对所述待选图像中的任一特征点,分别将所述特征点对应的局部特征中的第N~P比特内容与所述待检索图像中的各特征点对应的局部特征中的第N~P比特内容进行比对,根据比对结果确定出所述特征点是否符合预定要求,N为大于一的正整数,且小于P。
  9. 根据权利要求8所述的方法,其中,所述根据比对结果确定所述特征点与是否符合预定要求包括:
    若所述特征点对应的局部特征中的第N~P比特内容与所述待检索图像中的任一特征点对应的局部特征中的第N~P比特内容之间的汉明距离小于预定阈值,则确定所述特征点符合预定要求。
  10. 一种图像检索装置,包括:获取模块、筛选模块以及校验模块;
    所述获取模块,用于获取待检索图像的局部特征;
    所述筛选模块,用于根据所述待检索图像的局部特征及图库中的各待选图像的局部特征,从所述图库中筛选出与所述待检索图像相匹配的待选图像;
    所述校验模块,用于根据所述待检索图像及筛选出的待选图像的局部特征,对筛选出的待选图像进行校验,得到作为检索结果的待选图像。
  11. 根据权利要求10所述的装置,其中,
    所述获取模块对所述待检索图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征;
    所述获取模块进一步用于,预先分别对所述图库中的各待选图像进行特征点提取,并分别获取提取出的各特征点对应的局部特征。
  12. 根据权利要求11所述的装置,其中,
    所述局部特征包括:基于卷积神经网络获取到的深度特征。
  13. 根据权利要求11或12所述的装置,其中,
    所述获取模块进一步用于,针对所述图库,建立各特征点到包括特征点的待选图像之间的倒排索引;
    所述筛选模块针对所述待检索图像中的任一特征点,分别执行以下处理:将所述特征点作为待处理特征点;根据所述待处理特征点对应的局部特征以及所述倒排索引中的各特征点对应的局部特征,从所述倒排索引中的各特征点中筛选出与所述待处理特征点为同一特征点的特征点;将筛选出的特征点对应的待选图像作为与所述待检索图像相匹配的待选图像。
  14. 根据权利要求13所述的装置,其中,
    所述局部特征包括:P比特的局部特征,P为大于一的正整数;
    所述筛选模块针对所述倒排索引中的任一特征点,分别将所述特征点对应的局部特征中的第1~M比特内容与所述待处理特征点对应的局部特征中的第1~M比特内容进行比对,根据比对结果确定出所述特征点与所述待处理特征点是否为同一特征点,M为大于一的正整数,且小于P。
  15. 根据权利要求14所述的装置,其中,
    所述筛选模块在确定所述特征点对应的局部特征中的第1~M比特内容与所述待处理特征点对应的局部特征中的第1~M比特内容之间的汉明距离小于预定阈值时,确定所述特征点与所述待处理特征点为同一特征点。
  16. 根据权利要求11或12所述的装置,其中,
    所述校验模块针对筛选出的任一待选图像,分别执行以下处理:根据所述待选图像中的特征点对应的局部特征以及所述待检索图像中的特征点对应的局部特征,从所述待选图像中的特征点中筛选出符合预定要求的特征点,所述符合预定要求包括:与所述待检索图像中的一个特征点为同一特征点;将筛选出的特征点数量作为所述待选图像的评分;按照所述评分由高到低的顺序对筛选出的待选图像进行排序,将排序后处于前Q位的待选图像作为所述检索结果,Q为正整数,且小于筛选出的待选图像数量。
  17. 根据权利要求16所述的装置,其中,
    所述局部特征包括:P比特的局部特征,P为大于一的正整数;
    所述校验模块针对所述待选图像中的任一特征点,分别将所述特征点对应的局部特征中的第N~P比特内容与所述待检索图像中的各特征点对应的局部特征中的第N~P比特内容进行比对,根据比对结果确定出所述特征点是否符合预定要求,N为大于一的正整数,且小于P。
  18. 根据权利要求17所述的装置,其中,
    所述校验模块在确定所述特征点对应的局部特征中的第N~P比特内容与所述待检索图像中的任一特征点对应的局部特征中的第N~P比特内容之间的汉明距离小于预定阈值时,确定所述特征点符合预定要求。
  19. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-9中任一项所述的方法。
  20. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行根据权利要求1-9中任一项所述的方 法。
  21. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-9中任一项所述的方法。
PCT/CN2022/074951 2021-04-28 2022-01-29 图像检索方法、装置、电子设备及计算机可读存储介质 WO2022227760A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110468934.8 2021-04-28
CN202110468934.8A CN113204665B (zh) 2021-04-28 2021-04-28 图像检索方法、装置、电子设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022227760A1 true WO2022227760A1 (zh) 2022-11-03

Family

ID=77029739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074951 WO2022227760A1 (zh) 2021-04-28 2022-01-29 图像检索方法、装置、电子设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113204665B (zh)
WO (1) WO2022227760A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204665B (zh) * 2021-04-28 2023-09-22 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质
CN117591688A (zh) * 2023-10-09 2024-02-23 北京市燃气集团有限责任公司 一种巡检图像过滤方法和过滤装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011002965A (ja) * 2009-06-17 2011-01-06 Canon Inc 画像検索方法および装置
CN104199842A (zh) * 2014-08-07 2014-12-10 同济大学 一种基于局部特征邻域信息的相似图片检索方法
CN109670068A (zh) * 2018-08-02 2019-04-23 国科易讯(北京)科技有限公司 一种多级图像检索方法
CN110019910A (zh) * 2017-12-29 2019-07-16 上海全土豆文化传播有限公司 图像检索方法及装置
CN111242152A (zh) * 2018-11-29 2020-06-05 北京易讯理想科技有限公司 基于目标提取的图像检索方法
CN113204665A (zh) * 2021-04-28 2021-08-03 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714077B (zh) * 2012-09-29 2017-10-20 日电(中国)有限公司 物体检索的方法、检索校验的方法及装置
CN106445939B (zh) * 2015-08-06 2019-12-13 阿里巴巴集团控股有限公司 图像检索、获取图像信息及图像识别方法、装置及系统
CN111783805B (zh) * 2019-04-04 2024-08-23 京东方科技集团股份有限公司 图像检索方法及装置、电子设备、可读存储介质
CN111522986B (zh) * 2020-04-23 2023-10-10 北京百度网讯科技有限公司 图像检索方法、装置、设备和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011002965A (ja) * 2009-06-17 2011-01-06 Canon Inc 画像検索方法および装置
CN104199842A (zh) * 2014-08-07 2014-12-10 同济大学 一种基于局部特征邻域信息的相似图片检索方法
CN110019910A (zh) * 2017-12-29 2019-07-16 上海全土豆文化传播有限公司 图像检索方法及装置
CN109670068A (zh) * 2018-08-02 2019-04-23 国科易讯(北京)科技有限公司 一种多级图像检索方法
CN111242152A (zh) * 2018-11-29 2020-06-05 北京易讯理想科技有限公司 基于目标提取的图像检索方法
CN113204665A (zh) * 2021-04-28 2021-08-03 北京百度网讯科技有限公司 图像检索方法、装置、电子设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN113204665B (zh) 2023-09-22
CN113204665A (zh) 2021-08-03

Similar Documents

Publication Publication Date Title
US11544459B2 (en) Method and apparatus for determining feature words and server
EP3913499A1 (en) Method and apparatus for processing dataset, electronic device and storage medium
US20210312139A1 (en) Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium
US20220318275A1 (en) Search method, electronic device and storage medium
WO2022227760A1 (zh) 图像检索方法、装置、电子设备及计算机可读存储介质
CN114861889B (zh) 深度学习模型的训练方法、目标对象检测方法和装置
CN115359383A (zh) 跨模态特征提取、检索以及模型的训练方法、装置及介质
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
JP7357114B2 (ja) 生体検出モデルのトレーニング方法、装置、電子機器および記憶媒体
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
US20220027766A1 (en) Method for industry text increment and electronic device
CN113408280A (zh) 负例构造方法、装置、设备和存储介质
CN112925912A (zh) 文本处理方法、同义文本召回方法及装置
JP2024003750A (ja) 言語モデルの訓練方法、装置、電子デバイス及び記憶媒体
CN114444514B (zh) 语义匹配模型训练、语义匹配方法及相关装置
CN114201607B (zh) 一种信息处理的方法和装置
US20220129634A1 (en) Method and apparatus for constructing event library, electronic device and computer readable medium
CN112818167B (zh) 实体检索方法、装置、电子设备及计算机可读存储介质
CN116166814A (zh) 事件检测方法、装置、设备以及存储介质
CN114818736A (zh) 文本处理方法、用于短文本的链指方法、装置及存储介质
CN114117007A (zh) 检索实体的方法、装置、设备以及存储介质
CN112784600A (zh) 信息排序方法、装置、电子设备和存储介质
EP4123999A2 (en) Method and apparatus for pushing a resource, and storage medium
CN113268987B (zh) 一种实体名称识别方法、装置、电子设备和存储介质
CN113535958B (zh) 生产线索聚合方法、装置及系统、电子设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22794241

Country of ref document: EP

Kind code of ref document: A1