CN111767421A - Method, device, electronic equipment and computer readable medium for retrieving image - Google Patents

Method, device, electronic equipment and computer readable medium for retrieving image Download PDF

Info

Publication number
CN111767421A
CN111767421A CN202010616337.0A CN202010616337A CN111767421A CN 111767421 A CN111767421 A CN 111767421A CN 202010616337 A CN202010616337 A CN 202010616337A CN 111767421 A CN111767421 A CN 111767421A
Authority
CN
China
Prior art keywords
vector
image
similarity
feature
sparse matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010616337.0A
Other languages
Chinese (zh)
Inventor
王睿
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010616337.0A priority Critical patent/CN111767421A/en
Publication of CN111767421A publication Critical patent/CN111767421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

Embodiments of the present disclosure disclose methods, apparatuses, electronic devices, and computer-readable media for retrieving images. One embodiment of the method comprises: carrying out feature extraction and quantization on the target image to obtain a feature binarization vector; determining the similarity of the characteristic binary vector and a preset sparse matrix to obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set; selecting a first number of similarity values that meet a predetermined condition from the similarity values included in the similarity vector; a first number of registered images corresponding to the first number of similarity values are acquired as a retrieval result. The above process is based on the feature vector of the image feature and the sparse matrix corresponding to the matrix registration image set, and the registration image of the predetermined condition is selected and obtained from the registration image set as the retrieval result. The method effectively solves the problem of complex scene content retrieval with rotation, scale change, brightness change and region-of-interest change.

Description

Method, device, electronic equipment and computer readable medium for retrieving image
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method, an apparatus, an electronic device, and a computer-readable medium for retrieving an image.
Background
Currently, image retrieval algorithms that search images with images can help users quickly find content of interest to themselves. However, the existing image retrieval algorithm has certain difficulties in processing scenes with rotation, brightness change, small occupied retrieval region of interest and the like.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose a method, apparatus, electronic device and computer readable medium for retrieving an image to solve some or all of the technical problems mentioned in the above background.
In a first aspect, some embodiments of the present disclosure provide a method for retrieving an image, the method comprising: carrying out feature extraction and quantization on the target image to obtain a feature binarization vector; determining the similarity of the characteristic binary vector and a preset sparse matrix to obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set; selecting a first number of similarity values meeting a predetermined condition from the similarity values included in the similarity vectors; a first number of registered images corresponding to the first number of similarity values are acquired as a retrieval result.
In a second aspect, some embodiments of the present disclosure provide an apparatus for retrieving an image, the apparatus comprising: the characteristic extraction unit is configured to extract the characteristics of the image to be retrieved to obtain a characteristic binary vector; the determining unit is configured to determine the similarity between the characteristic binarization vector and a preset sparse matrix, and obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set; a selecting unit configured to select a first number of similarity values that meet a predetermined condition from among the similarity values included in the similarity vector; an acquisition unit configured to acquire a first number of registration images corresponding to the first number of similarity values as a retrieval result.
In a third aspect, some embodiments of the present disclosure provide an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
One of the above-described various embodiments of the present disclosure has the following advantageous effects: and performing feature extraction and quantization on the target image to obtain a feature binarization vector. And then, determining the similarity of the characteristic binarization vector and a preset sparse matrix to obtain a similarity vector. Through the steps, the similarity vector between the feature vector corresponding to the main feature of the target image and the preset sparse matrix can be determined, and a basis is provided for further matching the eligible images in the registered image set. Then, a first number of similarity values meeting a predetermined condition are selected from the similarity values included in the similarity vectors. And finally, acquiring a first number of registration images corresponding to the first number of similarity values as a retrieval result. The above process is based on the feature vector of the image feature and the sparse matrix corresponding to the matrix registration image set, and the registration image of the predetermined condition is selected and obtained from the registration image set as the retrieval result. The limited problem of convolutional neural network feature extraction is avoided. The method effectively solves the problem of complex scene content retrieval with rotation, scale change, brightness change and region-of-interest change.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
FIG. 1 is a schematic illustration of one application scenario of a method for retrieving images according to some embodiments of the present disclosure;
FIG. 2 is a flow diagram of some embodiments of a method for retrieving an image according to the present disclosure;
FIG. 3 is a flow diagram of further embodiments of methods for retrieving an image according to the present disclosure;
FIG. 4 is an apparatus schematic of some embodiments of an apparatus for retrieving images according to the present disclosure;
FIG. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram 100 of one application scenario of a method for retrieving an image according to some embodiments of the present disclosure.
In the application scenario of fig. 1, first, the computing device 101 may receive a target image 102 that needs to be retrieved. The target image may be an image of any format, such as bmp, jpg, png, tif, gif, and the like. The computing device 101 may then perform feature extraction and quantization on the target image 102. In the above process, the feature binarization vector 103 may be obtained using a SIFT algorithm. And performing dot product operation on the feature binarization vector 103 and a preset sparse matrix 104 to generate a similarity vector 105. The preset sparse matrix may be generated by extracting a set of key points and a description subset from each registered image in the set of registered images through the set of registered images. The similarity vector 105 includes a plurality of similarity values. Thereafter, a first number of similarity values meeting a predetermined condition may be selected from the similarity values comprised by the similarity vector 105 described above. The predetermined condition is, for example, that the similarity value is greater than a preset threshold value. Then, a first number of registered images corresponding to the first number of similarity values are acquired to generate a search result 106. Here, the first number of registered images is an image in the registered image set. Finally, the search results 106 may optionally be subjected to a single mapping and constraints based on the verification of the geometric position to obtain a more accurate ranking result 107.
The computing device 101 may be hardware or software. When the computing device 101 is hardware, it may be implemented as a distributed cluster composed of a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.
With continued reference to fig. 2, a flow 200 of some embodiments of a method for retrieving an image according to the present disclosure is shown. The image retrieval method comprises the following steps:
step 201, feature extraction and quantization are performed on the target image to obtain a feature binarization vector.
In some embodiments, an executing subject of the image retrieval method (e.g., the computing device 101 shown in fig. 1) may perform feature extraction on the target image through a SIFT algorithm to obtain image features. The target image may be an image to be retrieved input by a user. Next, quantization may be an operation of mapping the above-described extracted image features into a finite number of discrete values.
Optionally, after the traditional SIFT features are used for extracting the target image, SIFT features are obtained. In a preset compression dictionary, a KDtree (k-dimensional tree) can be selected to query a clustering center closest to SIFT features for quantization to obtain a feature binarization vector. For example, if a feature is closest to the 3 rd cluster center, then a 5000-dimensional long all-0 vector may be initialized with a 1 at the 3 rd position.
And step 202, performing dot product operation based on the characteristic binary vector and a preset sparse matrix to obtain a similarity vector.
In some embodiments, a dot product operation may be performed based on the feature binarization vector obtained in step 201 and a preset sparse matrix. The preset sparse matrix may be generated by performing feature extraction on the registered image set based on a feature extraction method in various ways. The dimension of the sparse matrix may be predetermined. Here, in the above-described sparse matrix, the number of elements having a value of 0 is usually much larger than the number of elements other than 0, and the distribution of the elements other than 0 is not regular. The dot product operation is also called a number product, and is a calculation method in linear algebra. The vector obtained by the dot product operation can be used as the similarity vector.
In some optional implementations of some embodiments, the sparse matrix may be generated by:
firstly, extracting each registered image in the registered image set to obtain a key point set and a description subset.
And the execution main body performs feature extraction on each registered image in the registered image set to obtain a key point set and a description subset. The key points may be two-dimensional coordinate points that represent positions of main information or main features of the image. The key points can be automatically determined based on an existing algorithm or can be manually marked. The descriptor is also called a feature descriptor, and can represent the local appearance of the feature point for further matching. Taking a face image with reference to two-dimensional coordinate axes as an example, the nose in the image represents the main feature of the face. The corresponding coordinate point of the nostril in coordinates may be a key point and the descriptor is the local appearance near the key point.
And secondly, quantizing the descriptor set of each registered image on a preset compression dictionary to obtain a sparse binary vector corresponding to the registered image.
Here, the dimension of the above-described sparse binary vector is a second number dimension. The execution subject may obtain the image set through web crawling or web gathering. And then, performing feature extraction on each image in the image set to obtain an image feature set. The image feature sets can be clustered through a k-means (k-means ) clustering algorithm to obtain a second number of clustering centers. The second number of clustering centers may be subjected to dimensionality reduction compression by PCA (Principal Component Analysis) to obtain a compression dictionary. Then, quantizing the descriptor set of each registered image on a preset compression dictionary to obtain a sparse binary vector corresponding to the registered image. For example, if a feature is closest to the 3 rd cluster center, then a 5000-dimensional long all-0 vector may be initialized with a 1 at the 3 rd position.
Thirdly, counting Frequency distribution of each descriptor in the descriptor set of each registered image, and determining tf-idf (Term Frequency-Inverse text Frequency index) vectors of each registered image.
In alternative implementations in some embodiments, the execution subject of the method determines the tf-idf vector of each registration image by various means, and the frequency distribution of each descriptor may be the probability of each descriptor occurring.
And fourthly, recombining the tf-idf vectors to obtain a sparse matrix.
In some optional implementations of some embodiments, the execution subject may count the frequency of occurrence of each cluster center in the compression dictionary. And then determining a retrieval weight for each clustering center by adopting an inverse text frequency idf principle. The retrieval weight is a weight coefficient given to the importance in the retrieval process for each cluster center. This is determined by the number of occurrences of the cluster center in the image set. Generally, in an image set, the more times a cluster center appears, the lower the weight coefficient is, and vice versa.
In some optional implementations of some embodiments, the cluster center has a search weight, and the search weight is determined according to an occurrence frequency of the cluster center. In practice, the occurrence frequency of the cluster center can be determined statistically, and the probability can be expressed as the retrieval weight. The importance degree of the clustering center is reflected to a certain extent.
Optionally, for each descriptor, querying a nearest clustering center in a preset dictionary through a KDtree, so as to obtain a quantized binary vector. The KDtree is a data structure for partitioning a k-dimensional data space and is mainly applied to searching of key data in the multi-dimensional space. KDtree is a special case of a binary space partition tree.
Step 203, selecting a first number of similarity values meeting a predetermined condition from the similarity values included in the similarity vectors.
In some embodiments, the executing entity may select a first number of similarity values meeting a predetermined condition based on the similarity vector obtained in step 202. The predetermined condition may be that the similarity value is greater than a preset threshold. In practice, a plurality of similarity values may appear in the process of calculating the similarity, and the top k similarity values with the maximum value are taken.
Step 204, acquiring a first number of registration images corresponding to the first number of similarity values as a retrieval result.
In some embodiments, after the execution subject selects a first number of similarity values that meet a predetermined condition, a first number of registration images corresponding to the first number of similarity values are acquired as the detection result.
One of the above embodiments of the present disclosure has the following beneficial effects that the feature binarization vector is obtained by performing feature extraction and quantization on the target image. And then, determining the similarity between the characteristic binarization vector and a preset sparse matrix to obtain a similarity vector and obtain a similarity vector. Through the steps, the similarity vector between the feature vector corresponding to the main feature of the target image and the preset sparse matrix can be determined, and a basis is provided for further matching the eligible images in the registered image set. Then, a first number of similarity values meeting a predetermined condition are selected from the similarity values included in the similarity vectors. And finally, acquiring a first number of registration images corresponding to the first number of similarity values as a retrieval result. The above process is based on the feature vector of the image feature and the sparse matrix corresponding to the matrix registration image set, and the registration image of the predetermined condition is selected and obtained from the registration image set as the retrieval result. The limited problem of convolutional neural network feature extraction is avoided. The method effectively solves the problem of complex scene content retrieval with rotation, scale change, brightness change and region-of-interest change.
With further reference to fig. 3, a flow 300 of further embodiments of an image retrieval method is shown. The flow 300 of the image retrieval method includes the following steps:
step 301, performing feature extraction and quantization on the target image to obtain a feature binarization vector.
And step 302, performing dot product operation based on the characteristic binary vector and a preset sparse matrix to obtain a similarity vector.
Step 303, selecting a first number of similarity values that meet a predetermined condition from the similarity values comprised in the similarity vector.
Step 304, a first number of registration images corresponding to the first number of similarity values are obtained as a retrieval result.
In some embodiments, the specific implementation and technical effects of steps 301 and 304 may refer to steps 201 and 204 in the embodiments corresponding to fig. 2, which are not described herein again.
305, constraining the first number of registration images based on the verification of the geometric position to obtain a constraint result; and carrying out similarity sequencing on the constraint results to generate a retrieval result sequence.
In some embodiments, the execution subject may obtain the top k registered images with the highest similarity based on steps 301 and 304. And then, performing single mapping transformation on the first k registered images to obtain a registered image sequence. The monosegraphic change may refer to an operation of constraining the matched image pair by the geometric position relationship of the feature points.
As can be seen from fig. 3, compared to the description of some embodiments corresponding to fig. 2, the flow 300 of the method for retrieving images in some embodiments corresponding to fig. 3 embodies the steps of constraining and similarity ranking the first number of registered images as the retrieval result. Thus, the schemes described in these examples can result in similarity sequences with higher confidence.
With further reference to fig. 4, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an apparatus for retrieving an image, which may be particularly applicable in various electronic devices.
As shown in fig. 4, an apparatus 400 for retrieving an image of some embodiments includes: a feature extraction unit 401, a determination unit 402, a selection unit 403, and an acquisition unit 404. The feature extraction unit 401 is configured to perform feature extraction on an image to be retrieved to obtain the above feature binarization vector; the determining unit 402 is configured to determine similarity between the feature binarization vector and a preset sparse matrix, resulting in a similarity vector, wherein the sparse matrix is generated based on a registered image set; the selecting unit 403 is configured to select a first number of similarity values that meet a predetermined condition from the similarity values included in the similarity vectors; the acquisition unit 404 is configured to acquire, as a retrieval result, a first number of registered images corresponding to the first number of similarity values described above.
In an alternative implementation of some embodiments, the apparatus for retrieving images further comprises a constraint and similarity ordering unit (not shown) configured to perform constraint and similarity ordering on the first number of registered images, generating the sequence of registered images.
In an alternative implementation of some embodiments, the sparse matrix is generated by: extracting a key point set and a description subset from each registered image in the registered image set; generating a sparse binary vector corresponding to each registered image based on a preset compression dictionary and a descriptor set of each registered image, wherein the dimensionality of the sparse binary vector is a second number dimension; generating a frequency vector of the registered image according to frequency distribution of each descriptor in the descriptor set of each registered image; and generating the sparse matrix based on the frequency vector.
In an alternative implementation of some embodiments, the frequency vector is a word frequency inverse text frequency index tf-idf vector.
In an alternative implementation of some embodiments, the compression dictionary is generated by: performing feature extraction on each image in the randomly acquired image set to generate an image feature set; clustering the obtained image feature set to obtain a second number of clustering centers; and processing the second number of clustering centers by using a compression method to obtain a compression dictionary.
In an alternative implementation manner of some embodiments, the cluster center has a retrieval weight, and the retrieval weight is determined according to the occurrence frequency of the cluster center.
It will be understood that the elements described in the apparatus 400 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and will not be described herein again.
Referring now to FIG. 5, a block diagram of an electronic device (e.g., the computing device of FIG. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device in some embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the computing device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out feature extraction and quantization on the target image to obtain a feature binarization vector; determining the similarity of the characteristic binary vector and a preset sparse matrix to obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set; selecting a first number of similarity values that meet a predetermined condition from the similarity values included in the similarity vector; a first number of registered images corresponding to the first number of similarity values are acquired as a retrieval result.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a feature extraction unit, a determination unit, a selection unit, and an acquisition unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the feature extraction unit may also be described as a "unit that performs feature extraction and quantization on a target image".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
According to one or more embodiments of the present disclosure, there is provided a method for retrieving an image, including: carrying out feature extraction and quantization on the target image to obtain a feature binarization vector; determining the similarity of the characteristic binary vector and a preset sparse matrix to obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set; selecting a first number of similarity values that meet a predetermined condition from the similarity values included in the similarity vector; a first number of registered images corresponding to the first number of similarity values are acquired as a retrieval result.
According to one or more embodiments of the present disclosure, based on the verification of the geometric position, the first number of registered images are constrained to obtain a constrained result; and carrying out similarity sequencing on the constraint results to generate a retrieval result sequence.
According to one or more embodiments of the present disclosure, a set of keypoints and a set of descriptors are extracted for each registered image in a set of registered images; generating a sparse binary vector corresponding to each registered image based on a preset compression dictionary and a descriptor set of each registered image, wherein the dimensionality of the sparse binary vector is a second number of dimensions; generating a frequency vector of the registered image according to frequency distribution of each descriptor in the descriptor set of each registered image; and generating the sparse matrix based on the frequency vector.
According to one or more embodiments of the present disclosure, the frequency vector is a word frequency inverse text frequency index tf-idf vector.
According to one or more embodiments of the present disclosure, feature extraction is performed on each image in a randomly acquired image set, and an image feature set is generated; clustering the obtained image feature set to obtain a second number of clustering centers; and processing the second number of clustering centers by using a compression method to obtain a compression dictionary.
According to one or more embodiments of the present disclosure, the frequency of occurrence of all cluster centers in a compression dictionary may be counted and then a retrieval weight tf-idf vector may be determined for each cluster center using the idf principle.
According to one or more embodiments of the present disclosure, there is provided an apparatus for retrieving an image, including: the characteristic extraction unit is configured to extract the characteristics of the image to be retrieved to obtain a characteristic binary vector; a determining unit configured to determine a similarity between the feature binarization vector and a preset sparse matrix, resulting in a similarity vector, wherein the sparse matrix is generated based on the registered image set; a selecting unit configured to select a first number of similarity values that meet a predetermined condition from among the similarity values included in the similarity vector; an acquisition unit configured to acquire a first number of registration images corresponding to the first number of similarity values as a retrieval result.
According to one or more embodiments of the present disclosure, the apparatus for retrieving images further includes a constraint and similarity ranking unit (not shown) configured to perform constraint and similarity ranking on the first number of registered images, generating a sequence of registered images.
According to one or more embodiments of the present disclosure, the sparse matrix is generated by: extracting a key point set and a description subset from each registered image in the registered image set; generating a sparse binary vector corresponding to each registered image based on a preset compression dictionary and a descriptor set of each registered image, wherein the dimensionality of the sparse binary vector is a second number dimension; generating a frequency vector of the registered image according to frequency distribution of each descriptor in the descriptor set of each registered image; and generating the sparse matrix based on the frequency vector.
According to one or more embodiments of the present disclosure, the frequency vector is a word frequency inverse text frequency index tf-idf vector.
According to one or more embodiments of the present disclosure, the compression dictionary is generated by: performing feature extraction on each image in the randomly acquired image set to generate an image feature set; clustering the obtained image feature set to obtain a second number of clustering centers; and processing the second number of clustering centers by using a compression method to obtain a compression dictionary.
According to one or more embodiments of the present disclosure, the cluster center has a search weight, and the search weight is determined according to an appearance frequency of the cluster center.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement any of the above-described methods.
According to one or more embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, wherein the program realizes any of the above-mentioned methods when executed by a processor.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (9)

1. A method for retrieving an image, comprising:
carrying out feature extraction and quantization on the target image to obtain a feature binarization vector;
determining the similarity of the characteristic binarization vector and a preset sparse matrix to obtain a similarity vector, wherein the sparse matrix is generated based on a registered image set;
selecting a first number of similarity values that meet a predetermined condition from the similarity values included in the similarity vector;
acquiring a first number of registration images corresponding to the first number of similarity values as a retrieval result.
2. The method of claim 1, wherein the method further comprises:
based on the verification of the geometric position, constraining the first number of registered images to obtain a constrained result;
and carrying out similarity sequencing on the constraint results to generate a retrieval result sequence.
3. The method of claim 1, wherein the sparse matrix is generated by:
extracting a key point set and a description subset from each registered image in the registered image set;
generating a sparse binary vector corresponding to the registration image based on a preset compression dictionary and a descriptor set of each registration image, wherein the dimensionality of the sparse binary vector is a second number dimension;
generating a frequency vector of each registration image through frequency distribution of each descriptor in the descriptor set of each registration image;
generating the sparse matrix based on the frequency vector.
4. The method of claim 3, wherein the frequency vector is a word frequency inverse text frequency index tf-idf vector.
5. The method of claim 3, wherein the compression dictionary is generated by:
performing feature extraction on each image in the acquired image set to generate an image feature set;
clustering the obtained image feature set to obtain a second number of clustering centers;
and processing the second number of clustering centers by using a compression method to obtain a compression dictionary.
6. The method of claim 4, wherein the cluster centers have retrieval weights that are determined according to frequency of occurrence of the cluster centers.
7. An apparatus for retrieving an image, comprising:
the characteristic extraction unit is configured to extract the characteristics of the image to be retrieved to obtain the characteristic binary vector;
a determining unit configured to determine similarity of the feature binarization vector and a preset sparse matrix, resulting in a similarity vector, wherein the sparse matrix is generated based on a registered image set;
a selection unit configured to select a first number of similarity values that meet a predetermined condition from among the similarity values included in the similarity vector;
an acquisition unit configured to acquire a first number of registered images corresponding to the first number of similarity values as a retrieval result.
8. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.
CN202010616337.0A 2020-06-30 2020-06-30 Method, device, electronic equipment and computer readable medium for retrieving image Pending CN111767421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010616337.0A CN111767421A (en) 2020-06-30 2020-06-30 Method, device, electronic equipment and computer readable medium for retrieving image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010616337.0A CN111767421A (en) 2020-06-30 2020-06-30 Method, device, electronic equipment and computer readable medium for retrieving image

Publications (1)

Publication Number Publication Date
CN111767421A true CN111767421A (en) 2020-10-13

Family

ID=72724380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010616337.0A Pending CN111767421A (en) 2020-06-30 2020-06-30 Method, device, electronic equipment and computer readable medium for retrieving image

Country Status (1)

Country Link
CN (1) CN111767421A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464002A (en) * 2020-12-02 2021-03-09 北京粉笔蓝天科技有限公司 Method, apparatus, storage medium, and device for graph reasoning topic image retrieval
CN113012226A (en) * 2021-03-22 2021-06-22 浙江商汤科技开发有限公司 Camera pose estimation method and device, electronic equipment and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102844766A (en) * 2011-04-20 2012-12-26 中国科学院自动化研究所 Human eyes images based multi-feature fusion identification method
CN104239898A (en) * 2014-09-05 2014-12-24 浙江捷尚视觉科技股份有限公司 Method for carrying out fast vehicle comparison and vehicle type recognition at tollgate
CN106934401A (en) * 2017-03-07 2017-07-07 上海师范大学 A kind of image classification method based on improvement bag of words
CN108509925A (en) * 2018-04-08 2018-09-07 东北大学 A kind of pedestrian's recognition methods again of view-based access control model bag of words
CN108829701A (en) * 2018-04-25 2018-11-16 鹰霆(天津)科技有限公司 A kind of 3D model retrieval method based on sketch

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102844766A (en) * 2011-04-20 2012-12-26 中国科学院自动化研究所 Human eyes images based multi-feature fusion identification method
CN104239898A (en) * 2014-09-05 2014-12-24 浙江捷尚视觉科技股份有限公司 Method for carrying out fast vehicle comparison and vehicle type recognition at tollgate
CN106934401A (en) * 2017-03-07 2017-07-07 上海师范大学 A kind of image classification method based on improvement bag of words
CN108509925A (en) * 2018-04-08 2018-09-07 东北大学 A kind of pedestrian's recognition methods again of view-based access control model bag of words
CN108829701A (en) * 2018-04-25 2018-11-16 鹰霆(天津)科技有限公司 A kind of 3D model retrieval method based on sketch

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464002A (en) * 2020-12-02 2021-03-09 北京粉笔蓝天科技有限公司 Method, apparatus, storage medium, and device for graph reasoning topic image retrieval
CN113012226A (en) * 2021-03-22 2021-06-22 浙江商汤科技开发有限公司 Camera pose estimation method and device, electronic equipment and computer storage medium

Similar Documents

Publication Publication Date Title
CN109492772B (en) Method and device for generating information
US20100088342A1 (en) Incremental feature indexing for scalable location recognition
CN107766492B (en) Image searching method and device
CN108536753B (en) Method for determining repeated information and related device
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109934142B (en) Method and apparatus for generating feature vectors of video
CN113449070A (en) Multimodal data retrieval method, device, medium and electronic equipment
CN111767421A (en) Method, device, electronic equipment and computer readable medium for retrieving image
WO2023020214A1 (en) Retrieval model training method and apparatus, retrieval method and apparatus, device and medium
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
CN114494709A (en) Feature extraction model generation method, image feature extraction method and device
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
WO2021012691A1 (en) Method and device for image retrieval
CN112990176A (en) Writing quality evaluation method and device and electronic equipment
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111666449B (en) Video retrieval method, apparatus, electronic device, and computer-readable medium
CN115359400A (en) Video identification method, device, medium and electronic equipment
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN111949819A (en) Method and device for pushing video
CN111913912A (en) File processing method, file matching device, electronic equipment and medium
CN114625876B (en) Method for generating author characteristic model, method and device for processing author information
CN111898658B (en) Image classification method and device and electronic equipment
CN114399058B (en) Model updating method, related device, equipment and storage medium
CN113283115B (en) Image model generation method and device and electronic equipment
CN117708630A (en) Data clustering method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.