CN111914921A

CN111914921A - Similarity image retrieval method and system based on multi-feature fusion

Info

Publication number: CN111914921A
Application number: CN202010725870.0A
Authority: CN
Inventors: 朱智林; 杜俊强; 张弦; 夏广培; 吴昊
Original assignee: Shandong Technology and Business University
Current assignee: Shandong Technology and Business University
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-11-10

Abstract

The invention provides a similarity image retrieval method and system based on multi-feature fusion, and relates to the field of image recognition. A similarity image retrieval method based on multi-feature fusion comprises the following steps: carrying out global feature extraction on the image through a GIST feature operator; extracting local features of the image through an SIFT feature operator; calculating the similarity between different images; judging whether the similarity is greater than a first preset threshold value, if so, judging that the retrieved image is a similar image, and if not, deleting the retrieved image; the local and global features of the image can be more fully extracted by using a mode of combining SIFT and GIST feature operators, and the image is more fully fused. In addition, the invention also provides a similarity image retrieval system based on multi-feature fusion, which comprises the following steps: the device comprises a first extraction module, a second extraction module, a calculation module and a judgment module.

Description

Similarity image retrieval method and system based on multi-feature fusion

Technical Field

The invention relates to the field of image recognition, in particular to a similarity image retrieval method and system based on multi-feature fusion.

Background

With the wide application of digital media technology, a large number of images have become an indispensable part in life, and have very wide application in a plurality of fields such as education, culture, life science and the like. Given a particular image, it would be of great practical value if a similar image could be found from a vast number of images.

The traditional image retrieval method is often highly dependent on training samples, so that the classical method cannot be directly applied to retrieval of a single similar image, and the following defects are specifically existed:

1. the global features and the local features of an image cannot be fully extracted and fused, and the fusion can be effectively carried out;

2. the similarity between different images cannot be effectively measured.

Disclosure of Invention

The invention aims to provide a similarity image retrieval method based on multi-feature fusion, which can more fully extract local and global features of an image and perform more sufficient fusion by utilizing a mode of combining SIFT and GIST feature operators.

Another object of the present invention is to provide a similarity image retrieval system based on multi-feature fusion, which is capable of operating a similarity image retrieval method based on multi-feature fusion.

The embodiment of the invention is realized by the following steps:

in a first aspect, an embodiment of the present application provides a similarity image retrieval method based on multi-feature fusion, which includes the following steps: carrying out global feature extraction on the image through a GIST feature operator; extracting local features of the image through an SIFT feature operator; calculating the similarity between different images; and judging whether the similarity is greater than a first preset threshold value, if so, judging that the searched image is a similar image, and if not, deleting the searched image.

In some embodiments of the present invention, after the local feature extraction of the image by the SIFT feature operator, obtaining a histogram of the characterization image by a BoW model is further included.

In some embodiments of the present invention, the above local feature extraction of the image by the SIFT feature operator comprises extracting a visual vocabulary vector from the image by the SIFT feature operator.

In some embodiments of the present invention, the method further comprises merging the visual vocabularies with similar word senses by K-means to construct a word list containing K vocabularies.

In some embodiments of the present invention, the method further comprises counting the number of times each word appears in the image, and characterizing the image as a K-dimensional numerical vector.

In some embodiments of the present invention, the calculating the similarity between different images includes calculating the global similarity between different images according to the euclidean distance based on the basis of feature extraction by the GIST feature operator.

In some embodiments of the present invention, the above further comprises calculating local similarity of different images by babbitt distance based on the basis of the characterization histogram.

In some embodiments of the present invention, the above further includes combining the local similarity and the global similarity of the image as a basis for image similarity retrieval.

In a second aspect, an embodiment of the present application provides a similarity image retrieval system based on multi-feature fusion, including a first extraction module, configured to perform global feature extraction on an image through a GIST feature operator; the second extraction module is used for extracting local features of the image through an SIFT feature operator; the calculating module is used for calculating the similarity between different images; and the judging module is used for judging whether the similarity is greater than a first preset threshold value, if so, judging that the searched image is a similar image, and if not, deleting the searched image.

In some embodiments of the invention, the above further comprises at least one memory for storing computer instructions; at least one processor in communication with the memory, wherein the at least one processor, when executing the computer instructions, causes the system to perform: the device comprises a first extraction module, a second extraction module, a calculation module and a judgment module.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

the local and global features of the image can be more fully extracted by combining SIFT and GIST feature operators, and are more fully fused, so that a good detection effect can be still obtained even if the rotation angle, the image brightness or the shooting visual angle are changed; by using a self-adaptive weight mode, the auxiliary task and the main task share part of feature representation, good feature representation can be trained while the auxiliary task is trained, the feature representation is beneficial to accelerating the learning of the main task, and then different methods can be considered in the similarity calculation process.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic diagram illustrating steps of a similarity image retrieval method based on multi-feature fusion according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating detailed steps of a similarity image retrieval method based on multi-feature fusion according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a similarity image retrieval system module based on multi-feature fusion according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Example 1

Referring to fig. 1, fig. 1 is a schematic diagram illustrating steps of a similarity image retrieval method based on multi-feature fusion according to an embodiment of the present invention, which includes the following steps:

step S100, carrying out global feature extraction on the image through a GIST feature operator;

specifically, scene-level identification is performed through GIST operator down-sampling, and global feature extraction is performed on the image.

In some embodiments, the GIST description itself is the overall information of an image, and is suitable for scene classification and analysis, for example, the GIST operator performs recognition classification on a high-resolution remote sensing image, similar to down-sampling, and colloquially speaking is the recognition of a scene level, for example, a local part of any google map is selected, the image can be divided into a plurality of squares, each square is 300 × 300pix, and then the squares are distinguished by different colors, such as orange yellow is a dense residential area, green is a water body, and the like.

Step S110, extracting local features of the image through an SIFT feature operator;

specifically, the feature points in one image and descriptors related to scale and orientation are obtained through SIFT to obtain features, image feature point matching is carried out to extract local features of the image, and one region is described to enable the region to have high distinguishability.

In some embodiments, the SIFT is used to search key points, i.e. feature points, in different scale spaces and calculate the direction of the key points. SIFT can be adopted to solve the problems: rotation, zooming, translation (RST), image affine/projective transformation (viewpoint), illumination influence (illumination), target occlusion (occlusion), debris scene (clutter), noise and the like of the target. The problem that factors such as the self state of a target, the environment of a scene, the imaging characteristics of imaging equipment and the like influence the performance of image registration/target identification tracking is solved.

Step S120, calculating the similarity between different images;

in some embodiments, cosine similarity calculation may be adopted to represent the pictures as a vector, and the similarity between two pictures is characterized by calculating the cosine distance between the vectors; the similarity of the pictures can also be calculated by adopting a Hash algorithm, including aHash, pHash and dHash. Perceptual hashing does not compute Hash values in a strict way, but rather computes Hash values in a more relative way, since "similar" or not is a relative decision. The value hash algorithm, the difference hash algorithm and the perception hash algorithm are all characterized in that the smaller the value is, the higher the similarity is, the value is 0-64, namely, the different hash values of 64 bits in the Hamming distance are. The values of the three histograms and the single-channel histogram are 0-1, and the larger the value is, the higher the similarity is; the similarity of the pictures can be calculated by adopting a histogram, the pictures can be equally divided according to the global distribution condition of the colors, and then the similarity of the pictures can be calculated.

Step S130, judging whether the similarity is greater than a first preset threshold value;

specifically, it is determined whether the similarity is greater than a first preset threshold, and if the similarity is greater than the first preset threshold, the process proceeds to step S140, and if the similarity is less than or equal to the first preset threshold, the process proceeds to step S150.

In some embodiments, the first preset threshold may be changed according to different pictures, for example, assuming that a picture has n pixels, where n1 pixels have a gray value smaller than the threshold and n2 pixels have a gray value greater than or equal to the threshold (n1+ n2 ═ n). w1 and w2 represent the specific gravity of each of the two pixels, the mean and variance of all pixels with gray values less than the threshold are μ 1 and σ 1, respectively, and the mean and variance of all pixels with gray values greater than or equal to the threshold are μ 2 and σ 2, respectively, and the histogram of the image is calculated, and the histogram is taken by a series of small-to-large threshold values and substituted into the BBS formula. The value that results in the "minimum intra-class difference" or the "maximum inter-class difference" is the final threshold, which may be, for example, 0.4, 0.5, 0.6, etc.

Step S140, the retrieved image is a similar image;

specifically, the retrieved pictures are determined to be similar images.

Step S150 deletes the retrieved image.

Example 2

Referring to fig. 2, fig. 2 is a detailed step diagram of a similarity image retrieval method based on multi-feature fusion according to an embodiment of the present invention, which includes the following steps:

step S200, carrying out global feature extraction on the image through a GIST feature operator;

specifically, scene-level identification is performed through GIST operator down-sampling, and global feature extraction is performed on the image. Reference may be made to the description of step S100 in fig. 1, which is not repeated herein.

Step S210, extracting visual vocabulary vectors from the image through an SIFT feature operator;

specifically, feature extraction is performed through SIFT to extract key points, or feature points, corner points, and visual vocabulary vectors from the picture.

In some implementations, the feature extraction can include: the location (coordinates) of the keypoints; if the scale information is not extracted, the algorithm cannot describe the feature points according to the scale information when describing the feature points, and further the algorithm has no scale invariance; if the direction information of the key points is not extracted, the algorithm cannot describe the feature points according to the direction information when describing the feature points, and further the algorithm has no rotation invariance. After the information of the key points exists, the key points are described later, so that the matching relationship between the key points can be judged according to different descriptions of different key points.

Step S220, merging visual vocabularies with similar word senses through K-means to construct a word list containing K vocabularies;

specifically, the K-means is a clustering algorithm based on euclidean distance, and it is considered that the closer the distance between two targets is, the greater the similarity is, visual vocabularies with similar word senses are merged to construct a word list containing K vocabularies.

In some embodiments, K-means based on euclidean distance assumes that the data of each data cluster has the same prior probability and exhibits a spherical distribution, but such a distribution is not common in real life. When the non-convex data distribution shape is faced, a kernel function can be introduced to optimize, and the algorithm is also called kernel K-means algorithm and is one of kernel clustering methods. The main idea of the kernel clustering method is to map data points in an input space to a high-order feature space through a nonlinear mapping, and perform clustering in a new feature space. The nonlinear mapping increases the probability of linear divisibility of data points, so that under the condition that the classical clustering algorithm fails, a more accurate clustering result can be achieved by introducing a kernel function.

Step S230, counting the occurrence frequency of each word in the image, and representing the image as a K-dimensional numerical vector;

in some embodiments, the clustering effect is not only based on the number and dimensions of the data, but also on the more reasonable processing and filtering of the data to be clustered during the data expansion stage, and the significance of the data itself. The K-means is the most widely used method, the related model assignment is not needed, and a good clustering result is obtained at the same time, so that the method of the Wanjin oil is probably not the optimal choice when the specific problem is processed, usually, several different methods are simultaneously used for comparing the effect and the interpretation degree during clustering analysis, and a hierarchical clustering and mixed model can also be adopted.

Step S240, obtaining a representation image histogram through a BoW model;

specifically, a histogram Of the representation image is obtained by using a Bag Of Word model and a method for image retrieval.

In some embodiments, Bag Of Features is similar to the Bag Of Word principle, but is applicable to the retrieval Of images, which can be divided into the following steps:

feature extraction: the method comprises the steps of (1) regarding an image as a set consisting of various image blocks, and obtaining key image features of the image through feature extraction;

learning "visual dictionary" (visual vocabularies): after the features of a plurality of images are obtained, the features are not classified, and some feature points are extremely similar, so that the step classifies the feature points extracted by the user through a K-means clustering algorithm. Clustering is a key operation in learning a visual dictionary. And the clustered cluster centers are called visual words. While the set of visual word components is referred to as a visual dictionary/codebook;

quantifying the input feature set: and mapping the input feature set into the codebook obtained in the last step. Calculating the distance from the input features to the visual words, then mapping the distance to the visual words with the closest distance, and counting;

the input image is converted into a frequency histogram of visual words (visual words): this step is performed by extracting features of the image and then converting the extracted feature points into a frequency histogram according to the previous step.

Step S250, calculating the global similarity of different images through Euclidean distance according to the basis of characteristic extraction of a GIST characteristic operator;

in some embodiments, the distance that an individual exists in space is measured by euclidean distance, with greater distance indicating greater variance between individuals, and the actual distance between two points in n-dimensional space is measured.

Step S260, calculating local similarity of different images through Mahalanobis distance according to the basis of the representation histogram;

specifically, the euclidean distance is corrected through the mahalanobis distance, problems that dimensions of the euclidean distance are inconsistent and relevant are corrected, and local similarity of different images is calculated.

Step S270, combining the local similarity and the global similarity of the image as the basis of image similarity retrieval;

specifically, multi-feature depth extraction is carried out on a reference image and one or more images in a database to obtain a reference representation, similarity among different images is calculated by utilizing a similarity calculation criterion on the basis of feature extraction, and then the local similarity and the global similarity of the images are combined to be used as the basis of image similarity retrieval.

Step S280, judging whether the similarity is greater than a first preset threshold value;

specifically, it is determined whether the similarity is greater than a first preset threshold, if the similarity is greater than the first preset threshold, the step S290 is performed, and if the similarity is less than or equal to the first preset threshold, the step S300 is performed.

Step S290, the retrieved image is a similar image;

specifically, the retrieved pictures are determined to be similar images.

Step S300, the retrieved image is deleted.

Example 3

Referring to fig. 3, fig. 3 is a schematic diagram of a similarity image retrieval system module based on multi-feature fusion according to an embodiment of the present invention.

A similarity image retrieval system based on multi-feature fusion comprises: the first extraction module is used for carrying out global feature extraction on the image through a GIST feature operator; the second extraction module is used for extracting local features of the image through an SIFT feature operator; the calculating module is used for calculating the similarity between different images; and the judging module is used for judging whether the similarity is greater than a first preset threshold value, if so, judging that the searched image is a similar image, and if not, deleting the searched image.

Also included are a memory, a processor, and a communication interface, which are electrically connected, directly or indirectly, to each other to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by executing the software programs and modules stored in the memory. The communication interface may be used for communicating signaling or data with other node devices.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the illustration of fig. 1, 2, and 3 is merely schematic and may include more or fewer components than those illustrated in fig. 1, 2, and 3 or may have a different configuration than that illustrated in fig. 1, 2, and 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In summary, the similarity image retrieval method and system based on multi-feature fusion provided by the embodiment of the application can more fully extract the local and global features of the image and perform more full fusion by using a mode of combining the SIFT feature operators and the GIST feature operators; different methods are considered in the similarity calculation process by using a self-adaptive weight mode.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A similarity image retrieval method based on multi-feature fusion is characterized by comprising the following steps:

carrying out global feature extraction on the image through a GIST feature operator;

extracting local features of the image through an SIFT feature operator;

calculating the similarity between different images;

and judging whether the similarity is greater than a first preset threshold value, if so, judging that the searched image is a similar image, and if not, deleting the searched image.

2. The method for retrieving the similarity image based on the multi-feature fusion as claimed in claim 1, further comprising, after the local feature extraction of the image by the SIFT feature operator:

a histogram of the representative image is obtained by the BoW model.

3. The method for retrieving the similarity image based on the multi-feature fusion as claimed in claim 1, wherein the extracting the local features of the image by the SIFT feature operator comprises:

visual vocabulary vectors are extracted from the image by a SIFT feature operator.

4. The similarity image retrieval method based on multi-feature fusion as claimed in claim 3, further comprising:

and merging the visual vocabularies with similar word senses through K-means to construct a word list containing K vocabularies.

5. The similarity image retrieval method based on multi-feature fusion as claimed in claim 3, further comprising:

and counting the times of occurrence of each word in the image, and representing the image as a K-dimensional numerical value vector.

6. The method for retrieving the similarity image based on the multi-feature fusion as claimed in claim 1, wherein the calculating the similarity between different images comprises:

and calculating the global similarity of different images according to the basis of characteristic extraction of the GIST characteristic operator and Euclidean distance.

7. The method for retrieving the similarity image based on the multi-feature fusion as claimed in claim 6, further comprising:

and calculating the local similarity of different images through the Papanicolaou distance according to the basis of the characteristic histogram.

8. The method for retrieving the similarity image based on the multi-feature fusion as claimed in claim 6, further comprising:

and combining the local similarity and the global similarity of the images to serve as the basis for image similarity retrieval.

9. A similarity image retrieval system based on multi-feature fusion is characterized by comprising:

the first extraction module is used for carrying out global feature extraction on the image through a GIST feature operator;

the second extraction module is used for extracting local features of the image through an SIFT feature operator;

the calculating module is used for calculating the similarity between different images;

and the judging module is used for judging whether the similarity is greater than a first preset threshold value, if so, judging that the searched image is a similar image, and if not, deleting the searched image.

10. The system for similarity image retrieval based on multi-feature fusion as claimed in claim 9, further comprising:

at least one memory for storing computer instructions;

at least one processor in communication with the memory, wherein the at least one processor, when executing the computer instructions, causes the system to perform: the device comprises a first extraction module, a second extraction module, a calculation module and a judgment module.