CN110968721A

CN110968721A - Method and system for searching infringement of mass images and computer readable storage medium thereof

Info

Publication number: CN110968721A
Application number: CN201911189003.3A
Authority: CN
Inventors: 朱向军; 吴敏; 刘锋; 吴冠勇
Original assignee: SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD
Current assignee: SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-07

Abstract

The invention provides a method and a system for searching infringement of mass images and a computer readable storage medium thereof, wherein the method comprises the following steps: s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model; s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image; s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics; s4 search and judgment: and S1, constructing an inverted index system by using the bag-of-words model, traversing the items corresponding to the visual words in the image to be retrieved, calculating the Hamming distance between the binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to the accumulated matching. And further, the retrieval speed of the infringement image is improved, and meanwhile, higher accuracy is guaranteed.

Description

Method and system for searching infringement of mass images and computer readable storage medium thereof

Technical Field

The invention relates to the field of computer vision, in particular to an image infringement retrieval method and system based on SIFT and local binary features and a computer-readable storage medium thereof.

Background

The manual local features are important for the task of image retrieval, and occupy the mainstream method of image retrieval before the appearance of global feature expression represented by deep learning. The combination of the local features and the bag-of-words model improves the speed and accuracy of retrieval, and under the condition of small image scale, the bag-of-words model contains less visual words, and a method for obtaining global features, such as VALD, by local feature aggregation is generally adopted; when the image scale is large, the visual words are more, generally an inverted index system is adopted, and the direct matching of the visual words is used as a retrieval basis.

For infringement feature retrieval, the global feature is poor in performance, one main reason is that some infringement types such as clipping and splicing can greatly affect the global feature, and at present, the mainstream method is to screen infringement images by using accurate matching of local features. In order to realize accurate matching of local features, geometric verification is adopted to filter out mismatching in the current mainstream method.

Such as patent numbers: CN201710267385.1 provides an image retrieval system, which mainly describes that the image retrieval system includes: the method comprises the steps of inquiring an image sample, extracting a first local feature in an image library, an anti-misjudgment module, extracting a second local feature in the image library, a safety control module, an image retrieval and an image safety display; according to the invention, through the application of the keywords and the marks, the database is divided into a plurality of sub-databases in advance, and the sub-databases with high correlation degree are searched in the process of searching, so that the calculation amount is reduced, and the operation speed is improved; when the image is represented based on the word vocabulary packet, the weighted representation and the first visual similarity are provided, so that the time overhead is reduced; when the image is represented based on the feature combination, the spatial inclusion relationship among the local features is utilized, and the related local features are combined together to enhance the visual expression capability of the image; the feature combination not only has good scale and rotation invariance, but also can naturally utilize the relative position information among feature elements to carry out local geometric verification and eliminate possible error matching.

However, the prior art is limited by high computational complexity, and the geometric verification is only suitable for small-scale data and cannot meet the requirement of accurate retrieval of large-scale mass data.

Disclosure of Invention

The invention mainly aims to provide an infringement retrieval method and system for mass images and a computer readable storage medium thereof so as to improve the accuracy of infringement image retrieval identification.

In order to achieve the object, according to an aspect of the present invention, there is provided a method for piracy search of a large number of images, comprising the steps of:

s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model;

s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image;

s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics;

s4 search and judgment: and S1, constructing an inverted index system by using the bag-of-words model, traversing the items corresponding to the visual words in the image to be retrieved, calculating the Hamming distance between the binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to the accumulated matching.

In a possible preferred embodiment, in step S1, the step of extracting SIFT feature points of the template image includes: and carrying out aspect ratio-preserving scaling processing on the template image to control the size so as to limit the extraction number of SIFT feature points.

In a possible preferred embodiment, in step S1, the step of obtaining visual vocabulary through clustering process includes: and (3) integrating the extracted SIFT feature points into a feature set, obtaining clustering centers by using an AKM clustering algorithm, and establishing a bag-of-words model by taking each clustering center as a visual vocabulary.

In a possible preferred embodiment, in step S2, the step of calculating the inverse document weight of the visual vocabulary includes: bag of words model { c) containing K visual words₁，c₂，...c_KSeparately, the calculation is carried out:

and selects the vocabulary with the smallest weight of the anti-document.

In a possible preferred embodiment, in step S2, the method further includes an infringing data generation processing step: and carrying out exception processing on the image blocks cut out according to the positioned SIFT feature points.

In a possible preferred embodiment, wherein in step S3, the metric learning step includes: and the triple loss is used as a loss function, so that the output characteristics of the image blocks in the same category are as close as possible, and the output characteristics in different categories are as far away from processing as possible.

In a possible preferred embodiment, wherein in step S3, the hash learning step includes: the image block x is subjected to the feature f (x) obtained by metric learning, and the features obtained by all training image blocks of each category are averaged and binarized to be output as the target of Hash learning, namely, set as { x }₁，x₂...x_MIf the image blocks in the same category are, the target binary characteristics of the category are as follows:

in a possible preferred embodiment, in step S4, the step of building the inverted index system includes: the image in the image library is coded by utilizing a bag-of-words model, SIFT features and binary features of the image are extracted, and after corresponding visual words are obtained according to clustering, the visual words are correspondingly stored with the image codes and the binary features.

In order to achieve the object, according to another aspect of the present invention, there is provided a mass image infringement retrieval system for executing the mass image infringement retrieval method, including:

the first data processing module: extracting SIFT feature points of the template image, clustering to obtain visual word vocabularies, and establishing a bag-of-words model;

the second data processing module: the first processing module is in data connection with the first processing module to acquire the visual vocabulary, calculate corresponding inverse document weight, position SIFT feature points which accord with a preset threshold value, and acquire original training data by correspondingly cutting template images

A third data processing module: the system is connected with a second processing module in a data mode, acquires the original training data, trains a CNN network according to a comprehensive metric learning and Hash learning method, and generates binary characteristics;

a fourth data processing module: the system is in data connection with the first processing module and the third processing module, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in an image to be retrieved are traversed, Hamming distances among binary features are calculated, whether matching is carried out or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.

In order to achieve the object, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, the processor performs the above-mentioned mass image infringement retrieval method.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention can obtain more training data only by using less training images, and the training mode is unsupervised, so that the training set is very easy to manufacture.

2. The binary local features generated by the invention are convenient to store, the Hamming distance calculation speed is high, the mismatching screening speed can be improved, the retrieval speed is further improved, and meanwhile, the higher accuracy rate is ensured.

3. The method has stronger adaptability, and can simulate the infringement types possibly appearing in practical application by enriching the infringement image block sample production, thereby improving the applicability.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a logic architecture diagram of the mass image infringement retrieval method and system of the present invention;

FIG. 2 is a logic architecture diagram of step 3 in the infringement retrieval method for massive images according to the present invention;

FIG. 3 is a schematic flow chart of the method and system for piracy retrieval of massive images according to the present invention;

fig. 4 is a logic step diagram of the method for piracy search of massive images according to the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and all of them are included in the disclosure and protection scope of the present invention. Meanwhile, in order to enable those skilled in the art to better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not a whole embodiment. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.

It should also be noted that the terms "first," "second," "S1," "S2," and the like in the description and claims of the invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

The massive image infringement retrieval method and the massive image infringement retrieval system are mainly based on SIFT and local binary feature technology, and error matching rapid filtering is achieved by using CNN (Convolutional Neural Network, CNN for short in full text) binary features based on image blocks. In addition, the method selects a proper training image block, designs a richer infringement sample, trains a targeted CNN, and combines metric learning and Hash learning, so that the CNN feature has strong infringement distinguishing capability and binary characteristics.

Therefore, the problem of mismatching caused by visual vocabulary matching can be accurately and quickly filtered by binary local features obtained by manufacturing special sample training CNN, so that the matching accuracy and processing speed of the infringement image block are improved, and the retrieval accuracy and judgment speed of the infringement image are further improved.

(A)

Specifically, as shown in fig. 1 to 4, the method for piracy search of massive images mainly includes the following steps:

s4 search and judgment: and constructing an inverted index system by using a bag-of-words model, traversing entries corresponding to visual words in the image to be retrieved, calculating the Hamming distance between binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to cumulative matching.

-S1 step

Specifically, in step S1, the image library stores template images for infringement comparison, and the bag-of-words model making step includes scaling the template images with aspect ratio maintained by using SIFT technology (Scale-invariant feature transform technology) to control the size to limit the number of SIFT feature points, where it is worth mentioning that the upper limit of the number of feature points is not fixed and is adjusted according to the development of computing power of various data processing devices to achieve the best experience effect under the optimal computing power condition that the data processing device in the prior art can achieve, and therefore, those skilled in the art should understand that the limit of the number of feature points can also be broken through as the computer computing power of the technical development is improved, so that the limit is not limited.

And then, integrating the extracted SIFT feature points into a feature set, and obtaining clustering centers by utilizing approximate K-means clustering (AKM algorithm), wherein each clustering center is regarded as a visual vocabulary, so that a bag-of-words model is established.

-S2 step

Specifically, in step S2, the training set creating step includes clipping the template image to obtain image blocks corresponding to the feature points, and calculating the inverse document weight of each visual vocabulary according to the bag-of-words model obtained in step S1.

For example, for a model containing K visual words, { c }₁，c₂，...c_KThe inverse document weight of a certain vocabulary describes the frequency of appearance of the vocabulary (the smaller the inverse document weight is, the higher the frequency is), and is specifically defined as:

where N represents the total number of pictures in the library, N_iIndicating the appearance of a visual vocabulary c_iThe number of pictures of (2). Then, a plurality of vocabularies with the minimum anti-document weight are selectedThe number of words to be selected increases with the number of clusters, for example, in this embodiment, the smallest weight may be selected

A word.

And then, SIFT feature points corresponding to the vocabularies are positioned, and corresponding cutting and dicing processing is carried out on the template image according to the feature points so as to obtain original training data.

In addition, in order to increase the number of samples of the training data, in a preferred embodiment, the step S2 further includes an infringement data generation processing step, so as to perform exception processing such as clipping and warping on the image block cut out according to the located SIFT feature point, specifically, each image block is regarded as a category, and the infringement data generation processing is performed on each category, for example, the image block includes: and rotating, compressing, adding noise, adjusting colors and other exception processing to simulate the possible patterns of each image block in an infringing use state, and obtaining more training samples of each category.

-S3 step

Specifically, in step S3, the network training step adopts an alternate training and weight sharing strategy, as shown in fig. 2, the alternate training is divided into metric learning and hash learning, the same CNN network is used for both, the metric learning is used as an initial stage, and the network parameters obtained after iteration for a certain number of steps are used to initialize the hash learning and perform the alternate training.

The metric learning preferably adopts triple losses as a loss function, so that the output features of the image blocks in the same category are as close as possible, and the output features in different categories are as far as possible.

For example, for a triplet input (a, p, n), where three are image blocks, a and p are of the same class, a and n are of different classes, the triplet loss is defined as:

loss＝max(d(a，p)-d(a，n)+m，0)

wherein d (a, p) represents the Euclidean distance of the output features f (a), f (p).

Wherein the hash learningSpecifically, the features f (x) obtained by the image block x through metric learning, the features obtained by all training image blocks of a certain category are averaged and binarized to be output as a target of hash learning, and { x }₁，x₂...x_MIf the image blocks in the same category are, the target binary characteristics of the category are as follows:

the Hash learning adopts binary cross entropy as a loss function, takes an activation value (sigmoid function) output by a network as a predicted value, and takes obtained y as a target value to train the network.

-S4 step

Specifically, in step S4, the retrieving step includes: and coding each image in the image library by using a bag-of-words model, extracting SIFT characteristics and binary local characteristics of the images, and obtaining a corresponding visual vocabulary according to clustering.

Constructing an inverted index system, wherein each vocabulary stores corresponding image codes and binary characteristics, such as: is expressed as { (id)₁，F₁)，(id₂，F₂)...(id_j，F_j)...}. Specifically, in the searching step, the corresponding entry of each visual word in the image to be searched is traversed during searching, the Hamming distance of the binary characteristic is calculated, a proper threshold value is set to judge whether the visual words are matched, and an infringement coefficient is given according to the accumulated matching.

For example: inputting an image to be detected, extracting m sift characteristics, and obtaining a corresponding visual vocabulary { c) through quantization₁，c₂...c_mInputting the image block corresponding to each sift characteristic point into the CNN network to obtain binary data F thereof_i. Searching features meeting infringement matching under the corresponding visual vocabulary of each local feature, comparing the Hamming distance with the threshold value to judge whether the features are matched, wherein the threshold value is matched with the binary specialThe characteristic length variation can be set in practical application

The accumulated matching times are used as the infringement index of the images in the library, so that whether the images infringe or not can be accurately judged, and the accuracy rate and the processing speed of the retrieval of the infringement images are effectively improved.

(II)

Referring to fig. 1 and fig. 3, another aspect of the present invention further provides a mass image infringement retrieval system for executing the mass image infringement retrieval method in embodiment 1, and specifically, the mass image infringement retrieval system includes: the present invention relates to a data processing system, and more particularly, to a data processing system including a first data processing module, a second data processing module, a third data processing module, and a fourth data processing module, wherein it should be noted that the first to fourth data processing modules may be integrated together or separately set up.

Specifically, the system for infringing and retrieving the massive images comprises:

In a preferred embodiment, the data processing procedure of the first data processing module includes: the template images are subjected to aspect ratio keeping scaling processing by adopting a Scale-invariant feature transform (SIFT) technology to control the size so as to limit SIFT feature points, and it is worth mentioning that the upper limit of the feature points is not fixedly limited and is adjusted along with the calculation power development of various data processing equipment so as to achieve the best experience effect under the optimal calculation power condition which can be achieved by the data processing equipment in the prior art.

In a preferred embodiment, the data processing procedure of the second data processing module includes: and cutting the template image to obtain image blocks corresponding to the characteristic points, and calculating the inverse document weight of each visual vocabulary according to the bag-of-words model obtained from the first data processing module.

where N represents the total number of pictures in the library, N_iIndicating the appearance of a visual vocabulary c_iThe number of pictures of (2). Then select the vocabulary with the smallest weight of the anti-document, the number of the vocabulary to be selected increases with the number of clusters, for exampleAs in the present embodiment, for example, the weight with the smallest weight may be selected

A word.

In addition, in order to increase the number of samples of the training data, in a preferred embodiment, the method further includes an infringement data generation processing step, so as to perform exception processing such as clipping and warping on the image blocks clipped according to the located SIFT feature points, specifically, each image block is regarded as a category, and the infringement data generation processing is performed for each category, where the infringement data generation processing includes: and rotating, compressing, adding noise, adjusting colors and other exception processing to simulate the possible patterns of each image block in an infringing use state, and obtaining more training samples of each category.

In a preferred embodiment, the data processing procedure of the third data processing module includes: and training the CNN network by adopting an alternate training and weight sharing strategy, wherein the alternate training is divided into metric learning and Hash learning, the metric learning and the Hash learning adopt the same CNN network, the metric learning is used as an initial stage, and network parameters obtained after iteration for a certain step number are used for initializing the Hash learning and performing alternate training.

loss＝max(d(a，p)-d(a，n)+m，0)

The hash learning is specifically that the image block x is subjected to metric learning to obtain a feature f (x), a place of a certain classThe characteristics obtained by the training image block are averaged and binarized to be output as a target of Hash learning, and { x is set₁，x₂...x_MIf the image blocks in the same category are, the target binary characteristics of the category are as follows:

In a preferred embodiment, the data processing procedure of the fourth data processing module includes: and the first processing module and the third processing module are in data connection, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in the image to be retrieved are traversed, the Hamming distance between binary features is calculated, whether the binary features are matched or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.

The construction of the inverted index system is that each vocabulary stores corresponding image codes and binary features, such as: is expressed as { (id)₁，F₁)，(id₂，F₂)...(id_j，F_j)...}. Specifically, in the searching step, the corresponding entry of each visual word in the image to be searched is traversed during searching, the Hamming distance of the binary characteristic is calculated, a proper threshold value is set to judge whether the visual words are matched, and an infringement coefficient is given according to the accumulated matching.

For example: inputting an image to be detected, extracting m sift characteristics, and obtaining a corresponding visual vocabulary { c) through quantization₁，c₂...c_mInputting the image block corresponding to each sift characteristic point into the CNN network to obtain binary data F thereof_i. Each local feature searches for features meeting infringement matching under the corresponding visual vocabulary, and the Hamming distance and the threshold value are compared to judge whether to judgeThe threshold value varies with the length of the binary feature, and may be set in practical application

(III)

Another aspect of the present invention also provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, will execute the massive image infringement retrieval method of embodiment 1.

In summary, according to the mass image infringement retrieval method, the mass image infringement retrieval system and the computer-readable storage medium, the visual words in the bag-of-words model and the inverse document weights thereof are used for selecting proper image blocks to be used for network training, and the features generated by the trained network can be accurately and rapidly filtered to remove mismatching. Meanwhile, a large amount of training data can be generated by using fewer images, and a training set can be enriched according to infringement types in practical application, so that the model is very suitable for practical requirements, and has high practical application value.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof, and any modification, equivalent replacement, or improvement made within the spirit and principle of the invention should be included in the protection scope of the invention.

It will be appreciated by those skilled in the art that, in addition to implementing the system, apparatus and various modules thereof provided by the present invention in the form of pure computer readable program code, the same procedures may be implemented entirely by logically programming method steps such that the system, apparatus and various modules thereof provided by the present invention are implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

In addition, all or part of the steps of the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In summary, various different implementation manners of the embodiments of the present invention can be arbitrarily combined, and the disclosure of the embodiments of the present invention should be regarded as the disclosure of the embodiments of the present invention as long as the concepts of the embodiments of the present invention are not violated.

Claims

1. A method for piracy retrieval of massive images comprises the following steps:

2. The mass image infringement retrieval method according to claim 1, wherein in step S1, the step of extracting SIFT feature points of the template image includes: and carrying out aspect ratio keeping scaling processing on the template image to control the size to limit the extraction number of SIFT feature points.

3. The method for infringing retrieval on massive images as claimed in claim 1, wherein in step S1, the step of obtaining visual vocabulary through clustering process comprises: and (3) integrating the extracted SIFT feature points into a feature set, obtaining clustering centers by using an AKM clustering algorithm, and establishing a bag-of-words model by taking each clustering center as a visual vocabulary.

4. The mass image infringement retrieval method according to claim 1, wherein in step S2, the step of calculating the inverse document weight of the visual vocabulary includes: bag of words model { c) containing K visual words₁，c₂，...c_KSeparately, the calculation is carried out:

and selects the vocabulary with the smallest weight of the anti-document.

5. The method for infringing retrieval of a huge amount of images as claimed in claim 1, wherein in step S2, further comprising an infringing data generation processing step: and carrying out exception processing on the image blocks cut out according to the positioned SIFT feature points.

6. The piracy retrieval method for massive images as claimed in claim 5, wherein in step S3, the metric learning step comprises: and the triple loss is used as a loss function, so that the output characteristics of the image blocks in the same category are as close as possible, and the output characteristics in different categories are as far away from processing as possible.

7. The method for piracy retrieval of massive images as claimed in claim 6, wherein in step S3, the hash learning step comprises: the image block x is subjected to the feature f (x) obtained by metric learning, and the features obtained by all training image blocks of each category are averaged and binarized to be output as the target of Hash learning, namely, set as { x }₁，x₂...x_MIf the image blocks in the same category are, the target binary characteristics of the category are as follows:

8. the method for infringing retrieval of massive images according to claim 1, wherein in step S4, the step of constructing an inverted index system includes: the image in the image library is coded by utilizing a bag-of-words model, SIFT features and binary features of the image are extracted, and after corresponding visual words are obtained according to clustering, the visual words are correspondingly stored with the image codes and the binary features.

9. A mass image infringement retrieval system for performing the mass image infringement retrieval method according to any one of claims 1 to 8, comprising:

A third data processing module: the system is in data connection with a second processing module, acquires the original training data, trains a CNN network according to a comprehensive metric learning and Hash learning method, and generates binary characteristics;

10. A computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform a method for piracy retrieval of images as claimed in any one of claims 1 to 8.