CN110968721A - Method and system for searching infringement of mass images and computer readable storage medium thereof - Google Patents

Method and system for searching infringement of mass images and computer readable storage medium thereof Download PDF

Info

Publication number
CN110968721A
CN110968721A CN201911189003.3A CN201911189003A CN110968721A CN 110968721 A CN110968721 A CN 110968721A CN 201911189003 A CN201911189003 A CN 201911189003A CN 110968721 A CN110968721 A CN 110968721A
Authority
CN
China
Prior art keywords
image
infringement
visual
bag
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911189003.3A
Other languages
Chinese (zh)
Inventor
朱向军
吴敏
刘锋
吴冠勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD
Original Assignee
SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD filed Critical SHANGHAI GUANYONG INFORMATION TECHNOLOGY CO LTD
Priority to CN201911189003.3A priority Critical patent/CN110968721A/en
Publication of CN110968721A publication Critical patent/CN110968721A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for searching infringement of mass images and a computer readable storage medium thereof, wherein the method comprises the following steps: s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model; s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image; s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics; s4 search and judgment: and S1, constructing an inverted index system by using the bag-of-words model, traversing the items corresponding to the visual words in the image to be retrieved, calculating the Hamming distance between the binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to the accumulated matching. And further, the retrieval speed of the infringement image is improved, and meanwhile, higher accuracy is guaranteed.

Description

Method and system for searching infringement of mass images and computer readable storage medium thereof
Technical Field
The invention relates to the field of computer vision, in particular to an image infringement retrieval method and system based on SIFT and local binary features and a computer-readable storage medium thereof.
Background
The manual local features are important for the task of image retrieval, and occupy the mainstream method of image retrieval before the appearance of global feature expression represented by deep learning. The combination of the local features and the bag-of-words model improves the speed and accuracy of retrieval, and under the condition of small image scale, the bag-of-words model contains less visual words, and a method for obtaining global features, such as VALD, by local feature aggregation is generally adopted; when the image scale is large, the visual words are more, generally an inverted index system is adopted, and the direct matching of the visual words is used as a retrieval basis.
For infringement feature retrieval, the global feature is poor in performance, one main reason is that some infringement types such as clipping and splicing can greatly affect the global feature, and at present, the mainstream method is to screen infringement images by using accurate matching of local features. In order to realize accurate matching of local features, geometric verification is adopted to filter out mismatching in the current mainstream method.
Such as patent numbers: CN201710267385.1 provides an image retrieval system, which mainly describes that the image retrieval system includes: the method comprises the steps of inquiring an image sample, extracting a first local feature in an image library, an anti-misjudgment module, extracting a second local feature in the image library, a safety control module, an image retrieval and an image safety display; according to the invention, through the application of the keywords and the marks, the database is divided into a plurality of sub-databases in advance, and the sub-databases with high correlation degree are searched in the process of searching, so that the calculation amount is reduced, and the operation speed is improved; when the image is represented based on the word vocabulary packet, the weighted representation and the first visual similarity are provided, so that the time overhead is reduced; when the image is represented based on the feature combination, the spatial inclusion relationship among the local features is utilized, and the related local features are combined together to enhance the visual expression capability of the image; the feature combination not only has good scale and rotation invariance, but also can naturally utilize the relative position information among feature elements to carry out local geometric verification and eliminate possible error matching.
However, the prior art is limited by high computational complexity, and the geometric verification is only suitable for small-scale data and cannot meet the requirement of accurate retrieval of large-scale mass data.
Disclosure of Invention
The invention mainly aims to provide an infringement retrieval method and system for mass images and a computer readable storage medium thereof so as to improve the accuracy of infringement image retrieval identification.
In order to achieve the object, according to an aspect of the present invention, there is provided a method for piracy search of a large number of images, comprising the steps of:
s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model;
s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image;
s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics;
s4 search and judgment: and S1, constructing an inverted index system by using the bag-of-words model, traversing the items corresponding to the visual words in the image to be retrieved, calculating the Hamming distance between the binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to the accumulated matching.
In a possible preferred embodiment, in step S1, the step of extracting SIFT feature points of the template image includes: and carrying out aspect ratio-preserving scaling processing on the template image to control the size so as to limit the extraction number of SIFT feature points.
In a possible preferred embodiment, in step S1, the step of obtaining visual vocabulary through clustering process includes: and (3) integrating the extracted SIFT feature points into a feature set, obtaining clustering centers by using an AKM clustering algorithm, and establishing a bag-of-words model by taking each clustering center as a visual vocabulary.
In a possible preferred embodiment, in step S2, the step of calculating the inverse document weight of the visual vocabulary includes: bag of words model { c) containing K visual words1,c2,...cKSeparately, the calculation is carried out:
Figure BDA0002292762840000021
and selects the vocabulary with the smallest weight of the anti-document.
In a possible preferred embodiment, in step S2, the method further includes an infringing data generation processing step: and carrying out exception processing on the image blocks cut out according to the positioned SIFT feature points.
In a possible preferred embodiment, wherein in step S3, the metric learning step includes: and the triple loss is used as a loss function, so that the output characteristics of the image blocks in the same category are as close as possible, and the output characteristics in different categories are as far away from processing as possible.
In a possible preferred embodiment, wherein in step S3, the hash learning step includes: the image block x is subjected to the feature f (x) obtained by metric learning, and the features obtained by all training image blocks of each category are averaged and binarized to be output as the target of Hash learning, namely, set as { x }1,x2...xMIf the image blocks in the same category are, the target binary characteristics of the category are as follows:
Figure BDA0002292762840000031
Figure BDA0002292762840000032
in a possible preferred embodiment, in step S4, the step of building the inverted index system includes: the image in the image library is coded by utilizing a bag-of-words model, SIFT features and binary features of the image are extracted, and after corresponding visual words are obtained according to clustering, the visual words are correspondingly stored with the image codes and the binary features.
In order to achieve the object, according to another aspect of the present invention, there is provided a mass image infringement retrieval system for executing the mass image infringement retrieval method, including:
the first data processing module: extracting SIFT feature points of the template image, clustering to obtain visual word vocabularies, and establishing a bag-of-words model;
the second data processing module: the first processing module is in data connection with the first processing module to acquire the visual vocabulary, calculate corresponding inverse document weight, position SIFT feature points which accord with a preset threshold value, and acquire original training data by correspondingly cutting template images
A third data processing module: the system is connected with a second processing module in a data mode, acquires the original training data, trains a CNN network according to a comprehensive metric learning and Hash learning method, and generates binary characteristics;
a fourth data processing module: the system is in data connection with the first processing module and the third processing module, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in an image to be retrieved are traversed, Hamming distances among binary features are calculated, whether matching is carried out or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.
In order to achieve the object, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, the processor performs the above-mentioned mass image infringement retrieval method.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention can obtain more training data only by using less training images, and the training mode is unsupervised, so that the training set is very easy to manufacture.
2. The binary local features generated by the invention are convenient to store, the Hamming distance calculation speed is high, the mismatching screening speed can be improved, the retrieval speed is further improved, and meanwhile, the higher accuracy rate is ensured.
3. The method has stronger adaptability, and can simulate the infringement types possibly appearing in practical application by enriching the infringement image block sample production, thereby improving the applicability.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a logic architecture diagram of the mass image infringement retrieval method and system of the present invention;
FIG. 2 is a logic architecture diagram of step 3 in the infringement retrieval method for massive images according to the present invention;
FIG. 3 is a schematic flow chart of the method and system for piracy retrieval of massive images according to the present invention;
fig. 4 is a logic step diagram of the method for piracy search of massive images according to the present invention.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and all of them are included in the disclosure and protection scope of the present invention. Meanwhile, in order to enable those skilled in the art to better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not a whole embodiment. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.
It should also be noted that the terms "first," "second," "S1," "S2," and the like in the description and claims of the invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The massive image infringement retrieval method and the massive image infringement retrieval system are mainly based on SIFT and local binary feature technology, and error matching rapid filtering is achieved by using CNN (Convolutional Neural Network, CNN for short in full text) binary features based on image blocks. In addition, the method selects a proper training image block, designs a richer infringement sample, trains a targeted CNN, and combines metric learning and Hash learning, so that the CNN feature has strong infringement distinguishing capability and binary characteristics.
Therefore, the problem of mismatching caused by visual vocabulary matching can be accurately and quickly filtered by binary local features obtained by manufacturing special sample training CNN, so that the matching accuracy and processing speed of the infringement image block are improved, and the retrieval accuracy and judgment speed of the infringement image are further improved.
(A)
Specifically, as shown in fig. 1 to 4, the method for piracy search of massive images mainly includes the following steps:
s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model;
s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image;
s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics;
s4 search and judgment: and constructing an inverted index system by using a bag-of-words model, traversing entries corresponding to visual words in the image to be retrieved, calculating the Hamming distance between binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to cumulative matching.
-S1 step
Specifically, in step S1, the image library stores template images for infringement comparison, and the bag-of-words model making step includes scaling the template images with aspect ratio maintained by using SIFT technology (Scale-invariant feature transform technology) to control the size to limit the number of SIFT feature points, where it is worth mentioning that the upper limit of the number of feature points is not fixed and is adjusted according to the development of computing power of various data processing devices to achieve the best experience effect under the optimal computing power condition that the data processing device in the prior art can achieve, and therefore, those skilled in the art should understand that the limit of the number of feature points can also be broken through as the computer computing power of the technical development is improved, so that the limit is not limited.
And then, integrating the extracted SIFT feature points into a feature set, and obtaining clustering centers by utilizing approximate K-means clustering (AKM algorithm), wherein each clustering center is regarded as a visual vocabulary, so that a bag-of-words model is established.
-S2 step
Specifically, in step S2, the training set creating step includes clipping the template image to obtain image blocks corresponding to the feature points, and calculating the inverse document weight of each visual vocabulary according to the bag-of-words model obtained in step S1.
For example, for a model containing K visual words, { c }1,c2,...cKThe inverse document weight of a certain vocabulary describes the frequency of appearance of the vocabulary (the smaller the inverse document weight is, the higher the frequency is), and is specifically defined as:
Figure BDA0002292762840000061
where N represents the total number of pictures in the library, NiIndicating the appearance of a visual vocabulary ciThe number of pictures of (2). Then, a plurality of vocabularies with the minimum anti-document weight are selectedThe number of words to be selected increases with the number of clusters, for example, in this embodiment, the smallest weight may be selected
Figure BDA0002292762840000062
A word.
And then, SIFT feature points corresponding to the vocabularies are positioned, and corresponding cutting and dicing processing is carried out on the template image according to the feature points so as to obtain original training data.
In addition, in order to increase the number of samples of the training data, in a preferred embodiment, the step S2 further includes an infringement data generation processing step, so as to perform exception processing such as clipping and warping on the image block cut out according to the located SIFT feature point, specifically, each image block is regarded as a category, and the infringement data generation processing is performed on each category, for example, the image block includes: and rotating, compressing, adding noise, adjusting colors and other exception processing to simulate the possible patterns of each image block in an infringing use state, and obtaining more training samples of each category.
-S3 step
Specifically, in step S3, the network training step adopts an alternate training and weight sharing strategy, as shown in fig. 2, the alternate training is divided into metric learning and hash learning, the same CNN network is used for both, the metric learning is used as an initial stage, and the network parameters obtained after iteration for a certain number of steps are used to initialize the hash learning and perform the alternate training.
The metric learning preferably adopts triple losses as a loss function, so that the output features of the image blocks in the same category are as close as possible, and the output features in different categories are as far as possible.
For example, for a triplet input (a, p, n), where three are image blocks, a and p are of the same class, a and n are of different classes, the triplet loss is defined as:
loss=max(d(a,p)-d(a,n)+m,0)
wherein d (a, p) represents the Euclidean distance of the output features f (a), f (p).
Wherein the hash learningSpecifically, the features f (x) obtained by the image block x through metric learning, the features obtained by all training image blocks of a certain category are averaged and binarized to be output as a target of hash learning, and { x }1,x2...xMIf the image blocks in the same category are, the target binary characteristics of the category are as follows:
Figure BDA0002292762840000071
Figure BDA0002292762840000072
the Hash learning adopts binary cross entropy as a loss function, takes an activation value (sigmoid function) output by a network as a predicted value, and takes obtained y as a target value to train the network.
-S4 step
Specifically, in step S4, the retrieving step includes: and coding each image in the image library by using a bag-of-words model, extracting SIFT characteristics and binary local characteristics of the images, and obtaining a corresponding visual vocabulary according to clustering.
Constructing an inverted index system, wherein each vocabulary stores corresponding image codes and binary characteristics, such as: is expressed as { (id)1,F1),(id2,F2)...(idj,Fj)...}. Specifically, in the searching step, the corresponding entry of each visual word in the image to be searched is traversed during searching, the Hamming distance of the binary characteristic is calculated, a proper threshold value is set to judge whether the visual words are matched, and an infringement coefficient is given according to the accumulated matching.
For example: inputting an image to be detected, extracting m sift characteristics, and obtaining a corresponding visual vocabulary { c) through quantization1,c2...cmInputting the image block corresponding to each sift characteristic point into the CNN network to obtain binary data F thereofi. Searching features meeting infringement matching under the corresponding visual vocabulary of each local feature, comparing the Hamming distance with the threshold value to judge whether the features are matched, wherein the threshold value is matched with the binary specialThe characteristic length variation can be set in practical application
Figure BDA0002292762840000081
Figure BDA0002292762840000082
The accumulated matching times are used as the infringement index of the images in the library, so that whether the images infringe or not can be accurately judged, and the accuracy rate and the processing speed of the retrieval of the infringement images are effectively improved.
(II)
Referring to fig. 1 and fig. 3, another aspect of the present invention further provides a mass image infringement retrieval system for executing the mass image infringement retrieval method in embodiment 1, and specifically, the mass image infringement retrieval system includes: the present invention relates to a data processing system, and more particularly, to a data processing system including a first data processing module, a second data processing module, a third data processing module, and a fourth data processing module, wherein it should be noted that the first to fourth data processing modules may be integrated together or separately set up.
Specifically, the system for infringing and retrieving the massive images comprises:
the first data processing module: extracting SIFT feature points of the template image, clustering to obtain visual word vocabularies, and establishing a bag-of-words model;
the second data processing module: the first processing module is in data connection with the first processing module to acquire the visual vocabulary, calculate corresponding inverse document weight, position SIFT feature points which accord with a preset threshold value, and acquire original training data by correspondingly cutting template images
A third data processing module: the system is connected with a second processing module in a data mode, acquires the original training data, trains a CNN network according to a comprehensive metric learning and Hash learning method, and generates binary characteristics;
a fourth data processing module: the system is in data connection with the first processing module and the third processing module, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in an image to be retrieved are traversed, Hamming distances among binary features are calculated, whether matching is carried out or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.
In a preferred embodiment, the data processing procedure of the first data processing module includes: the template images are subjected to aspect ratio keeping scaling processing by adopting a Scale-invariant feature transform (SIFT) technology to control the size so as to limit SIFT feature points, and it is worth mentioning that the upper limit of the feature points is not fixedly limited and is adjusted along with the calculation power development of various data processing equipment so as to achieve the best experience effect under the optimal calculation power condition which can be achieved by the data processing equipment in the prior art.
And then, integrating the extracted SIFT feature points into a feature set, and obtaining clustering centers by utilizing approximate K-means clustering (AKM algorithm), wherein each clustering center is regarded as a visual vocabulary, so that a bag-of-words model is established.
In a preferred embodiment, the data processing procedure of the second data processing module includes: and cutting the template image to obtain image blocks corresponding to the characteristic points, and calculating the inverse document weight of each visual vocabulary according to the bag-of-words model obtained from the first data processing module.
For example, for a model containing K visual words, { c }1,c2,...cKThe inverse document weight of a certain vocabulary describes the frequency of appearance of the vocabulary (the smaller the inverse document weight is, the higher the frequency is), and is specifically defined as:
Figure BDA0002292762840000091
where N represents the total number of pictures in the library, NiIndicating the appearance of a visual vocabulary ciThe number of pictures of (2). Then select the vocabulary with the smallest weight of the anti-document, the number of the vocabulary to be selected increases with the number of clusters, for exampleAs in the present embodiment, for example, the weight with the smallest weight may be selected
Figure BDA0002292762840000092
A word.
And then, SIFT feature points corresponding to the vocabularies are positioned, and corresponding cutting and dicing processing is carried out on the template image according to the feature points so as to obtain original training data.
In addition, in order to increase the number of samples of the training data, in a preferred embodiment, the method further includes an infringement data generation processing step, so as to perform exception processing such as clipping and warping on the image blocks clipped according to the located SIFT feature points, specifically, each image block is regarded as a category, and the infringement data generation processing is performed for each category, where the infringement data generation processing includes: and rotating, compressing, adding noise, adjusting colors and other exception processing to simulate the possible patterns of each image block in an infringing use state, and obtaining more training samples of each category.
In a preferred embodiment, the data processing procedure of the third data processing module includes: and training the CNN network by adopting an alternate training and weight sharing strategy, wherein the alternate training is divided into metric learning and Hash learning, the metric learning and the Hash learning adopt the same CNN network, the metric learning is used as an initial stage, and network parameters obtained after iteration for a certain step number are used for initializing the Hash learning and performing alternate training.
The metric learning preferably adopts triple losses as a loss function, so that the output features of the image blocks in the same category are as close as possible, and the output features in different categories are as far as possible.
For example, for a triplet input (a, p, n), where three are image blocks, a and p are of the same class, a and n are of different classes, the triplet loss is defined as:
loss=max(d(a,p)-d(a,n)+m,0)
wherein d (a, p) represents the Euclidean distance of the output features f (a), f (p).
The hash learning is specifically that the image block x is subjected to metric learning to obtain a feature f (x), a place of a certain classThe characteristics obtained by the training image block are averaged and binarized to be output as a target of Hash learning, and { x is set1,x2...xMIf the image blocks in the same category are, the target binary characteristics of the category are as follows:
Figure BDA0002292762840000101
Figure BDA0002292762840000102
the Hash learning adopts binary cross entropy as a loss function, takes an activation value (sigmoid function) output by a network as a predicted value, and takes obtained y as a target value to train the network.
In a preferred embodiment, the data processing procedure of the fourth data processing module includes: and the first processing module and the third processing module are in data connection, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in the image to be retrieved are traversed, the Hamming distance between binary features is calculated, whether the binary features are matched or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.
The construction of the inverted index system is that each vocabulary stores corresponding image codes and binary features, such as: is expressed as { (id)1,F1),(id2,F2)...(idj,Fj)...}. Specifically, in the searching step, the corresponding entry of each visual word in the image to be searched is traversed during searching, the Hamming distance of the binary characteristic is calculated, a proper threshold value is set to judge whether the visual words are matched, and an infringement coefficient is given according to the accumulated matching.
For example: inputting an image to be detected, extracting m sift characteristics, and obtaining a corresponding visual vocabulary { c) through quantization1,c2...cmInputting the image block corresponding to each sift characteristic point into the CNN network to obtain binary data F thereofi. Each local feature searches for features meeting infringement matching under the corresponding visual vocabulary, and the Hamming distance and the threshold value are compared to judge whether to judgeThe threshold value varies with the length of the binary feature, and may be set in practical application
Figure BDA0002292762840000111
Figure BDA0002292762840000112
The accumulated matching times are used as the infringement index of the images in the library, so that whether the images infringe or not can be accurately judged, and the accuracy rate and the processing speed of the retrieval of the infringement images are effectively improved.
(III)
Another aspect of the present invention also provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, will execute the massive image infringement retrieval method of embodiment 1.
In summary, according to the mass image infringement retrieval method, the mass image infringement retrieval system and the computer-readable storage medium, the visual words in the bag-of-words model and the inverse document weights thereof are used for selecting proper image blocks to be used for network training, and the features generated by the trained network can be accurately and rapidly filtered to remove mismatching. Meanwhile, a large amount of training data can be generated by using fewer images, and a training set can be enriched according to infringement types in practical application, so that the model is very suitable for practical requirements, and has high practical application value.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof, and any modification, equivalent replacement, or improvement made within the spirit and principle of the invention should be included in the protection scope of the invention.
It will be appreciated by those skilled in the art that, in addition to implementing the system, apparatus and various modules thereof provided by the present invention in the form of pure computer readable program code, the same procedures may be implemented entirely by logically programming method steps such that the system, apparatus and various modules thereof provided by the present invention are implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
In addition, all or part of the steps of the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In summary, various different implementation manners of the embodiments of the present invention can be arbitrarily combined, and the disclosure of the embodiments of the present invention should be regarded as the disclosure of the embodiments of the present invention as long as the concepts of the embodiments of the present invention are not violated.

Claims (10)

1. A method for piracy retrieval of massive images comprises the following steps:
s1 generates a bag of words model: extracting SIFT feature points of the template image, clustering to obtain visual vocabularies, and establishing a bag-of-words model;
s2, making a training set: calculating the inverse document weight of each visual vocabulary, positioning SIFT feature points which accord with a preset threshold value, and obtaining original training data by correspondingly cutting a template image;
s3 training the neural network: training the CNN network by adopting the original training data of the step S2 according to a comprehensive metric learning and Hash learning method to generate binary characteristics;
s4 search and judgment: and S1, constructing an inverted index system by using the bag-of-words model, traversing the items corresponding to the visual words in the image to be retrieved, calculating the Hamming distance between the binary features, judging whether the binary features are matched according to a preset threshold value, and giving an infringement coefficient according to the accumulated matching.
2. The mass image infringement retrieval method according to claim 1, wherein in step S1, the step of extracting SIFT feature points of the template image includes: and carrying out aspect ratio keeping scaling processing on the template image to control the size to limit the extraction number of SIFT feature points.
3. The method for infringing retrieval on massive images as claimed in claim 1, wherein in step S1, the step of obtaining visual vocabulary through clustering process comprises: and (3) integrating the extracted SIFT feature points into a feature set, obtaining clustering centers by using an AKM clustering algorithm, and establishing a bag-of-words model by taking each clustering center as a visual vocabulary.
4. The mass image infringement retrieval method according to claim 1, wherein in step S2, the step of calculating the inverse document weight of the visual vocabulary includes: bag of words model { c) containing K visual words1,c2,...cKSeparately, the calculation is carried out:
Figure FDA0002292762830000011
and selects the vocabulary with the smallest weight of the anti-document.
5. The method for infringing retrieval of a huge amount of images as claimed in claim 1, wherein in step S2, further comprising an infringing data generation processing step: and carrying out exception processing on the image blocks cut out according to the positioned SIFT feature points.
6. The piracy retrieval method for massive images as claimed in claim 5, wherein in step S3, the metric learning step comprises: and the triple loss is used as a loss function, so that the output characteristics of the image blocks in the same category are as close as possible, and the output characteristics in different categories are as far away from processing as possible.
7. The method for piracy retrieval of massive images as claimed in claim 6, wherein in step S3, the hash learning step comprises: the image block x is subjected to the feature f (x) obtained by metric learning, and the features obtained by all training image blocks of each category are averaged and binarized to be output as the target of Hash learning, namely, set as { x }1,x2...xMIf the image blocks in the same category are, the target binary characteristics of the category are as follows:
Figure FDA0002292762830000021
Figure FDA0002292762830000022
8. the method for infringing retrieval of massive images according to claim 1, wherein in step S4, the step of constructing an inverted index system includes: the image in the image library is coded by utilizing a bag-of-words model, SIFT features and binary features of the image are extracted, and after corresponding visual words are obtained according to clustering, the visual words are correspondingly stored with the image codes and the binary features.
9. A mass image infringement retrieval system for performing the mass image infringement retrieval method according to any one of claims 1 to 8, comprising:
the first data processing module: extracting SIFT feature points of the template image, clustering to obtain visual word vocabularies, and establishing a bag-of-words model;
the second data processing module: the first processing module is in data connection with the first processing module to acquire the visual vocabulary, calculate corresponding inverse document weight, position SIFT feature points which accord with a preset threshold value, and acquire original training data by correspondingly cutting template images
A third data processing module: the system is in data connection with a second processing module, acquires the original training data, trains a CNN network according to a comprehensive metric learning and Hash learning method, and generates binary characteristics;
a fourth data processing module: the system is in data connection with the first processing module and the third processing module, a bag-of-words model is obtained to construct an inverted index system, entries corresponding to visual words in an image to be retrieved are traversed, Hamming distances among binary features are calculated, whether matching is carried out or not is judged according to a preset threshold value, and an infringement coefficient is given according to accumulated matching.
10. A computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform a method for piracy retrieval of images as claimed in any one of claims 1 to 8.
CN201911189003.3A 2019-11-28 2019-11-28 Method and system for searching infringement of mass images and computer readable storage medium thereof Pending CN110968721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911189003.3A CN110968721A (en) 2019-11-28 2019-11-28 Method and system for searching infringement of mass images and computer readable storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911189003.3A CN110968721A (en) 2019-11-28 2019-11-28 Method and system for searching infringement of mass images and computer readable storage medium thereof

Publications (1)

Publication Number Publication Date
CN110968721A true CN110968721A (en) 2020-04-07

Family

ID=70032020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911189003.3A Pending CN110968721A (en) 2019-11-28 2019-11-28 Method and system for searching infringement of mass images and computer readable storage medium thereof

Country Status (1)

Country Link
CN (1) CN110968721A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552829A (en) * 2020-05-07 2020-08-18 北京海益同展信息科技有限公司 Method and apparatus for analyzing image material
CN112417381A (en) * 2020-12-11 2021-02-26 中国搜索信息科技股份有限公司 Method and device for rapidly positioning infringement image applied to image copyright protection
CN117474903A (en) * 2023-12-26 2024-01-30 浪潮电子信息产业股份有限公司 Image infringement detection method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399870A (en) * 2013-07-08 2013-11-20 华中科技大学 Visual word bag feature weighting method and system based on classification drive
CN104199922A (en) * 2014-09-01 2014-12-10 中国科学院自动化研究所 Large-scale image library retrieval method based on local similarity hash algorithm
CN105469096A (en) * 2015-11-18 2016-04-06 南京大学 Feature bag image retrieval method based on Hash binary code
CN106776856A (en) * 2016-11-29 2017-05-31 江南大学 A kind of vehicle image search method of Fusion of Color feature and words tree
CN108959567A (en) * 2018-07-04 2018-12-07 武汉大学 It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399870A (en) * 2013-07-08 2013-11-20 华中科技大学 Visual word bag feature weighting method and system based on classification drive
CN104199922A (en) * 2014-09-01 2014-12-10 中国科学院自动化研究所 Large-scale image library retrieval method based on local similarity hash algorithm
CN105469096A (en) * 2015-11-18 2016-04-06 南京大学 Feature bag image retrieval method based on Hash binary code
CN106776856A (en) * 2016-11-29 2017-05-31 江南大学 A kind of vehicle image search method of Fusion of Color feature and words tree
CN108959567A (en) * 2018-07-04 2018-12-07 武汉大学 It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁明汶: "《基于深度学习的散列检索技术研究进展》", 《电信科学》, no. 10, pages 104 - 114 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552829A (en) * 2020-05-07 2020-08-18 北京海益同展信息科技有限公司 Method and apparatus for analyzing image material
CN111552829B (en) * 2020-05-07 2023-06-27 京东科技信息技术有限公司 Method and apparatus for analyzing image material
CN112417381A (en) * 2020-12-11 2021-02-26 中国搜索信息科技股份有限公司 Method and device for rapidly positioning infringement image applied to image copyright protection
CN117474903A (en) * 2023-12-26 2024-01-30 浪潮电子信息产业股份有限公司 Image infringement detection method, device, equipment and readable storage medium
CN117474903B (en) * 2023-12-26 2024-03-22 浪潮电子信息产业股份有限公司 Image infringement detection method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN106328147B (en) Speech recognition method and device
CN102549603B (en) Relevance-based image selection
CN110083729B (en) Image searching method and system
CN107392147A (en) A kind of image sentence conversion method based on improved production confrontation network
CN110968721A (en) Method and system for searching infringement of mass images and computer readable storage medium thereof
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN112163122A (en) Method and device for determining label of target video, computing equipment and storage medium
CN110019794A (en) Classification method, device, storage medium and the electronic device of textual resources
CN112487139A (en) Text-based automatic question setting method and device and computer equipment
CN105184260A (en) Image characteristic extraction method, pedestrian detection method and device
CN112800292A (en) Cross-modal retrieval method based on modal specificity and shared feature learning
CN110188195A (en) A kind of text intension recognizing method, device and equipment based on deep learning
CN116049412B (en) Text classification method, model training method, device and electronic equipment
Lee et al. Large scale video representation learning via relational graph clustering
Li et al. Meta learning for task-driven video summarization
Lin et al. Robust fisher codes for large scale image retrieval
Chen et al. Efficient activity detection in untrimmed video with max-subgraph search
CN116186328A (en) Video text cross-modal retrieval method based on pre-clustering guidance
Sun et al. Learning deep semantic attributes for user video summarization
CN116935170B (en) Processing method and device of video processing model, computer equipment and storage medium
CN108090117B (en) A kind of image search method and device, electronic equipment
Cai et al. Heterogeneous semantic level features fusion for action recognition
CN117315249A (en) Image segmentation model training and segmentation method, system, equipment and medium
Pourian et al. Pixnet: A localized feature representation for classification and visual search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination