Background
With the increasing advent of internet social networks, users' personal presentation patterns are becoming increasingly rich, ranging from earliest single nicknames, signatures to current voice signatures, personal portraits, photo album presentations, circles of friends, and so forth. However, the complexity of the social contact of the user is higher and higher, for example, we cannot judge the real photo of the user through the information displayed by the user. This gives some offensive users a chance to ride, who often create exquisite photo albums, circles of friends to disguise themselves from participating in the social interactions of the user group, which brings inconvenience to the mass users, excessive spam promotion, and ineffective chat messages. Thus, wind control interception is particularly important in internet social contact, and wind control interception is embodied in chat content interception detection, user information interception detection, user violation head identification detection and the like. For social fields, illegal users mostly use the same or similar materials to build a plurality of 'good users', for example, the same or cut head portraits, photo pictures and the like are used to build own friend circles. How to efficiently retrieve illegal users from mass pictures becomes particularly important, huge workload and enterprise cost can be brought by manual marking interception, and wind control similar image identification detection has become an important research topic in the fields of information retrieval and computer vision.
Feature learning based on deep learning convolutional neural network has been widely successful in the fields of image classification, target detection and the like, and becomes a new research focus and hotspot. The convolutional neural network can automatically learn image features based on a large amount of image data, can acquire the expression from low-level simple features to high-level abstract features of images, and has stronger distinguishing and generalizing performances compared with the traditional features. The Chinese patent application with publication number of CN112685580A discloses a distributed detection system, a method, a device, a processor and a storage medium thereof for social network head portrait comparison based on deep learning. The system comprises an avatar acquisition and storage function module, a user management module and a user management module, wherein the avatar acquisition and storage function module is used for acquiring an avatar picture and basic information of a specific user in a social network from the Internet and storing the corresponding information; the head portrait similarity training function module is connected with the head portrait collecting and storing function module and is used for carrying out feature vector extraction based on deep learning and conversion from a high-dimensional vector to a low-dimensional vector based on a local sensitive hash algorithm on the collected head portrait pictures and constructing a distributed feature vector index library of the head portrait pictures; the head portrait real-time searching function module is connected with the head portrait similarity training function module and is used for calculating the characteristic value of the head portrait input from the head portrait collecting function module, calculating the most similar head portrait pictures in the distributed characteristic vector index library by utilizing an approximate neighbor algorithm, and combining all calculation results to obtain the integral similar head portrait pictures and the social network user ID.
According to the scheme, a local sensitive hash method is adopted to connect the deep learning feature vector extraction units, and finally binary hash codes of head portrait pictures are obtained, so that certain loss is caused to picture information, and the accuracy of head portrait comparison is affected; according to the scheme, the ANN is adopted to perform neighbor similarity calculation, so that the calculation capability is low when massive high-dimensional vectors are processed, and the development cost required by performing neighbor similarity calculation by using the ANN is high.
Disclosure of Invention
The invention provides a similar image detection method and system based on deep learning, and aims to solve the problems of lower accuracy and calculation capability and higher development cost when image comparison is carried out in the prior art.
In order to solve the technical problems, the method for detecting the similar images provided by the invention comprises the following steps:
s1: training a similar image recognition model, wherein the similar image recognition model is used for converting an input image into a corresponding Embedded vector;
s2: constructing a violation image library, and converting the violation image library into an Embedded vector library by using a trained similar image recognition model to obtain an image interception library corresponding to the violation image library;
s3: inputting the image interception library into a Faiss vector library for training to obtain a vector retrieval file;
s4: converting an image uploaded by a user during service operation into a user assembled vector;
s5: inputting the user Embedded vector into a Faiss vector library to calculate and search the image similarity;
s6: and obtaining similar images in the image interception library corresponding to the images uploaded by the user through the similar calculation by the Faiss vector library, and returning the IDs and the similarity scores of the similar images.
Preferably, the method further comprises step S7: the similarity score is compared with a set threshold value, and when the similarity score is larger than the set threshold value, warning information is triggered.
Preferably, the training method in step S1 specifically includes:
s101: collecting images in the service to generate an image set, combining any one image in the image set with one other image to form an image pair, traversing all images in the image set until all image combinations are completed, and obtaining an image combination set;
s102: manually labeling each image combination according to whether each group of images in the image combination set are similar or not;
s103: extracting face feature information of an image in the image combination set, wherein the face feature information is a first Embedded vector after face feature conversion in the image;
s104: preprocessing the image in the step S103 to obtain a second Embedded vector of the preprocessed image, and splicing the first Embedded vector and the second Embedded vector to obtain a fused Embedded vector;
s105: repeating the steps S103 to S104 until all images in the image combination set are traversed, and forming a fusion Embedded vector set by all obtained fusion Embedded vectors;
s106: and inputting the fused Embedded vector set into a similar image recognition model for training to obtain the similar image recognition model.
Preferably, the label value of the label is 0 and 1, identifying whether each group of images are similar.
Preferably, the preprocessing includes image cropping, image scaling, and extracting an embedded vector feature.
Preferably, in the step S5, image similarity is calculated, and the L2 distance between the user ebedded vector and the vector in the Faiss vector library is calculated by using neighbor similarity.
Preferably, the similarity score is the cosine value of the two vectors being compared.
Preferably, the user coding vector is a concatenation of two vectors, one vector is face feature information extracted from an image uploaded by a user, and the other vector is a coding vector output after the image uploaded by the user is subjected to a trained similar image recognition model.
Correspondingly, the invention also provides a similar image detection system based on deep learning, which comprises the following steps:
the image acquisition module is used for acquiring images uploaded by a user in a business process, converting the images into an Embedded vector, and inputting the Embedded vector into a Faiss vector library for similarity calculation;
the image preprocessing module is used for preprocessing an input image, wherein the preprocessing comprises image cutting, image scaling and extracting an Embedded vector feature;
the training module is used for receiving the Embedded vector features output by the image preprocessing module and training a similar image recognition model;
the similar image recognition model is used for converting the illegal image library into an Embedded vector library, obtaining an image interception library corresponding to the illegal image library, and converting images uploaded by a user during service operation into user Embedded vectors;
the Faiss vector library is used for training the image interception library to obtain a vector retrieval file, and is also used for calculating and retrieving the image similarity of the input user Embedded vector and returning calculation and retrieval results.
Preferably, the system also comprises an HDFS distributed file system which is used for storing the vector retrieval files which are trained by the image interception library and the training module, and carrying out periodical construction and synchronization with the vector retrieval files which are local to the Faiss vector library.
Compared with the prior art, the invention has the following technical effects:
1. the similar image detection method provided by the invention adopts a method based on deep learning to the image, and finally converts the image into high-dimensional vector representation, so that the image recognition breadth is greatly expanded, the image information is greatly extracted, and the rich high-dimensional vector representation of the image is obtained; meanwhile, mass similar vector retrieval service is carried out based on the Faiss tool, so that the engineering quantity for constructing the image retrieval system is simple, the development cost is low, the complexity of high-dimensional vector similar retrieval is greatly reduced, and the requirement of high-efficiency similar retrieval calculation on the user image is met.
2. The similar image detection method provided by the invention fuses the face characteristic information in the image with the integral vector of the image, and the fused image is used as the image vector representation for similar calculation, so that the face information on the image is enhanced; meanwhile, the vector of the vector retrieval file also comprises fusion of the face information in the respective images and the whole vector of the images; the accuracy of image similarity calculation can be effectively improved.
3. The similar image detection system provided by the invention uses the HDFS to store the vector retrieval file, the HDFS is a distributed storage system which can store large-scale data and provide high scalability, and the vector retrieval file is stored on the HDFS to conveniently process the large-scale image data, so that the system adapts to the ever-increasing data volume; and different nodes and tasks can conveniently share and access the same data, and cooperation and parallel processing are promoted.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present application and with reference to the accompanying drawings.
Example 1
As shown in fig. 1, a similar image detection method based on deep learning includes the following steps:
s1: training a similar image recognition model, wherein the similar image recognition model is used for converting an input image into a corresponding Embedded vector. Specifically, the similar image recognition model of the present embodiment adopts a VAE (Variational Autoencoders) model. The VAE model encodes input data into a vector representation in potential space, in which the input image is mapped through the encoder network to a mean vector and a standard deviation vector in potential space. Then, in the potential space, sampling is performed from a Gaussian distribution according to the mean value vector and the standard deviation vector, so as to obtain a potential variable vector.
Specifically, the VAE comprises two main parts: an encoder (decoder) and a decoder (decoder). The encoder network converts the input image through a series of convolution and pooling layers into mean and standard deviation vectors in the potential space, which represent the distribution in the potential space; the decoder network receives the sampled vectors from the potential space and decodes them into a generated image, the decoder consisting of a series of deconvolution and upsampling layers.
As shown in fig. 2, the training method in step S1 specifically includes:
s101: and acquiring images in the service to generate an image set, combining any one image in the image set with one other image to form an image pair, traversing all images in the image set until all image combinations are completed, and obtaining an image combination set. Specifically, the service image includes a plurality of images, namely, image 1, image 2, image 3, … and image N, and if the two images are combined, an image pair such as { image 1, image 2}, { image 1, image 3}, …, { image 1, image N }, { image 2, image 3}, …, { image N-1, image N } is formed, and the N images are formed as a wholeImage pairs are grouped, where N is the number of images in an image set.
As shown in fig. 3, the image set includes two pictures, such as picture 1, picture 2, …, and picture N. Two images are extracted from the image dataset as an image pair, for example, an image A and an image B in the figure are images extracted from the image dataset and are combined into an image pair.
Further, image parameters in the image set, such as image size, image pixels, may not be uniform.
S102: each image combination is manually annotated according to whether each group of images in the image combination set are similar. Specifically, the labeled tag values are 0 and 1, and identify whether each group of images are similar. In the training process shown in fig. 3, whether the image a and the image B are similar or not is marked by adopting a manual marking method, and all the image pairs are marked in this way. In some embodiments, label 0 is represented as the two images of the image pair being similar and label 1 is represented as the two images of the image pair being dissimilar; in other embodiments, label 0 is indicated as dissimilar to the two images of the image pair and label 1 is indicated as similar to the two images of the image pair.
S103: face feature information of an image in the image combination set is extracted, wherein the face feature information is a first Embedded vector after face feature conversion in the image. When the face features are extracted, the face feature information is filled with a set value for the image without the person or the face features, for example, one image is landscape, and the vector representing the face feature information is filled with the empty vector of all 0.
S104: and (3) preprocessing the image in the step (S103) to obtain a second Embedded vector of the preprocessed image, and splicing the first Embedded vector and the second Embedded vector to obtain a fused Embedded vector.
The preprocessing mode of the step comprises image clipping, image scaling and extracting of the Embedded vector features. For image clipping, the embodiment may use OpenCV to clip according to a set clipping region, where the set clipping region is implemented by setting the starting coordinates { x, y } of clipping and the width and height values { w, h } of clipping. For image cropping, the embodiment can be implemented by using OpenCV according to a set scaling.
S105: and repeating the steps S103 to S104 until all images in the image combination set are traversed, and forming a fused Embedded vector set by all the obtained fused Embedded vectors.
S106: and inputting the fused Embedded vector set into a similar image recognition model for training to obtain the similar image recognition model. The model training is carried out in the following target directions: for manually labeling similar images, correspondingly outputting a result with high similarity by the model; and for images marked as dissimilar manually, the model outputs a result with low similarity. And finally obtaining a similar image recognition model capable of carrying out similar recognition.
S2: and constructing a violation image library, and converting the violation image library into an Embedded vector library by using a trained similar image recognition model to obtain an image interception library corresponding to the violation image library.
S3: and inputting the image interception library into a Faiss vector library for training to obtain a vector retrieval file.
S4: and converting the image uploaded by the user during service operation into a user assembled vector. Specifically, facial feature extraction is performed on an image uploaded by a user, the image is converted into an Embedded vector through a trained model service, and the Embedded vector is fused with the extracted facial feature information, so that an Embedded high-dimensional vector representation of the image is obtained and used as the user Embedded vector.
The user Embedded vector is a concatenation of two vectors, one vector is face characteristic information extracted from an image uploaded by a user, and the other vector is an Embedded vector output after the image uploaded by the user is subjected to a trained similar image recognition model.
S5: and inputting the user Embedded vector into a Faiss vector library to calculate and search the image similarity. When image similarity is calculated, the L2 distance (also called Euclidean distance) between the user Embeddding vector and the vector in the Faiss vector library is calculated by adopting the neighbor similarity, and the calculation formula is as follows:
in the method, in the process of the invention,for the dimension of the vector, +.>For the user Embedded vector, < >>Is the sum +.>Vector for calculation, ++>The L2 distance between the user's assembled vector and the vector in the Faiss vector library is the smaller the distance, the more similar the two vectors are.
S6: and obtaining similar images in the image interception library corresponding to the images uploaded by the user through the similar calculation by the Faiss vector library, and returning the IDs and the similarity scores of the similar images. And then the ID and the similarity score of the corresponding similar images and the images uploaded by the user are displayed on a webpage or in a system by the back end or the front end, and other staff carry out auditing and verification. The similarity score is the cosine value of the two vectors being compared.
And obtaining a similar picture in the wind control image interception library corresponding to the image uploaded by the user after the Faiss vector library is subjected to similar calculation, and returning and displaying the ID of the similar image and the image per se.
A complete detection flow is shown in FIG. 4, and the image set including the image 1, the image 2, the image … and the image N is used for training the similar image recognition model to obtain a trained similar image recognition model. And inputting the built violation image library into a trained similar image recognition model for conversion, and converting the violation image library into an Embedded vector library serving as an image interception library. And inputting the image interception library into a Faiss vector library for training to obtain a vector retrieval file, wherein the vector retrieval file is used for carrying out similar calculation with the image uploaded by the user. And for images uploaded by the user during service operation, converting the images into user Embedded vectors through a trained similar image recognition model, inputting the user Embedded vectors into a Faiss vector library for similar calculation, and outputting a calculation result from the Faiss vector library.
In other embodiments of the present invention, the method further includes step S7: the similarity score is compared with a set threshold value, and when the similarity score is larger than the set threshold value, warning information is triggered. Specifically, the similarity score calculated by similarity is characterized by a vector cosine value, the similarity score can be judged by setting a threshold, for example, when the similarity score of two images exceeds 0.8, the two images can be considered as similar images, and the threshold can be flexibly adjusted by enterprises according to different business scenes.
Example two
A deep learning based similar image detection system comprising:
the image acquisition module is used for acquiring images uploaded by users in the business process, converting the images into an Embedded vector, and inputting the Embedded vector into the Faiss vector library for similarity calculation.
The image preprocessing module is used for preprocessing an input image, wherein the preprocessing comprises image cutting, image scaling and extracting an Embedded vector feature.
The training module is used for receiving the Embedded vector features output by the image preprocessing module and training a similar image recognition model.
The similar image recognition model is used for converting the illegal image library into an Embedded vector library, obtaining an image interception library corresponding to the illegal image library, and converting images uploaded by a user during service operation into user Embedded vectors.
The Faiss vector library is used for training the image interception library to obtain a vector retrieval file, and is also used for calculating and retrieving the image similarity of the input user Embedded vector and returning calculation and retrieval results.
In other embodiments of the present invention, the similar image detection system further includes a HDFS (Hadoop Distributed File System) distributed file system for storing the image interception library and the vector retrieval file trained by the training module, and periodically constructing and synchronizing with the vector retrieval file local to the Faiss vector library.
HDFS is a distributed storage system that is capable of storing large-scale data and provides a high degree of scalability. The vector retrieval file is stored on the HDFS, so that large-scale image data can be conveniently processed, and the method is suitable for the increasing data volume; and different nodes and tasks can conveniently share and access the same data, and cooperation and parallel processing are promoted.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements could be made by those skilled in the art without departing from the inventive concept, which falls within the scope of the present invention.