WO2021136027A1

WO2021136027A1 - Similar image detection method and apparatus, device and storage medium

Info

Publication number: WO2021136027A1
Application number: PCT/CN2020/138510
Authority: WO
Inventors: 孙莹莹
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-12-30
Filing date: 2020-12-23
Publication date: 2021-07-08
Also published as: CN111222548A

Abstract

Embodiments of the present application relate to the field of image processing. Disclosed are a similar image detection method and apparatus, a device and a storage medium. The method comprises: using multiple images to be detected as the input of a CNN model, and performing feature extraction on the multiple images by means of the CNN model to obtain feature vectors of the multiple images; using the feature vectors of the multiple images as the input of an SVM model, and performing similarity measurement on the feature vectors of the multiple images by means of the SVM model to obtain similarity measurement values of the multiple images; and performing similar image detection on the multiple images on the basis of the similarity measurement values of the multiple images. According to the present application, shallow and deep features in images can be effectively extracted, and whether the images are similar can be determined according to the similarity between the feature vectors of the images, thereby improving the accuracy of image similarity detection.

Description

Similar image detection method, device, equipment and storage medium

This application claims the priority of a Chinese patent application filed on December 30, 2019 with the application number 201911390241.0 and the invention title "Similar image detection method, device, equipment and storage medium", the entire content of which is incorporated into this application by reference in.

Technical field

The embodiments of the present application relate to the field of image processing, and in particular, to a similar image detection method, device, device, and storage medium.

Background technique

Similar image detection is a basic problem in computer vision, which aims to compare the similarity between images and judge whether the images are similar based on the similarity between the images. Similar image detection can be applied to different task scenarios. For example, similar image detection technology can be used to detect similar images from the mobile phone's album, and then delete some of the similar images from the album to save mobile phone memory.

In related technologies, a hash algorithm can be used to detect similar images. Specifically, the two images to be detected can be hashed by a hash algorithm to obtain the hash value of the two images, and then the Hamming distance between the two images based on the hash value can be calculated. If the Hamming distance is less than the threshold, it is determined that the two images are similar images. If the Hamming distance based on the hash value is greater than or equal to the threshold, it is determined that the two images are not similar images.

Summary of the invention

The embodiments of the present application provide a similar image detection method, device, equipment, and storage medium. The technical solution is as follows:

On the one hand, an embodiment of the present application provides a similar image detection method, and the method includes:

Acquire multiple images to be detected;

Taking the multiple images as input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and measuring the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images;

Based on the similarity measurement values of the multiple images, similar image detection is performed on the multiple images.

In another aspect, a similar image detection device is provided, and the device includes:

The first acquisition module is used to acquire multiple images to be detected;

The first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

The measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ；

The detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.

In another aspect, an electronic device is provided, the electronic device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement the above-mentioned similar image detection method .

In another aspect, a computer-readable storage medium is provided, the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.

On the other hand, a computer program product is also provided, and the computer program product stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.

Description of the drawings

Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of the network structure of a VGGNet-16 model provided by an embodiment of the present application;

FIG. 3 is a flowchart of a similar image detection process provided by an embodiment of the present application;

FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application;

FIG. 5 is a flowchart of another similar image detection method provided by an embodiment of the present application;

Fig. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.

The "plurality" mentioned herein means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.

Before describing in detail the similar image detection method provided by the embodiment of the application, the application scenario of the embodiment of the application will be introduced first.

The similar image detection method provided by the embodiments of the present application is applied in the field of computer vision, and is specifically applied in a scene where similarity detection is performed on images, so as to detect similar images from multiple images. Similar image detection is a very important basic problem in the field of computer vision. The quality of many tasks depends on the quality of the similarity measure.

As an example, the similar image detection method provided in the embodiment of this application can be extended to the album application. For example, the similar image detection method provided in the embodiment of this application detects similar images from the album, and then recommends the similar images to the user Delete and clean up to help users better manage albums. Of course, the similar image detection method can also be applied to other scenarios, which is not limited in the embodiment of the present application.

Next, the implementation environment involved in the embodiments of the present application will be introduced.

The similar image detection method provided by the embodiments of the application can be applied to a similar image detection device. The similar image detection device can be an electronic device such as a terminal or a server. The terminal can be a mobile phone, a tablet, or a computer. .

The embodiment of the present application provides a similar image detection method, and the method includes:

Acquire multiple images to be detected;

Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and performing similarity measurement on the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images;

Optionally, the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.

Optionally, the measuring the similarity of the feature vectors of the multiple images by the SVM model includes:

Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.

Optionally, the performing similar image detection on the multiple images based on the similarity metric values of the multiple images includes:

Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;

The similar images are determined to be images of the same category.

Optionally, before the similarity measurement is performed on the feature vectors of the multiple images by the SVM model, the method further includes:

Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;

The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;

According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.

Optionally, the training the SVM model to be trained based on the feature vector of the first sample images of the multiple categories and the category label information of each first sample image to obtain the SVM model includes:

According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.

Optionally, after obtaining the first sample data set, the method further includes:

Preprocessing the first sample data set to obtain a preprocessed first sample data set;

Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.

Optionally, before the using the multiple images as the input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model, the method further includes:

Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;

The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.

Optionally, the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples The images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The M third sample images are related to the N third sample images. The three sample images are different, and the M, S, N, and T are positive integers.

The similar image detection method provided in the embodiments of this application is a similar image detection method based on deep learning. The detection process requires the use of CNN (Convolutional Neural Networks, convolutional neural network) model for feature extraction, and the use of SVM (Support Vector) Machine, support vector machine) model for similarity measurement, in order to facilitate understanding, next, first introduce the model training methods of the CNN model and the SVM model.

Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application. The method is applied to a similar image detection device. The device may be an electronic device such as a terminal or a server. As shown in Fig. 1, the method includes the following steps :

Step 101: Obtain a second sample data set and a third sample data set. The second sample data set includes multiple types of second sample images and the category label information of each second sample image. The third sample data set includes multiple types The category of the third sample image and the category label information of each third sample image.

Among them, the second sample data set and the third sample data set are pre-acquired sample data sets that can meet the requirements of model training. The category tag information is used to indicate the category of the corresponding image. As an example, the second sample images of multiple categories and the third sample images of multiple categories have different categories.

As an example, the second sample data set and the third sample data set are network image data sets. For example, the second sample data set is the ImageNet (image network) data set, and the third sample data set is the trademark image data set. Among them, the ImageNet data set contains a total of more than 14 million images and a total of more than 20,000 images. The ImageNet data set is currently a more commonly used data set. On this data set, research work such as image classification, target positioning, and target detection can be carried out. The trademark image data set includes various types of trademark images.

As an example, the third sample data set includes two different sub-sample data sets, for example, the third sample data set includes a first sub-sample data set and a second sub-sample data set. Among them, the first subsample data set includes M third sample images, and M third sample images belong to S categories, the second subsample data set includes N third sample images, and N third sample images belong to For T categories, M third sample images are different from N third sample images. Among them, M and N are both positive integers, and S and T are also positive integers.

As an example, if the third sample data set is a trademark image data set, the first sub-sample data set may be the Logo (trademark)-405 data set, and the second sub-sample data set may be FlickrLogo (福link trademark)- 32 data sets. Among them, the Logo-405 data set comes from Internet crawlers, including 32218 images. A total of 405 types of trademark image data including major luxury trademarks are collected. The number of each type of trademark data ranges from tens to more than 100. Varying, the size of each trademark image is about 300*500. The FlickrLogo-32 data set comes from data publicly available on the Internet. The data set contains a total of 32 types of trademark images including trademarks of major Internet companies, with a total of 8,240 images.

In a possible implementation manner, for the acquired second sample data set and third sample data set, preprocessing may be performed on the images in the two sample data sets. Exemplarily, the second sample data set and the third sample data set can be preprocessed in the following manner:

As an example, for the acquired second sample data set and third sample data set, the images in the two sample data sets may also be formatted, deduplicated, and renamed.

As an example, for the acquired second sample data set and third sample data set, data enhancement processing may also be performed on these two sample data sets. Wherein, the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.

For example, for the acquired second sample data set and third sample data set, the images in any sample data set may be scaled proportionally to ensure that the scales of the images in the sample data set are uniform. Since the scales of actual images are often different, and the scales must be uniform in the training set, the scaling method can be used to adjust the scales of the images in the sample data set to be uniform, so that the model can learn and recognize these images.

For another example, for the acquired second sample data set and third sample data set, noise can be randomly added to the images in any sample data set. Since the actual image often contains a lot of noise, and the sample images in the sample data set are usually relatively clean, if the training is carried out directly, the robustness of the model to noise will be poor, and even during the image detection process, if there are several images on the image to be detected. For the noise of each pixel, the model will recognize errors. Therefore, in order to make the model more robust to noise, the embodiment of the present application may randomly add noise to the training image.

For another example, for the acquired second sample data set and third sample data set, you can select an image from any sample data set, then rotate the selected image, and add the rotated image to the sample data set to Increase the data volume of the sample data set. The selection method used may be a random selection method or other preset selection methods, which is not limited in the embodiment of the present application. In addition, after the selected image is rotated, the image transferred out of the display area can be cropped to maintain the uniformity of the image scale.

For another example, for the acquired second sample data set and third sample data set, the sample images in any sample data set may be normalized to remove redundant information. As an example, the normalization processing includes: normalizing the pixel value of the sample image from [0, 255] to [0, 1] to remove redundant information contained in the sample data to be trained and further shorten the training time.

As an example, a fourth sample data set and a fifth sample data set can also be obtained. Both the fourth sample data set and the fifth sample data set include sample images of multiple categories and the category label information of each sample image. The sample images in the fourth sample data set and the fifth sample data set are different. For example, the fourth sample data set is the ImageNet data set, and the fifth sample data set is the trademark image data set.

Then, preprocess the sample images in the fourth sample data set, divide the preprocessed fourth sample data set into a training set and a test set, and use the training set as the second sample data set. And, preprocess the sample images in the fifth sample data set, divide the preprocessed fifth sample data set into a training set and a test set, and use the training set as the third sample data set.

Among them, the training set is used to train the CNN model, and the test set is used to test the CNN model. The preprocessing includes at least one of format adjustment, deduplication, and data enhancement processing, and the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.

As an example, the preprocessed fourth sample data set can be divided into a training set and a test set proportionally, and the preprocessed fifth sample data set can be divided into a training set and a test set proportionally. In a possible implementation, the ratio of training set to test set can be 8:2.

Step 102: Perform pre-training on the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.

As an example, the CNN model may be a VGGNet (Visual Geometry Group Network) model. The VGGNet model is a deep convolutional neural network model proposed by the Vision Geometry Group of Oxford University, which can reduce the Top-5 error rate to 7.3%.

Among them, the VGGNet model mainly has the following characteristics: 1) All convolutional layers in the entire network structure use 3*3 convolution kernels; 2) In the network structure, two 3*3 convolutional layers are used to replace a traditional 5* 5 convolutional layers, three 3*3 convolutional layers replace a traditional 7*7 convolutional layer, which increases the nonlinear expression ability of the network; 3) Multiple 3*3 convolution kernels are larger than one The size of the convolution kernel has fewer parameters, which reduces the overall network parameters.

As an example, the VGGNet model may be a 19-layer VGGNet-19 model or a 16-layer VGGNet-16 model. Please refer to FIG. 2. FIG. 2 is a schematic diagram of the network structure of a VGG Net-16 model provided by an embodiment of the present application. As shown in FIG. 2, the VGGNet-16 model includes 13 layers of convolutional layers and 3 layers of fully connected layers.

As an example, in addition to using more layers in the VGGNet-16 model, all convolutional layers in the network structure have the same size convolution kernel. The size of the convolution kernel is 3*3, which can capture the upper and lower layers. The minimum size window for the left, right and center information, and each 3*3 convolutional layer is filled with one pixel to ensure that the input and output sizes after convolution remain the same.

By pre-training the CNN model to be trained according to the second sample data set, an initialized CNN model that can classify images in the second sample data set can be obtained.

Among them, the VGGNet model is obtained by training based on a sample data set, and the sample data set includes sample images of multiple categories and category label information of each sample image. For example, the VGGNet model to be trained is pre-trained according to the second sample data set to obtain the initial VGGNet model, and then the initial VGGNet model is trained according to the third sample data set to obtain the VGGNet model.

As an example, you can first use the ImageNet dataset to train the VGGNe-16 model to obtain a deep learning classification model pre-trained on natural images, and use the trained deep learning classification model as the initialization for the next training trademark image classification model model.

Step 103: Train the initialized CNN model according to the third sample data set to obtain the CNN model.

Among them, the CNN model is used to perform feature extraction on any image to obtain the feature vector of the image.

After pre-training the CNN model to be trained based on the second sample data set and obtaining the initialized CNN model, you can further train the initialized CNN model based on the third sample data set to obtain the CNN model, and use the CNN model as a pair of images Feature extraction model for feature extraction.

As an example, use the VGG-16 model pre-trained on the ImageNet dataset as the initialization model, and train the above initialization model according to the trademark image to obtain the trained VGG-16 model, and then use the trained VGG-16 model as Feature extractor.

As an example, during the training process, the training set can be input to the CNN model for training, and iterate a preset number of times. Wherein, the preset number of times can be preset, for example, the preset number of times can be 80 times, 90 times, 100 times, and so on.

As an example, the gradient descent algorithm can be used to optimize the objective function during each iterative calculation process, so that the model reaches convergence. As an example, the gradient descent algorithm may be an Adam (Adaptive Moment Estimation) gradient descent algorithm. The Adam gradient descent algorithm is an efficient calculation method that can improve the convergence speed of the gradient descent. For example, the batch sample size batch_size of the Adam gradient descent algorithm can be set in advance. For example, batch_size can be set to 32.

Step 104: Obtain a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image.

Wherein, the first sample data set may be the above-mentioned first sub-sample data set and/or the second sub-sample data set, or may be other than the above-mentioned first sub-sample data set and the second sub-sample data set. Sample data set.

As an example, the first sample data set is the aforementioned third sample data set, for example, the first sample data set is a trademark image data set. For example, the first sample data set is the Logo-405 data set and the FlickrLogo-32 data set.

Step 105: Use the first sample image of the multiple categories as the input of the CNN model, and perform feature extraction on the first sample image of the multiple categories through the CNN model to obtain the first sample of the multiple categories The feature vector of the image.

Among them, the feature vector is used to characterize the features of the sample image in various dimensions. As an example, the feature vector may be a feature vector with a dimension of 4096.

As an example, the CNN model can be used to extract features from the trademark datasets Logo-405 and FlickrLogo-32 respectively, to obtain the feature vector of each image, and finally to obtain the feature extraction of each trademark image through the VGG-16 model Depth representation form.

In the embodiments of the present application, the CNN model is used for feature extraction. The CNN model can automatically learn the features of the image during the training process, without manual design and human intervention in feature learning, which improves the efficiency and accuracy of feature extraction.

In a possible implementation manner, after the electronic device obtains the first sample data set, the first sample data set is preprocessed to obtain the preprocessed first sample data. The preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing. Use the preprocessed first sample data set as the input of the CNN model, and perform feature extraction on the preprocessed first sample data set through the CNN model to obtain the preprocessed first sample data set Feature vector.

Step 106: According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, train the SVM model to be trained to obtain the SVM model.

As an example, according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, the operation of training the SVM model to be trained includes: according to the first sample of multiple categories The feature vector of the image and the category label information of each first sample image are solved for the objective function to obtain the classification function of the SVM model; wherein, the objective function is used to indicate that the multiple categories of the first sample images are different The interval between the first sample images of the category is the largest.

The basic idea of the support vector machine algorithm is to first transform the input sample space into a high-dimensional space, and then find the optimal classification hyperplane in this new high-dimensional feature space to maximize the interval between sample points of different categories. The classification hyperplane is the maximum separation hyperplane.

As an example, suppose the training set D of the SVM model to be trained={(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _n ,y _n )},

y _i ∈{-1,1}. Where x represents the feature vector of the first sample image, and y represents the category label information of the first sample image, then the objective function of the SVM model to be trained can be:

Among them, ω is the normal vector, which determines the direction of the classification hyperplane, and b is the displacement, which determines the distance between the classification hyperplane and the origin.

As an example, the constraints of the SVM model to be trained are:

sty _i (ω ^T x _i +b)≥1, i=1,...n (2)

The Lagrange multiplier α is introduced into the objective function, and the objective function obtained is:

Solving the objective function, that is, deriving the objective function and solving the optimal classification surface, the resulting classification function expression is:

Assuming that the data is linearly separable, then a hyperplane can be found to completely separate the data of different categories. As an example, the SVM model to be trained may also be a kernel function-based SVM model. For linear inseparable problems, the SVM model to be trained may also use the kernel function to solve the classification function. Among them, the kernel function is also a special similarity measure function, and different kernel functions represent different similarity measures.

After training the CNN model and SVM model, you can also use the test set as the input of the CNN model, extract the feature vector of each image in the test set, and then use the feature vector of each image in the test set as the input of the SVM model for testing. Verify the accuracy of the model. If the CNN model and the SVM model pass the verification, the training is completed. If the verification fails, the training of the CNN model and the SVM model will continue according to the training set until the trained model can pass the verification.

Please refer to FIG. 3, which is a flow chart of a similar image detection process provided by an embodiment of the present application. As shown in FIG. 3, a sample data set can be constructed first, and then the sample images in the sample data set can be preprocessed, and then The preprocessed sample data set is divided into a training set and a test set proportionally. The training set is used to train the CNN model and the SVM model, and the test set is used to verify the accuracy of the trained CNN model and the SVM model. Then, based on the training set and the test set, the VGG-16-based image similarity feature extraction model, namely the CNN model, is constructed, and then the SVM model is used to classify the feature vectors extracted by the CNN model to complete the image similarity detection.

As an example, the embodiment of the present application may use TensorFlow (an open source software library) to define a convolutional neural network model. With TensorFlow's flexible architecture, users can easily deploy computing tasks to multiple platforms and devices. Among them, the multiple platforms include CPU (Central Processing Unit), GPU (Graphics Processing Unit, graphics processing unit), and TPU (Tensor Processing Unit, tensor processing unit). The multiple devices may include desktop devices, Server clusters, mobile devices, edge devices, etc. Exemplarily, desktop devices include desktop computers, etc.; server clusters include one or more servers; mobile devices include mobile phones, tablets, smart wearable devices, etc.; edge devices refer to switches, routers, and routing switches installed on edge networks , IAD (Integrated Access Device), and various MAN (Metropolitan Area Network)/WAN (Wide Area Network) devices, etc.

After the training of the CNN model and SVM model is completed, similar image detection can be performed according to the trained CNN model and SVM model. Next, the similar image detection process in the embodiment of the present application will be introduced in detail.

FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application. The method is applied to a similar image detection device. As shown in FIG. 4, the method includes the following steps:

Step 401: Acquire multiple images to be detected.

Among them, the multiple images to be detected can be uploaded by the user, can be obtained from the storage space of the device, can be sent by other devices, or can be obtained from the network. The method of obtaining is not limited.

As an example, the images stored in the album of the terminal may be acquired, and the images stored in the album may be used as multiple images to be detected.

Step 402: Use the multiple images as the input of the CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images.

As an example, the CNN model is a VGGNet model. For example, the CNN model is a VGGNet-16 model or a VGGNet-19 model.

As an example, before the multiple images are used as the input of the CNN model for feature extraction, the multiple images can also be preprocessed, and then the preprocessed multiple images can be used as the input of the CNN model, through the CNN model Perform feature extraction. Wherein, the preprocessing includes at least one of format adjustment, deduplication, and data enhancement.

Step 403: Use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images.

As an example, the classification function value of each image can be determined through the classification function of the SVM model, and the classification function value of each image can be used as the similarity measurement value of each image.

Wherein, the classification function is used to characterize the classification hyperplane for classifying images, and the classification function value of each image is used to characterize the distance between each image and the classification hyperplane. The closer the classification function values of any two images in the multiple images are, the closer the distance between the two images is, and the higher the similarity between the two images is.

It should be noted that in the process of similarity measurement, the quality of image feature selection also has a very important impact on the result of similarity measurement. Therefore, the quality of image feature extraction and the selection of measurement algorithms have a significant impact on image similarity detection. The results all have a certain impact. In the embodiment of this application, the feature vector of the image can be combined with the traditional pattern recognition algorithm, the similarity between the image features can be measured to determine whether the image is similar, the detection task is completed, and the shallow layer in the image can be extracted by using deep learning. And high-level features, effectively use the information in the image, and improve the accuracy of detection.

Step 404: Perform similar image detection on the multiple images based on the similarity metric values of the multiple images.

As an example, at least two images in the plurality of images whose similarity metric values are within the same metric value range are determined to be similar images, and the similar images are determined to be images of the same category.

That is, the image features of multiple images can be used as the input of the SVM model, and the multiple images can be classified according to the image features of the multiple images through the SVM model. Among them, the images classified into one category are similar images.

As an example, after the similar images are determined as images of the same category, the images of the same category and the category tags may be stored correspondingly.

Please refer to Figure 5. Figure 5 is a flowchart of another similar image detection method provided by an embodiment of the present application. As shown in Figure 5, the image to be detected can be obtained first, and then the image to be detected can be preprocessed, and then passed through the CNN model Perform feature extraction on the pre-processed image, and then perform similarity measurement on the extracted feature vector through the SVM model, and perform similar image detection based on the similarity measurement result.

In the embodiment of this application, for multiple images to be detected, by first inputting the multiple images into the CNN model, and performing feature extraction through the CNN model, the shallow and deep features in the image can be effectively extracted, which can be effectively used The information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model. The similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.

In addition, the application embodiment introduces deep learning methods into image similarity recognition and detection, and builds a convolutional neural network to mine and learn the information in sample images, perform deep characterization of different sample images, and then use traditional pattern recognition The algorithm measures the similarity between features and solves the problem of identifying and detecting a large number of similar pictures. In addition, the embodiment of this application cleverly transforms the image similarity recognition and detection problem into an image classification problem, uses a deep learning model as a feature extractor, and combines traditional pattern recognition algorithms to measure the similarity between features, cleverly avoiding calculations The inaccuracies caused by the hash mean, the combination of deep learning methods and traditional pattern recognition methods can not only effectively use the ability of deep learning convolutional neural network transfer learning and automatic learning of image features, but also combine deep representation with traditional pattern recognition The combination of algorithms further improves the accuracy of similar image recognition and detection.

6 is a block diagram of a similar image detection device provided by an embodiment of the present application. As shown in FIG. 6, the device includes a first acquisition module 601, a first extraction module 602, a measurement module 603, and a detection module 604.

The first acquisition module 601 is used to acquire multiple images to be detected;

The first extraction module 602 is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

The measurement module 603 is configured to use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement of the multiple images value;

The detection module 604 is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.

Optionally, the metric module 603 is used to:

Optionally, the detection module 604 is used to:

The similar images are determined to be images of the same category.

Optionally, the device further includes:

The second acquisition module (not shown in the figure) is used to acquire a first sample data set, the first sample data set including multiple types of first sample images and the category of each first sample image Tag information

The second extraction module (not shown in the figure) is used to take the first sample images of the multiple categories as the input of the CNN model, and use the CNN model to analyze the first samples of the multiple categories. Performing feature extraction on the image to obtain the feature vector of the first sample image of the multiple categories;

The first training module (not shown in the figure) is used to train the SVM model to be trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image to obtain The SVM model.

Optionally, the first training module is used to:

Optionally, the device further includes:

A preprocessing module (not shown in the figure), configured to preprocess the first sample data set to obtain a preprocessed first sample data set;

Optionally, the device further includes:

The second training module is used for pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model. The second sample data set includes multiple types of second sample images and the data of each second sample image. Category mark information;

The third training module is used to train the initialized CNN model according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of various categories and each third sample The category tag information of the image.

It should be noted that when the similar image detection device provided in the above embodiment performs similar image detection, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated to different functional modules according to needs. Complete, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the similar image detection device provided in the foregoing embodiment belongs to the same concept as the similar image detection method embodiment, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.

FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application. The electronic device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer. The electronic device may have relatively large differences due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where at least one instruction is stored in the memory 702, and the at least One instruction is loaded and executed by the processor 701 to implement the similar image detection methods provided by the foregoing method embodiments. Of course, the electronic device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the electronic device may also include other components for implementing device functions, which will not be repeated here.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to realize similar image detection as described in each of the above embodiments. method.

The embodiments of the present application also provide a computer program product that stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the similar image detection method described in each of the above embodiments.

Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

The above are only optional embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims

A similar image detection method, characterized in that the method includes:

Acquire multiple images to be detected;

Taking the multiple images as input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and performing similarity measurement on the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images;

Based on the similarity measurement values of the multiple images, similar image detection is performed on the multiple images.
The method according to claim 1, wherein the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, and the sample data set includes samples of multiple categories The image and the category labeling information of each sample image.
The method according to claim 1, wherein the measuring the similarity of the feature vectors of the multiple images by the SVM model comprises:

Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
The method according to claim 1, wherein the performing similar image detection on the multiple images based on the similarity metric values of the multiple images comprises:

Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;

The similar images are determined to be images of the same category.
The method according to any one of claims 1 to 4, wherein before said measuring the similarity of the feature vectors of the multiple images through the SVM model, the method further comprises:

Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;

The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;

According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.
The method according to claim 5, wherein the SVM model to be trained is trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, Obtaining the SVM model includes:

According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
The method according to claim 5, wherein after the obtaining the first sample data set, the method further comprises:

Preprocessing the first sample data set to obtain a preprocessed first sample data set;

Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
The method according to any one of claims 1 to 4, characterized in that, before the multiple images are used as input of a convolutional neural network CNN model, the multiple images are extracted through the CNN model. ,Also includes:

Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;

The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.
The method according to claim 8, wherein the third sample data set includes a first sub-sample data set and a second sub-sample data set, and the first sub-sample data set includes M third sample images , And the M third sample images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The three sample images are different from the N third sample images, and the M, S, N, and T are positive integers.
A similar image detection device, characterized in that the device includes:

The first acquisition module is used to acquire multiple images to be detected;

The first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

The measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ；

The detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
The device according to claim 10, wherein the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, and the sample data set includes samples of multiple categories The image and the category labeling information of each sample image.
The device according to claim 10, wherein the measuring the similarity of the feature vectors of the multiple images by the SVM model comprises:

Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
The device according to claim 10, wherein the performing similar image detection on the multiple images based on the similarity metric values of the multiple images comprises:

Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;

The similar images are determined to be images of the same category.
The apparatus according to any one of claims 10-13, wherein before the similarity measurement is performed on the feature vectors of the multiple images by the SVM model, the method further comprises:

Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;

The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;

According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.
The device according to claim 14, wherein the SVM model to be trained is trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, Obtaining the SVM model includes:

According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
The device according to claim 14, wherein after said obtaining the first sample data set, it further comprises:

Preprocessing the first sample data set to obtain a preprocessed first sample data set;

Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
The device according to any one of claims 10-13, wherein the multiple images are used as input of a CNN model of a convolutional neural network, and before the feature extraction is performed on the multiple images through the CNN model ,Also includes:

Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;

The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.
The apparatus according to claim 17, wherein the third sample data set includes a first sub-sample data set and a second sub-sample data set, and the first sub-sample data set includes M third sample images , And the M third sample images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The three sample images are different from the N third sample images, and the M, S, N, and T are positive integers.
An electronic device, wherein the electronic device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement any one of claims 1 to 9 The similar image detection method described in one.
A computer-readable storage medium, wherein the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to realize the similar image detection according to any one of claims 1 to 9 method.