CN111222548A

CN111222548A - Similar image detection method, device, equipment and storage medium

Info

Publication number: CN111222548A
Application number: CN201911390241.0A
Authority: CN
Inventors: 孙莹莹
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-06-02
Also published as: WO2021136027A1

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for detecting similar images, belonging to the field of image processing. The method comprises the following steps: taking a plurality of images to be detected as input of a CNN model, and performing feature extraction on the plurality of images through the CNN model to obtain feature vectors of the plurality of images; taking the feature vectors of the plurality of images as the input of an SVM model, and carrying out similarity measurement on the feature vectors of the plurality of images through the SVM model to obtain similarity measurement values of the plurality of images; and carrying out similar image detection on the plurality of images based on the similarity metric values of the plurality of images. The method and the device can effectively extract the shallow and deep features in the image, judge whether the image is similar according to the similarity between the feature vectors of the image, and improve the detection accuracy of the image similarity.

Description

Similar image detection method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of image processing, in particular to a method, a device, equipment and a storage medium for detecting similar images.

Background

Similar image detection is a fundamental problem in computer vision, and aims to compare the similarity between images and judge whether the images are similar according to the similarity between the images. The similar image detection can be applied to different task scenes, for example, similar images can be detected from an album of the mobile phone by using a similar image detection technology, and then partial images in the similar images are deleted from the album, so that the memory of the mobile phone is saved.

In the related art, a hash algorithm may be used for similar image detection. Specifically, two images to be detected may be subjected to hash processing by a hash algorithm to obtain hash values of the two images, and then a hamming distance between the two images based on the hash values is calculated, and if the hamming distance based on the hash values is smaller than a threshold, the two images are determined to be similar images, and if the hamming distance based on the hash values is greater than or equal to the threshold, the two images are determined not to be similar images.

However, when the hash algorithm is used for detecting similar images, the hash algorithm needs to be used for compressing the images, which may cause serious loss of image content information in the compression process, and when whether the images are similar or not is judged based on the hamming distance of the hash value, if the length of the generated hash value is short, the image characteristics may be difficult to distinguish, and these factors may cause low detection accuracy of image similarity.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for detecting similar images. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for detecting similar images, where the method includes:

acquiring a plurality of images to be detected;

taking the multiple images as the input of a Convolutional Neural Network (CNN) model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

taking the feature vectors of the plurality of images as input of an SVM (support vector machine) model, and performing similarity measurement on the feature vectors of the plurality of images through the SVM model to obtain similarity measurement values of the plurality of images;

and performing similar image detection on the plurality of images based on the similarity metric values of the plurality of images.

In another aspect, a similar image detecting apparatus is provided, the apparatus including:

the first acquisition module is used for acquiring a plurality of images to be detected;

the first extraction module is used for taking the multiple images as input of a CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

the measurement module is used for taking the feature vectors of the images as the input of an SVM model, and carrying out similarity measurement on the feature vectors of the images through the SVM model to obtain similarity measurement values of the images;

and the detection module is used for carrying out similar image detection on the plurality of images based on the similarity metric values of the plurality of images.

In another aspect, an electronic device is provided, the electronic device comprising a processor and a memory; the memory stores at least one instruction for execution by the processor to implement the similar image detection method described above.

In another aspect, a computer-readable storage medium is provided, which stores at least one instruction for execution by a processor to implement the similar image detection method described above.

In another aspect, a computer program product is provided, which stores at least one instruction for execution by a processor to implement the similar image detection method described above.

The technical scheme provided by the application can at least bring the following beneficial effects:

in the embodiment of the application, for a plurality of images to be detected, the plurality of images are input into the CNN model, feature extraction is carried out through the CNN model, shallow and deep features in the images can be effectively extracted, information in the images can be effectively utilized, detection accuracy is improved, the extracted feature vectors are input into the SVM model, similarity measurement is carried out on the feature vectors of each image through the SVM model, whether the images are similar or not can be judged according to similarity between the feature vectors of the images, the detection accuracy is further improved, meanwhile, calculated quantity is reduced, and detection efficiency is improved.

Drawings

FIG. 1 is a flow chart of a model training method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a network structure of a VGG Net-16 model provided by an embodiment of the present application;

FIG. 3 is a flowchart of a similar image detection process provided in an embodiment of the present application;

fig. 4 is a flowchart of a similar image detection method provided in an embodiment of the present application;

FIG. 5 is a flowchart of another similar image detection method provided in the embodiments of the present application;

fig. 6 is a block diagram of a similar image detection apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Before describing the similar image detection method provided in the embodiment of the present application in detail, an application scenario of the embodiment of the present application is introduced.

The similar image detection method provided by the embodiment of the application is applied to the field of computer vision, and is particularly applied to a scene for detecting the similarity of images so as to detect similar images from multiple images. Similar image detection is a very important basic problem in the field of computer vision, and the quality of a plurality of task results depends on the quality of similarity measurement.

As an example, the similar image detection method provided by the embodiment of the present application may be popularized to an album application, for example, the similar image detection method provided by the embodiment of the present application detects a similar image from an album, and then recommends the similar image to a user for deletion and cleaning, so as to help the user better manage the album. Of course, the similar image detection method may also be applied to other scenes, which is not limited in this embodiment of the present application.

Next, an implementation environment related to the embodiments of the present application will be described.

The similar image detection method provided by the embodiment of the application can be applied to a similar image detection device, the similar image detection device can be an electronic device such as a terminal or a server, the terminal can be a mobile phone, a tablet computer or a computer, and the server can be an applied background server.

The similar image detection method provided in the embodiment of the application is a similar image detection method based on deep learning, a CNN (Convolutional Neural Networks) model is required to be used for feature extraction in the detection process, an SVM (Support Vector Machine) model is required to be used for similarity measurement in the detection process, and for convenience of understanding, model training methods of the CNN model and the SVM model are introduced firstly.

Fig. 1 is a flowchart of a model training method provided in an embodiment of the present application, where the method is applied to a similar image detection apparatus, which may be an electronic device such as a terminal or a server, as shown in fig. 1, and the method includes the following steps:

step 101: and acquiring a second sample data set and a third sample data set, wherein the second sample data set comprises second sample images of various categories and category label information of each second sample image, and the third sample data set comprises third sample images of various categories and category label information of each third sample image.

And the second sample data set and the third sample data set are sample data sets which are acquired in advance and can meet the requirement of model training. The category label information is used to indicate the category of the corresponding image. As one example, the categories of the second sample images of the plurality of categories and the third sample images of the plurality of categories are different.

As one example, the second and third sample data sets are network image data sets. For example, the second sample data set is an ImageNet (image network) data set, and the third sample data set is a trademark image data set. The ImageNet data set comprises 1400 images and 2 ten thousand images in total, is a commonly used data set at present, and can be used for research works such as image classification, target positioning, target detection and the like. The trademark image data set includes trademark images of various categories.

As an example, the third sample data set comprises two different subsample data sets, e.g. the third sample data set comprises a first subsample data set and a second subsample data set. The first subsample data set comprises M third sample images, the M third sample images belong to S categories, the second subsample data set comprises N third sample images, the N third sample images belong to T categories, and the M third sample images are different from the N third sample images. Wherein M and N are both positive integers, and S and T are also positive integers.

As an example, if the third sample data set is a trademark image data set, the first sub-sample data set may be a Logo (trademark) -405 data set, and the second sub-sample data set may be a FlickrLogo (weleke trademark) -32 data set. Wherein, the Logo-405 data set is from internet crawler, comprises 32218 images, and collects and arranges 405 trademark image data including various large luxury trademarks, the number of each trademark data is from dozens to one hundred, and the size of each trademark image is about 300 x 500. The FlickrLogo-32 data set is derived from data published on the web, which collectively contains 32 types of brand pictures including the brand of each large Internet company for a total of 8240 images.

As an example, for the acquired second and third sample data sets, formatting and de-duplication may also be performed on the images in the two sample data sets.

As an example, for the acquired second sample data set and third sample data set, data enhancement processing may also be performed on the two sample data sets. Wherein the data enhancement processing includes at least one of scaling, adding noise, rotating, and normalizing processing.

For example, for the acquired second sample data set and the third sample data set, the images in any sample data set may be scaled to ensure that the dimensions of the images in the sample data set are uniform. Because the scales of the actual images are different and the uniform scale is ensured in the training set, the scales of the images in the sample data set can be adjusted uniformly by adopting a scaling method, and then the images can be learned and recognized by a model.

For another example, for the acquired second sample data set and the third sample data set, noise may be randomly added to the image in any sample data set. Because the actual image often contains a lot of noises, and the sample image with concentrated sample data is usually cleaner, if training is directly performed, the robustness of the model to the noises is poor, even in the image detection process, if the noise of a plurality of pixel points exists on the image to be detected, the model can be identified wrongly, and therefore in order to make the model more robust to the noises, the noise can be randomly added to the training image.

For another example, for the acquired second sample data set and the third sample data set, some images may be selected from any sample data set, and then the selected images are rotated, and the rotated images are added to the sample data set, so as to increase the data volume of the sample data set. The selection method may be a random selection method or other preset selection methods, which is not limited in the embodiment of the present application. In addition, after the selected image is rotated, the image out of the display area may be cropped to keep the image scale uniform.

For another example, for the acquired second sample data set and the third sample data set, normalization processing may be performed on the sample image in any sample data set to remove redundant information. As an example, the normalization process includes: and normalizing the pixel value of the sample image from [0, 255] to [0, 1] to remove redundant information contained in the sample data to be trained, thereby further shortening the training time.

As an example, a fourth sample data set and a fifth sample data set may also be obtained, where the fourth sample data set and the fifth sample data set each include sample images of multiple categories and a category label of each sample image, and the sample images in the fourth sample data set and the fifth sample data set are different. For example, the fourth sample data set is an ImageNet data set, and the fifth sample data set is a trademark image data set.

Then, preprocessing the sample images in the fourth sample data set, dividing the preprocessed fourth sample data set into a training set and a test set, and taking the training set as a second sample data set. And preprocessing the sample images in the fifth sample data set, dividing the preprocessed fifth sample data set into a training set and a test set, and taking the training set as a third sample data set.

The training set is used for training the CNN model, and the testing set is used for testing the CNN model. The pre-processing includes at least one of formatting, de-duplication, and data enhancement processing including at least one of scaling, adding noise, rotating, and normalizing processing.

As an example, the preprocessed fourth sample data set may be proportionally divided into a training set and a test set, and the preprocessed fifth sample data set may be proportionally divided into a training set and a test set.

Step 102: and pre-training the CNN model to be trained according to the second sample data set to obtain an initialized CNN model.

As an example, the CNN model may be a VGGNet (Visual Geometry Group Network) model. The VGGNet model is a deep convolutional neural network model proposed by the visual geometry group of Oxford university, and can reduce the Top-5 error rate to 7.3%.

Among them, the VGGNet model mainly has the following characteristics: 1) the convolution layers in the whole network structure all adopt 3 x 3 convolution kernels; 2) in the network structure, two convolution layers of 3 x 3 replace a traditional convolution layer of 5 x 5, and three convolution layers of 3 x 3 replace a traditional convolution layer of 7 x 7, so that the nonlinear expression capability of the network is increased; 3) the number of convolution kernels 3 x 3 is less than the number of parameters of a convolution kernel of large size, reducing the parameters of the overall network.

As an example, the VGGNet model may be a 19-tier VGGNet-19 model or a 16-tier VGGNet-16 model. Referring to fig. 2, fig. 2 is a schematic diagram of a network structure of a VGG Net-16 model according to an embodiment of the present disclosure, and as shown in fig. 2, the vgnet-16 model includes 13 convolutional layers and 3 fully-connected layers.

As an example, the VGGNet-16 model uses, except for a larger number of layers, all convolutional layers in the network structure have the same size of convolutional kernel, the size of the convolutional kernel is 3 × 3, which is the minimum size window capable of capturing information of top, bottom, left, right, and center, and each 3 × 3 convolutional layer has a pixel fill, ensuring that the input and output sizes after convolution are consistent.

By pre-training the CNN model to be trained according to the second sample data set, an initialized CNN model capable of classifying images in the second sample data set can be obtained.

The VGGNet model is obtained by training based on a sample data set, wherein the sample data set comprises sample images of various categories and category label information of each sample image. For example, the VGGNet model to be trained is pre-trained according to the second sample data set to obtain an initialized VGGNet model, and then the initialized VGGNet model is trained according to the third sample data set to obtain the VGGNet model.

As an example, the VGGNe-16 model may be trained by using the ImageNet data set to obtain a deep learning classification model pre-trained on a natural image, and the deep learning classification model after training is used as an initialization model for training the trademark image classification model in the next step.

Step 103: and training the initialized CNN model according to the third sample data set to obtain the CNN model.

The CNN model is used for extracting the features of any image to obtain the feature vector of the image.

After the CNN model to be trained is pre-trained according to the second sample data set to obtain an initialized CNN model, the initialized CNN model may be further trained according to a third sample data set to obtain a CNN model, and the CNN model is used as a feature extraction model for feature extraction of an image.

As an example, a trademark image is trained by using a VGG-16 model pre-trained on an ImageNet data set as an initialization model, and the trained VGG-16 model is used as a feature extractor.

As an example, in the training process, the training set may be input into the CNN model for training, and iterated for a preset number of times. The preset number may be preset, for example, the preset number may be 80, 90, or 100.

As an example, a gradient descent algorithm may be used to optimize the objective function during each iterative computation such that the model converges. As an example, the gradient descent algorithm may be an Adam (Adaptive motion estimation) gradient descent algorithm, which is an efficient calculation method and can increase the gradient descent convergence speed. For example, the batch sample size of the Adam gradient descent algorithm may be set in advance, and for example, the batch size may be set to 32.

Step 104: a first sample dataset is acquired that includes a plurality of categories of first sample images and category label information for each of the first sample images.

The first sample data set may be the first sample data set and/or the second sample data set, or may be another sample data set other than the first sample data set and the second sample data set.

As an example, the first sample data set is the second sample data set described above, for example, the first sample data set is a trademark image data set. Illustratively, the first sample dataset is the Logo-405 dataset and the FlickrLogo-32 dataset.

Step 105: and taking the first sample images of the multiple types as input of the CNN model, and performing feature extraction on the first sample images of the multiple types through the CNN model to obtain feature vectors of the first sample images of the multiple types.

The feature vector is used for characterizing features of the sample image in all dimensions. As one example, the feature vector may be a feature vector of dimension 4096.

As an example, features of trademark data sets Logo-405 and Flickrlogo-32 can be extracted through the CNN model respectively to obtain a feature vector of each image, and finally a depth characterization form of each trademark image after feature extraction through a VGG-16 model is obtained.

In the embodiment of the application, the CNN model is used for feature extraction, the CNN model can automatically learn the features of the image in the training process, manual design and artificial intervention for feature learning are not needed, and the feature extraction efficiency and accuracy are improved.

Step 106: and training the SVM model to be trained according to the feature vectors of the first sample images of various types and the class mark information of each first sample image to obtain the SVM model.

As an example, the operation of training the SVM model to be trained according to the feature vectors of the plurality of types of first sample images and the class label information of each first sample image includes: solving the objective function according to the feature vectors of the first sample images of various categories and the category label information of each first sample image to obtain a classification function of the SVM model; wherein the objective function is used to indicate that an interval between different classes of first sample images among the multiple classes of first sample images is maximum.

The basic idea of the support vector machine algorithm is that firstly, an input sample space is transformed to a high-dimensional space, then, an optimal classification hyperplane is searched in the new high-dimensional feature space, so that the interval between sample points of different classes is maximum, and the classification hyperplane is the maximum interval hyperplane.

As an example, assume that the training set D { (x) of the SVM model to be trained₁,y₁),(x₂,y₂),...,(x_n,y_n)}，

y_iE { -1,1 }. Wherein x represents a feature vector of the first sample image, and y represents class label information of the first sample image, the objective function of the SVM model to be trained may be:

where ω is a normal vector, determining the direction of the classification hyperplane, and b is the displacement, determining the distance between the classification hyperplane and the origin.

As an example, the constraint conditions of the SVM model to be trained are:

s.t.y_i(ω^Tx_i+b)≥1，i＝1,...n (2)

introducing a lagrange multiplier α into the objective function, the resulting objective function is:

solving the objective function, namely, deriving the objective function and solving the optimal classification surface, wherein the obtained classification function expression is as follows:

assuming that the data is linearly separable, a hyperplane can be found to completely separate the different classes of data. As an example, the SVM model to be trained may also be an SVM model based on a Kernel Function, and for a linear immiscibility problem, the SVM model to be trained may also solve a classification Function by means of a Kernel Function (Kernel Function). The kernel function is also a special similarity measure function, and different kernel functions represent different similarity measures.

After the CNN model and the SVM model are trained, the test set can be used as the input of the CNN model, the feature vector of each image in the test set is extracted, the feature vector of each image in the test set is used as the input of the SVM model for testing, and the accuracy of the model is verified. And if the CNN model and the SVM model pass the verification, finishing the training, and if the CNN model and the SVM model do not pass the verification, continuing to train the CNN model and the SVM model according to the training set until the trained models can pass the verification.

Referring to fig. 3, fig. 3 is a flow chart of a similar image detection process provided in an embodiment of the present application, and as shown in fig. 3, a sample data set may be first constructed, then sample images in the sample data set are preprocessed, and then the preprocessed sample data set is proportionally divided into a training set and a test set, where the training set is used for training a CNN model and an SVM model, and the test set is used for verifying the accuracy of the trained CNN model and SVM model. Then, constructing a VGG-16-based image similarity feature extraction model, namely a CNN model, based on the training set and the test set, and classifying feature vectors extracted by the CNN model by using an SVM model to complete image similarity detection.

As an example, embodiments of the present application may define a convolutional neural network model using TensorFlow (an open source software library). With the flexible architecture of the TensorFlow, users can easily deploy computing jobs to a variety of platforms and devices. The multiple platforms include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Temporal Processing Unit (TPU), and the multiple devices may include a desktop device, a server cluster, a mobile device, an edge device, and the like.

After the training of the CNN model and the SVM model is completed, similar image detection can be performed according to the trained CNN model and SVM model. Next, a similar image detection process of the embodiment of the present application will be described in detail.

Fig. 4 is a flowchart of a similar image detection method provided in an embodiment of the present application, where the method is applied to a similar image detection apparatus, and as shown in fig. 4, the method includes the following steps:

step 401: and acquiring a plurality of images to be detected.

The method and the device for detecting the images of the image to be detected are characterized in that the images to be detected can be uploaded by a user, can be acquired from a storage space of the device, can be sent by other equipment, and can also be acquired from a network.

As an example, an image stored in an album of the terminal may be acquired, and the image stored in the album may be regarded as a plurality of images to be detected.

Step 402: and taking the multiple images as input of a CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images.

As one example, the CNN model is a VGGNet model. For example, the CNN model is VGGNet-16 or VGGNet-19.

As an example, before performing feature extraction using the multiple images as input of the CNN model, the multiple images may be preprocessed, and then the preprocessed multiple images may be used as input of the CNN model to perform feature extraction using the CNN model. Wherein the pre-processing includes at least one of format adjustment, deduplication, and data enhancement.

Step 403: and taking the feature vectors of the multiple images as the input of an SVM model, and carrying out similarity measurement on the feature vectors of the multiple images through the SVM model to obtain similarity measurement values of the multiple images.

As an example, a classification function value of each image may be determined by a classification function of the SVM model, and the classification function value of each image may be used as a similarity metric value of each image.

The classification function is used for representing a classification hyperplane for classifying the images, and the classification function value of each image is used for representing the distance between each image and the classification hyperplane. The closer the classification function values of any two images in the plurality of images are, the closer the distance between the two images is, the higher the similarity between the two images is.

It should be noted that, in the process of measuring the similarity, the quality of image feature selection also has a very important effect on the result of measuring the similarity, and therefore, the quality of image feature extraction and the selection of the measurement algorithm have a certain influence on the result of detecting the similarity of the image. In the embodiment of the application, the feature vectors of the images can be combined with the traditional pattern recognition algorithm, whether the images are similar or not is judged by measuring the similarity between the image features, the detection task is completed, the shallow layer and the high layer features in the images are extracted by adopting deep learning, the information in the images is effectively utilized, and the detection accuracy is improved.

Step 404: and carrying out similar image detection on the plurality of images based on the similarity metric values of the plurality of images.

As an example, at least two images of the plurality of images whose similarity metric values are within the same metric value range are determined as similar images, and the similar images are determined as images of the same category.

That is, the image features of the plurality of images may be input as an SVM model, and the plurality of images may be classified according to the image features of the plurality of images by the SVM model. Wherein the images classified into one category are similar images.

As an example, after the similar images are determined as the images of the same category, the images of the same category and the category labels may be respectively stored in correspondence.

Referring to fig. 5, fig. 5 is a flowchart of another similar image detection method provided in an embodiment of the present application, and as shown in fig. 5, an image to be detected may be obtained first, then a prediction process is performed on the image to be detected, then a CNN model is used to perform feature extraction on the preprocessed image, an SVM model is used to perform similarity measurement on the extracted feature vectors, and similar image detection is performed based on a similarity measurement result.

In addition, the method for deep learning is introduced into image similarity identification and detection, information in sample images is mined and learned by constructing a convolutional neural network, depth characterization is carried out on different sample images, and then the similarity between features is measured by a traditional pattern recognition algorithm, so that the problems of identification and detection of a large number of similar images are solved. In addition, the image similarity identification and detection problem is ingeniously converted into the image classification problem, the deep learning model is used as a feature extractor, the similarity between features is measured by combining a traditional pattern recognition algorithm, the inaccuracy caused by calculating a Hash mean value is ingeniously avoided, the deep learning method is combined with the traditional pattern recognition method, the capability of deep learning convolutional neural network transfer learning and automatic learning of image features can be effectively utilized, the deep characterization is combined with the traditional pattern recognition algorithm, and the accuracy of similar image identification and detection is further improved.

Fig. 6 is a block diagram of a similar image detection apparatus provided in an embodiment of the present application, and as shown in fig. 6, the apparatus includes a first obtaining module 601, a first extracting module 602, a metric module 603, and a detection module 604.

The first obtaining module 601 is configured to obtain a plurality of images to be detected;

a first extraction module 602, configured to use the multiple images as input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;

a measuring module 603, configured to use the feature vectors of the multiple images as an input of an SVM model, and perform similarity measurement on the feature vectors of the multiple images through the SVM model to obtain similarity measurement values of the multiple images;

a detecting module 604, configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.

Optionally, the CNN model is a VGGNet model, and the VGGNet model is obtained by training based on a sample data set, where the sample data set includes multiple types of sample images and category label information of each sample image.

Optionally, the metric module 603 is configured to:

determining a classification function value of each image through a classification function of the SVM model, and taking the classification function value of each image as a similarity metric value of each image; the classification function is used for representing a classification hyperplane for classifying the images, and the classification function value of each image is used for representing the distance between each image and the classification hyperplane.

Optionally, the detecting module 604 is configured to:

determining at least two images with similarity metric values in the same metric value range in the multiple images as similar images;

and determining the similar images as the images of the same category.

Optionally, the apparatus further comprises:

a second obtaining module, configured to obtain a first sample data set, where the first sample data set includes first sample images of multiple categories and category label information of each first sample image;

the second extraction module is used for taking the first sample images of the multiple categories as the input of the CNN model, and performing feature extraction on the first sample images of the multiple categories through the CNN model to obtain feature vectors of the first sample images of the multiple categories;

and the first training module is used for training the SVM model to be trained according to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image to obtain the SVM model.

Optionally, the first training module is configured to:

solving an objective function according to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image to obtain a classification function of the SVM model; wherein the objective function is used to indicate that an interval between different classes of first sample images among the multiple classes of first sample images is maximum.

Optionally, the apparatus further comprises:

the second training module is used for pre-training the CNN model to be trained according to a second sample data set to obtain an initialized CNN model, wherein the second sample data set comprises second sample images of various types and the type marking information of each second sample image;

and the third training module is used for training the initialized CNN model according to a third sample data set to obtain the CNN model, wherein the third sample data set comprises third sample images of various types and the type mark information of each third sample image.

Optionally, the third sample data set comprises a first subsample data set comprising M third sample images and belonging to S classes and a second subsample data set comprising N third sample images and belonging to T classes, the M third sample images being different from the N third sample images.

It should be noted that: in the similar image detection apparatus provided in the above embodiment, when performing similar image detection, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the similar image detection apparatus provided in the above embodiment and the similar image detection method embodiment belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present disclosure, where the electronic device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer. The electronic device may generate a relatively large difference due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where the memory 702 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 701 to implement the similar image detection method provided in each of the above method embodiments. Of course, the electronic device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the electronic device may further include other components for implementing the functions of the device, which is not described herein again.

The embodiment of the present application further provides a computer-readable medium, where at least one instruction is stored, and the at least one instruction is loaded and executed by the processor to implement the similar image detection method according to the above embodiments.

The embodiment of the present application further provides a computer program product, where at least one instruction is stored, and the at least one instruction is loaded and executed by the processor to implement the similar image detection method according to the above embodiments.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for detecting similar images, the method comprising:

acquiring a plurality of images to be detected;

2. The method of claim 1, wherein the CNN model is a visual geometry group network (VGGNet) model, and the VGGNet model is trained based on a sample data set, the sample data set comprising multiple classes of sample images and class label information for each sample image.

3. The method of claim 1, wherein the performing similarity measurements on the feature vectors of the plurality of images by the SVM model comprises:

4. The method of claim 1, wherein the performing similar image detection on the plurality of images based on the similarity metric values of the plurality of images comprises:

and determining the similar images as the images of the same category.

5. The method according to any one of claims 1-4, wherein before performing the similarity measurement on the feature vectors of the plurality of images through the SVM model, the method further comprises:

obtaining a first sample data set, wherein the first sample data set comprises a plurality of types of first sample images and the type mark information of each first sample image;

taking the first sample images of the multiple categories as input of the CNN model, and performing feature extraction on the first sample images of the multiple categories through the CNN model to obtain feature vectors of the first sample images of the multiple categories;

and training an SVM model to be trained according to the feature vectors of the first sample images of the various categories and the category label information of each first sample image to obtain the SVM model.

6. The method of claim 5, wherein the training an SVM model to be trained according to the feature vectors of the first sample images of the plurality of classes and the class label information of each first sample image to obtain the SVM model comprises:

7. The method according to any one of claims 1-4, wherein before the step of inputting the plurality of images into a Convolutional Neural Network (CNN) model and performing feature extraction on the plurality of images through the CNN model, the method further comprises:

pre-training a CNN model to be trained according to a second sample data set to obtain an initialized CNN model, wherein the second sample data set comprises second sample images of various categories and category label information of each second sample image;

and training the initialized CNN model according to a third sample data set to obtain the CNN model, wherein the third sample data set comprises third sample images of various types and the type mark information of each third sample image.

8. The method of claim 7, wherein the third sample data set comprises a first subsample data set and a second subsample data set, the first subsample data set comprises M third sample images, and the M third sample images belong to S categories, the second subsample data set comprises N third sample images, and the N third sample images belong to T categories, the M third sample images being different from the N third sample images.

9. A similar image detecting apparatus, characterized in that the apparatus comprises:

10. An electronic device, comprising a processor and a memory; the memory stores at least one instruction for execution by the processor to implement a similar image detection method as claimed in any one of claims 1 to 8.

11. A computer-readable storage medium having stored thereon at least one instruction for execution by a processor to implement a similar image detection method as claimed in any one of claims 1 to 8.