WO2021136027A1 - Similar image detection method and apparatus, device and storage medium - Google Patents

Similar image detection method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2021136027A1
WO2021136027A1 PCT/CN2020/138510 CN2020138510W WO2021136027A1 WO 2021136027 A1 WO2021136027 A1 WO 2021136027A1 CN 2020138510 W CN2020138510 W CN 2020138510W WO 2021136027 A1 WO2021136027 A1 WO 2021136027A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
sample
data set
image
model
Prior art date
Application number
PCT/CN2020/138510
Other languages
French (fr)
Chinese (zh)
Inventor
孙莹莹
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021136027A1 publication Critical patent/WO2021136027A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to a similar image detection method, device, device, and storage medium.
  • Similar image detection is a basic problem in computer vision, which aims to compare the similarity between images and judge whether the images are similar based on the similarity between the images. Similar image detection can be applied to different task scenarios. For example, similar image detection technology can be used to detect similar images from the mobile phone's album, and then delete some of the similar images from the album to save mobile phone memory.
  • a hash algorithm can be used to detect similar images.
  • the two images to be detected can be hashed by a hash algorithm to obtain the hash value of the two images, and then the Hamming distance between the two images based on the hash value can be calculated. If the Hamming distance is less than the threshold, it is determined that the two images are similar images. If the Hamming distance based on the hash value is greater than or equal to the threshold, it is determined that the two images are not similar images.
  • the embodiments of the present application provide a similar image detection method, device, equipment, and storage medium.
  • the technical solution is as follows:
  • an embodiment of the present application provides a similar image detection method, and the method includes:
  • Similar image detection is performed on the multiple images.
  • a similar image detection device includes:
  • the first acquisition module is used to acquire multiple images to be detected
  • the first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
  • the measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ;
  • the detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • an electronic device in another aspect, includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement the above-mentioned similar image detection method .
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
  • a computer program product stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
  • Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the network structure of a VGGNet-16 model provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a similar image detection process provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of another similar image detection method provided by an embodiment of the present application.
  • Fig. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects before and after are in an "or” relationship.
  • the similar image detection method provided by the embodiments of the present application is applied in the field of computer vision, and is specifically applied in a scene where similarity detection is performed on images, so as to detect similar images from multiple images. Similar image detection is a very important basic problem in the field of computer vision. The quality of many tasks depends on the quality of the similarity measure.
  • the similar image detection method provided in the embodiment of this application can be extended to the album application.
  • the similar image detection method provided in the embodiment of this application detects similar images from the album, and then recommends the similar images to the user Delete and clean up to help users better manage albums.
  • the similar image detection method can also be applied to other scenarios, which is not limited in the embodiment of the present application.
  • the similar image detection method provided by the embodiments of the application can be applied to a similar image detection device.
  • the similar image detection device can be an electronic device such as a terminal or a server.
  • the terminal can be a mobile phone, a tablet, or a computer. .
  • the embodiment of the present application provides a similar image detection method, and the method includes:
  • Similar image detection is performed on the multiple images.
  • the CNN model is a visual geometry group network VGGNet model
  • the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
  • the measuring the similarity of the feature vectors of the multiple images by the SVM model includes:
  • the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image
  • the classification hyperplane the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  • the performing similar image detection on the multiple images based on the similarity metric values of the multiple images includes:
  • the similar images are determined to be images of the same category.
  • the method further includes:
  • the first sample data set including multiple types of first sample images and category label information of each first sample image
  • the first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories.
  • the feature vector of the sample image is used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories.
  • the SVM model to be trained is trained to obtain the SVM model.
  • the training the SVM model to be trained based on the feature vector of the first sample images of the multiple categories and the category label information of each first sample image to obtain the SVM model includes:
  • the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  • the method further includes:
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • the method before the using the multiple images as the input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model, the method further includes:
  • the initialization CNN model is trained according to a third sample data set to obtain the CNN model.
  • the third sample data set includes third sample images of multiple categories and category label information of each third sample image.
  • the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples
  • the images belong to S categories
  • the second sub-sample data set includes N third sample images
  • the N third sample images belong to T categories.
  • the M third sample images are related to the N third sample images.
  • the three sample images are different, and the M, S, N, and T are positive integers.
  • the similar image detection method provided in the embodiments of this application is a similar image detection method based on deep learning.
  • the detection process requires the use of CNN (Convolutional Neural Networks, convolutional neural network) model for feature extraction, and the use of SVM (Support Vector) Machine, support vector machine) model for similarity measurement, in order to facilitate understanding, next, first introduce the model training methods of the CNN model and the SVM model.
  • CNN Convolutional Neural Networks, convolutional neural network
  • SVM Small Vector
  • Support vector machine Support vector machine
  • Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application. The method is applied to a similar image detection device.
  • the device may be an electronic device such as a terminal or a server. As shown in Fig. 1, the method includes the following steps :
  • Step 101 Obtain a second sample data set and a third sample data set.
  • the second sample data set includes multiple types of second sample images and the category label information of each second sample image.
  • the third sample data set includes multiple types The category of the third sample image and the category label information of each third sample image.
  • the second sample data set and the third sample data set are pre-acquired sample data sets that can meet the requirements of model training.
  • the category tag information is used to indicate the category of the corresponding image.
  • the second sample images of multiple categories and the third sample images of multiple categories have different categories.
  • the second sample data set and the third sample data set are network image data sets.
  • the second sample data set is the ImageNet (image network) data set
  • the third sample data set is the trademark image data set.
  • the ImageNet data set contains a total of more than 14 million images and a total of more than 20,000 images.
  • the ImageNet data set is currently a more commonly used data set. On this data set, research work such as image classification, target positioning, and target detection can be carried out.
  • the trademark image data set includes various types of trademark images.
  • the third sample data set includes two different sub-sample data sets, for example, the third sample data set includes a first sub-sample data set and a second sub-sample data set.
  • the first subsample data set includes M third sample images, and M third sample images belong to S categories
  • the second subsample data set includes N third sample images
  • N third sample images belong to
  • M third sample images are different from N third sample images.
  • M and N are both positive integers
  • S and T are also positive integers.
  • the third sample data set is a trademark image data set
  • the first sub-sample data set may be the logo (trademark)-405 data set
  • the second sub-sample data set may be FlickrLogo ( ⁇ link trademark)- 32 data sets.
  • the Logo-405 data set comes from Internet crawlers, including 32218 images.
  • a total of 405 types of trademark image data including major luxury trademarks are collected.
  • the number of each type of trademark data ranges from tens to more than 100. Varying, the size of each trademark image is about 300*500.
  • the FlickrLogo-32 data set comes from data publicly available on the Internet.
  • the data set contains a total of 32 types of trademark images including trademarks of major Internet companies, with a total of 8,240 images.
  • preprocessing may be performed on the images in the two sample data sets.
  • the second sample data set and the third sample data set can be preprocessed in the following manner:
  • the images in the two sample data sets may also be formatted, deduplicated, and renamed.
  • data enhancement processing may also be performed on these two sample data sets.
  • the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
  • the images in any sample data set may be scaled proportionally to ensure that the scales of the images in the sample data set are uniform. Since the scales of actual images are often different, and the scales must be uniform in the training set, the scaling method can be used to adjust the scales of the images in the sample data set to be uniform, so that the model can learn and recognize these images.
  • noise can be randomly added to the images in any sample data set. Since the actual image often contains a lot of noise, and the sample images in the sample data set are usually relatively clean, if the training is carried out directly, the robustness of the model to noise will be poor, and even during the image detection process, if there are several images on the image to be detected. For the noise of each pixel, the model will recognize errors. Therefore, in order to make the model more robust to noise, the embodiment of the present application may randomly add noise to the training image.
  • the acquired second sample data set and third sample data set you can select an image from any sample data set, then rotate the selected image, and add the rotated image to the sample data set to Increase the data volume of the sample data set.
  • the selection method used may be a random selection method or other preset selection methods, which is not limited in the embodiment of the present application.
  • the image transferred out of the display area can be cropped to maintain the uniformity of the image scale.
  • the sample images in any sample data set may be normalized to remove redundant information.
  • the normalization processing includes: normalizing the pixel value of the sample image from [0, 255] to [0, 1] to remove redundant information contained in the sample data to be trained and further shorten the training time.
  • a fourth sample data set and a fifth sample data set can also be obtained.
  • Both the fourth sample data set and the fifth sample data set include sample images of multiple categories and the category label information of each sample image.
  • the sample images in the fourth sample data set and the fifth sample data set are different.
  • the fourth sample data set is the ImageNet data set
  • the fifth sample data set is the trademark image data set.
  • preprocess the sample images in the fourth sample data set divide the preprocessed fourth sample data set into a training set and a test set, and use the training set as the second sample data set.
  • preprocess the sample images in the fifth sample data set divide the preprocessed fifth sample data set into a training set and a test set, and use the training set as the third sample data set.
  • the training set is used to train the CNN model
  • the test set is used to test the CNN model.
  • the preprocessing includes at least one of format adjustment, deduplication, and data enhancement processing
  • the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
  • the preprocessed fourth sample data set can be divided into a training set and a test set proportionally
  • the preprocessed fifth sample data set can be divided into a training set and a test set proportionally
  • the ratio of training set to test set can be 8:2.
  • Step 102 Perform pre-training on the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.
  • the CNN model may be a VGGNet (Visual Geometry Group Network) model.
  • VGGNet Visual Geometry Group Network
  • the VGGNet model is a deep convolutional neural network model proposed by the Vision Geometry Group of Oxford University, which can reduce the Top-5 error rate to 7.3%.
  • the VGGNet model mainly has the following characteristics: 1) All convolutional layers in the entire network structure use 3*3 convolution kernels; 2) In the network structure, two 3*3 convolutional layers are used to replace a traditional 5* 5 convolutional layers, three 3*3 convolutional layers replace a traditional 7*7 convolutional layer, which increases the nonlinear expression ability of the network; 3) Multiple 3*3 convolution kernels are larger than one The size of the convolution kernel has fewer parameters, which reduces the overall network parameters.
  • the VGGNet model may be a 19-layer VGGNet-19 model or a 16-layer VGGNet-16 model.
  • FIG. 2 is a schematic diagram of the network structure of a VGG Net-16 model provided by an embodiment of the present application. As shown in FIG. 2, the VGGNet-16 model includes 13 layers of convolutional layers and 3 layers of fully connected layers.
  • all convolutional layers in the network structure have the same size convolution kernel.
  • the size of the convolution kernel is 3*3, which can capture the upper and lower layers.
  • the minimum size window for the left, right and center information, and each 3*3 convolutional layer is filled with one pixel to ensure that the input and output sizes after convolution remain the same.
  • an initialized CNN model that can classify images in the second sample data set can be obtained.
  • the VGGNet model is obtained by training based on a sample data set, and the sample data set includes sample images of multiple categories and category label information of each sample image.
  • the VGGNet model to be trained is pre-trained according to the second sample data set to obtain the initial VGGNet model, and then the initial VGGNet model is trained according to the third sample data set to obtain the VGGNet model.
  • Step 103 Train the initialized CNN model according to the third sample data set to obtain the CNN model.
  • the CNN model is used to perform feature extraction on any image to obtain the feature vector of the image.
  • VGG-16 model pre-trained on the ImageNet dataset as the initialization model, and train the above initialization model according to the trademark image to obtain the trained VGG-16 model, and then use the trained VGG-16 model as Feature extractor.
  • the training set can be input to the CNN model for training, and iterate a preset number of times.
  • the preset number of times can be preset, for example, the preset number of times can be 80 times, 90 times, 100 times, and so on.
  • the gradient descent algorithm can be used to optimize the objective function during each iterative calculation process, so that the model reaches convergence.
  • the gradient descent algorithm may be an Adam (Adaptive Moment Estimation) gradient descent algorithm.
  • the Adam gradient descent algorithm is an efficient calculation method that can improve the convergence speed of the gradient descent.
  • the batch sample size batch_size of the Adam gradient descent algorithm can be set in advance.
  • batch_size can be set to 32.
  • Step 104 Obtain a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image.
  • the first sample data set may be the above-mentioned first sub-sample data set and/or the second sub-sample data set, or may be other than the above-mentioned first sub-sample data set and the second sub-sample data set. Sample data set.
  • the first sample data set is the aforementioned third sample data set, for example, the first sample data set is a trademark image data set.
  • the first sample data set is the logo-405 data set and the FlickrLogo-32 data set.
  • Step 105 Use the first sample image of the multiple categories as the input of the CNN model, and perform feature extraction on the first sample image of the multiple categories through the CNN model to obtain the first sample of the multiple categories The feature vector of the image.
  • the feature vector is used to characterize the features of the sample image in various dimensions.
  • the feature vector may be a feature vector with a dimension of 4096.
  • the CNN model can be used to extract features from the trademark datasets Logo-405 and FlickrLogo-32 respectively, to obtain the feature vector of each image, and finally to obtain the feature extraction of each trademark image through the VGG-16 model Depth representation form.
  • the CNN model is used for feature extraction.
  • the CNN model can automatically learn the features of the image during the training process, without manual design and human intervention in feature learning, which improves the efficiency and accuracy of feature extraction.
  • the first sample data set is preprocessed to obtain the preprocessed first sample data.
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • Step 106 According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, train the SVM model to be trained to obtain the SVM model.
  • the operation of training the SVM model to be trained includes: according to the first sample of multiple categories The feature vector of the image and the category label information of each first sample image are solved for the objective function to obtain the classification function of the SVM model; wherein, the objective function is used to indicate that the multiple categories of the first sample images are different The interval between the first sample images of the category is the largest.
  • the basic idea of the support vector machine algorithm is to first transform the input sample space into a high-dimensional space, and then find the optimal classification hyperplane in this new high-dimensional feature space to maximize the interval between sample points of different categories.
  • the classification hyperplane is the maximum separation hyperplane.
  • the objective function of the SVM model to be trained can be:
  • is the normal vector, which determines the direction of the classification hyperplane
  • b is the displacement, which determines the distance between the classification hyperplane and the origin.
  • constraints of the SVM model to be trained are:
  • the Lagrange multiplier ⁇ is introduced into the objective function, and the objective function obtained is:
  • the SVM model to be trained may also be a kernel function-based SVM model.
  • the SVM model to be trained may also use the kernel function to solve the classification function.
  • the kernel function is also a special similarity measure function, and different kernel functions represent different similarity measures.
  • the CNN model and SVM model After training the CNN model and SVM model, you can also use the test set as the input of the CNN model, extract the feature vector of each image in the test set, and then use the feature vector of each image in the test set as the input of the SVM model for testing. Verify the accuracy of the model. If the CNN model and the SVM model pass the verification, the training is completed. If the verification fails, the training of the CNN model and the SVM model will continue according to the training set until the trained model can pass the verification.
  • FIG. 3 is a flow chart of a similar image detection process provided by an embodiment of the present application.
  • a sample data set can be constructed first, and then the sample images in the sample data set can be preprocessed, and then The preprocessed sample data set is divided into a training set and a test set proportionally.
  • the training set is used to train the CNN model and the SVM model
  • the test set is used to verify the accuracy of the trained CNN model and the SVM model.
  • the VGG-16-based image similarity feature extraction model namely the CNN model
  • the SVM model is used to classify the feature vectors extracted by the CNN model to complete the image similarity detection.
  • the embodiment of the present application may use TensorFlow (an open source software library) to define a convolutional neural network model.
  • TensorFlow an open source software library
  • users can easily deploy computing tasks to multiple platforms and devices.
  • the multiple platforms include CPU (Central Processing Unit), GPU (Graphics Processing Unit, graphics processing unit), and TPU (Tensor Processing Unit, tensor processing unit).
  • the multiple devices may include desktop devices, Server clusters, mobile devices, edge devices, etc.
  • desktop devices include desktop computers, etc.
  • server clusters include one or more servers
  • mobile devices include mobile phones, tablets, smart wearable devices, etc.
  • edge devices refer to switches, routers, and routing switches installed on edge networks , IAD (Integrated Access Device), and various MAN (Metropolitan Area Network)/WAN (Wide Area Network) devices, etc.
  • FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application. The method is applied to a similar image detection device. As shown in FIG. 4, the method includes the following steps:
  • Step 401 Acquire multiple images to be detected.
  • the multiple images to be detected can be uploaded by the user, can be obtained from the storage space of the device, can be sent by other devices, or can be obtained from the network.
  • the method of obtaining is not limited.
  • the images stored in the album of the terminal may be acquired, and the images stored in the album may be used as multiple images to be detected.
  • Step 402 Use the multiple images as the input of the CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images.
  • the CNN model is a VGGNet model.
  • the CNN model is a VGGNet-16 model or a VGGNet-19 model.
  • the multiple images can also be preprocessed, and then the preprocessed multiple images can be used as the input of the CNN model, through the CNN model Perform feature extraction.
  • the preprocessing includes at least one of format adjustment, deduplication, and data enhancement.
  • Step 403 Use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images.
  • the classification function value of each image can be determined through the classification function of the SVM model, and the classification function value of each image can be used as the similarity measurement value of each image.
  • the classification function is used to characterize the classification hyperplane for classifying images
  • the classification function value of each image is used to characterize the distance between each image and the classification hyperplane. The closer the classification function values of any two images in the multiple images are, the closer the distance between the two images is, and the higher the similarity between the two images is.
  • the quality of image feature selection also has a very important impact on the result of similarity measurement. Therefore, the quality of image feature extraction and the selection of measurement algorithms have a significant impact on image similarity detection. The results all have a certain impact.
  • the feature vector of the image can be combined with the traditional pattern recognition algorithm, the similarity between the image features can be measured to determine whether the image is similar, the detection task is completed, and the shallow layer in the image can be extracted by using deep learning. And high-level features, effectively use the information in the image, and improve the accuracy of detection.
  • Step 404 Perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • At least two images in the plurality of images whose similarity metric values are within the same metric value range are determined to be similar images, and the similar images are determined to be images of the same category.
  • the image features of multiple images can be used as the input of the SVM model, and the multiple images can be classified according to the image features of the multiple images through the SVM model.
  • the images classified into one category are similar images.
  • the images of the same category and the category tags may be stored correspondingly.
  • Figure 5 is a flowchart of another similar image detection method provided by an embodiment of the present application.
  • the image to be detected can be obtained first, and then the image to be detected can be preprocessed, and then passed through the CNN model Perform feature extraction on the pre-processed image, and then perform similarity measurement on the extracted feature vector through the SVM model, and perform similar image detection based on the similarity measurement result.
  • the shallow and deep features in the image can be effectively extracted, which can be effectively used
  • the information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model.
  • the similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
  • the application embodiment introduces deep learning methods into image similarity recognition and detection, and builds a convolutional neural network to mine and learn the information in sample images, perform deep characterization of different sample images, and then use traditional pattern recognition
  • the algorithm measures the similarity between features and solves the problem of identifying and detecting a large number of similar pictures.
  • the embodiment of this application cleverly transforms the image similarity recognition and detection problem into an image classification problem, uses a deep learning model as a feature extractor, and combines traditional pattern recognition algorithms to measure the similarity between features, cleverly avoiding calculations
  • the inaccuracies caused by the hash mean, the combination of deep learning methods and traditional pattern recognition methods can not only effectively use the ability of deep learning convolutional neural network transfer learning and automatic learning of image features, but also combine deep representation with traditional pattern recognition
  • the combination of algorithms further improves the accuracy of similar image recognition and detection.
  • FIG. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application. As shown in FIG. 6, the device includes a first acquisition module 601, a first extraction module 602, a measurement module 603, and a detection module 604.
  • the first acquisition module 601 is used to acquire multiple images to be detected
  • the first extraction module 602 is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
  • the measurement module 603 is configured to use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement of the multiple images value;
  • the detection module 604 is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • the CNN model is a visual geometry group network VGGNet model
  • the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
  • the metric module 603 is used to:
  • the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image
  • the classification hyperplane the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  • the detection module 604 is used to:
  • the similar images are determined to be images of the same category.
  • the device further includes:
  • the second acquisition module (not shown in the figure) is used to acquire a first sample data set, the first sample data set including multiple types of first sample images and the category of each first sample image Tag information
  • the second extraction module (not shown in the figure) is used to take the first sample images of the multiple categories as the input of the CNN model, and use the CNN model to analyze the first samples of the multiple categories. Performing feature extraction on the image to obtain the feature vector of the first sample image of the multiple categories;
  • the first training module (not shown in the figure) is used to train the SVM model to be trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image to obtain The SVM model.
  • the first training module is used to:
  • the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  • the device further includes:
  • a preprocessing module (not shown in the figure), configured to preprocess the first sample data set to obtain a preprocessed first sample data set;
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • the device further includes:
  • the second training module is used for pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.
  • the second sample data set includes multiple types of second sample images and the data of each second sample image. Category mark information;
  • the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples
  • the images belong to S categories
  • the second sub-sample data set includes N third sample images
  • the N third sample images belong to T categories.
  • the M third sample images are related to the N third sample images.
  • the three sample images are different, and the M, S, N, and T are positive integers.
  • the shallow and deep features in the image can be effectively extracted, which can be effectively used
  • the information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model.
  • the similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
  • the similar image detection device provided in the above embodiment performs similar image detection, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated to different functional modules according to needs. Complete, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the similar image detection device provided in the foregoing embodiment belongs to the same concept as the similar image detection method embodiment, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer.
  • the electronic device may have relatively large differences due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where at least one instruction is stored in the memory 702, and the at least One instruction is loaded and executed by the processor 701 to implement the similar image detection methods provided by the foregoing method embodiments.
  • the electronic device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the electronic device may also include other components for implementing device functions, which will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to realize similar image detection as described in each of the above embodiments. method.
  • the embodiments of the present application also provide a computer program product that stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the similar image detection method described in each of the above embodiments.
  • the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present application relate to the field of image processing. Disclosed are a similar image detection method and apparatus, a device and a storage medium. The method comprises: using multiple images to be detected as the input of a CNN model, and performing feature extraction on the multiple images by means of the CNN model to obtain feature vectors of the multiple images; using the feature vectors of the multiple images as the input of an SVM model, and performing similarity measurement on the feature vectors of the multiple images by means of the SVM model to obtain similarity measurement values of the multiple images; and performing similar image detection on the multiple images on the basis of the similarity measurement values of the multiple images. According to the present application, shallow and deep features in images can be effectively extracted, and whether the images are similar can be determined according to the similarity between the feature vectors of the images, thereby improving the accuracy of image similarity detection.

Description

相似图像检测方法、装置、设备及存储介质Similar image detection method, device, equipment and storage medium
本申请要求于2019年12月30日提交的申请号为201911390241.0、发明名称为“相似图像检测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on December 30, 2019 with the application number 201911390241.0 and the invention title "Similar image detection method, device, equipment and storage medium", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请实施例涉及图像处理领域,特别涉及一种相似图像检测方法、装置、设备及存储介质。The embodiments of the present application relate to the field of image processing, and in particular, to a similar image detection method, device, device, and storage medium.
背景技术Background technique
相似图像检测是计算机视觉中的一个基础性问题,旨在比较图像之间的相似度,并根据图像之间的相似度判断图像是否相似。相似图像检测可以应用于不同任务场景中,比如可以利用相似图像检测技术,从手机的相册中检测出相似图像,然后将相似图像中的部分图像从相册中删除,以节省手机内存。Similar image detection is a basic problem in computer vision, which aims to compare the similarity between images and judge whether the images are similar based on the similarity between the images. Similar image detection can be applied to different task scenarios. For example, similar image detection technology can be used to detect similar images from the mobile phone's album, and then delete some of the similar images from the album to save mobile phone memory.
相关技术中,可以采用哈希算法进行相似图像检测。具体地,可以通过哈希算法对待检测的两张图像进行哈希处理,得到两张图像的哈希值,然后计算两张图像之间基于哈希值的汉明距离,若基于哈希值的汉明距离小于阈值,则判定这两张图像为相似图像,若基于哈希值的汉明距离大于或等于阈值,则判定这两种图像不是相似图像。In related technologies, a hash algorithm can be used to detect similar images. Specifically, the two images to be detected can be hashed by a hash algorithm to obtain the hash value of the two images, and then the Hamming distance between the two images based on the hash value can be calculated. If the Hamming distance is less than the threshold, it is determined that the two images are similar images. If the Hamming distance based on the hash value is greater than or equal to the threshold, it is determined that the two images are not similar images.
发明内容Summary of the invention
本申请实施例提供了一种相似图像检测方法、装置、设备及存储介质。所述技术方案如下:The embodiments of the present application provide a similar image detection method, device, equipment, and storage medium. The technical solution is as follows:
一方面,本申请实施例提供了一种相似图像检测方法,所述方法包括:On the one hand, an embodiment of the present application provides a similar image detection method, and the method includes:
获取待检测的多张图像;Acquire multiple images to be detected;
将所述多张图像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;Taking the multiple images as input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
将所述多张图像的特征向量作为支持向量机SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相 似性度量值;Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and measuring the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images;
基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。Based on the similarity measurement values of the multiple images, similar image detection is performed on the multiple images.
另一方面,提供了一种相似图像检测装置,所述装置包括:In another aspect, a similar image detection device is provided, and the device includes:
第一获取模块,用于获取待检测的多张图像;The first acquisition module is used to acquire multiple images to be detected;
第一提取模块,用于将所述多张图像作为CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;The first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
度量模块,用于将所述多张图像的特征向量作为SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相似性度量值;The measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ;
检测模块,用于基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。The detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
另一方面,提供了一种电子设备,所述电子设备包括处理器和存储器;所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现上述相似图像检测方法。In another aspect, an electronic device is provided, the electronic device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement the above-mentioned similar image detection method .
另一方面,提供了一种计算机可读存储介质,所述存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现上述相似图像检测方法。In another aspect, a computer-readable storage medium is provided, the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
另一方面,还提供了一种计算机程序产品,该计算机程序产品存储有至少一条指令,所述至少一条指令用于被处理器执行以实现上述相似图像检测方法。On the other hand, a computer program product is also provided, and the computer program product stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
附图说明Description of the drawings
图1是本申请实施例提供的一种模型训练方法的流程图;Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application;
图2是本申请实施例提供的一种VGGNet-16模型的网络结构示意图;FIG. 2 is a schematic diagram of the network structure of a VGGNet-16 model provided by an embodiment of the present application;
图3是本申请实施例提供的一种相似图像检测过程流程图;FIG. 3 is a flowchart of a similar image detection process provided by an embodiment of the present application;
图4是本申请实施例提供的一种相似图像检测方法的流程图;FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application;
图5是本申请实施例提供的另一种相似图像检测方法的流程图;FIG. 5 is a flowchart of another similar image detection method provided by an embodiment of the present application;
图6是本申请实施例提供的一种相似图像检测装置的框图;Fig. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application;
图7是本申请实施例提供的一种电子设备的结构示意图。Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。The "plurality" mentioned herein means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.
在对本申请实施例提供的相似图像检测方法进行详细说明之前,先对申请实施例的应用场景进行介绍。Before describing in detail the similar image detection method provided by the embodiment of the application, the application scenario of the embodiment of the application will be introduced first.
本申请实施例提供的相似图像检测方法应用于计算机视觉领域中,具体应用于对图像进行相似性检测的场景中,以从多张图像中检测出相似图像。相似图像检测在计算机视觉领域中是一个非常重要的基础问题,很多任务结果的好坏都依赖于相似度度量的好坏。The similar image detection method provided by the embodiments of the present application is applied in the field of computer vision, and is specifically applied in a scene where similarity detection is performed on images, so as to detect similar images from multiple images. Similar image detection is a very important basic problem in the field of computer vision. The quality of many tasks depends on the quality of the similarity measure.
作为一个示例,可以将本申请实施例提供的相似图像检测方法推广到相册应用上,比如,通过本申请实施例提供的相似图像检测方法从相册中检测出相似图像,然后将相似图像推荐给用户进行删除清理,以帮助用户更好地管理相册。当然,该相似图像检测方法还可以应用于其他场景中,本申请实施例对此不做限定。As an example, the similar image detection method provided in the embodiment of this application can be extended to the album application. For example, the similar image detection method provided in the embodiment of this application detects similar images from the album, and then recommends the similar images to the user Delete and clean up to help users better manage albums. Of course, the similar image detection method can also be applied to other scenarios, which is not limited in the embodiment of the present application.
接下来,对本申请实施例涉及的实施环境进行介绍。Next, the implementation environment involved in the embodiments of the present application will be introduced.
本申请实施例提供的相似图像检测方法可以应用于相似图像检测装置,该相似图像检测装置可以为终端或服务器等电子设备,终端可以为手机、平板电脑或计算机等,服务器可以为应用的后台服务器。The similar image detection method provided by the embodiments of the application can be applied to a similar image detection device. The similar image detection device can be an electronic device such as a terminal or a server. The terminal can be a mobile phone, a tablet, or a computer. .
本申请实施例提供了一种相似图像检测方法,所述方法包括:The embodiment of the present application provides a similar image detection method, and the method includes:
获取待检测的多张图像;Acquire multiple images to be detected;
将所述多张图像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;Taking the multiple images as input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
将所述多张图像的特征向量作为支持向量机SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相似性度量值;Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and performing similarity measurement on the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images;
基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。Based on the similarity measurement values of the multiple images, similar image detection is performed on the multiple images.
可选地,所述CNN模型为视觉几何组网络VGGNet模型,且所述VGGNet模型是基于样本数据集进行训练得到,所述样本数据集包括多种类别的样本图像和每张样本图像的类别标记信息。Optionally, the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
可选地,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量,包括:Optionally, the measuring the similarity of the feature vectors of the multiple images by the SVM model includes:
通过所述SVM模型的分类函数,确定每张图像的分类函数值,将每张图像的分类函数值作为每张图像的相似性度量值;其中,所述分类函数用于表征对图像进行分类的分类超平面,每张图像的分类函数值用于表征每张图像与所述分类超平面之间的距离。Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
可选地,所述基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测,包括:Optionally, the performing similar image detection on the multiple images based on the similarity metric values of the multiple images includes:
将所述多张图像中相似性度量值位于同一度量值范围内的至少两张图像确定为相似图像;Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;
将所述相似图像确定为同一类别的图像。The similar images are determined to be images of the same category.
可选地,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量之前,还包括:Optionally, before the similarity measurement is performed on the feature vectors of the multiple images by the SVM model, the method further includes:
获取第一样本数据集,所述第一样本数据集包括多种类别的第一样本图像和每张第一样本图像的类别标记信息;Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;
将所述多种类别的第一样本图像作为所述CNN模型的输入,通过所述CNN模型对所述多种类别的第一样本图像进行特征提取,得到所述多种类别的第一样本图像的特征向量;The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;
根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型。According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.
可选地,所述根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型,包括:Optionally, the training the SVM model to be trained based on the feature vector of the first sample images of the multiple categories and the category label information of each first sample image to obtain the SVM model includes:
根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对目标函数进行求解,得到所述SVM模型的分类函数;其中,所述目标函数用于指示所述多种类别的第一样本图像中不同类别的第一样本图像之间的间隔最大。According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
可选地,所述获取第一样本数据集之后,还包括:Optionally, after obtaining the first sample data set, the method further includes:
对所述第一样本数据集进行预处理,得到预处理后的第一样本数据集;Preprocessing the first sample data set to obtain a preprocessed first sample data set;
其中,所述预处理后的第一样本数据集用于作为所述CNN模型的输入,所述预处理包括以下至少一项:格式调整、去重和数据增强处理。Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
可选地,所述将所述多张图像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取之前,还包括:Optionally, before the using the multiple images as the input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model, the method further includes:
根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型,所述第二样本数据集包括多种类别的第二样本图像和每张第二样本图像的类别标记信息;Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;
根据第三样本数据集对所述初始化CNN模型进行训练,得到所述CNN模型,所述第三样本数据集包括多种类别的第三样本图像和每张第三样本图像的类别标记信息。The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.
可选地,所述第三样本数据集包括第一子样本数据集和第二子样本数据集,所述第一子样本数据集包括M张第三样本图像,且所述M张第三样本图像属于S个类别,所述第二子样本数据集包括N张第三样本图像,且所述N张第三样本图像属于T个类别,所述M张第三样本图像与所述N张第三样本图像不同,所述M、S、N、T为正整数。Optionally, the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples The images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The M third sample images are related to the N third sample images. The three sample images are different, and the M, S, N, and T are positive integers.
本申请实施例中提供的相似图像检测方法是一种基于深度学习的相似图像检测方法,检测过程中需要使用CNN(Convolutional Neural Networks,卷积神经网络)模型进行特征提取,以及使用SVM(Support Vector Machine,支持向量机)模型进行相似性度量,为了便于理解,接下来,先对CNN模型和SVM模型的模型训练方法进行介绍。The similar image detection method provided in the embodiments of this application is a similar image detection method based on deep learning. The detection process requires the use of CNN (Convolutional Neural Networks, convolutional neural network) model for feature extraction, and the use of SVM (Support Vector) Machine, support vector machine) model for similarity measurement, in order to facilitate understanding, next, first introduce the model training methods of the CNN model and the SVM model.
图1是本申请实施例提供的一种模型训练方法的流程图,该方法应用于相似图像检测装置中,该装置可以为终端或服务器等电子设备,如图1所示,该方法包括如下步骤:Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application. The method is applied to a similar image detection device. The device may be an electronic device such as a terminal or a server. As shown in Fig. 1, the method includes the following steps :
步骤101:获取第二样本数据集和第三样本数据集,第二样本数据集包括多种类别的第二样本图像和每张第二样本图像的类别标记信息,第三样本数据集包括多种类别的第三样本图像和每张第三样本图像的类别标记信息。Step 101: Obtain a second sample data set and a third sample data set. The second sample data set includes multiple types of second sample images and the category label information of each second sample image. The third sample data set includes multiple types The category of the third sample image and the category label information of each third sample image.
其中,第二样本数据集和第三样本数据集为预先获取的能够满足模型训练需求的样本数据集。类别标记信息用于指示对应图像的类别。作为一个示例, 多种类别的第二样本图像和多种类别的第三样本图像的类别不同。Among them, the second sample data set and the third sample data set are pre-acquired sample data sets that can meet the requirements of model training. The category tag information is used to indicate the category of the corresponding image. As an example, the second sample images of multiple categories and the third sample images of multiple categories have different categories.
作为一个示例,第二样本数据集和第三样本数据集为网络图像数据集。比如,第二样本数据集为ImageNet(图像网络)数据集,第三样本数据集为商标图像数据集。其中,ImageNet数据集总计包含1400多万张,总计2万多类图像,ImageNet数据集是目前比较常用的一个数据集,在该数据集上可以进行图像分类、目标定位、目标检测等研究工作。商标图像数据集包括多种类别的商标图像。As an example, the second sample data set and the third sample data set are network image data sets. For example, the second sample data set is the ImageNet (image network) data set, and the third sample data set is the trademark image data set. Among them, the ImageNet data set contains a total of more than 14 million images and a total of more than 20,000 images. The ImageNet data set is currently a more commonly used data set. On this data set, research work such as image classification, target positioning, and target detection can be carried out. The trademark image data set includes various types of trademark images.
作为一个示例,第三样本数据集包括两个不同的子样本数据集,比如,第三样本数据集包括第一子样本数据集和第二子样本数据集。其中,第一子样本数据集包括M张第三样本图像,且M张第三样本图像属于S个类别,第二子样本数据集包括N张第三样本图像,且N张第三样本图像属于T个类别,M张第三样本图像与N张第三样本图像不同。其中,M和N均为正整数,S和T也为正整数。As an example, the third sample data set includes two different sub-sample data sets, for example, the third sample data set includes a first sub-sample data set and a second sub-sample data set. Among them, the first subsample data set includes M third sample images, and M third sample images belong to S categories, the second subsample data set includes N third sample images, and N third sample images belong to For T categories, M third sample images are different from N third sample images. Among them, M and N are both positive integers, and S and T are also positive integers.
作为一个示例,若第三样本数据集为商标图像数据集,则第一子样本数据集可以为Logo(商标)-405数据集,第二子样本数据集可以为FlickrLogo(福林克商标)-32数据集。其中,Logo-405数据集来自于互联网爬虫,包括32218张图像,总共搜集整理了包含各大奢侈品商标在内的405类商标图像数据,每一类商标数据的数量从几十到一百多不等,每张商标图像的大小为300*500左右。FlickrLogo-32数据集来自于网上公开的数据,该数据集总共包含包括各大互联网公司的商标在内的32类商标图片,总计8240张图像。As an example, if the third sample data set is a trademark image data set, the first sub-sample data set may be the Logo (trademark)-405 data set, and the second sub-sample data set may be FlickrLogo (福link trademark)- 32 data sets. Among them, the Logo-405 data set comes from Internet crawlers, including 32218 images. A total of 405 types of trademark image data including major luxury trademarks are collected. The number of each type of trademark data ranges from tens to more than 100. Varying, the size of each trademark image is about 300*500. The FlickrLogo-32 data set comes from data publicly available on the Internet. The data set contains a total of 32 types of trademark images including trademarks of major Internet companies, with a total of 8,240 images.
在可能的实现方式中,对于获取的第二样本数据集和第三样本数据集,可以对这两个样本数据集中的图像进行预处理。示例性地,可以通过如下方式对第二样本数据集和第三样本数据集进行预处理:In a possible implementation manner, for the acquired second sample data set and third sample data set, preprocessing may be performed on the images in the two sample data sets. Exemplarily, the second sample data set and the third sample data set can be preprocessed in the following manner:
作为一个示例,对于获取的第二样本数据集和第三样本数据集,还可以对这两个样本数据集中的图像进行格式调整、去重、重命名。As an example, for the acquired second sample data set and third sample data set, the images in the two sample data sets may also be formatted, deduplicated, and renamed.
作为一个示例,对于获取的第二样本数据集和第三样本数据集,还可以对这两个样本数据集进行数据增强处理。其中,该数据增强处理包括缩放、添加噪声、旋转和归一化处理中的至少一种。As an example, for the acquired second sample data set and third sample data set, data enhancement processing may also be performed on these two sample data sets. Wherein, the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
比如,对于获取的第二样本数据集和第三样本数据集,可以对任一样本数据集中的图像按比例进行缩放,以保证该样本数据集中的图像的尺度统一。由于实际图像的尺度往往不尽相同,而在训练集中要保证尺度统一,因此采用按 比例缩放的方法可以将样本数据集中的图像的尺度调整统一,进而能让模型学习并识别这些图像。For example, for the acquired second sample data set and third sample data set, the images in any sample data set may be scaled proportionally to ensure that the scales of the images in the sample data set are uniform. Since the scales of actual images are often different, and the scales must be uniform in the training set, the scaling method can be used to adjust the scales of the images in the sample data set to be uniform, so that the model can learn and recognize these images.
再比如,对于获取的第二样本数据集和第三样本数据集,可以在任一样本数据集中的图像随机添加噪声。由于实际图像常常包含很多噪声,而样本数据集中的样本图像通常较为干净,如果直接进行训练,将导致模型对噪声的鲁棒性很差,甚至在图像检测过程中,如果待检测图像上有几个像素点的噪声,模型都会识别错误,因此为了让模型对噪声更加鲁棒,本申请实施例可以为训练图像随机添加噪声。For another example, for the acquired second sample data set and third sample data set, noise can be randomly added to the images in any sample data set. Since the actual image often contains a lot of noise, and the sample images in the sample data set are usually relatively clean, if the training is carried out directly, the robustness of the model to noise will be poor, and even during the image detection process, if there are several images on the image to be detected. For the noise of each pixel, the model will recognize errors. Therefore, in order to make the model more robust to noise, the embodiment of the present application may randomly add noise to the training image.
再比如,对于获取的第二样本数据集和第三样本数据集,可以从任一样本数据集中选择图像,然后对选择的图像进行旋转,并将旋转后的图像增加到该样本数据集中,以增加样本数据集的数据量。其中,采用的选择方法可以为随机选择方法,也可以为其他预设选择方法,本申请实施例对此不做限定。另外,在对选择的图像进行旋转之后,还可以对转出显示区域的图像进行裁剪,以保持图像尺度的统一。For another example, for the acquired second sample data set and third sample data set, you can select an image from any sample data set, then rotate the selected image, and add the rotated image to the sample data set to Increase the data volume of the sample data set. The selection method used may be a random selection method or other preset selection methods, which is not limited in the embodiment of the present application. In addition, after the selected image is rotated, the image transferred out of the display area can be cropped to maintain the uniformity of the image scale.
再比如,对于获取的第二样本数据集和第三样本数据集,可以对任一样本数据集中的样本图像进行归一化处理,以去除冗余信息。作为一个示例,该归一化处理包括:将样本图像像素值从[0,255]归一化到[0,1],以去除待训练样本数据中包含的冗余信息,进一步缩短训练时间。For another example, for the acquired second sample data set and third sample data set, the sample images in any sample data set may be normalized to remove redundant information. As an example, the normalization processing includes: normalizing the pixel value of the sample image from [0, 255] to [0, 1] to remove redundant information contained in the sample data to be trained and further shorten the training time.
作为一个示例,还可以获取第四样本数据集和第五样本数据集,第四样本数据集合和第五样本数据集均包括多种类别的样本图像以及每张样本图像的类别标记信息,且第四样本数据集和第五样本数据集中的样本图像不同。比如,第四样本数据集为ImageNet数据集,第五样本数据集为商标图像数据集。As an example, a fourth sample data set and a fifth sample data set can also be obtained. Both the fourth sample data set and the fifth sample data set include sample images of multiple categories and the category label information of each sample image. The sample images in the fourth sample data set and the fifth sample data set are different. For example, the fourth sample data set is the ImageNet data set, and the fifth sample data set is the trademark image data set.
然后,对第四样本数据集合中的样本图像进行预处理,并将预处理后的第四样本数据集划分为训练集和测试集,将训练集作为第二样本数据集。以及,对第五样本数据集中的样本图像进行预处理,并将预处理后的第五样本数据集划分为训练集和测试集,将训练集作为第三样本数据集。Then, preprocess the sample images in the fourth sample data set, divide the preprocessed fourth sample data set into a training set and a test set, and use the training set as the second sample data set. And, preprocess the sample images in the fifth sample data set, divide the preprocessed fifth sample data set into a training set and a test set, and use the training set as the third sample data set.
其中,训练集用于进行CNN模型的训练,测试集用于进行CNN模型的测试。预处理包括格式调整、去重和数据增强处理中的至少一种,数据增强处理包括缩放、添加噪声、旋转和归一化处理中的至少一种。Among them, the training set is used to train the CNN model, and the test set is used to test the CNN model. The preprocessing includes at least one of format adjustment, deduplication, and data enhancement processing, and the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
作为一个示例,可以按比例将预处理后的第四样本数据集划分为训练集和测试集,以及按比例将预处理后的第五样本数据集划分为训练集和测试集。在 可能的实现方式中,训练集和测试集的比例可以为8:2。As an example, the preprocessed fourth sample data set can be divided into a training set and a test set proportionally, and the preprocessed fifth sample data set can be divided into a training set and a test set proportionally. In a possible implementation, the ratio of training set to test set can be 8:2.
步骤102:根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型。Step 102: Perform pre-training on the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.
作为一个示例,CNN模型可以为VGGNet(Visual Geometry Group Network,视觉几何组网络)模型。VGGNet模型是由牛津大学的视觉几何组提出的一种深度卷积神经网络模型,可以将Top-5错误率降到7.3%。As an example, the CNN model may be a VGGNet (Visual Geometry Group Network) model. The VGGNet model is a deep convolutional neural network model proposed by the Vision Geometry Group of Oxford University, which can reduce the Top-5 error rate to 7.3%.
其中,VGGNet模型主要具有如下特点:1)整个网络结构中的卷积层全部采用3*3的卷积核;2)网络结构中用两个3*3的卷积层替代一个传统的5*5的卷积层,三个3*3的卷积层替代一个传统的7*7的卷积层,增加了网络的非线性表达能力;3)多个3*3的卷积核比一个大尺寸的卷积核的参数要少,减少了整体网络的参数。Among them, the VGGNet model mainly has the following characteristics: 1) All convolutional layers in the entire network structure use 3*3 convolution kernels; 2) In the network structure, two 3*3 convolutional layers are used to replace a traditional 5* 5 convolutional layers, three 3*3 convolutional layers replace a traditional 7*7 convolutional layer, which increases the nonlinear expression ability of the network; 3) Multiple 3*3 convolution kernels are larger than one The size of the convolution kernel has fewer parameters, which reduces the overall network parameters.
作为一个示例,该VGGNet模型可以为19层的VGGNet-19模型,也可以为16层的VGGNet-16模型。请参考图2,图2是本申请实施例提供的一种VGG Net-16模型的网络结构示意图,如图2所示,该VGGNet-16模型包括13层卷积层和3层全连接层。As an example, the VGGNet model may be a 19-layer VGGNet-19 model or a 16-layer VGGNet-16 model. Please refer to FIG. 2. FIG. 2 is a schematic diagram of the network structure of a VGG Net-16 model provided by an embodiment of the present application. As shown in FIG. 2, the VGGNet-16 model includes 13 layers of convolutional layers and 3 layers of fully connected layers.
作为一个示例,该VGGNet-16模型除了使用更多的层数之外,网络结构中所有的卷积层都具有同样大小的卷积核,卷积核的尺寸为3*3,是能够捕获上下左右和中心信息的最小尺寸窗口,并且每个3*3的卷积层有一个像素的填充,保证卷积以后的输入和输出大小保持一致。As an example, in addition to using more layers in the VGGNet-16 model, all convolutional layers in the network structure have the same size convolution kernel. The size of the convolution kernel is 3*3, which can capture the upper and lower layers. The minimum size window for the left, right and center information, and each 3*3 convolutional layer is filled with one pixel to ensure that the input and output sizes after convolution remain the same.
通过根据第二样本数据集对待训练CNN模型进行预训练,可以得到能够对第二样本数据集中的图像进行分类的初始化CNN模型。By pre-training the CNN model to be trained according to the second sample data set, an initialized CNN model that can classify images in the second sample data set can be obtained.
其中,该VGGNet模型是基于样本数据集进行训练得到,该样本数据集包括多种类别的样本图像和每张样本图像的类别标记信息。比如,先根据第二样本数据集对待训练VGGNet模型进行预训练,得到初始化VGGNet模型,然后根据第三样本数据集对该初始化VGGNet模型进行训练,得到该VGGNet模型。Among them, the VGGNet model is obtained by training based on a sample data set, and the sample data set includes sample images of multiple categories and category label information of each sample image. For example, the VGGNet model to be trained is pre-trained according to the second sample data set to obtain the initial VGGNet model, and then the initial VGGNet model is trained according to the third sample data set to obtain the VGGNet model.
作为一个示例,可以先利用ImageNet数据集对VGGNe-16模型进行训练,得到在自然图像上预训练的深度学习分类模型,并且将训练好的深度学习分类模型作为下一步训练商标图像分类模型的初始化模型。As an example, you can first use the ImageNet dataset to train the VGGNe-16 model to obtain a deep learning classification model pre-trained on natural images, and use the trained deep learning classification model as the initialization for the next training trademark image classification model model.
步骤103:根据第三样本数据集对该初始化CNN模型进行训练,得到CNN模型。Step 103: Train the initialized CNN model according to the third sample data set to obtain the CNN model.
其中,该CNN模型用于对任一图像进行特征提取,得到该图像的特征向量。Among them, the CNN model is used to perform feature extraction on any image to obtain the feature vector of the image.
在根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型之后,可以进一步根据第三样本数据集对该初始化CNN模型进行训练,得到CNN模型,并将该CNN模型作为对图像进行特征提取的特征提取模型。After pre-training the CNN model to be trained based on the second sample data set and obtaining the initialized CNN model, you can further train the initialized CNN model based on the third sample data set to obtain the CNN model, and use the CNN model as a pair of images Feature extraction model for feature extraction.
作为一个示例,利用在ImageNet数据集上预训练的VGG-16模型作为初始化模型,根据商标图像对上述初始化模型进行训练,得到训练好的VGG-16模型,再将训练好的VGG-16模型作为特征提取器。As an example, use the VGG-16 model pre-trained on the ImageNet dataset as the initialization model, and train the above initialization model according to the trademark image to obtain the trained VGG-16 model, and then use the trained VGG-16 model as Feature extractor.
作为一个示例,训练过程中可以将训练集输入CNN模型进行训练,并迭代预设次数。其中,该预设次数可以预先设置,比如,该预设次数可以为80次、90次或100次等。As an example, during the training process, the training set can be input to the CNN model for training, and iterate a preset number of times. Wherein, the preset number of times can be preset, for example, the preset number of times can be 80 times, 90 times, 100 times, and so on.
作为一个示例,每一次迭代计算过程中可以使用梯度下降算法优化目标函数,使得模型达到收敛。作为一个示例,该梯度下降算法可以为Adam(Adaptive Moment Estimation,自适应时刻估计)梯度下降算法,Adam梯度下降算法是一种高效计算方法,可以提高梯度下降收敛速度。示例的,可以预先设置Adam梯度下降算法的批处理样本大小batch_size,示例的,可以将batch_size设置为32。As an example, the gradient descent algorithm can be used to optimize the objective function during each iterative calculation process, so that the model reaches convergence. As an example, the gradient descent algorithm may be an Adam (Adaptive Moment Estimation) gradient descent algorithm. The Adam gradient descent algorithm is an efficient calculation method that can improve the convergence speed of the gradient descent. For example, the batch sample size batch_size of the Adam gradient descent algorithm can be set in advance. For example, batch_size can be set to 32.
步骤104:获取第一样本数据集,该第一样本数据集包括多种类别的第一样本图像和每张第一样本图像的类别标记信息。Step 104: Obtain a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image.
其中,该第一样本数据集可以为上述第一子样本数据集和/或第二子样本数据集,也可以为除上述第一子样本数据集和第二子样本数据集之外的其他样本数据集。Wherein, the first sample data set may be the above-mentioned first sub-sample data set and/or the second sub-sample data set, or may be other than the above-mentioned first sub-sample data set and the second sub-sample data set. Sample data set.
作为一个示例,该第一样本数据集为上述第三样本数据集,比如,该第一样本数据集为商标图像数据集。示例的,第一样本数据集为Logo-405数据集和FlickrLogo-32数据集。As an example, the first sample data set is the aforementioned third sample data set, for example, the first sample data set is a trademark image data set. For example, the first sample data set is the Logo-405 data set and the FlickrLogo-32 data set.
步骤105:将该多种类别的第一样本图像作为该CNN模型的输入,通过该CNN模型对该多种类别的第一样本图像进行特征提取,得到该多种类别的第一样本图像的特征向量。Step 105: Use the first sample image of the multiple categories as the input of the CNN model, and perform feature extraction on the first sample image of the multiple categories through the CNN model to obtain the first sample of the multiple categories The feature vector of the image.
其中,该特征向量用于表征样本图像在各个维度上的特征。作为一个示例,该特征向量可以为维度为4096的特征向量。Among them, the feature vector is used to characterize the features of the sample image in various dimensions. As an example, the feature vector may be a feature vector with a dimension of 4096.
作为一个示例,可以通过该CNN模型,分别对商标数据集Logo-405和FlickrLogo-32进行提取特征,得到每张图像的特征向量,最终得到每张商标图像经过VGG-16模型进行特征提取后的深度表征形式。As an example, the CNN model can be used to extract features from the trademark datasets Logo-405 and FlickrLogo-32 respectively, to obtain the feature vector of each image, and finally to obtain the feature extraction of each trademark image through the VGG-16 model Depth representation form.
本申请实施例中,利用CNN模型进行特征提取,CNN模型可以在训练过程中自动学习图像的特征,不需要手工设计以及人为干预特征的学习,提高了特征提取效率和准确度。In the embodiments of the present application, the CNN model is used for feature extraction. The CNN model can automatically learn the features of the image during the training process, without manual design and human intervention in feature learning, which improves the efficiency and accuracy of feature extraction.
在可能的实现方式中,当电子设备获取到第一样本数据集后,对第一样本数据集进行预处理,得到预处理后的第一样本数据。预处理后的第一样本数据集用于作为CNN模型的输入,预处理包括以下至少一项:格式调整、去重和数据增强处理。将预处理后的第一样本数据集作为该CNN模型的输入,通过该CNN模型对该预处理后的第一样本数据集进行特征提取,得到预处理后的第一样本数据集的特征向量。In a possible implementation manner, after the electronic device obtains the first sample data set, the first sample data set is preprocessed to obtain the preprocessed first sample data. The preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing. Use the preprocessed first sample data set as the input of the CNN model, and perform feature extraction on the preprocessed first sample data set through the CNN model to obtain the preprocessed first sample data set Feature vector.
步骤106:根据该多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到该SVM模型。Step 106: According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, train the SVM model to be trained to obtain the SVM model.
作为一个示例,根据该多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练的操作包括:根据多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对目标函数进行求解,得到SVM模型的分类函数;其中,该目标函数用于指示该多种类别的第一样本图像中不同类别的第一样本图像之间的间隔最大。As an example, according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, the operation of training the SVM model to be trained includes: according to the first sample of multiple categories The feature vector of the image and the category label information of each first sample image are solved for the objective function to obtain the classification function of the SVM model; wherein, the objective function is used to indicate that the multiple categories of the first sample images are different The interval between the first sample images of the category is the largest.
支持向量机算法的基本思想是,首先将输入样本空间变换到一个高维空间,然后在这个新的高维特征空间中寻找最优分类超平面使不同类别的样本点之间的间隔最大,该分类超平面即为最大间隔超平面。The basic idea of the support vector machine algorithm is to first transform the input sample space into a high-dimensional space, and then find the optimal classification hyperplane in this new high-dimensional feature space to maximize the interval between sample points of different categories. The classification hyperplane is the maximum separation hyperplane.
作为一个示例,假设待训练SVM模型的训练集D={(x 1,y 1),(x 2,y 2),...,(x n,y n)},
Figure PCTCN2020138510-appb-000001
y i∈{-1,1}。其中,x代表第一样本图像的特征向量,y代表第一样本图像的类别标记信息,则该待训练SVM模型的目标函数可以为:
As an example, suppose the training set D of the SVM model to be trained={(x 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n )},
Figure PCTCN2020138510-appb-000001
y i ∈{-1,1}. Where x represents the feature vector of the first sample image, and y represents the category label information of the first sample image, then the objective function of the SVM model to be trained can be:
Figure PCTCN2020138510-appb-000002
Figure PCTCN2020138510-appb-000002
其中,ω为法向量,决定了分类超平面的方向,b为位移量,决定了分类超平面与原点的距离。Among them, ω is the normal vector, which determines the direction of the classification hyperplane, and b is the displacement, which determines the distance between the classification hyperplane and the origin.
作为一个示例,该待训练SVM模型的约束条件为:As an example, the constraints of the SVM model to be trained are:
s.t.y iTx i+b)≥1,i=1,...n      (2) sty iT x i +b)≥1, i=1,...n (2)
将拉格朗日乘子α引入目标函数,得到的目标函数为:The Lagrange multiplier α is introduced into the objective function, and the objective function obtained is:
Figure PCTCN2020138510-appb-000003
Figure PCTCN2020138510-appb-000003
对目标函数进行求解,也即是对目标函数求导并求解最优分类面,得到的分类函数表达式为:Solving the objective function, that is, deriving the objective function and solving the optimal classification surface, the resulting classification function expression is:
Figure PCTCN2020138510-appb-000004
Figure PCTCN2020138510-appb-000004
假设数据是线性可分的,那么就可以找到一个超平面将不同类别的数据完全分开。作为一个示例,待训练SVM模型还可以为基于核函数(Kernel Function)的SVM模型,对于线性不可分问题,待训练SVM模型还可以借助核函数来求解分类函数。其中,核函数也是一种特殊的相似度量函数,而且不同的核函数就代表着不同的相似性度量。Assuming that the data is linearly separable, then a hyperplane can be found to completely separate the data of different categories. As an example, the SVM model to be trained may also be a kernel function-based SVM model. For linear inseparable problems, the SVM model to be trained may also use the kernel function to solve the classification function. Among them, the kernel function is also a special similarity measure function, and different kernel functions represent different similarity measures.
在对CNN模型和SVM模型训练完成之后,还可以将测试集作为CNN模型的输入,提取测试集中每张图像的特征向量,再将测试集中每张图像的特征向量作为SVM模型的输入进行测试,验证模型的准确性。如果CNN模型和SVM模型通过验证,则训练完成,如果未通过验证,则继续根据训练集对CNN模型和SVM模型进行训练,直至训练的模型能够通过验证为止。After training the CNN model and SVM model, you can also use the test set as the input of the CNN model, extract the feature vector of each image in the test set, and then use the feature vector of each image in the test set as the input of the SVM model for testing. Verify the accuracy of the model. If the CNN model and the SVM model pass the verification, the training is completed. If the verification fails, the training of the CNN model and the SVM model will continue according to the training set until the trained model can pass the verification.
请参考图3,图3是本申请实施例提供的一种相似图像检测过程流程图,如图3所示,可以先构建样本数据集,然后对样本数据集中的样本图像进行预处理,再将预处理后的样本数据集按比例划分为训练集和测试集,训练集用于训练CNN模型和SVM模型,测试集用于验证训练的CNN模型和SVM模型的准确性。然后,基于训练集和测试集,构建基于VGG-16的图片相似性特征提取模型即CNN模型,再使用SVM模型对CNN模型提取的特征向量进行分类,完成图像的相似性检测。Please refer to FIG. 3, which is a flow chart of a similar image detection process provided by an embodiment of the present application. As shown in FIG. 3, a sample data set can be constructed first, and then the sample images in the sample data set can be preprocessed, and then The preprocessed sample data set is divided into a training set and a test set proportionally. The training set is used to train the CNN model and the SVM model, and the test set is used to verify the accuracy of the trained CNN model and the SVM model. Then, based on the training set and the test set, the VGG-16-based image similarity feature extraction model, namely the CNN model, is constructed, and then the SVM model is used to classify the feature vectors extracted by the CNN model to complete the image similarity detection.
作为一个示例,本申请实施例可以使用TensorFlow(一种开放源代码软件库)定义卷积神经网络模型。借助TensorFlow的灵活架构,用户可以轻松地将计算工作部署到多种平台和设备。其中,该多种平台包括CPU(Central Processing Unit,中央处理器)、GPU(Graphics Processing Unit,图形处理器)和TPU(Tensor Processing Unit,张量处理单元),该多种设备可以包括桌面设备、服务器集群、移动设备和边缘设备等。示例性地,桌面设备包括台式电脑等;服务器集群包括一个或者多个服务器;移动设备包括手机、平板电脑、智能可穿戴设备等;边缘设备是指安装在边缘网络上的交换机、路由器、路由交换机、IAD(Integrated Access Device,综合接入设备)以及各种MAN(Metropolitan Area Network,都会网域)/WAN(Wide Area Network,广域网)设备等。As an example, the embodiment of the present application may use TensorFlow (an open source software library) to define a convolutional neural network model. With TensorFlow's flexible architecture, users can easily deploy computing tasks to multiple platforms and devices. Among them, the multiple platforms include CPU (Central Processing Unit), GPU (Graphics Processing Unit, graphics processing unit), and TPU (Tensor Processing Unit, tensor processing unit). The multiple devices may include desktop devices, Server clusters, mobile devices, edge devices, etc. Exemplarily, desktop devices include desktop computers, etc.; server clusters include one or more servers; mobile devices include mobile phones, tablets, smart wearable devices, etc.; edge devices refer to switches, routers, and routing switches installed on edge networks , IAD (Integrated Access Device), and various MAN (Metropolitan Area Network)/WAN (Wide Area Network) devices, etc.
在对CNN模型和SVM模型训练完成之后,即可根据训练好的CNN模型和SVM模型进行相似图像检测。接下来,对本申请实施例的相似图像检测过程进行详细介绍。After the training of the CNN model and SVM model is completed, similar image detection can be performed according to the trained CNN model and SVM model. Next, the similar image detection process in the embodiment of the present application will be introduced in detail.
图4是本申请实施例提供的一种相似图像检测方法的流程图,该方法应用于相似图像检测装置中,如图4所示,该方法包括如下步骤:FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application. The method is applied to a similar image detection device. As shown in FIG. 4, the method includes the following steps:
步骤401:获取待检测的多张图像。Step 401: Acquire multiple images to be detected.
其中,待检测的多张图像可以由用户上传得到,可以从装置的存储空间中获取得到,可以由其他设备发送得到,也可以从网络中获取得到,本申请实施例对待检测的多张图像的获取方式不做限定。Among them, the multiple images to be detected can be uploaded by the user, can be obtained from the storage space of the device, can be sent by other devices, or can be obtained from the network. The method of obtaining is not limited.
作为一个示例,可以获取终端的相册中存储的图像,将相册中存储的图像作为待检测的多张图像。As an example, the images stored in the album of the terminal may be acquired, and the images stored in the album may be used as multiple images to be detected.
步骤402:将该多张图像作为CNN模型的输入,通过CNN模型对多张图像进行特征提取,得到多张图像的特征向量。Step 402: Use the multiple images as the input of the CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images.
作为一个示例,该CNN模型为VGGNet模型。比如,该CNN模型为VGGNet-16模型或VGGNet-19模型。As an example, the CNN model is a VGGNet model. For example, the CNN model is a VGGNet-16 model or a VGGNet-19 model.
作为一个示例,在将该多张图像作为CNN模型的输入进行特征提取之前,还可以先对该多张图像进行预处理,再将预处理后的多张图像作为CNN模型的输入,通过CNN模型进行特征提取。其中,该预处理包括格式调整、去重和数据增强中的至少一种。As an example, before the multiple images are used as the input of the CNN model for feature extraction, the multiple images can also be preprocessed, and then the preprocessed multiple images can be used as the input of the CNN model, through the CNN model Perform feature extraction. Wherein, the preprocessing includes at least one of format adjustment, deduplication, and data enhancement.
步骤403:将该多张图像的特征向量作为SVM模型的输入,通过SVM模型对该多张图像的特征向量进行相似性度量,得到该多张图像的相似性度量值。Step 403: Use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images.
作为一个示例,可以通过该SVM模型的分类函数,确定每张图像的分类函数值,将每张图像的分类函数值作为每张图像的相似性度量值。As an example, the classification function value of each image can be determined through the classification function of the SVM model, and the classification function value of each image can be used as the similarity measurement value of each image.
其中,该分类函数用于表征对图像进行分类的分类超平面,每张图像的分类函数值用于表征每张图像与该分类超平面之间的距离。该多张图像中的任意两张图像的分类函数值越接近,说明这两个图像之间的距离越近,这两张图像之间的相似度越高。Wherein, the classification function is used to characterize the classification hyperplane for classifying images, and the classification function value of each image is used to characterize the distance between each image and the classification hyperplane. The closer the classification function values of any two images in the multiple images are, the closer the distance between the two images is, and the higher the similarity between the two images is.
需要说明的是,在相似度度量过程中,图像特征选取的好坏对于相似度度量的结果也有着非常重要的影,因此,图像特征提取的好坏以及度量算法的选取对图像相似性检测的结果都有着一定的影响。本申请实施例中,可以将图像 的特征向量与传统的模式识别算法相结合,通过度量图像特征之间的相似度来判断图像是否相似,完成检测任务,通过采用深度学习提取图像中的浅层和高层特征,有效的利用了图像中的信息,提高了检测的准确性。It should be noted that in the process of similarity measurement, the quality of image feature selection also has a very important impact on the result of similarity measurement. Therefore, the quality of image feature extraction and the selection of measurement algorithms have a significant impact on image similarity detection. The results all have a certain impact. In the embodiment of this application, the feature vector of the image can be combined with the traditional pattern recognition algorithm, the similarity between the image features can be measured to determine whether the image is similar, the detection task is completed, and the shallow layer in the image can be extracted by using deep learning. And high-level features, effectively use the information in the image, and improve the accuracy of detection.
步骤404:基于该多张图像的相似性度量值,对该多张图像进行相似图像检测。Step 404: Perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
作为一个示例,将该多张图像中相似性度量值位于同一度量值范围内的至少两张图像确定为相似图像,将该相似图像确定为同一类别的图像。As an example, at least two images in the plurality of images whose similarity metric values are within the same metric value range are determined to be similar images, and the similar images are determined to be images of the same category.
也即是,可以将多张图像的图像特征作为SVM模型的输入,通过SVM模型,根据多张图像的图像特征对这多张图像进行分类。其中,分为一类的图像即为相似图像。That is, the image features of multiple images can be used as the input of the SVM model, and the multiple images can be classified according to the image features of the multiple images through the SVM model. Among them, the images classified into one category are similar images.
作为一个示例,在将该相似图像确定为同一类别的图像之后,还可以将同一类别的图像与类别标签分别进行对应存储。As an example, after the similar images are determined as images of the same category, the images of the same category and the category tags may be stored correspondingly.
请参考图5,图5是本申请实施例提供的另一种相似图像检测方法的流程图,如图5所示,可以先获取待检测图像,然后对待检测图像进行预处理,再通过CNN模型对预处理后的图像进行特征提取,再通过SVM模型对提取的特征向量进行相似性度量,基于相似性度量结果进行相似图像检测。Please refer to Figure 5. Figure 5 is a flowchart of another similar image detection method provided by an embodiment of the present application. As shown in Figure 5, the image to be detected can be obtained first, and then the image to be detected can be preprocessed, and then passed through the CNN model Perform feature extraction on the pre-processed image, and then perform similarity measurement on the extracted feature vector through the SVM model, and perform similar image detection based on the similarity measurement result.
本申请实施例中,对于待检测的多张图像,通过先将这多张图像输入至CNN模型中,通过CNN模型进行特征提取,可以有效提取图像中的浅层和深层特征,从而可以有效利用图像中的信息,提高检测准确度,之后将提取的特征向量输入至SVM模型中,通过SVM模型对每张图像的特征向量进行相似性度量,可以根据图像的特征向量之间的相似来判断图像是否相似,进一步提高了检测准确度,同时减少了计算量,提高了检测效率。In the embodiment of this application, for multiple images to be detected, by first inputting the multiple images into the CNN model, and performing feature extraction through the CNN model, the shallow and deep features in the image can be effectively extracted, which can be effectively used The information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model. The similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
另外,申请实施例通过将深度学习的方法引入到图片相似性识别和检测中,通过构建卷积神经网络挖掘并学习样本图像中的信息,对不同的样本图像进行深度表征,然后通过传统模式识别算法度量特征之间的相似度,解决了大量相似图片的识别和检测问题。此外,本申请实施例将图像相似识别和检测问题巧妙的转化为图像分类问题,把深度学习模型作为特征提取器,结合传统的模式识别算法来度量特征之间的相似度,巧妙的避免了计算哈希均值带来的不准确性,将深度学习方法和传统模式识别方法相结合,不仅可以有效利用深度学习卷积神经网络迁移学习和自动学习图像特征的能力,而且将深度表征与传统模式识别算法相结合,进一步提高了相似图像识别和检测的准确率。In addition, the application embodiment introduces deep learning methods into image similarity recognition and detection, and builds a convolutional neural network to mine and learn the information in sample images, perform deep characterization of different sample images, and then use traditional pattern recognition The algorithm measures the similarity between features and solves the problem of identifying and detecting a large number of similar pictures. In addition, the embodiment of this application cleverly transforms the image similarity recognition and detection problem into an image classification problem, uses a deep learning model as a feature extractor, and combines traditional pattern recognition algorithms to measure the similarity between features, cleverly avoiding calculations The inaccuracies caused by the hash mean, the combination of deep learning methods and traditional pattern recognition methods can not only effectively use the ability of deep learning convolutional neural network transfer learning and automatic learning of image features, but also combine deep representation with traditional pattern recognition The combination of algorithms further improves the accuracy of similar image recognition and detection.
图6是本申请实施例提供的一种相似图像检测装置的框图,如图6所示,该装置包括第一获取模块601、第一提取模块602、度量模块603和检测模块604。6 is a block diagram of a similar image detection device provided by an embodiment of the present application. As shown in FIG. 6, the device includes a first acquisition module 601, a first extraction module 602, a measurement module 603, and a detection module 604.
第一获取模块601,用于获取待检测的多张图像;The first acquisition module 601 is used to acquire multiple images to be detected;
第一提取模块602,用于将所述多张图像作为CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;The first extraction module 602 is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
度量模块603,用于将所述多张图像的特征向量作为SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相似性度量值;The measurement module 603 is configured to use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement of the multiple images value;
检测模块604,用于基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。The detection module 604 is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
可选地,所述CNN模型为视觉几何组网络VGGNet模型,且所述VGGNet模型是基于样本数据集进行训练得到,所述样本数据集包括多种类别的样本图像和每张样本图像的类别标记信息。Optionally, the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
可选地,所述度量模块603用于:Optionally, the metric module 603 is used to:
通过所述SVM模型的分类函数,确定每张图像的分类函数值,将每张图像的分类函数值作为每张图像的相似性度量值;其中,所述分类函数用于表征对图像进行分类的分类超平面,每张图像的分类函数值用于表征每张图像与所述分类超平面之间的距离。Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
可选地,所述检测模块604用于:Optionally, the detection module 604 is used to:
将所述多张图像中相似性度量值位于同一度量值范围内的至少两张图像确定为相似图像;Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;
将所述相似图像确定为同一类别的图像。The similar images are determined to be images of the same category.
可选地,所述装置还包括:Optionally, the device further includes:
第二获取模块(图中未示出),用于获取第一样本数据集,所述第一样本数据集包括多种类别的第一样本图像和每张第一样本图像的类别标记信息;The second acquisition module (not shown in the figure) is used to acquire a first sample data set, the first sample data set including multiple types of first sample images and the category of each first sample image Tag information
第二提取模块(图中未示出),用于将所述多种类别的第一样本图像作为所述CNN模型的输入,通过所述CNN模型对所述多种类别的第一样本图像进行特征提取,得到所述多种类别的第一样本图像的特征向量;The second extraction module (not shown in the figure) is used to take the first sample images of the multiple categories as the input of the CNN model, and use the CNN model to analyze the first samples of the multiple categories. Performing feature extraction on the image to obtain the feature vector of the first sample image of the multiple categories;
第一训练模块(图中未示出),用于根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练, 得到所述SVM模型。The first training module (not shown in the figure) is used to train the SVM model to be trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image to obtain The SVM model.
可选地,所述第一训练模块用于:Optionally, the first training module is used to:
根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对目标函数进行求解,得到所述SVM模型的分类函数;其中,所述目标函数用于指示所述多种类别的第一样本图像中不同类别的第一样本图像之间的间隔最大。According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
可选地,所述装置还包括:Optionally, the device further includes:
预处理模块(图中未示出),用于对所述第一样本数据集进行预处理,得到预处理后的第一样本数据集;A preprocessing module (not shown in the figure), configured to preprocess the first sample data set to obtain a preprocessed first sample data set;
其中,所述预处理后的第一样本数据集用于作为所述CNN模型的输入,所述预处理包括以下至少一项:格式调整、去重和数据增强处理。Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
可选地,所述装置还包括:Optionally, the device further includes:
第二训练模块,用于根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型,所述第二样本数据集包括多种类别的第二样本图像和每张第二样本图像的类别标记信息;The second training module is used for pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model. The second sample data set includes multiple types of second sample images and the data of each second sample image. Category mark information;
第三训练模块,用于根据第三样本数据集对所述初始化CNN模型进行训练,得到所述CNN模型,所述第三样本数据集包括多种类别的第三样本图像和每张第三样本图像的类别标记信息。The third training module is used to train the initialized CNN model according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of various categories and each third sample The category tag information of the image.
可选地,所述第三样本数据集包括第一子样本数据集和第二子样本数据集,所述第一子样本数据集包括M张第三样本图像,且所述M张第三样本图像属于S个类别,所述第二子样本数据集包括N张第三样本图像,且所述N张第三样本图像属于T个类别,所述M张第三样本图像与所述N张第三样本图像不同,所述M、S、N、T为正整数。Optionally, the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples The images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The M third sample images are related to the N third sample images. The three sample images are different, and the M, S, N, and T are positive integers.
本申请实施例中,对于待检测的多张图像,通过先将这多张图像输入至CNN模型中,通过CNN模型进行特征提取,可以有效提取图像中的浅层和深层特征,从而可以有效利用图像中的信息,提高检测准确度,之后将提取的特征向量输入至SVM模型中,通过SVM模型对每张图像的特征向量进行相似性度量,可以根据图像的特征向量之间的相似来判断图像是否相似,进一步提高了检测准确度,同时减少了计算量,提高了检测效率。In the embodiment of this application, for multiple images to be detected, by first inputting the multiple images into the CNN model, and performing feature extraction through the CNN model, the shallow and deep features in the image can be effectively extracted, which can be effectively used The information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model. The similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
需要说明的是:上述实施例提供的相似图像检测装置在进行相似图像检测时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而 将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的相似图像检测装置与相似图像检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the similar image detection device provided in the above embodiment performs similar image detection, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated to different functional modules according to needs. Complete, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the similar image detection device provided in the foregoing embodiment belongs to the same concept as the similar image detection method embodiment, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
图7是本申请实施例提供的一种电子设备700的结构示意图,该电子设备可以终端或服务器等,终端可以为手机、平板电脑或计算机等。该电子设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器701和一个或一个以上的存储器702,其中,所述存储器702中存储有至少一条指令,所述至少一条指令由所述处理器701加载并执行以实现上述各个方法实施例提供的相似图像检测方法。当然,该电子设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该电子设备还可以包括其他用于实现设备功能的部件,在此不做赘述。FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application. The electronic device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer. The electronic device may have relatively large differences due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where at least one instruction is stored in the memory 702, and the at least One instruction is loaded and executed by the processor 701 to implement the similar image detection methods provided by the foregoing method embodiments. Of course, the electronic device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the electronic device may also include other components for implementing device functions, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上各个实施例所述的相似图像检测方法。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to realize similar image detection as described in each of the above embodiments. method.
本申请实施例还提供了一种计算机程序产品,该计算机程序产品存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上各个实施例所述的相似图像检测方法。The embodiments of the present application also provide a computer program product that stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the similar image detection method described in each of the above embodiments.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims (20)

  1. 一种相似图像检测方法,其特征在于,所述方法包括:A similar image detection method, characterized in that the method includes:
    获取待检测的多张图像;Acquire multiple images to be detected;
    将所述多张图像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;Taking the multiple images as input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
    将所述多张图像的特征向量作为支持向量机SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相似性度量值;Taking the feature vectors of the multiple images as the input of the support vector machine SVM model, and performing similarity measurement on the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images;
    基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。Based on the similarity measurement values of the multiple images, similar image detection is performed on the multiple images.
  2. 根据权利要求1所述的方法,其特征在于,所述CNN模型为视觉几何组网络VGGNet模型,且所述VGGNet模型是基于样本数据集进行训练得到,所述样本数据集包括多种类别的样本图像和每张样本图像的类别标记信息。The method according to claim 1, wherein the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, and the sample data set includes samples of multiple categories The image and the category labeling information of each sample image.
  3. 根据权利要求1所述的方法,其特征在于,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量,包括:The method according to claim 1, wherein the measuring the similarity of the feature vectors of the multiple images by the SVM model comprises:
    通过所述SVM模型的分类函数,确定每张图像的分类函数值,将每张图像的分类函数值作为每张图像的相似性度量值;其中,所述分类函数用于表征对图像进行分类的分类超平面,每张图像的分类函数值用于表征每张图像与所述分类超平面之间的距离。Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测,包括:The method according to claim 1, wherein the performing similar image detection on the multiple images based on the similarity metric values of the multiple images comprises:
    将所述多张图像中相似性度量值位于同一度量值范围内的至少两张图像确定为相似图像;Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;
    将所述相似图像确定为同一类别的图像。The similar images are determined to be images of the same category.
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量之前,还包括:The method according to any one of claims 1 to 4, wherein before said measuring the similarity of the feature vectors of the multiple images through the SVM model, the method further comprises:
    获取第一样本数据集,所述第一样本数据集包括多种类别的第一样本图像和每张第一样本图像的类别标记信息;Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;
    将所述多种类别的第一样本图像作为所述CNN模型的输入,通过所述CNN模型对所述多种类别的第一样本图像进行特征提取,得到所述多种类别的第一样本图像的特征向量;The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;
    根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型。According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型,包括:The method according to claim 5, wherein the SVM model to be trained is trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, Obtaining the SVM model includes:
    根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对目标函数进行求解,得到所述SVM模型的分类函数;其中,所述目标函数用于指示所述多种类别的第一样本图像中不同类别的第一样本图像之间的间隔最大。According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  7. 根据权利要求5所述的方法,其特征在于,所述获取第一样本数据集之后,还包括:The method according to claim 5, wherein after the obtaining the first sample data set, the method further comprises:
    对所述第一样本数据集进行预处理,得到预处理后的第一样本数据集;Preprocessing the first sample data set to obtain a preprocessed first sample data set;
    其中,所述预处理后的第一样本数据集用于作为所述CNN模型的输入,所述预处理包括以下至少一项:格式调整、去重和数据增强处理。Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  8. 根据权利要求1-4任一所述的方法,其特征在于,所述将所述多张图像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取之前,还包括:The method according to any one of claims 1 to 4, characterized in that, before the multiple images are used as input of a convolutional neural network CNN model, the multiple images are extracted through the CNN model. ,Also includes:
    根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型,所述第二样本数据集包括多种类别的第二样本图像和每张第二样本图像的类别标记信息;Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;
    根据第三样本数据集对所述初始化CNN模型进行训练,得到所述CNN模型,所述第三样本数据集包括多种类别的第三样本图像和每张第三样本图像的 类别标记信息。The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.
  9. 根据权利要求8所述的方法,其特征在于,所述第三样本数据集包括第一子样本数据集和第二子样本数据集,所述第一子样本数据集包括M张第三样本图像,且所述M张第三样本图像属于S个类别,所述第二子样本数据集包括N张第三样本图像,且所述N张第三样本图像属于T个类别,所述M张第三样本图像与所述N张第三样本图像不同,所述M、S、N、T为正整数。The method according to claim 8, wherein the third sample data set includes a first sub-sample data set and a second sub-sample data set, and the first sub-sample data set includes M third sample images , And the M third sample images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The three sample images are different from the N third sample images, and the M, S, N, and T are positive integers.
  10. 一种相似图像检测装置,其特征在于,所述装置包括:A similar image detection device, characterized in that the device includes:
    第一获取模块,用于获取待检测的多张图像;The first acquisition module is used to acquire multiple images to be detected;
    第一提取模块,用于将所述多张图像作为CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取,得到所述多张图像的特征向量;The first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
    度量模块,用于将所述多张图像的特征向量作为SVM模型的输入,通过所述SVM模型对所述多张图像的特征向量进行相似性度量,得到所述多张图像的相似性度量值;The measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ;
    检测模块,用于基于所述多张图像的相似性度量值,对所述多张图像进行相似图像检测。The detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  11. 根据权利要求10所述的装置,其特征在于,所述CNN模型为视觉几何组网络VGGNet模型,且所述VGGNet模型是基于样本数据集进行训练得到,所述样本数据集包括多种类别的样本图像和每张样本图像的类别标记信息。The device according to claim 10, wherein the CNN model is a visual geometry group network VGGNet model, and the VGGNet model is obtained by training based on a sample data set, and the sample data set includes samples of multiple categories The image and the category labeling information of each sample image.
  12. 根据权利要求10所述的装置,其特征在于,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量,包括:The device according to claim 10, wherein the measuring the similarity of the feature vectors of the multiple images by the SVM model comprises:
    通过所述SVM模型的分类函数,确定每张图像的分类函数值,将每张图像的分类函数值作为每张图像的相似性度量值;其中,所述分类函数用于表征对图像进行分类的分类超平面,每张图像的分类函数值用于表征每张图像与所述分类超平面之间的距离。Through the classification function of the SVM model, the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image The classification hyperplane, the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  13. 根据权利要求10所述的装置,其特征在于,所述基于所述多张图像的 相似性度量值,对所述多张图像进行相似图像检测,包括:The device according to claim 10, wherein the performing similar image detection on the multiple images based on the similarity metric values of the multiple images comprises:
    将所述多张图像中相似性度量值位于同一度量值范围内的至少两张图像确定为相似图像;Determining at least two images in the plurality of images whose similarity metric values are within the same metric value range as similar images;
    将所述相似图像确定为同一类别的图像。The similar images are determined to be images of the same category.
  14. 根据权利要求10-13任一所述的装置,其特征在于,所述通过所述SVM模型对所述多张图像的特征向量进行相似性度量之前,还包括:The apparatus according to any one of claims 10-13, wherein before the similarity measurement is performed on the feature vectors of the multiple images by the SVM model, the method further comprises:
    获取第一样本数据集,所述第一样本数据集包括多种类别的第一样本图像和每张第一样本图像的类别标记信息;Acquiring a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image;
    将所述多种类别的第一样本图像作为所述CNN模型的输入,通过所述CNN模型对所述多种类别的第一样本图像进行特征提取,得到所述多种类别的第一样本图像的特征向量;The first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories. The feature vector of the sample image;
    根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型。According to the feature vectors of the first sample images of the multiple categories and the category label information of each first sample image, the SVM model to be trained is trained to obtain the SVM model.
  15. 根据权利要求14所述的装置,其特征在于,所述根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对待训练SVM模型进行训练,得到所述SVM模型,包括:The device according to claim 14, wherein the SVM model to be trained is trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image, Obtaining the SVM model includes:
    根据所述多种类别的第一样本图像的特征向量以及每张第一样本图像的类别标记信息,对目标函数进行求解,得到所述SVM模型的分类函数;其中,所述目标函数用于指示所述多种类别的第一样本图像中不同类别的第一样本图像之间的间隔最大。According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  16. 根据权利要求14所述的装置,其特征在于,所述获取第一样本数据集之后,还包括:The device according to claim 14, wherein after said obtaining the first sample data set, it further comprises:
    对所述第一样本数据集进行预处理,得到预处理后的第一样本数据集;Preprocessing the first sample data set to obtain a preprocessed first sample data set;
    其中,所述预处理后的第一样本数据集用于作为所述CNN模型的输入,所述预处理包括以下至少一项:格式调整、去重和数据增强处理。Wherein, the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  17. 根据权利要求10-13任一所述的装置,其特征在于,所述将所述多张图 像作为卷积神经网络CNN模型的输入,通过所述CNN模型对所述多张图像进行特征提取之前,还包括:The device according to any one of claims 10-13, wherein the multiple images are used as input of a CNN model of a convolutional neural network, and before the feature extraction is performed on the multiple images through the CNN model ,Also includes:
    根据第二样本数据集对待训练CNN模型进行预训练,得到初始化CNN模型,所述第二样本数据集包括多种类别的第二样本图像和每张第二样本图像的类别标记信息;Pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model, where the second sample data set includes multiple types of second sample images and category label information of each second sample image;
    根据第三样本数据集对所述初始化CNN模型进行训练,得到所述CNN模型,所述第三样本数据集包括多种类别的第三样本图像和每张第三样本图像的类别标记信息。The initialization CNN model is trained according to a third sample data set to obtain the CNN model. The third sample data set includes third sample images of multiple categories and category label information of each third sample image.
  18. 根据权利要求17所述的装置,其特征在于,所述第三样本数据集包括第一子样本数据集和第二子样本数据集,所述第一子样本数据集包括M张第三样本图像,且所述M张第三样本图像属于S个类别,所述第二子样本数据集包括N张第三样本图像,且所述N张第三样本图像属于T个类别,所述M张第三样本图像与所述N张第三样本图像不同,所述M、S、N、T为正整数。The apparatus according to claim 17, wherein the third sample data set includes a first sub-sample data set and a second sub-sample data set, and the first sub-sample data set includes M third sample images , And the M third sample images belong to S categories, the second sub-sample data set includes N third sample images, and the N third sample images belong to T categories. The three sample images are different from the N third sample images, and the M, S, N, and T are positive integers.
  19. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现如权利要求1至9任一项所述的相似图像检测方法。An electronic device, wherein the electronic device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement any one of claims 1 to 9 The similar image detection method described in one.
  20. 一种计算机可读存储介质,其特征在于,所述存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现如权利要求1至9任一项所述的相似图像检测方法。A computer-readable storage medium, wherein the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to realize the similar image detection according to any one of claims 1 to 9 method.
PCT/CN2020/138510 2019-12-30 2020-12-23 Similar image detection method and apparatus, device and storage medium WO2021136027A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911390241.0 2019-12-30
CN201911390241.0A CN111222548A (en) 2019-12-30 2019-12-30 Similar image detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021136027A1 true WO2021136027A1 (en) 2021-07-08

Family

ID=70827972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138510 WO2021136027A1 (en) 2019-12-30 2020-12-23 Similar image detection method and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN111222548A (en)
WO (1) WO2021136027A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780284A (en) * 2021-09-17 2021-12-10 焦点科技股份有限公司 Logo detection method based on target detection and metric learning
CN114445661A (en) * 2022-01-24 2022-05-06 电子科技大学 Embedded image identification method based on edge calculation
CN114998956A (en) * 2022-05-07 2022-09-02 北京科技大学 Small sample image data expansion method and device based on intra-class difference
CN114998192A (en) * 2022-04-19 2022-09-02 深圳格芯集成电路装备有限公司 Defect detection method, device and equipment based on deep learning and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222548A (en) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 Similar image detection method, device, equipment and storage medium
CN111870279B (en) * 2020-07-31 2022-01-28 西安电子科技大学 Method, system and application for segmenting left ventricular myocardium of ultrasonic image
CN112749765A (en) * 2021-02-01 2021-05-04 深圳无域科技技术有限公司 Picture scene classification method, system, device and computer readable medium
CN113297411B (en) * 2021-07-26 2021-11-09 深圳市信润富联数字科技有限公司 Method, device and equipment for measuring similarity of wheel-shaped atlas and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563319A (en) * 2017-08-24 2018-01-09 西安交通大学 Face similarity measurement computational methods between a kind of parent-offspring based on image
CN109359551A (en) * 2018-09-21 2019-02-19 深圳市璇玑实验室有限公司 A kind of nude picture detection method and system based on machine learning
CN109784366A (en) * 2018-12-07 2019-05-21 北京飞搜科技有限公司 The fine grit classification method, apparatus and electronic equipment of target object
US20190183429A1 (en) * 2016-03-24 2019-06-20 The Regents Of The University Of California Deep-learning-based cancer classification using a hierarchical classification framework
CN111222548A (en) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 Similar image detection method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467564B (en) * 2010-11-12 2013-06-05 中国科学院烟台海岸带研究所 Remote sensing image retrieval method based on improved support vector machine relevance feedback
CN109165682B (en) * 2018-08-10 2020-06-16 中国地质大学(武汉) Remote sensing image scene classification method integrating depth features and saliency features
CN110298376B (en) * 2019-05-16 2022-07-01 西安电子科技大学 Bank bill image classification method based on improved B-CNN

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190183429A1 (en) * 2016-03-24 2019-06-20 The Regents Of The University Of California Deep-learning-based cancer classification using a hierarchical classification framework
CN107563319A (en) * 2017-08-24 2018-01-09 西安交通大学 Face similarity measurement computational methods between a kind of parent-offspring based on image
CN109359551A (en) * 2018-09-21 2019-02-19 深圳市璇玑实验室有限公司 A kind of nude picture detection method and system based on machine learning
CN109784366A (en) * 2018-12-07 2019-05-21 北京飞搜科技有限公司 The fine grit classification method, apparatus and electronic equipment of target object
CN111222548A (en) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 Similar image detection method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780284A (en) * 2021-09-17 2021-12-10 焦点科技股份有限公司 Logo detection method based on target detection and metric learning
CN113780284B (en) * 2021-09-17 2024-04-19 焦点科技股份有限公司 Logo detection method based on target detection and metric learning
CN114445661A (en) * 2022-01-24 2022-05-06 电子科技大学 Embedded image identification method based on edge calculation
CN114445661B (en) * 2022-01-24 2023-08-18 电子科技大学 Embedded image recognition method based on edge calculation
CN114998192A (en) * 2022-04-19 2022-09-02 深圳格芯集成电路装备有限公司 Defect detection method, device and equipment based on deep learning and storage medium
CN114998956A (en) * 2022-05-07 2022-09-02 北京科技大学 Small sample image data expansion method and device based on intra-class difference

Also Published As

Publication number Publication date
CN111222548A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
WO2021136027A1 (en) Similar image detection method and apparatus, device and storage medium
WO2021203863A1 (en) Artificial intelligence-based object detection method and apparatus, device, and storage medium
WO2019128646A1 (en) Face detection method, method and device for training parameters of convolutional neural network, and medium
WO2020155518A1 (en) Object detection method and device, computer device and storage medium
US8792722B2 (en) Hand gesture detection
US8750573B2 (en) Hand gesture detection
WO2022033095A1 (en) Text region positioning method and apparatus
WO2017096753A1 (en) Facial key point tracking method, terminal, and nonvolatile computer readable storage medium
WO2016150240A1 (en) Identity authentication method and apparatus
EP4099217A1 (en) Image processing model training method and apparatus, device, and storage medium
US11875512B2 (en) Attributionally robust training for weakly supervised localization and segmentation
WO2021238548A1 (en) Region recognition method, apparatus and device, and readable storage medium
CN105550641B (en) Age estimation method and system based on multi-scale linear differential texture features
US11830233B2 (en) Systems and methods for stamp detection and classification
US10423817B2 (en) Latent fingerprint ridge flow map improvement
US20240153240A1 (en) Image processing method, apparatus, computing device, and medium
WO2017214970A1 (en) Building convolutional neural network
CN113221918B (en) Target detection method, training method and device of target detection model
CN111914908A (en) Image recognition model training method, image recognition method and related equipment
CN111598149B (en) Loop detection method based on attention mechanism
WO2023123923A1 (en) Human body weight identification method, human body weight identification device, computer device, and medium
WO2022127333A1 (en) Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device
CN111444807A (en) Target detection method, device, electronic equipment and computer readable medium
CN106709490B (en) Character recognition method and device
US9104450B2 (en) Graphical user interface component classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20909458

Country of ref document: EP

Kind code of ref document: A1