WO2021136027A1 - Procédé et appareil de détection d'images similaires, dispositif et support d'informations - Google Patents

Procédé et appareil de détection d'images similaires, dispositif et support d'informations Download PDF

Info

Publication number
WO2021136027A1
WO2021136027A1 PCT/CN2020/138510 CN2020138510W WO2021136027A1 WO 2021136027 A1 WO2021136027 A1 WO 2021136027A1 CN 2020138510 W CN2020138510 W CN 2020138510W WO 2021136027 A1 WO2021136027 A1 WO 2021136027A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
sample
data set
image
model
Prior art date
Application number
PCT/CN2020/138510
Other languages
English (en)
Chinese (zh)
Inventor
孙莹莹
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021136027A1 publication Critical patent/WO2021136027A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to a similar image detection method, device, device, and storage medium.
  • Similar image detection is a basic problem in computer vision, which aims to compare the similarity between images and judge whether the images are similar based on the similarity between the images. Similar image detection can be applied to different task scenarios. For example, similar image detection technology can be used to detect similar images from the mobile phone's album, and then delete some of the similar images from the album to save mobile phone memory.
  • a hash algorithm can be used to detect similar images.
  • the two images to be detected can be hashed by a hash algorithm to obtain the hash value of the two images, and then the Hamming distance between the two images based on the hash value can be calculated. If the Hamming distance is less than the threshold, it is determined that the two images are similar images. If the Hamming distance based on the hash value is greater than or equal to the threshold, it is determined that the two images are not similar images.
  • the embodiments of the present application provide a similar image detection method, device, equipment, and storage medium.
  • the technical solution is as follows:
  • an embodiment of the present application provides a similar image detection method, and the method includes:
  • Similar image detection is performed on the multiple images.
  • a similar image detection device includes:
  • the first acquisition module is used to acquire multiple images to be detected
  • the first extraction module is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
  • the measurement module is used to take the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement values of the multiple images ;
  • the detection module is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • an electronic device in another aspect, includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement the above-mentioned similar image detection method .
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
  • a computer program product stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the above-mentioned similar image detection method.
  • Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the network structure of a VGGNet-16 model provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a similar image detection process provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of another similar image detection method provided by an embodiment of the present application.
  • Fig. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects before and after are in an "or” relationship.
  • the similar image detection method provided by the embodiments of the present application is applied in the field of computer vision, and is specifically applied in a scene where similarity detection is performed on images, so as to detect similar images from multiple images. Similar image detection is a very important basic problem in the field of computer vision. The quality of many tasks depends on the quality of the similarity measure.
  • the similar image detection method provided in the embodiment of this application can be extended to the album application.
  • the similar image detection method provided in the embodiment of this application detects similar images from the album, and then recommends the similar images to the user Delete and clean up to help users better manage albums.
  • the similar image detection method can also be applied to other scenarios, which is not limited in the embodiment of the present application.
  • the similar image detection method provided by the embodiments of the application can be applied to a similar image detection device.
  • the similar image detection device can be an electronic device such as a terminal or a server.
  • the terminal can be a mobile phone, a tablet, or a computer. .
  • the embodiment of the present application provides a similar image detection method, and the method includes:
  • Similar image detection is performed on the multiple images.
  • the CNN model is a visual geometry group network VGGNet model
  • the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
  • the measuring the similarity of the feature vectors of the multiple images by the SVM model includes:
  • the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image
  • the classification hyperplane the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  • the performing similar image detection on the multiple images based on the similarity metric values of the multiple images includes:
  • the similar images are determined to be images of the same category.
  • the method further includes:
  • the first sample data set including multiple types of first sample images and category label information of each first sample image
  • the first sample images of the multiple categories are used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories.
  • the feature vector of the sample image is used as the input of the CNN model, and feature extraction is performed on the first sample images of the multiple categories through the CNN model to obtain the first sample images of the multiple categories.
  • the SVM model to be trained is trained to obtain the SVM model.
  • the training the SVM model to be trained based on the feature vector of the first sample images of the multiple categories and the category label information of each first sample image to obtain the SVM model includes:
  • the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  • the method further includes:
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • the method before the using the multiple images as the input of a convolutional neural network CNN model, and performing feature extraction on the multiple images through the CNN model, the method further includes:
  • the initialization CNN model is trained according to a third sample data set to obtain the CNN model.
  • the third sample data set includes third sample images of multiple categories and category label information of each third sample image.
  • the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples
  • the images belong to S categories
  • the second sub-sample data set includes N third sample images
  • the N third sample images belong to T categories.
  • the M third sample images are related to the N third sample images.
  • the three sample images are different, and the M, S, N, and T are positive integers.
  • the similar image detection method provided in the embodiments of this application is a similar image detection method based on deep learning.
  • the detection process requires the use of CNN (Convolutional Neural Networks, convolutional neural network) model for feature extraction, and the use of SVM (Support Vector) Machine, support vector machine) model for similarity measurement, in order to facilitate understanding, next, first introduce the model training methods of the CNN model and the SVM model.
  • CNN Convolutional Neural Networks, convolutional neural network
  • SVM Small Vector
  • Support vector machine Support vector machine
  • Fig. 1 is a flowchart of a model training method provided by an embodiment of the present application. The method is applied to a similar image detection device.
  • the device may be an electronic device such as a terminal or a server. As shown in Fig. 1, the method includes the following steps :
  • Step 101 Obtain a second sample data set and a third sample data set.
  • the second sample data set includes multiple types of second sample images and the category label information of each second sample image.
  • the third sample data set includes multiple types The category of the third sample image and the category label information of each third sample image.
  • the second sample data set and the third sample data set are pre-acquired sample data sets that can meet the requirements of model training.
  • the category tag information is used to indicate the category of the corresponding image.
  • the second sample images of multiple categories and the third sample images of multiple categories have different categories.
  • the second sample data set and the third sample data set are network image data sets.
  • the second sample data set is the ImageNet (image network) data set
  • the third sample data set is the trademark image data set.
  • the ImageNet data set contains a total of more than 14 million images and a total of more than 20,000 images.
  • the ImageNet data set is currently a more commonly used data set. On this data set, research work such as image classification, target positioning, and target detection can be carried out.
  • the trademark image data set includes various types of trademark images.
  • the third sample data set includes two different sub-sample data sets, for example, the third sample data set includes a first sub-sample data set and a second sub-sample data set.
  • the first subsample data set includes M third sample images, and M third sample images belong to S categories
  • the second subsample data set includes N third sample images
  • N third sample images belong to
  • M third sample images are different from N third sample images.
  • M and N are both positive integers
  • S and T are also positive integers.
  • the third sample data set is a trademark image data set
  • the first sub-sample data set may be the logo (trademark)-405 data set
  • the second sub-sample data set may be FlickrLogo ( ⁇ link trademark)- 32 data sets.
  • the Logo-405 data set comes from Internet crawlers, including 32218 images.
  • a total of 405 types of trademark image data including major luxury trademarks are collected.
  • the number of each type of trademark data ranges from tens to more than 100. Varying, the size of each trademark image is about 300*500.
  • the FlickrLogo-32 data set comes from data publicly available on the Internet.
  • the data set contains a total of 32 types of trademark images including trademarks of major Internet companies, with a total of 8,240 images.
  • preprocessing may be performed on the images in the two sample data sets.
  • the second sample data set and the third sample data set can be preprocessed in the following manner:
  • the images in the two sample data sets may also be formatted, deduplicated, and renamed.
  • data enhancement processing may also be performed on these two sample data sets.
  • the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
  • the images in any sample data set may be scaled proportionally to ensure that the scales of the images in the sample data set are uniform. Since the scales of actual images are often different, and the scales must be uniform in the training set, the scaling method can be used to adjust the scales of the images in the sample data set to be uniform, so that the model can learn and recognize these images.
  • noise can be randomly added to the images in any sample data set. Since the actual image often contains a lot of noise, and the sample images in the sample data set are usually relatively clean, if the training is carried out directly, the robustness of the model to noise will be poor, and even during the image detection process, if there are several images on the image to be detected. For the noise of each pixel, the model will recognize errors. Therefore, in order to make the model more robust to noise, the embodiment of the present application may randomly add noise to the training image.
  • the acquired second sample data set and third sample data set you can select an image from any sample data set, then rotate the selected image, and add the rotated image to the sample data set to Increase the data volume of the sample data set.
  • the selection method used may be a random selection method or other preset selection methods, which is not limited in the embodiment of the present application.
  • the image transferred out of the display area can be cropped to maintain the uniformity of the image scale.
  • the sample images in any sample data set may be normalized to remove redundant information.
  • the normalization processing includes: normalizing the pixel value of the sample image from [0, 255] to [0, 1] to remove redundant information contained in the sample data to be trained and further shorten the training time.
  • a fourth sample data set and a fifth sample data set can also be obtained.
  • Both the fourth sample data set and the fifth sample data set include sample images of multiple categories and the category label information of each sample image.
  • the sample images in the fourth sample data set and the fifth sample data set are different.
  • the fourth sample data set is the ImageNet data set
  • the fifth sample data set is the trademark image data set.
  • preprocess the sample images in the fourth sample data set divide the preprocessed fourth sample data set into a training set and a test set, and use the training set as the second sample data set.
  • preprocess the sample images in the fifth sample data set divide the preprocessed fifth sample data set into a training set and a test set, and use the training set as the third sample data set.
  • the training set is used to train the CNN model
  • the test set is used to test the CNN model.
  • the preprocessing includes at least one of format adjustment, deduplication, and data enhancement processing
  • the data enhancement processing includes at least one of scaling, noise addition, rotation, and normalization processing.
  • the preprocessed fourth sample data set can be divided into a training set and a test set proportionally
  • the preprocessed fifth sample data set can be divided into a training set and a test set proportionally
  • the ratio of training set to test set can be 8:2.
  • Step 102 Perform pre-training on the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.
  • the CNN model may be a VGGNet (Visual Geometry Group Network) model.
  • VGGNet Visual Geometry Group Network
  • the VGGNet model is a deep convolutional neural network model proposed by the Vision Geometry Group of Oxford University, which can reduce the Top-5 error rate to 7.3%.
  • the VGGNet model mainly has the following characteristics: 1) All convolutional layers in the entire network structure use 3*3 convolution kernels; 2) In the network structure, two 3*3 convolutional layers are used to replace a traditional 5* 5 convolutional layers, three 3*3 convolutional layers replace a traditional 7*7 convolutional layer, which increases the nonlinear expression ability of the network; 3) Multiple 3*3 convolution kernels are larger than one The size of the convolution kernel has fewer parameters, which reduces the overall network parameters.
  • the VGGNet model may be a 19-layer VGGNet-19 model or a 16-layer VGGNet-16 model.
  • FIG. 2 is a schematic diagram of the network structure of a VGG Net-16 model provided by an embodiment of the present application. As shown in FIG. 2, the VGGNet-16 model includes 13 layers of convolutional layers and 3 layers of fully connected layers.
  • all convolutional layers in the network structure have the same size convolution kernel.
  • the size of the convolution kernel is 3*3, which can capture the upper and lower layers.
  • the minimum size window for the left, right and center information, and each 3*3 convolutional layer is filled with one pixel to ensure that the input and output sizes after convolution remain the same.
  • an initialized CNN model that can classify images in the second sample data set can be obtained.
  • the VGGNet model is obtained by training based on a sample data set, and the sample data set includes sample images of multiple categories and category label information of each sample image.
  • the VGGNet model to be trained is pre-trained according to the second sample data set to obtain the initial VGGNet model, and then the initial VGGNet model is trained according to the third sample data set to obtain the VGGNet model.
  • Step 103 Train the initialized CNN model according to the third sample data set to obtain the CNN model.
  • the CNN model is used to perform feature extraction on any image to obtain the feature vector of the image.
  • VGG-16 model pre-trained on the ImageNet dataset as the initialization model, and train the above initialization model according to the trademark image to obtain the trained VGG-16 model, and then use the trained VGG-16 model as Feature extractor.
  • the training set can be input to the CNN model for training, and iterate a preset number of times.
  • the preset number of times can be preset, for example, the preset number of times can be 80 times, 90 times, 100 times, and so on.
  • the gradient descent algorithm can be used to optimize the objective function during each iterative calculation process, so that the model reaches convergence.
  • the gradient descent algorithm may be an Adam (Adaptive Moment Estimation) gradient descent algorithm.
  • the Adam gradient descent algorithm is an efficient calculation method that can improve the convergence speed of the gradient descent.
  • the batch sample size batch_size of the Adam gradient descent algorithm can be set in advance.
  • batch_size can be set to 32.
  • Step 104 Obtain a first sample data set, the first sample data set including multiple types of first sample images and category label information of each first sample image.
  • the first sample data set may be the above-mentioned first sub-sample data set and/or the second sub-sample data set, or may be other than the above-mentioned first sub-sample data set and the second sub-sample data set. Sample data set.
  • the first sample data set is the aforementioned third sample data set, for example, the first sample data set is a trademark image data set.
  • the first sample data set is the logo-405 data set and the FlickrLogo-32 data set.
  • Step 105 Use the first sample image of the multiple categories as the input of the CNN model, and perform feature extraction on the first sample image of the multiple categories through the CNN model to obtain the first sample of the multiple categories The feature vector of the image.
  • the feature vector is used to characterize the features of the sample image in various dimensions.
  • the feature vector may be a feature vector with a dimension of 4096.
  • the CNN model can be used to extract features from the trademark datasets Logo-405 and FlickrLogo-32 respectively, to obtain the feature vector of each image, and finally to obtain the feature extraction of each trademark image through the VGG-16 model Depth representation form.
  • the CNN model is used for feature extraction.
  • the CNN model can automatically learn the features of the image during the training process, without manual design and human intervention in feature learning, which improves the efficiency and accuracy of feature extraction.
  • the first sample data set is preprocessed to obtain the preprocessed first sample data.
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • Step 106 According to the feature vector of the first sample images of the multiple categories and the category label information of each first sample image, train the SVM model to be trained to obtain the SVM model.
  • the operation of training the SVM model to be trained includes: according to the first sample of multiple categories The feature vector of the image and the category label information of each first sample image are solved for the objective function to obtain the classification function of the SVM model; wherein, the objective function is used to indicate that the multiple categories of the first sample images are different The interval between the first sample images of the category is the largest.
  • the basic idea of the support vector machine algorithm is to first transform the input sample space into a high-dimensional space, and then find the optimal classification hyperplane in this new high-dimensional feature space to maximize the interval between sample points of different categories.
  • the classification hyperplane is the maximum separation hyperplane.
  • the objective function of the SVM model to be trained can be:
  • is the normal vector, which determines the direction of the classification hyperplane
  • b is the displacement, which determines the distance between the classification hyperplane and the origin.
  • constraints of the SVM model to be trained are:
  • the Lagrange multiplier ⁇ is introduced into the objective function, and the objective function obtained is:
  • the SVM model to be trained may also be a kernel function-based SVM model.
  • the SVM model to be trained may also use the kernel function to solve the classification function.
  • the kernel function is also a special similarity measure function, and different kernel functions represent different similarity measures.
  • the CNN model and SVM model After training the CNN model and SVM model, you can also use the test set as the input of the CNN model, extract the feature vector of each image in the test set, and then use the feature vector of each image in the test set as the input of the SVM model for testing. Verify the accuracy of the model. If the CNN model and the SVM model pass the verification, the training is completed. If the verification fails, the training of the CNN model and the SVM model will continue according to the training set until the trained model can pass the verification.
  • FIG. 3 is a flow chart of a similar image detection process provided by an embodiment of the present application.
  • a sample data set can be constructed first, and then the sample images in the sample data set can be preprocessed, and then The preprocessed sample data set is divided into a training set and a test set proportionally.
  • the training set is used to train the CNN model and the SVM model
  • the test set is used to verify the accuracy of the trained CNN model and the SVM model.
  • the VGG-16-based image similarity feature extraction model namely the CNN model
  • the SVM model is used to classify the feature vectors extracted by the CNN model to complete the image similarity detection.
  • the embodiment of the present application may use TensorFlow (an open source software library) to define a convolutional neural network model.
  • TensorFlow an open source software library
  • users can easily deploy computing tasks to multiple platforms and devices.
  • the multiple platforms include CPU (Central Processing Unit), GPU (Graphics Processing Unit, graphics processing unit), and TPU (Tensor Processing Unit, tensor processing unit).
  • the multiple devices may include desktop devices, Server clusters, mobile devices, edge devices, etc.
  • desktop devices include desktop computers, etc.
  • server clusters include one or more servers
  • mobile devices include mobile phones, tablets, smart wearable devices, etc.
  • edge devices refer to switches, routers, and routing switches installed on edge networks , IAD (Integrated Access Device), and various MAN (Metropolitan Area Network)/WAN (Wide Area Network) devices, etc.
  • FIG. 4 is a flowchart of a similar image detection method provided by an embodiment of the present application. The method is applied to a similar image detection device. As shown in FIG. 4, the method includes the following steps:
  • Step 401 Acquire multiple images to be detected.
  • the multiple images to be detected can be uploaded by the user, can be obtained from the storage space of the device, can be sent by other devices, or can be obtained from the network.
  • the method of obtaining is not limited.
  • the images stored in the album of the terminal may be acquired, and the images stored in the album may be used as multiple images to be detected.
  • Step 402 Use the multiple images as the input of the CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images.
  • the CNN model is a VGGNet model.
  • the CNN model is a VGGNet-16 model or a VGGNet-19 model.
  • the multiple images can also be preprocessed, and then the preprocessed multiple images can be used as the input of the CNN model, through the CNN model Perform feature extraction.
  • the preprocessing includes at least one of format adjustment, deduplication, and data enhancement.
  • Step 403 Use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity metric values of the multiple images.
  • the classification function value of each image can be determined through the classification function of the SVM model, and the classification function value of each image can be used as the similarity measurement value of each image.
  • the classification function is used to characterize the classification hyperplane for classifying images
  • the classification function value of each image is used to characterize the distance between each image and the classification hyperplane. The closer the classification function values of any two images in the multiple images are, the closer the distance between the two images is, and the higher the similarity between the two images is.
  • the quality of image feature selection also has a very important impact on the result of similarity measurement. Therefore, the quality of image feature extraction and the selection of measurement algorithms have a significant impact on image similarity detection. The results all have a certain impact.
  • the feature vector of the image can be combined with the traditional pattern recognition algorithm, the similarity between the image features can be measured to determine whether the image is similar, the detection task is completed, and the shallow layer in the image can be extracted by using deep learning. And high-level features, effectively use the information in the image, and improve the accuracy of detection.
  • Step 404 Perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • At least two images in the plurality of images whose similarity metric values are within the same metric value range are determined to be similar images, and the similar images are determined to be images of the same category.
  • the image features of multiple images can be used as the input of the SVM model, and the multiple images can be classified according to the image features of the multiple images through the SVM model.
  • the images classified into one category are similar images.
  • the images of the same category and the category tags may be stored correspondingly.
  • Figure 5 is a flowchart of another similar image detection method provided by an embodiment of the present application.
  • the image to be detected can be obtained first, and then the image to be detected can be preprocessed, and then passed through the CNN model Perform feature extraction on the pre-processed image, and then perform similarity measurement on the extracted feature vector through the SVM model, and perform similar image detection based on the similarity measurement result.
  • the shallow and deep features in the image can be effectively extracted, which can be effectively used
  • the information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model.
  • the similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
  • the application embodiment introduces deep learning methods into image similarity recognition and detection, and builds a convolutional neural network to mine and learn the information in sample images, perform deep characterization of different sample images, and then use traditional pattern recognition
  • the algorithm measures the similarity between features and solves the problem of identifying and detecting a large number of similar pictures.
  • the embodiment of this application cleverly transforms the image similarity recognition and detection problem into an image classification problem, uses a deep learning model as a feature extractor, and combines traditional pattern recognition algorithms to measure the similarity between features, cleverly avoiding calculations
  • the inaccuracies caused by the hash mean, the combination of deep learning methods and traditional pattern recognition methods can not only effectively use the ability of deep learning convolutional neural network transfer learning and automatic learning of image features, but also combine deep representation with traditional pattern recognition
  • the combination of algorithms further improves the accuracy of similar image recognition and detection.
  • FIG. 6 is a block diagram of a similar image detection device provided by an embodiment of the present application. As shown in FIG. 6, the device includes a first acquisition module 601, a first extraction module 602, a measurement module 603, and a detection module 604.
  • the first acquisition module 601 is used to acquire multiple images to be detected
  • the first extraction module 602 is configured to use the multiple images as the input of a CNN model, and perform feature extraction on the multiple images through the CNN model to obtain feature vectors of the multiple images;
  • the measurement module 603 is configured to use the feature vectors of the multiple images as the input of the SVM model, and measure the similarity of the feature vectors of the multiple images through the SVM model to obtain the similarity measurement of the multiple images value;
  • the detection module 604 is configured to perform similar image detection on the multiple images based on the similarity metric values of the multiple images.
  • the CNN model is a visual geometry group network VGGNet model
  • the VGGNet model is obtained by training based on a sample data set, the sample data set including sample images of multiple categories and a category label for each sample image information.
  • the metric module 603 is used to:
  • the classification function value of each image is determined, and the classification function value of each image is used as the similarity measure of each image; wherein, the classification function is used to characterize the classification of the image
  • the classification hyperplane the classification function value of each image is used to characterize the distance between each image and the classification hyperplane.
  • the detection module 604 is used to:
  • the similar images are determined to be images of the same category.
  • the device further includes:
  • the second acquisition module (not shown in the figure) is used to acquire a first sample data set, the first sample data set including multiple types of first sample images and the category of each first sample image Tag information
  • the second extraction module (not shown in the figure) is used to take the first sample images of the multiple categories as the input of the CNN model, and use the CNN model to analyze the first samples of the multiple categories. Performing feature extraction on the image to obtain the feature vector of the first sample image of the multiple categories;
  • the first training module (not shown in the figure) is used to train the SVM model to be trained according to the feature vector of the first sample image of the multiple categories and the category label information of each first sample image to obtain The SVM model.
  • the first training module is used to:
  • the objective function is solved to obtain the classification function of the SVM model; wherein, the objective function is The interval between the first sample images of different categories among the first sample images indicating the multiple categories is the largest.
  • the device further includes:
  • a preprocessing module (not shown in the figure), configured to preprocess the first sample data set to obtain a preprocessed first sample data set;
  • the preprocessed first sample data set is used as the input of the CNN model, and the preprocessing includes at least one of the following: format adjustment, deduplication, and data enhancement processing.
  • the device further includes:
  • the second training module is used for pre-training the CNN model to be trained according to the second sample data set to obtain the initialized CNN model.
  • the second sample data set includes multiple types of second sample images and the data of each second sample image. Category mark information;
  • the third sample data set includes a first subsample data set and a second subsample data set, the first subsample data set includes M third sample images, and the M third samples
  • the images belong to S categories
  • the second sub-sample data set includes N third sample images
  • the N third sample images belong to T categories.
  • the M third sample images are related to the N third sample images.
  • the three sample images are different, and the M, S, N, and T are positive integers.
  • the shallow and deep features in the image can be effectively extracted, which can be effectively used
  • the information in the image improves the detection accuracy, and then the extracted feature vector is input into the SVM model.
  • the similarity measurement of the feature vector of each image is carried out through the SVM model, and the image can be judged based on the similarity between the feature vectors of the image. Whether they are similar, the detection accuracy is further improved, the calculation amount is reduced, and the detection efficiency is improved.
  • the similar image detection device provided in the above embodiment performs similar image detection, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated to different functional modules according to needs. Complete, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the similar image detection device provided in the foregoing embodiment belongs to the same concept as the similar image detection method embodiment, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 7 is a schematic structural diagram of an electronic device 700 provided by an embodiment of the present application.
  • the electronic device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer.
  • the electronic device may have relatively large differences due to different configurations or performances, and may include one or more processors 701 and one or more memories 702, where at least one instruction is stored in the memory 702, and the at least One instruction is loaded and executed by the processor 701 to implement the similar image detection methods provided by the foregoing method embodiments.
  • the electronic device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the electronic device may also include other components for implementing device functions, which will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to realize similar image detection as described in each of the above embodiments. method.
  • the embodiments of the present application also provide a computer program product that stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the similar image detection method described in each of the above embodiments.
  • the functions described in the embodiments of the present application may be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Certains modes de réalisation de la présente invention se rapportent au domaine technique du traitement d'images. L'invention concerne un procédé et un appareil de détection d'images similaires, un dispositif et un support d'informations. Le procédé consiste à : utiliser de multiples images à détecter en tant qu'entrée d'un modèle de CNN, et appliquer une extraction de caractéristiques aux multiples images au moyen du modèle de CNN pour obtenir des vecteurs de caractéristiques des multiples images ; utiliser les vecteurs de caractéristiques des multiples images en tant qu'entrée d'un modèle de SVM, et effectuer une mesure de similarité des vecteurs de caractéristiques des multiples images au moyen du modèle de SVM pour obtenir des valeurs de mesure de similarité des multiples images ; et appliquer une détection d'images similaires aux multiples images sur la base des valeurs de mesure de similarité des multiples images. Selon la présente invention, des caractéristiques peu profondes et profondes d'images peuvent être efficacement extraites, et le fait que les images soient similaires peut être déterminé conformément à la similarité entre les vecteurs de caractéristiques des images, ce qui permet d'améliorer la précision de la détection de similarité d'images.
PCT/CN2020/138510 2019-12-30 2020-12-23 Procédé et appareil de détection d'images similaires, dispositif et support d'informations WO2021136027A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911390241.0A CN111222548A (zh) 2019-12-30 2019-12-30 相似图像检测方法、装置、设备及存储介质
CN201911390241.0 2019-12-30

Publications (1)

Publication Number Publication Date
WO2021136027A1 true WO2021136027A1 (fr) 2021-07-08

Family

ID=70827972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138510 WO2021136027A1 (fr) 2019-12-30 2020-12-23 Procédé et appareil de détection d'images similaires, dispositif et support d'informations

Country Status (2)

Country Link
CN (1) CN111222548A (fr)
WO (1) WO2021136027A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780284A (zh) * 2021-09-17 2021-12-10 焦点科技股份有限公司 一种基于目标检测和度量学习的logo检测方法
CN114445661A (zh) * 2022-01-24 2022-05-06 电子科技大学 一种基于边缘计算的嵌入式图像识别方法
CN114998956A (zh) * 2022-05-07 2022-09-02 北京科技大学 一种基于类内差异的小样本图像数据扩充方法及装置
CN114998192A (zh) * 2022-04-19 2022-09-02 深圳格芯集成电路装备有限公司 基于深度学习的缺陷检测方法、装置、设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222548A (zh) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 相似图像检测方法、装置、设备及存储介质
CN111870279B (zh) * 2020-07-31 2022-01-28 西安电子科技大学 超声图像左室心肌的分割方法、系统及应用
CN112749765A (zh) * 2021-02-01 2021-05-04 深圳无域科技技术有限公司 图片场景分类方法、系统、设备及计算机可读介质
CN113297411B (zh) * 2021-07-26 2021-11-09 深圳市信润富联数字科技有限公司 轮形图谱相似性的度量方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563319A (zh) * 2017-08-24 2018-01-09 西安交通大学 一种基于图像的亲子间人脸相似性度量计算方法
CN109359551A (zh) * 2018-09-21 2019-02-19 深圳市璇玑实验室有限公司 一种基于机器学习的敏感图像识别方法与系统
CN109784366A (zh) * 2018-12-07 2019-05-21 北京飞搜科技有限公司 目标物体的细粒度分类方法、装置与电子设备
US20190183429A1 (en) * 2016-03-24 2019-06-20 The Regents Of The University Of California Deep-learning-based cancer classification using a hierarchical classification framework
CN111222548A (zh) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 相似图像检测方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467564B (zh) * 2010-11-12 2013-06-05 中国科学院烟台海岸带研究所 一种基于改进支持向量机相关反馈的遥感图像检索方法
CN109165682B (zh) * 2018-08-10 2020-06-16 中国地质大学(武汉) 一种融合深度特征和显著性特征的遥感图像场景分类方法
CN110298376B (zh) * 2019-05-16 2022-07-01 西安电子科技大学 一种基于改进b-cnn的银行票据图像分类方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190183429A1 (en) * 2016-03-24 2019-06-20 The Regents Of The University Of California Deep-learning-based cancer classification using a hierarchical classification framework
CN107563319A (zh) * 2017-08-24 2018-01-09 西安交通大学 一种基于图像的亲子间人脸相似性度量计算方法
CN109359551A (zh) * 2018-09-21 2019-02-19 深圳市璇玑实验室有限公司 一种基于机器学习的敏感图像识别方法与系统
CN109784366A (zh) * 2018-12-07 2019-05-21 北京飞搜科技有限公司 目标物体的细粒度分类方法、装置与电子设备
CN111222548A (zh) * 2019-12-30 2020-06-02 Oppo广东移动通信有限公司 相似图像检测方法、装置、设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780284A (zh) * 2021-09-17 2021-12-10 焦点科技股份有限公司 一种基于目标检测和度量学习的logo检测方法
CN113780284B (zh) * 2021-09-17 2024-04-19 焦点科技股份有限公司 一种基于目标检测和度量学习的logo检测方法
CN114445661A (zh) * 2022-01-24 2022-05-06 电子科技大学 一种基于边缘计算的嵌入式图像识别方法
CN114445661B (zh) * 2022-01-24 2023-08-18 电子科技大学 一种基于边缘计算的嵌入式图像识别方法
CN114998192A (zh) * 2022-04-19 2022-09-02 深圳格芯集成电路装备有限公司 基于深度学习的缺陷检测方法、装置、设备和存储介质
CN114998956A (zh) * 2022-05-07 2022-09-02 北京科技大学 一种基于类内差异的小样本图像数据扩充方法及装置

Also Published As

Publication number Publication date
CN111222548A (zh) 2020-06-02

Similar Documents

Publication Publication Date Title
WO2021136027A1 (fr) Procédé et appareil de détection d'images similaires, dispositif et support d'informations
WO2021203863A1 (fr) Procédé et appareil de détection d'objets basés sur l'intelligence artificielle, dispositif, et support de stockage
WO2020199468A1 (fr) Procédé et dispositif de classification d'image et support de stockage lisible par ordinateur
WO2019128646A1 (fr) Procédé de détection de visage, procédé et dispositif d'apprentissage de paramètres d'un réseau neuronal convolutif, et support
WO2020155518A1 (fr) Procédé et dispositif de détection d'objet, dispositif informatique et support d'informations
US8792722B2 (en) Hand gesture detection
WO2022033095A1 (fr) Procédé et appareil de positionnement de région de texte
WO2017096753A1 (fr) Procédé de suivi de point clé facial, terminal et support de stockage lisible par ordinateur non volatil
WO2016150240A1 (fr) Procédé et appareil d'authentification d'identité
US20120027263A1 (en) Hand gesture detection
EP4099217A1 (fr) Procédé et appareil d'entraînement de modèle de traitement d'image, dispositif, et support de stockage
US11875512B2 (en) Attributionally robust training for weakly supervised localization and segmentation
WO2021238548A1 (fr) Procédé, appareil et dispositif de reconnaissance de région, et support de stockage lisible
CN105550641B (zh) 基于多尺度线性差分纹理特征的年龄估计方法和系统
US11830233B2 (en) Systems and methods for stamp detection and classification
US10423817B2 (en) Latent fingerprint ridge flow map improvement
US20240153240A1 (en) Image processing method, apparatus, computing device, and medium
WO2017214970A1 (fr) Construction d'un réseau de neurones convolutif
CN113221918B (zh) 目标检测方法、目标检测模型的训练方法及装置
CN111914908A (zh) 一种图像识别模型训练方法、图像识别方法及相关设备
CN111444807A (zh) 目标检测方法、装置、电子设备和计算机可读介质
CN111598149B (zh) 一种基于注意力机制的回环检测方法
WO2023123923A1 (fr) Procédé d'identification de poids de corps humain, dispositif d'identification de poids de corps humain, dispositif informatique, et support
CN106709490B (zh) 一种字符识别方法和装置
US9104450B2 (en) Graphical user interface component classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20909458

Country of ref document: EP

Kind code of ref document: A1