CN110942081A - Image processing method and device, electronic equipment and readable storage medium - Google Patents

Image processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN110942081A
CN110942081A CN201811120898.0A CN201811120898A CN110942081A CN 110942081 A CN110942081 A CN 110942081A CN 201811120898 A CN201811120898 A CN 201811120898A CN 110942081 A CN110942081 A CN 110942081A
Authority
CN
China
Prior art keywords
image set
vector
image
images
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811120898.0A
Other languages
Chinese (zh)
Other versions
CN110942081B (en
Inventor
张修宝
王艳
沈海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN201811120898.0A priority Critical patent/CN110942081B/en
Publication of CN110942081A publication Critical patent/CN110942081A/en
Application granted granted Critical
Publication of CN110942081B publication Critical patent/CN110942081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The embodiment of the invention provides an image processing method and device, electronic equipment and a readable storage medium, and belongs to the field of image processing. The method comprises the following steps: acquiring a first type image set; extracting a feature vector of each image of the first type; calculating the similarity of every two feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors; and cleaning the first class image set according to the deviation value between the feature vectors. According to the scheme, the characteristic vector of each first-class image is obtained, pairwise similarity calculation is carried out on the pairwise images based on the characteristic vectors, the deviation value between every two characteristic vectors is obtained, the first-class image set is cleaned according to the deviation value between the characteristic vectors, images which do not meet the similarity requirement in the first-class image set can be removed, interference of the image data is avoided, and a good image cleaning effect is achieved.

Description

Image processing method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a readable storage medium.
Background
With the development of scientific technology, the deep learning neural network greatly promotes the development of the image recognition technology, so that the image recognition technology has very wide application. But the neural network is very dependent on data, and a data set which is sufficient in data amount and reliable is a premise for the neural network to achieve good effect. In order to train the neural network, a large amount of sample data needs to be obtained first, most of the currently disclosed image data sets are pictures collected on the network, the quality is uneven, some pictures are possibly very fuzzy, the features in the images are not obvious, some pictures are possibly mistaken, the training of the network is very unfavorable, the training result is not ideal, and finally the problem that the recognition result is inaccurate when the images are recognized is caused. At present, most of the methods of manually cleaning images are adopted, but under the condition of huge data quantity, time and labor are obviously wasted, and the cleaning result is not ideal.
Disclosure of Invention
The embodiment of the invention aims to provide an image processing method, an image processing device, electronic equipment and a readable storage medium.
In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:
acquiring a first image set, wherein the first image set comprises a plurality of first images;
extracting a feature vector of each image of the first type;
calculating the similarity of every two feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors; and are
And cleaning the first class image set according to the deviation value between the feature vectors.
Further, the cleaning the first class image set according to the deviation value between the feature vectors comprises:
counting the number of the characteristic vectors with deviation values smaller than or equal to a first threshold value aiming at each characteristic vector;
determining the feature vector with the largest number as a first reference feature vector of the first-class image set;
and cleaning the first class image set according to the first reference feature vector.
Further, the cleaning the first class image set according to the first reference feature vector comprises:
and determining the feature vectors of all the images of the first type of image set, wherein the deviation value of the feature vectors of all the images of the first type of image set and the first reference feature vector is greater than the feature vector of the first threshold, and removing the images corresponding to the determined feature vectors from the first type of image set.
Further, the cleaning the first class image set according to the first reference feature vector comprises:
determining feature vectors which are not directly similar to the first reference feature vector or indirectly similar to the first reference feature vector in all the feature vectors of the images of the first type of image set, and removing the determined images corresponding to the feature vectors from the first type of image set;
wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
Further, the cleaning the first class image set according to the deviation value between the feature vectors comprises:
if the deviation value of the two characteristic vectors is smaller than or equal to a second threshold value, removing the image corresponding to one of the two characteristic vectors from the first class of image set, wherein the second threshold value is smaller than the first threshold value.
Further, after the cleaning of the first class image set according to the deviation value between the feature vectors, the method further includes:
similarity calculation is carried out on the first reference characteristic vector and a second reference characteristic vector of a second class of image set, and a deviation value between the first reference characteristic vector and the second reference characteristic vector is obtained;
and if the deviation value of the first reference characteristic vector and the second reference characteristic vector is smaller than a third threshold value, merging the first image set and the second image set into the same image set.
Further, after the cleaning of the first class image set according to the deviation value between the feature vectors, the method further includes:
calculating a first vector mean value of the feature vectors of the first type image set, an acquired second vector mean value of the feature vectors of the second type image set, and a deviation value between the first vector mean value and the second vector mean value;
and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
Further, performing pairwise similarity calculation on all the feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors, including:
and calculating the Euclidean distance or the cosine value of the included angle of every two characteristic vectors to obtain the Euclidean distance or the cosine value of the included angle between every two characteristic vectors, wherein the Euclidean distance or the cosine value of the included angle is the deviation value.
Further, before a first type image set is acquired, the first type image set comprising a plurality of first type images, the method further comprises:
acquiring a plurality of images belonging to a first type of image set;
selecting the images with human faces in the multiple images as multiple original images;
and carrying out standardization processing on the human face areas in the plurality of original images to obtain the plurality of first-class images.
Further, the normalizing the face regions in the plurality of original images includes:
if the original image comprises a plurality of face images, extracting the face with the largest area in the original image, and carrying out standardization processing on the face with the largest area.
In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:
the image acquisition module is used for acquiring a first-class image set, and the first-class image set comprises a plurality of first-class images;
the characteristic vector extraction module is used for extracting a characteristic vector of each first type of image;
the similarity calculation module is used for calculating the similarity of every two feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors;
and the cleaning module is used for cleaning the first type of image set according to the deviation value between the feature vectors.
Further, the cleaning module is specifically configured to count, for each feature vector, the number of feature vectors whose deviation value is less than or equal to a first threshold; determining the feature vector with the largest number as a first reference feature vector of the first-class image set; and cleaning the first class image set according to the first reference feature vector.
Further, the cleaning module is further configured to determine a feature vector, of the feature vectors of all the images of the first-class image set, of which a deviation value from the first reference feature vector is greater than the first threshold, and remove an image corresponding to the determined feature vector from the first-class image set.
Further, the cleaning module is further configured to determine, from feature vectors of all images of the first-class image set, a feature vector that is neither directly similar to the first reference feature vector nor indirectly similar to the first reference feature vector, and eliminate an image corresponding to the determined feature vector from the first-class image set;
wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
Further, the cleaning module is further configured to remove an image corresponding to one of the two feature vectors from the first class of image set if the deviation value of the two feature vectors is smaller than or equal to a second threshold, where the second threshold is smaller than the first threshold.
Further, the apparatus further comprises:
the first inter-class cleaning module is used for calculating the similarity between the first reference characteristic vector and a second reference characteristic vector of a second class image set to obtain a deviation value between the first reference characteristic vector and the second reference characteristic vector; and if the deviation value of the first reference characteristic vector and the second reference characteristic vector is smaller than a third threshold value, merging the first image set and the second image set into the same image set.
Further, the apparatus further comprises:
the second inter-class cleaning module is used for calculating a first vector mean value of the feature vectors of the first class of image sets, an obtained second vector mean value of the feature vectors of the second class of image sets, and a deviation value between the first vector mean value and the second vector mean value; and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
Further, the similarity calculation module is specifically configured to calculate a euclidean distance or an included angle cosine value of every two feature vectors, and obtain the euclidean distance or the included angle cosine value between every two feature vectors, where the euclidean distance or the included angle cosine value is the deviation value.
Further, the apparatus further comprises:
the image processing module is used for acquiring a plurality of images belonging to a first type of image set; selecting the images with human faces in the multiple images as multiple original images; and carrying out standardization processing on the human face areas in the plurality of original images to obtain the plurality of first-class images.
Further, the image processing module is further configured to, if the original image includes a plurality of face images, extract a face with a largest area in the original image, and perform normalization processing on the face with the largest area.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.
In a fourth aspect, the present invention provides a readable storage medium, on which a computer program is stored, where the computer program runs the steps in the method provided in the first aspect when being executed by a processor.
The embodiment of the invention provides an image processing method, an image processing device, electronic equipment and a readable storage medium, wherein the method comprises the steps of firstly obtaining a first-class image set, extracting a feature vector of each first-class image, and then carrying out pairwise similarity calculation on all feature vectors extracted from the first-class image set to obtain a deviation value between every two feature vectors; and cleaning the first class image set according to the deviation value between the feature vectors. According to the scheme, the characteristic vector of each first-class image is obtained, pairwise similarity calculation is carried out on the pairwise images based on the characteristic vectors, the deviation value between every two characteristic vectors is obtained, the first-class image set is cleaned according to the deviation value between the characteristic vectors, images which do not meet the similarity requirement in the first-class image set can be removed, interference of the image data is avoided, and a good image cleaning effect is achieved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating sub-steps of step S130 in an image processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a step of screening images between two image sets according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating another step of filtering images between two image sets according to an embodiment of the present invention;
fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention, where the method includes the following steps:
step S110: a first category image set is acquired, the first category image set including a plurality of first category images.
Step S120: and extracting a feature vector of each image of the first type.
In the process of image recognition, in order to accurately recognize images, some clear and effective images need to be obtained first, and then the images are input into a neural network model to be trained, so that the recognition result is more accurate when the images are recognized through the neural network model.
In a specific embodiment, in order to obtain sample data for training the neural network model, a plurality of images belonging to the first type of image set need to be obtained first, where the plurality of images may be face images, body contour images, color feature images, or the like.
The feature vector extracted from the image may be a feature vector obtained by extracting features from the face data of each first type of image, and the feature vector represents features of a face in the first type of image.
Of course, the feature vector may also be a vector obtained by extracting other features in the image, such as a color feature vector, a texture feature vector, a body contour feature vector, and the like.
The first-class image set may be obtained by specifically classifying images after a large number of images are acquired, for example, images of a user are classified into one-class image sets, that is, images of a class represent images of a user, so that a plurality of classes of image sets can be obtained, and therefore, a plurality of images in the first-class image set are images of a user. However, when selecting an image of a certain user, images of other users may be mixed in an image set of a certain class, or images of a certain user are in two image sets of classes, or different image classes are divided into the same class, or other unsatisfactory images, such as unclear images, are mixed in, so that the images need to be screened.
As an optional implementation manner, since image recognition is usually performed by face recognition, in order to accurately recognize images in the following, a plurality of images belonging to a first-class image set are obtained, then images with faces in the plurality of images are selected as a plurality of original images, and then face regions in the plurality of original images are normalized to obtain the plurality of first-class images.
Firstly, a face image can be screened from a first-class image set, namely, images which are not faces, such as landscape images and the like, are removed from a plurality of images, but because the proportion of possible faces in the selected original image with the faces is different, for example, the area occupied by the faces in the first original image is very large, and the area occupied by the faces in the second original image is very small, in order to avoid that the training result is not ideal due to the fact that the samples are not consistent during training, the sizes of the face areas in the plurality of original images are standardized, or if one original image comprises a plurality of face images, the face with the largest area in the original images is extracted, the face with the largest area is standardized, namely, the images are normalized, namely, a plurality of first-class images with the same form are obtained, the standardization processing process can be as follows:
firstly, determining the size of a human face in a required image, such as 100 x 100, then determining a facial feature template according to the size of the human face and the position relationship of facial features, and then transforming the detected facial features to the determined facial feature template through a transformation matrix.
Therefore, the size of the face region in each of the original images subjected to the normalization process is made uniform, even if the size of the face region in each of the plurality of first-type images is made uniform.
It should be noted that, in order to perform image recognition, an image with other recognition characteristics, for example, an image with a color or a body contour, may be screened from a plurality of images, and the embodiment of the present invention is described as an image with a human face, but an image with other characteristics may also be subjected to image sharpening by the scheme provided by the embodiment of the present invention, and therefore, it is also within the scope of the present invention to screen an image with other characteristics than a human face and then perform subsequent processing.
Optionally, if the obtained multiple first-type images are images with human faces, the method for extracting the feature vector of each first-type image may be: each first-class image is input into a face recognition model, the face recognition model can be various models in the prior art, such as a sphere model, an arcfacce model and the like, in the face recognition model, feature extraction can be performed on face data of each first-class image to obtain feature vectors, and the feature vectors represent features of a face in the first-class image.
Step S130: and carrying out pairwise similarity calculation on all the feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors.
In this embodiment, after the feature vectors of each first-type image in all the first-type images are obtained, the feature vectors are subjected to pairwise similarity calculation to obtain a deviation value between each two feature vectors, and the specific manner may be as follows: and calculating the Euclidean distance or the cosine value of the included angle of every two characteristic vectors to obtain the Euclidean distance or the cosine value of the included angle between every two characteristic vectors, wherein the Euclidean distance or the cosine value of the included angle is the deviation value.
The similarity calculation process by calculating the Euclidean distance between every two feature vectors comprises the following steps: for example, for two first-type images, where a is (1,2,3,4,5) and B is (2,3,5,6,7), the euclidean distance between feature vector a and feature vector B is
Figure BDA0001810260710000101
For other eigenvectors, the euclidean distance between two eigenvectors can be calculated by the above method, then the euclidean distance between every two eigenvectors is calculated, and then the subsequent similarity comparison is performed by the euclidean distance between the two eigenvectors, which can be used as the offset value of the two eigenvectors, for example, the offset value between the eigenvector a and the eigenvector B is 3.74.
The process of calculating the similarity by calculating the cosine value of the included angle of every two eigenvectors comprises the following steps: for example, for two first-type images, where a is (1,2,3,4,5) and B is (2,3,5,6,7), the cosine value of the included angle between the feature vector a and the feature vector B is
Figure BDA0001810260710000102
Therefore, the cosine value of the included angle between every two feature vectors can be obtained through the calculation method, and then the subsequent similarity comparison is performed through the cosine values of the included angles of the two feature vectors, and the cosine value of the included angle can be used as the deviation value between the two feature vectors, for example, the deviation value between the feature vector a and the feature vector B is 0.997.
If the Euclidean distance or the cosine value of the included angle between the two characteristic vectors is smaller, the similarity between the two first-class images corresponding to the two characteristic vectors is represented to be higher.
Step S140: and cleaning the first class image set according to the deviation value between the feature vectors.
The cosine value of the above euclidean distance or included angle may be used as the deviation value between the two eigenvectors. If the smaller the Euclidean distance or the cosine value of the included angle between the two feature vectors is, the more similar the two first-class images corresponding to the two feature vectors are, so that the first-class image set can be cleaned according to the deviation value between every two feature vectors, for example, the images with low similarity are removed.
The method for judging whether the two images are similar may be as follows: if the Euclidean distance between two eigenvectors corresponding to the two images is smaller than or equal to a first preset threshold value, or the cosine value of the included angle between the two eigenvectors is smaller than or equal to a second preset threshold value, the two images are similar images, and if the Euclidean distance between the two eigenvectors corresponding to the two images is larger than the first preset threshold value, or the cosine value of the included angle between the two eigenvectors is larger than the second preset threshold value, the two images are dissimilar images.
In addition, as an optional implementation manner, referring to fig. 2, the manner of cleaning the first type image set according to the deviation values between the feature vectors may further be: step S131: for each feature vector, counting the number of feature vectors whose deviation value is less than or equal to the first threshold, and step S132: determining the feature vector with the largest number as a first reference feature vector of the first-class image set, and step S133: and cleaning the first class image set according to the first reference feature vector.
Wherein, the characteristic vectors are counted according to the calculated Euclidean distance or cosine value of the included angle of every two characteristic vectors, for example, if the similarity calculation is performed by using the euclidean distance, for each feature vector, the number of feature vectors having the euclidean distance from the feature vector less than or equal to the first threshold is counted, and if there are four feature vectors A, B, C, D at present, respectively calculating Euclidean distances between the feature vector A and the feature vector B, between the feature vector A and the feature vector C, and between the feature vector A and the feature vector D for the feature vector A, if the euclidean distance between the feature vector a and the feature vector B is 2, the euclidean distance between the feature vector a and the feature vector C is 3, the euclidean distance between the feature vector a and the feature vector D is 4, and if the first threshold value is 5, the number of feature vectors having a euclidean distance with the feature vector a of less than 5 is 3. The euclidean distances between the other feature vectors and the feature vector B are also obtained as described above for the feature vector B, then the number of feature vectors having a euclidean distance from the feature vector B equal to or less than the first threshold value is counted, for example, the number is 2, then statistics is also performed as described above for the feature vectors C and D, and if the number of feature vectors having a euclidean distance from the feature vector C equal to or less than the first threshold value is counted as 1 and the number of feature vectors having a euclidean distance from the feature vector D equal to or less than the first threshold value is counted as 0, then the feature vector a having the largest number is determined as the first reference feature vector.
Of course, the similarity calculation is performed by using the cosine values of the included angles, and the manner of obtaining the first reference feature vector is consistent with that described above, and for simplicity of description, the description is not repeated here.
For example, when the similarity calculation is performed by using the euclidean distance, the first threshold may be set to 5, when the similarity calculation is performed by using the cosine value of the included angle, the first threshold may be 0.5, and a specific value of the first threshold may be set according to a requirement.
As an embodiment, the manner of cleaning the first class of image set according to the first reference feature vector may be: and determining the feature vectors of all the images of the first type of image set, wherein the deviation value of the feature vectors of all the images of the first type of image set and the first reference feature vector is greater than the feature vector of the first threshold, and removing the images corresponding to the determined feature vectors from the first type of image set.
It can be understood that, for example, for the first reference feature vector a obtained as described above, if the euclidean distance between the feature vector E and the first reference feature vector a is greater than a first threshold, for example, if the euclidean distance between the feature vector E and the first reference feature vector a is 6 and the first threshold is 5, it indicates that the similarity between the image corresponding to the feature vector E and the image corresponding to the first reference feature vector a is not high, and the image corresponding to the feature vector E may be removed from the first class of image set.
Of course, the feature vector having a deviation value from the first reference feature vector greater than the first threshold value can also be understood as: and calculating the cosine value of an included angle between a certain characteristic vector and the first reference characteristic vector, and if the cosine value of the included angle between the certain characteristic vector and the first reference characteristic vector is greater than a first threshold value, indicating that the deviation value of the certain characteristic vector and the first reference characteristic vector is greater than the first threshold value.
It can be understood that, since the obtained first reference feature vector is the feature vector with the largest number, and the number of the images corresponding to the first reference feature vector is the largest, the image is the image with the largest similarity to other images in the first-class image set, and the image with the largest similarity to the image can be retained, and the image with the smaller similarity to the image can be removed.
As an embodiment, the method for cleaning the first type of image set according to the first reference feature vector may further include: determining feature vectors which are not directly similar to the first reference vector or indirectly similar to the first reference feature vector in all the image features of the first-class image set, and removing the determined images corresponding to the feature vectors from the first-class image set.
Wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
It is understood that, for example, the first reference feature vector is a, and the feature vector similar to the feature vector a is B, D, i.e. the deviation value of the feature vector B, D from the feature vector a is smaller than the first threshold; the feature vector similar to the feature vector B is C; the feature vector similar to the feature vector C is B; the feature vector similar to the feature vector D is B; the feature vector similar to the feature vector E is F; then the feature vectors that are directly similar to feature vector a are B and D, because feature vector B is similar to feature vector C, feature vector a is indirectly similar to feature vector D, and feature vector E is neither directly similar to nor indirectly similar to feature vector a. Therefore, two images corresponding to the feature vector E and the feature vector F can be removed from the first-class image set.
In addition, in order to screen images with repetition, if the deviation value of the two feature vectors is smaller than or equal to a second threshold value in the process of carrying out similarity calculation on the two feature vectors, the image corresponding to one of the two feature vectors is removed from the first class image set.
For example, if the euclidean distance between the feature vector a and the feature vector B is 1, that is, the deviation value between the feature vector a and the feature vector B is 1, and if the second threshold is 2, it indicates that the deviation value between the feature vector a and the feature vector B is less than or equal to the second threshold, that is, the image corresponding to the feature vector a and the image corresponding to the feature vector B are likely to be duplicate images, the image corresponding to the feature vector a may be removed from the first type of image set, or the image corresponding to the feature vector B may be removed from the first type of image set.
It should be noted that, if two images are determined to be duplicate images, the similarity of the two images is certainly high, so the value of the second threshold may be set to be smaller than the first threshold, for example, if the two images are duplicate images, the euclidean distance between the feature vectors corresponding to the two images is small, for example, the euclidean distance is 0.2, and if the euclidean distance is less than or equal to the second threshold 0.3, it indicates that the deviation value of the two feature vectors is smaller than the second threshold; if the two photos are only similar, the euclidean distance between the feature vectors corresponding to the two photos is relatively large, for example, the euclidean distance is 3, and if the euclidean distance is less than or equal to the first threshold value of 5, it indicates that the deviation value of the two feature vectors is less than or equal to the first threshold value of 5, in which case the second threshold value is less than the first threshold value.
Of course, whether the deviation value between the two feature vectors is less than or equal to the second threshold value may also be determined by calculating the cosine value of the included angle between the two feature vectors, and for simplicity of description, redundant description is not repeated here.
As another embodiment, if an image of a certain user is classified into different image sets, as if the image of a user is classified into two image sets, in order to filter the image of the user, it may also be determined whether the images in the two image sets are similar images, that is, whether the images are images of the same user, and for the second image set, the second reference feature vector of the second image set may be calculated according to the above method, which is not described herein again.
Referring to fig. 3, the method for filtering the images between the two image sets may include the following steps: step S150 a: performing similarity calculation on the first reference feature vector and a second reference feature vector of the second-class image set to obtain a deviation value between the first reference feature vector and the second reference feature vector, and step S160 a: and if the deviation value of the first reference feature vector and the second reference feature vector is smaller than a third threshold value, combining the first image set and the second image set into the same image set.
For example, if the first reference feature vector is a1 and the second reference feature vector is a2, the euclidean distance or the cosine of the included angle between a1 and a2 may be calculated to perform similarity calculation, and if the euclidean distance between a1 and a2 is smaller than a third threshold, it indicates that the images in the two image sets have great similarity, and the images belonging to the same user are merged.
In addition, as an embodiment, referring to fig. 4, the method for filtering the images between the two classes of image sets may further include the following steps: step S150 b: calculating a first vector mean value of the feature vectors of the first type image set, an acquired second vector mean value of the feature vectors of the second type image set, and a deviation value between the first vector mean value and the second vector mean value; step S160 b: and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
For example, if the feature vector a in the first type image set is (1,2,3), the feature vector B is (3,5,6), and the feature vector C is (7,8,9), the first vector mean value of the three feature vectors is (1+3+7, 2+5+8, 3+6+9)/3 is (3.7, 5,6) calculated by the formula (a + B + C)/3, and the feature vector D in the second type image is (2,4,8) and the feature vector E is (5,9,11), the second vector mean value of the two feature vectors is (3.5,6.5,9.5) calculated by the formula, if the fourth threshold value is 4, and if the deviation value (e.g., the euclidean distance) between the first vector mean value and the second vector mean value is 3.8, the deviation value between the first vector mean value and the second vector mean value is smaller than the fourth threshold value, the first type image set is similar to the second type image set, the two sets of class images are merged.
As an alternative embodiment, for image cleaning, the two image sets may be cleaned in-set first, and then the image sets are cleaned between classes, that is, whether the two image sets are similar or not is judged, and when the similarity is larger, the two image sets are merged. After merging, the merged image set may be cleaned in the set again, including cleaning methods such as removing duplicate images in the set and screening out dissimilar images.
As another alternative, for image cleaning, inter-class cleaning may be performed on two image sets, and then intra-class cleaning is performed, that is, when the similarity of the two image sets is determined to be greater, the two image sets are merged, and then intra-class cleaning is performed on the merged image set, for example, two image sets, such as a first class image set and a second class image set, are obtained first, a first vector mean of a feature vector of the first class image set and a second vector mean of a feature vector of the second class image set are calculated, then a deviation value between the first vector mean and the second vector mean is calculated, and if the deviation value between the first vector mean and the second vector mean is smaller than a preset threshold, the first class image set and the second class image set are merged into the same image set, and then intra-class cleaning may be performed on the same image set, such cleaning methods include the above-described cleaning methods of removing duplicate images in a set and screening out dissimilar images. In the scheme, the inter-class cleaning can be carried out firstly, and then the intra-class cleaning is carried out, so that the images which do not meet the requirement of the similarity can be removed, the interference of the image data is avoided, and the workload of cleaning can be reduced.
Referring to fig. 5, fig. 5 is a block diagram of an image processing apparatus 200 according to an embodiment of the present invention, the apparatus including:
an image obtaining module 210, configured to obtain a first category image set, where the first category image set includes a plurality of first category images;
a feature vector extraction module 220, configured to extract a feature vector of each of the first type images;
the similarity calculation module 230 is configured to perform pairwise similarity calculation on all the feature vectors extracted from the first type image set, so as to obtain a deviation value between every two feature vectors;
and a cleaning module 240, configured to clean the first class image set according to the deviation value between the feature vectors.
Further, the cleaning module 240 is specifically configured to count, for each feature vector, the number of feature vectors whose deviation value is less than or equal to a first threshold; determining the feature vector with the largest number as a first reference feature vector of the first-class image set; and cleaning the first class image set according to the first reference feature vector.
Further, the cleaning module 240 is further configured to determine a feature vector, of the feature vectors of all the images of the first type of image set, of which a deviation value from the first reference feature vector is greater than the first threshold, and remove an image corresponding to the determined feature vector from the first type of image set.
Further, the cleaning module 240 is further configured to determine, from feature vectors of all images in the first-class image set, a feature vector that is neither directly similar to the first reference feature vector nor indirectly similar to the first reference feature vector, and eliminate an image corresponding to the determined feature vector from the first-class image set;
wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
Further, the cleaning module 240 is further configured to remove an image corresponding to one of the two feature vectors from the first class of image set if the deviation value of the two feature vectors is smaller than or equal to a second threshold, where the second threshold is greater than the first threshold.
Further, the apparatus further comprises:
the first inter-class cleaning module is used for calculating the similarity between the first reference characteristic vector and a second reference characteristic vector of a second class image set to obtain a deviation value between the first reference characteristic vector and the second reference characteristic vector; and if the deviation value of the first reference characteristic vector and the second reference characteristic vector is smaller than a third threshold value, merging the first image set and the second image set into the same image set.
Further, the apparatus further comprises:
the second inter-class cleaning module is used for calculating a first vector mean value of the feature vectors of the first class of image sets, an obtained second vector mean value of the feature vectors of the second class of image sets, and a deviation value between the first vector mean value and the second vector mean value; and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
Further, the similarity calculation module 230 is specifically configured to calculate an euclidean distance or an included angle cosine value of every two feature vectors, and obtain the euclidean distance or the included angle cosine value between every two feature vectors, where the euclidean distance or the included angle cosine value is the deviation value.
Further, the apparatus further comprises:
the image processing module is used for acquiring a plurality of images belonging to a first type of image set; selecting the images with human faces in the multiple images as multiple original images; and carrying out standardization processing on the human face areas in the plurality of original images to obtain the plurality of first-class images.
Further, the image processing module is further configured to, if the original image includes a plurality of face images, extract a face with a largest area in the original image, and perform normalization processing on the face with the largest area.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device may include: at least one processor 110, such as a CPU, at least one communication interface 120, at least one memory 130, and at least one communication bus 140. Wherein the communication bus 140 is used for realizing direct connection communication of these components. The communication interface 120 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 130 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). Memory 130 may optionally be at least one memory device located remotely from the aforementioned processor. The memory 130 stores computer readable instructions, which when executed by the processor 110, cause the electronic device to perform the method processes described above with reference to fig. 1.
The embodiment of the present application provides a readable storage medium, and when being executed by a processor, the computer program performs the method processes performed by the electronic device in the method embodiment shown in fig. 1.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.
In summary, the embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic device, and a readable storage medium, where the method first obtains a first type image set, where the first type image set includes a plurality of first type images, then extracts a feature vector of each first type image, and then performs pairwise similarity calculation on all feature vectors extracted from the first type image set to obtain a deviation value between every two feature vectors; and cleaning the first class image set according to the deviation value between the feature vectors. According to the scheme, the characteristic vector of each first-class image is obtained, pairwise similarity calculation is carried out on the pairwise images based on the characteristic vectors, the deviation value between every two characteristic vectors is obtained, the first-class image set is cleaned according to the deviation value between the characteristic vectors, images which do not meet the similarity requirement in the first-class image set can be removed, interference of the image data is avoided, and a good image cleaning effect is achieved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (22)

1. An image processing method, characterized in that the method comprises:
acquiring a first image set, wherein the first image set comprises a plurality of first images;
extracting a feature vector of each image of the first type;
calculating the similarity of every two feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors; and are
And cleaning the first class image set according to the deviation value between the feature vectors.
2. The method of claim 1, wherein cleaning the first class of image set according to the variance values between the feature vectors comprises:
counting the number of the characteristic vectors with deviation values smaller than or equal to a first threshold value aiming at each characteristic vector;
determining the feature vector with the largest number as a first reference feature vector of the first-class image set;
and cleaning the first class image set according to the first reference feature vector.
3. The method of claim 2, wherein cleansing the first class of image sets from the first fiducial feature vector comprises:
and determining the feature vectors of all the images of the first type of image set, wherein the deviation value of the feature vectors of all the images of the first type of image set and the first reference feature vector is greater than the feature vector of the first threshold, and removing the images corresponding to the determined feature vectors from the first type of image set.
4. The method of claim 2, wherein cleansing the first class of image sets from the first fiducial feature vector comprises:
determining feature vectors which are not directly similar to the first reference feature vector or indirectly similar to the first reference feature vector in all the feature vectors of the images of the first type of image set, and removing the determined images corresponding to the feature vectors from the first type of image set;
wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
5. The method of any one of claims 2 to 4, wherein the cleaning of the first class of image sets based on the deviation values between the feature vectors further comprises:
if the deviation value of the two characteristic vectors is smaller than or equal to a second threshold value, removing the image corresponding to one of the two characteristic vectors from the first class of image set, wherein the second threshold value is smaller than the first threshold value.
6. The method according to any one of claims 2-4, wherein after cleaning the first class image set according to the deviation values between the feature vectors, the method further comprises:
similarity calculation is carried out on the first reference characteristic vector and a second reference characteristic vector of a second class of image set, and a deviation value between the first reference characteristic vector and the second reference characteristic vector is obtained;
and if the deviation value of the first reference characteristic vector and the second reference characteristic vector is smaller than a third threshold value, merging the first image set and the second image set into the same image set.
7. The method according to any one of claims 1-4, wherein after cleaning the first class image set according to the deviation values between the feature vectors, the method further comprises:
calculating a first vector mean value of the feature vectors of the first type image set, an acquired second vector mean value of the feature vectors of the second type image set, and a deviation value between the first vector mean value and the second vector mean value;
and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
8. The method according to any one of claims 1 to 4, wherein performing pairwise similarity calculation on all the feature vectors extracted from the first type image set to obtain a deviation value between every two feature vectors comprises:
and calculating the Euclidean distance or the cosine value of the included angle of every two characteristic vectors to obtain the Euclidean distance or the cosine value of the included angle between every two characteristic vectors, wherein the deviation value is the Euclidean distance or the cosine value of the included angle.
9. The method of any of claims 1-4, wherein a first type of image set is acquired, the first type of image set comprising a plurality of first type images, the method further comprising:
acquiring a plurality of images belonging to a first type of image set;
selecting the images with human faces in the multiple images as multiple original images;
and carrying out standardization processing on the human face areas in the plurality of original images to obtain the plurality of first-class images.
10. The method according to claim 9, wherein the normalizing the face regions in the plurality of original images comprises:
if the original image comprises a plurality of face images, extracting the face with the largest area in the original image, and carrying out standardization processing on the face with the largest area.
11. An image processing apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring a first-class image set, and the first-class image set comprises a plurality of first-class images;
the characteristic vector extraction module is used for extracting a characteristic vector of each first type of image;
the similarity calculation module is used for calculating the similarity of every two feature vectors extracted from the first type of image set to obtain a deviation value between every two feature vectors;
and the cleaning module is used for cleaning the first type of image set according to the deviation value between the feature vectors.
12. The apparatus of claim 11, wherein the cleaning module is specifically configured to count, for each eigenvector, the number of eigenvectors having deviation values smaller than or equal to a first threshold; determining the feature vector with the largest number as a first reference feature vector of the first-class image set; and cleaning the first class image set according to the first reference feature vector.
13. The apparatus of claim 12, wherein the cleaning module is further configured to determine a feature vector having a deviation value greater than the first threshold value from the feature vectors of all the images in the first type of image set, and remove an image corresponding to the determined feature vector from the first type of image set.
14. The apparatus according to claim 12, wherein the cleaning module is further configured to determine, from the feature vectors of all the images in the first category of image sets, a feature vector that is neither directly similar to the first reference feature vector nor indirectly similar to the first reference feature vector, and remove an image corresponding to the determined feature vector from the first category of image sets;
wherein directly similar to the first reference feature vector means: the deviation value of the first reference characteristic vector is less than or equal to the first threshold value; the indirect similarity with the first reference feature vector means: the deviation value of the feature vector directly or indirectly similar to the first reference feature vector is less than or equal to the first threshold.
15. The apparatus according to any one of claims 12-14, wherein the cleaning module is further configured to remove an image corresponding to one of the two feature vectors from the first class of image set if there is a deviation value of the two feature vectors that is smaller than or equal to a second threshold value, where the second threshold value is smaller than the first threshold value.
16. The apparatus according to any one of claims 12-14, further comprising:
the first inter-class cleaning module is used for calculating the similarity between the first reference characteristic vector and a second reference characteristic vector of a second class image set to obtain a deviation value between the first reference characteristic vector and the second reference characteristic vector; and if the deviation value of the first reference characteristic vector and the second reference characteristic vector is smaller than a third threshold value, merging the first image set and the second image set into the same image set.
17. The apparatus according to any one of claims 11-14, further comprising:
the second inter-class cleaning module is used for calculating a first vector mean value of the feature vectors of the first class of image sets, an obtained second vector mean value of the feature vectors of the second class of image sets, and a deviation value between the first vector mean value and the second vector mean value; and if the deviation value of the first vector mean value and the second vector mean value is smaller than a fourth threshold value, merging the first image set and the second image set into the same image set.
18. The apparatus according to any one of claims 11 to 14, wherein the similarity calculation module is specifically configured to calculate a euclidean distance or an angle cosine value between every two eigenvectors, to obtain the euclidean distance or the angle cosine value between every two eigenvectors, and the deviation value is the euclidean distance or the angle cosine value.
19. The apparatus according to any one of claims 11-14, further comprising:
the image processing module is used for acquiring a plurality of images belonging to a first type of image set; selecting the images with human faces in the multiple images as multiple original images; and carrying out standardization processing on the human face areas in the plurality of original images to obtain the plurality of first-class images.
20. The apparatus of claim 19, wherein the image processing module is further configured to, if the original image includes a plurality of face images, extract a face with a largest area in the original image, and perform normalization processing on the face with the largest area.
21. An electronic device comprising a processor and a memory, said memory storing computer readable instructions which, when executed by said processor, perform the steps of the method of any of claims 1-10.
22. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.
CN201811120898.0A 2018-09-25 2018-09-25 Image processing method, device, electronic equipment and readable storage medium Active CN110942081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811120898.0A CN110942081B (en) 2018-09-25 2018-09-25 Image processing method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811120898.0A CN110942081B (en) 2018-09-25 2018-09-25 Image processing method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN110942081A true CN110942081A (en) 2020-03-31
CN110942081B CN110942081B (en) 2023-08-18

Family

ID=69904488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811120898.0A Active CN110942081B (en) 2018-09-25 2018-09-25 Image processing method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110942081B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183648A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Automatic screening method and device for fine classification training data set

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408929A (en) * 2007-10-10 2009-04-15 三星电子株式会社 Multiple-formwork human face registering method and apparatus for human face recognition system
US20120158807A1 (en) * 2010-12-21 2012-06-21 Jeffrey Woody Matching data based on numeric difference
CN103345645A (en) * 2013-06-27 2013-10-09 复旦大学 Commodity image category forecasting method based on online shopping platform
CN103546312A (en) * 2013-08-27 2014-01-29 中国航天科工集团第二研究院七〇六所 Massive multi-source isomerism log correlation analyzing method
CN103810663A (en) * 2013-11-18 2014-05-21 北京航天金盾科技有限公司 Demographic data cleaning method based on face recognition
CN104537252A (en) * 2015-01-05 2015-04-22 深圳市腾讯计算机系统有限公司 User state single-classification model training method and device
CN105426485A (en) * 2015-11-20 2016-03-23 小米科技有限责任公司 Image combination method and device, intelligent terminal and server
CN105488527A (en) * 2015-11-27 2016-04-13 小米科技有限责任公司 Image classification method and apparatus
US20160188999A1 (en) * 2014-12-30 2016-06-30 Xiaomi Inc. Method and device for displaying images
US20160232195A1 (en) * 2015-02-05 2016-08-11 Quantum Corporation Mobile Device Agent For Personal Deduplication
CN106649610A (en) * 2016-11-29 2017-05-10 北京智能管家科技有限公司 Image labeling method and apparatus
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN106776662A (en) * 2015-11-25 2017-05-31 腾讯科技(深圳)有限公司 A kind of taxonomic revision method and apparatus of photo
WO2017162083A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Data cleaning method and apparatus
CN107368812A (en) * 2017-07-21 2017-11-21 成都恒高科技有限公司 Facial recognition data cleaning method based on maximal connected subgraphs
CN107430776A (en) * 2015-04-28 2017-12-01 欧姆龙株式会社 Template construct device and template construct method
CN107463705A (en) * 2017-08-17 2017-12-12 陕西优百信息技术有限公司 A kind of data cleaning method
CN107480685A (en) * 2016-06-08 2017-12-15 国家计算机网络与信息安全管理中心 A kind of distributed power iteration clustering method and device based on GraphX
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN107871107A (en) * 2016-09-26 2018-04-03 北京眼神科技有限公司 Face authentication method and device
CN107944020A (en) * 2017-12-11 2018-04-20 深圳云天励飞技术有限公司 Facial image lookup method and device, computer installation and storage medium
CN108229419A (en) * 2018-01-22 2018-06-29 百度在线网络技术(北京)有限公司 For clustering the method and apparatus of image
CN108319938A (en) * 2017-12-31 2018-07-24 奥瞳系统科技有限公司 High quality training data preparation system for high-performance face identification system
CN108536753A (en) * 2018-03-13 2018-09-14 腾讯科技(深圳)有限公司 The determination method and relevant apparatus of duplicate message

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408929A (en) * 2007-10-10 2009-04-15 三星电子株式会社 Multiple-formwork human face registering method and apparatus for human face recognition system
US20120158807A1 (en) * 2010-12-21 2012-06-21 Jeffrey Woody Matching data based on numeric difference
CN103345645A (en) * 2013-06-27 2013-10-09 复旦大学 Commodity image category forecasting method based on online shopping platform
CN103546312A (en) * 2013-08-27 2014-01-29 中国航天科工集团第二研究院七〇六所 Massive multi-source isomerism log correlation analyzing method
CN103810663A (en) * 2013-11-18 2014-05-21 北京航天金盾科技有限公司 Demographic data cleaning method based on face recognition
US20160188999A1 (en) * 2014-12-30 2016-06-30 Xiaomi Inc. Method and device for displaying images
CN104537252A (en) * 2015-01-05 2015-04-22 深圳市腾讯计算机系统有限公司 User state single-classification model training method and device
US20160232195A1 (en) * 2015-02-05 2016-08-11 Quantum Corporation Mobile Device Agent For Personal Deduplication
CN107430776A (en) * 2015-04-28 2017-12-01 欧姆龙株式会社 Template construct device and template construct method
CN105426485A (en) * 2015-11-20 2016-03-23 小米科技有限责任公司 Image combination method and device, intelligent terminal and server
CN106776662A (en) * 2015-11-25 2017-05-31 腾讯科技(深圳)有限公司 A kind of taxonomic revision method and apparatus of photo
CN105488527A (en) * 2015-11-27 2016-04-13 小米科技有限责任公司 Image classification method and apparatus
US20170154208A1 (en) * 2015-11-27 2017-06-01 Xiaomi Inc. Image classification method and device
WO2017162083A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Data cleaning method and apparatus
CN107480685A (en) * 2016-06-08 2017-12-15 国家计算机网络与信息安全管理中心 A kind of distributed power iteration clustering method and device based on GraphX
CN107871107A (en) * 2016-09-26 2018-04-03 北京眼神科技有限公司 Face authentication method and device
CN106649610A (en) * 2016-11-29 2017-05-10 北京智能管家科技有限公司 Image labeling method and apparatus
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN107368812A (en) * 2017-07-21 2017-11-21 成都恒高科技有限公司 Facial recognition data cleaning method based on maximal connected subgraphs
CN107480203A (en) * 2017-07-23 2017-12-15 北京中科火眼科技有限公司 It is a kind of to be directed to identical and similar pictures duplicate removal view data cleaning method
CN107463705A (en) * 2017-08-17 2017-12-12 陕西优百信息技术有限公司 A kind of data cleaning method
CN107944020A (en) * 2017-12-11 2018-04-20 深圳云天励飞技术有限公司 Facial image lookup method and device, computer installation and storage medium
CN108319938A (en) * 2017-12-31 2018-07-24 奥瞳系统科技有限公司 High quality training data preparation system for high-performance face identification system
CN108229419A (en) * 2018-01-22 2018-06-29 百度在线网络技术(北京)有限公司 For clustering the method and apparatus of image
CN108536753A (en) * 2018-03-13 2018-09-14 腾讯科技(深圳)有限公司 The determination method and relevant apparatus of duplicate message

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
FEI ZHANG 等: "LayerMover: Fast virtual machine migration over WAN with three-layer image structure", FUTURE GENERATION COMPUTER SYSTEMS, vol. 83, pages 37 - 49 *
MARIOS HADJIELEFTHERIOU 等: "Fast Indexes and Algorithms for Set Similarity Selection Queries", 《2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING》 *
MARIOS HADJIELEFTHERIOU 等: "Fast Indexes and Algorithms for Set Similarity Selection Queries", 《2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING》, 25 April 2008 (2008-04-25), pages 267 - 276, XP031245984 *
MING CHEN 等: "A duplicate image deduplication approach via Haar wavelet technology", 2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, pages 624 - 628 *
R. KRISHNAMOORTHY 等: "A new approach for data cleaning process", INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE-2014), pages 1 - 5 *
SHENGYONG DING 等: "Automatically Building Face Datasets of New Domains from Weakly Labeled Data with Pretrained Models", 《ARXIV》 *
SHENGYONG DING 等: "Automatically Building Face Datasets of New Domains from Weakly Labeled Data with Pretrained Models", 《ARXIV》, 24 November 2016 (2016-11-24), pages 1 - 7 *
夏洋洋 等: "人脸识别背后的数据清理问题研究", 《智能系统学报》 *
夏洋洋 等: "人脸识别背后的数据清理问题研究", 《智能系统学报》, vol. 12, no. 5, 21 October 2017 (2017-10-21), pages 2 *
李丹平 等: "一种支持所有权认证的客户端图像模糊去重方法", 计算机学报, vol. 41, no. 6, pages 1047 - 1063 *
李阳 等: "知识图谱中实体相似度计算研究", 《中文信息学报》 *
李阳 等: "知识图谱中实体相似度计算研究", 《中文信息学报》, no. 01, 31 January 2017 (2017-01-31), pages 140 - 146 *
程海鹰 等: "基于HOG-LBP特征和SVM分类器的视频摘要方法", 四川理工学院学报(自然科学版), vol. 31, no. 4, pages 43 - 48 *
赵星: "基于聚类的数据清洗研究", 中国优秀硕士学位论文全文数据库 信息科技辑, vol. 2018, no. 2, pages 138 - 1213 *
鲁晓波 等: "旋转移动图像颜色相似度特征提取方法仿真", 《计算机仿真》 *
鲁晓波 等: "旋转移动图像颜色相似度特征提取方法仿真", 《计算机仿真》, no. 02, 28 February 2017 (2017-02-28), pages 1304 - 308 *
黄芳: "近重复图像去冗技术研究", 中国优秀硕士学位论文全文数据库 信息科技辑, vol. 2018, no. 3, pages 138 - 1920 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183648A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Automatic screening method and device for fine classification training data set

Also Published As

Publication number Publication date
CN110942081B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN108171104B (en) Character detection method and device
CN107067006B (en) Verification code identification method and system serving for data acquisition
CN110472082B (en) Data processing method, data processing device, storage medium and electronic equipment
CN104751108A (en) Face image recognition device and face image recognition method
EP1964028A1 (en) Method for automatic detection and classification of objects and patterns in low resolution environments
CN109426831B (en) Image similarity matching and model training method and device and computer equipment
CN111046886A (en) Automatic identification method, device and equipment for number plate and computer readable storage medium
CN108986125B (en) Object edge extraction method and device and electronic equipment
CN111445459A (en) Image defect detection method and system based on depth twin network
CN107844737B (en) Iris image detection method and device
CN108154132A (en) A kind of identity card text extraction method, system and equipment and storage medium
CN113723157B (en) Crop disease identification method and device, electronic equipment and storage medium
CN110458792A (en) Method and device for evaluating quality of face image
CN113723309A (en) Identity recognition method, identity recognition device, equipment and storage medium
CN108921006B (en) Method for establishing handwritten signature image authenticity identification model and authenticity identification method
US20170309040A1 (en) Method and device for positioning human eyes
CN110942081B (en) Image processing method, device, electronic equipment and readable storage medium
CN112906696A (en) English image region identification method and device
CN112488137A (en) Sample acquisition method and device, electronic equipment and machine-readable storage medium
CN107798282B (en) Method and device for detecting human face of living body
CN111931229B (en) Data identification method, device and storage medium
CN111382703B (en) Finger vein recognition method based on secondary screening and score fusion
CN110276260B (en) Commodity detection method based on depth camera
Gopalan et al. Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme
CN111126245A (en) Digital image dot matrix positioning method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant