CN111401158A - Difficult sample discovery method and device and computer equipment - Google Patents

Difficult sample discovery method and device and computer equipment Download PDF

Info

Publication number
CN111401158A
CN111401158A CN202010138382.XA CN202010138382A CN111401158A CN 111401158 A CN111401158 A CN 111401158A CN 202010138382 A CN202010138382 A CN 202010138382A CN 111401158 A CN111401158 A CN 111401158A
Authority
CN
China
Prior art keywords
sample image
model
key point
sample
difficult
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010138382.XA
Other languages
Chinese (zh)
Other versions
CN111401158B (en
Inventor
蔡中印
陆进
陈斌
宋晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010138382.XA priority Critical patent/CN111401158B/en
Publication of CN111401158A publication Critical patent/CN111401158A/en
Priority to PCT/CN2020/118113 priority patent/WO2021174820A1/en
Application granted granted Critical
Publication of CN111401158B publication Critical patent/CN111401158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method for discovering a difficult sample, which comprises the following steps: acquiring a first sample set; identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image; selecting sample images meeting preset conditions according to the attributes of the sample images; the image quality of the sample images which accord with the preset conditions is sorted based on a preset quality sorting model, and a quality sorting result is output; performing key point position marking on a first sample image which is sequenced at a preset position in the quality sequencing result based on a first key point marking model and a second key point marking model to obtain a second sample image and a third sample image which are marked; calculating the unit pixel deviation of the key points in the labeled second sample image and the labeled third sample image; and if the unitized pixel deviation is larger than or equal to a preset value, taking the first sample image as a difficult sample image. The embodiment of the invention can improve the discovery efficiency of the difficult samples.

Description

Difficult sample discovery method and device and computer equipment
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to a difficult sample finding device and computer equipment.
Background
Most of the current technologies related to face key points focus on better neural network structures (such as HRNET), training methods (such as unsupervised), loss functions (such as wing) and the like, and the training set and the test set both adopt labeled public data sets. However, the number of public data sets is small and the coverage surface cannot completely contain the actual service scene, so that in actual use, a key point labeling set of the actual service scene needs to be established, but a model trained by opening a source data set can only adapt to most simple scenes in the actual service scene, and is not well adapted to conditions such as large angles, strong backlight, blurring, and shading, so that in order to make a key point labeling model accurately label images in a complex scene, a large number of sample images shot in the complex scene need to be used for training the key point model, however, when determining whether the sample images belong to sample images (difficult samples) shot in the complex scene, in the prior art, the determination is often performed in a manual mode, so that the determination efficiency of the sample images is low, and requires a significant amount of labor cost.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a method, an apparatus, a computer device and a computer-readable storage medium for discovering a difficult sample, which are used to solve the problems of low efficiency and large manpower requirement for discovering a difficult sample.
In order to achieve the above object, an embodiment of the present invention provides a method for discovering a hard sample, including:
obtaining a first sample set, wherein the first sample set comprises a plurality of sample images with unlabeled attributes;
identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image, wherein the face attribute model is used for marking the attribute of the sample image;
selecting sample images meeting preset conditions according to the attributes of the sample images;
ranking the image quality of the sample images meeting the preset conditions based on a preset quality ranking model, and outputting a quality ranking result, wherein the quality ranking model is a trained model for identifying the image quality;
performing key point position labeling on a first sample image sequenced at a preset position in the quality sequencing result based on a first key point marking model to obtain a labeled second sample image, and performing key point position labeling on the first sample image sequenced at the preset position in the quality sequencing result based on a second key point marking model to obtain a labeled third sample image, wherein the first key point marking model and the second key point marking model are used for performing key point positioning on the images;
calculating the unitized pixel deviation of the key points in the second sample image and the third sample image after the labeling; and
and if the unitized pixel deviation is larger than or equal to a preset value, taking the first sample image as a difficult sample image.
Optionally, the sample image is a face image, and the attributes of the sample image include at least one of a deflection angle, a blur degree, an expression, a backlight intensity, a shade, glasses, a mask, a hat, a bang, and an age;
the preset condition is any one of the following conditions:
the deflection angle is larger than a first preset value, the ambiguity is larger than a second preset value, the backlight is larger than a third preset value, the expression is a preset expression, sunglasses are worn, a mask is worn, a cap is worn, Liuhai exists, and the age is larger than a fourth preset value or smaller than a fifth preset value.
Optionally, the hard sample discovery method further comprises:
labeling the difficult sample image through a third key point labeling model to obtain a first difficult sample image containing key points, and labeling the difficult sample image through a fourth key point labeling model to obtain a second difficult sample image containing key points;
inputting the first difficult sample image and the second difficult sample image into a face detection frame model to obtain a third difficult sample image and a fourth difficult sample image containing a face frame;
and publishing the third difficult sample image and the fourth difficult sample image to an annotation website so as to enable an annotation worker to classify the third difficult sample image and the fourth difficult sample image, wherein the classification category comprises three categories, namely accurate prediction result of a third key point marking model, inaccurate prediction result of a fourth key point marking model, inaccurate prediction results of the third key point marking model and the fourth key point marking model, and inaccurate classification of the third key point marking model, the fourth key point marking model and a human face frame.
Optionally, the hard sample discovery method further comprises:
and receiving the classification result, and taking the difficult sample image corresponding to the classification result as a training sample image of the fourth key point marking model when the classification result is that the prediction result of the third key point marking model is accurate and the prediction result of the fourth key point marking model is not accurate.
Optionally, the hard sample discovery method further comprises:
receiving a classification result, and issuing a difficult sample image corresponding to the classification result to a marking website when the classification result is that the prediction results of the third key point marking model and the fourth key point marking model are not correct, so that a marking person can correct key points in the marked difficult sample image;
and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as the training sample images of the third key point marking model and the fourth key point marking model.
Optionally, the hard sample discovery method further comprises:
receiving a classification result, and issuing a difficult sample image corresponding to the classification result to a marking website when the classification result is inaccurate in a third key point marking model, a fourth key point marking model and a face frame, so that a marking person corrects the face detection frame in the marked difficult sample image;
and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as a training sample image of the face detection frame model.
Optionally, the calculating the normalized pixel deviation of the key point in the labeled second sample image and the labeled third sample image is calculating the normalized pixel deviation of the left eye, the right eye and the center of mouth in the labeled second sample image and the labeled third sample image, and includes:
calculating a first unitized pixel deviation for the left eye in the second and third sample images after labeling;
calculating a second unitized pixel deviation for the right eye in the second and third sample images after labeling;
calculating a third unitized pixel deviation of the noted mouth center in the second sample image and the third sample image;
an average value of the first unitized pixel deviation, the second unitized pixel deviation, and the third unitized pixel deviation is set as the unitized pixel deviation.
In order to achieve the above object, an embodiment of the present invention further provides a difficult sample finding device, including:
the acquisition module is used for acquiring a first sample set, wherein the first sample set comprises a plurality of sample images with unlabeled attributes;
the identification module is used for identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image, wherein the face attribute model is used for marking the attribute of the sample image;
the selecting module is used for selecting sample images meeting preset conditions according to the attributes of the sample images;
the sorting module is used for sorting the image quality of the sample images meeting the preset conditions based on a preset quality sorting model and outputting a quality sorting result, wherein the quality sorting model is a trained model for identifying the image quality;
the labeling module is used for performing key point position labeling on the first sample images sequenced at the preset positions in the quality sequencing results based on a first key point labeling model to obtain labeled second sample images, and performing key point position labeling on the first sample images sequenced at the preset positions in the quality sequencing results based on a second key point labeling model to obtain labeled third sample images, wherein the first key point labeling model and the second key point labeling model are used for performing key point positioning on the images;
a calculating module, configured to calculate a unitized pixel deviation of the labeled keypoints in the second sample image and the third sample image; and
and the module is used for taking the first sample image as a difficult sample image if the unitized pixel deviation is greater than or equal to a preset value.
To achieve the above object, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the hard sample discovery method as described above when executing the computer program.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor to cause the at least one processor to execute the steps of the hard sample discovery method as described above.
According to the method, the device, the computer equipment and the computer readable storage medium for discovering the difficult sample, the first sample set is obtained, and the first sample set comprises a plurality of sample images with unmarked attributes; identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image, wherein the face attribute model is used for marking the attribute of the sample image; selecting sample images meeting preset conditions from the sample images; the image quality of the sample images meeting the preset conditions is sorted based on a preset quality sorting model, and a quality sorting result is output; performing key point position marking on a first sample image sequenced at a preset position in the quality sequencing result based on a first key point marking model to obtain a marked second sample image, and performing key point position marking on the first sample image sequenced at the preset position in the quality sequencing result based on a second key point marking model to obtain a marked third sample image, wherein the first key point marking model and the second key point marking model are used for performing key point positioning on the images; calculating the unitized pixel deviation of the key points in the second sample image and the third sample image after the labeling; and if the unitized pixel deviation is larger than or equal to a preset value, taking the first sample image as a difficult sample image. According to the embodiment of the invention, the human face attribute model and the quality sequencing model are combined, so that the first sample image can be automatically selected from a large number of sample images, and then the first sample image is further analyzed through the first key point model and the second key point model, so that whether the first sample image is a difficult sample image can be judged. According to the embodiment of the invention, the sample image does not need to be judged manually, so that the discovery efficiency of difficult samples can be improved, and the labor cost can be reduced.
Drawings
FIG. 1 is a flow chart illustrating the steps of one embodiment of the method for finding a difficult sample according to the present invention.
Fig. 2 is a flowchart illustrating a detailed process of calculating the unit pixel deviations of the left eye, the right eye, and the mouth center in the second sample image and the third sample image after labeling according to an embodiment of the present invention.
FIG. 3 is a flow chart illustrating steps of another embodiment of a method for finding a difficult sample according to the present invention.
Fig. 4 is a schematic diagram of program modules of a sample finding apparatus according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In the description of the present invention, it should be understood that the numerical references before the steps do not identify the order of performing the steps, but merely serve to facilitate the description of the present invention and to distinguish each step, and thus should not be construed as limiting the present invention.
Referring to fig. 1, a flow chart of a method for discovering a hard sample according to a first embodiment of the invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following is an exemplary description of an execution subject of a hard sample discovery apparatus (hereinafter, referred to as a "discovery apparatus"), which may be applied to a computer device, such as a mobile phone, a tablet personal computer (tablet personal computer), a laptop computer (laptop computer), a server, and the like, having a data transmission function. The method comprises the following specific steps:
step S10, obtaining a first sample set, where the first sample set includes a plurality of sample images with unlabeled attributes.
Specifically, the first sample set includes face sample images of various attributes, for example, face images that can be photographed at various angles, face images of various degrees of sharpness, face images under various lighting conditions, face images of various expressions, face images with glasses and without glasses, face images with a mask and without a mask, face images with bang and without bang, face images with a hat and without a hat, face images of various ages, and the like. The sample image without the labeled attribute refers to a sample image without labeling the image by means of manual labeling and the like.
Step S11, identifying each sample image with the unlabeled attribute based on a preset face attribute model to obtain the attribute of each unlabeled sample image, where the face attribute model is used to label the attribute of the sample image.
Specifically, the face attribute model may identify attributes of a sample image, and label the identified result in the sample image, where the sample image is preferably a face image, and the attributes of the sample image may include a deflection angle of the sample image, the deflection angle being a left deflection angle and a right deflection angle of the face, a blur degree of the sample image, an expression of the sample image, a backlight intensity of the sample image, whether the face of the sample image is blocked, whether a person in the sample image wears glasses, a mask, a hat, whether a person in the sample image has a bang, an age of the person in the sample image, and the like. The face attribute model can be obtained by training a deep neural network, which can be a convolutional neural network, by using a large number of training sample images in advance, but is not limited thereto.
In one embodiment, the face attribute model may be composed of a plurality of independent models, for example, a deflection angle recognition model, a ambiguity recognition model, a backlight intensity recognition model, an expression recognition model, and the like, and the sample image passes through the deflection angle recognition model, the ambiguity recognition model, the backlight intensity recognition model, the expression recognition model, and the like in sequence, so as to recognize various attributes of the sample image. In another embodiment, the face attribute model may also be composed of a plurality of model cascades, such as a deflection angle recognition model, a blur degree recognition model, a backlight intensity recognition model, an expression recognition model, and the like, and various attributes of the sample image are recognized by the sample image through the cascade model.
In this embodiment, by inputting each unlabeled sample image into the face attribute model, the attribute of each unlabeled sample image can be identified, so as to obtain the attribute of each unlabeled sample image.
Step S12, selecting sample images meeting the preset conditions according to the attributes of the sample images.
Specifically, the preset condition is any one of the following conditions: the deflection angle is greater than a first preset value, the ambiguity is greater than a second preset value, the backlight is greater than a third preset value, the expression is a preset expression, sunglasses are worn, a mask is worn, a hat is worn, and a person has a bang, and the age is greater than a fourth preset value or less than a fifth preset value, wherein the first preset value, the second preset value, the third preset value, the fourth preset value and the fifth preset value can be preset by a user or can be set by default of a system, and the method is not limited in the embodiment.
In this embodiment, after the attributes of each sample image are identified by the face attribute model, it may be determined whether the attributes of each sample image satisfy any one of the above conditions, and if one of the conditions is satisfied, it indicates that the sample image is a sample image that meets a preset condition.
And step S13, ranking the image quality of the sample images meeting the preset conditions based on a preset quality ranking model, and outputting a quality ranking result, wherein the quality ranking model is a trained model for identifying the image quality.
In particular, the quality ranking model is used to score the quality of the images. The quality ranking model can be obtained by training a deep neural network model in advance through a large number of sample images containing high-definition base images and snapshot images of different characters, wherein the high-definition base images and the snapshot images contain scores of users for the high-definition base images and the snapshot images. After the training of the quality ranking model is completed, the images input thereto may be scored by the quality ranking model.
In this embodiment, the score values of the sample images meeting the preset conditions can be obtained by inputting the sample images meeting the preset conditions into the quality ranking model. After the score values of the sample images meeting the preset conditions are obtained, the score values of the sample images meeting the preset conditions can be sorted, and after the sorting is completed, a quality sorting result can be output. In this embodiment, when performing sorting, sorting may be performed according to the score value from large to small, or sorting may be performed according to the score value from small to large, which is not limited in this embodiment.
Step S14, performing key point position labeling on the first sample images sorted at the preset positions in the quality sorting result based on the first key point labeling model to obtain a labeled second sample image, and performing key point position labeling on the first sample images sorted at the preset positions in the quality sorting result based on the second key point labeling model to obtain a labeled third sample image, wherein the first key point labeling model and the second key point labeling model are used for performing key point positioning on the images.
Specifically, the preset position is a preset sorting position range, for example, the preset sorting position range is a sample image that is sorted from small to large before 50 bits, that is, the first sample image is a sample image that is sorted before 50 bits in the sample images meeting the preset condition.
In this embodiment, the first and second keypoint signature models are obtained by different network structures or different training criteria. For example, the first keypoint signature model is a keypoint signature model trained by adopting an vgg network structure and adopting a 72-point landmark (keypoint) standard, and the second keypoint signature model is a keypoint signature model trained by adopting a resnet network structure and adopting a 106-point landmark standard. It is understood that the network structure of the first and second keypoint signature models and the landmark standard are only exemplary and are not limited in this embodiment. For example, the first keypoint signature model may also adopt a resnet network structure, or a keypoint signature model obtained by training the 68-point landmark standard; the second keypoint labeling model can also adopt vgg network, and can also adopt a keypoint labeling model obtained by training of 150-point landmark standard.
After the first sample image is input into the first key point marking model, the key point coordinates of each input image can be predicted through the first key point marking model, and therefore the labeled second sample image is obtained. Wherein the first sample image comprises at least one image.
After the first sample image is input into the second key point marking model, the key point coordinates of each input image can be predicted through the second key point marking model, and therefore the labeled third sample image is obtained.
It should be noted that the key point coordinates are the labels of the first sample image by the first key point labeling model and the second key point labeling model, that is, the key points of the sample image are located in a coordinate point manner.
Step S15, calculating the unit pixel deviation of the key points in the labeled second sample image and the third sample image.
Specifically, the second sample image and the third sample image labeled by the first keypoint model and the second keypoint model both include a plurality of keypoints, preferably the keypoints of the left eye, the right eye, and the center of the mouth.
Predicting coordinate values of key points (taking three points of the left eye, the right eye and the mouth as examples) of each first sample image through the first key point marking model to obtain a second sample image; and after the coordinate values of the three points of the left eye, the right eye and the mouth center of each first sample image are predicted through the second key point marking model, and a third sample image is obtained, the unitized pixel deviation can be calculated according to the coordinate values of the three points of the landmark predicted values (the left eye, the right eye and the mouth center) of the second sample image and the third sample image. Specifically, the unitized pixel deviation can be calculated by the following formula:
Figure BDA0002398129180000091
wherein Δ x is an x-axis difference between two corresponding points of the second sample image and the third sample image, Δ y is a y-axis difference between two corresponding points of the second sample image and the third sample image, w is a width of the face frame in the first sample image, and h is a height of the face frame in the first sample image.
In one embodiment, referring to fig. 2, the calculating the unit pixel deviations of the keypoints in the labeled second sample image and the labeled third sample image is calculating the unit pixel deviations of the left-eye, right-eye and mouth centers in the labeled second sample image and the labeled third sample image, and includes:
step S20, calculating a first unitized pixel deviation of the left eye in the second and third sample images after the labeling;
step S21, calculating a second unitized pixel deviation of the right eye in the labeled second sample image and the third sample image;
step S22, calculating a third unitized pixel deviation of the center of mouth in the labeled second sample image and the third sample image;
in step S23, the average value of the first, second, and third unitized pixel deviations is set as the unitized pixel deviation.
Specifically, after the first unitized pixel deviation, the second unitized pixel deviation, and the third unitized pixel deviation of three points at the center of the left eye, the right eye, and the mouth are obtained, respectively, the average value of the unitized pixel deviations of the three points may be used as the final unitized pixel deviation value.
In another embodiment of the present invention, the maximum value among the calculated first, second, and third unit pixel deviations may be set as the final unit pixel deviation value, or the median among the calculated first, second, and third unit pixel deviations may be set as the final unit pixel deviation value.
In step S16, if the unitized pixel deviation is greater than or equal to a preset value, the first sample image is regarded as a difficult sample image.
Specifically, the preset value is a preset standard unit pixel deviation value, and the value can be set and modified according to actual conditions.
When the calculated deviation value of the unitized pixel is greater than or equal to the preset value, the first sample image can be judged as a difficult sample image, and when the calculated deviation value of the unitized pixel is less than the preset value, the first sample image can be judged as not a difficult sample image.
It should be noted that the hard sample image in the present embodiment refers to a face image captured in a complex scene, for example, a face image captured in strong light, a blocked face image, and the like.
The method for discovering the difficult sample provided by the embodiment of the invention can automatically select the first sample image from a large number of sample images by combining the human face attribute model and the quality sequencing model, and then further analyze the first sample image through the first key point model and the second key point model, thereby judging whether the first sample image is the difficult sample image. According to the embodiment of the invention, the sample image does not need to be judged manually, so that the discovery efficiency of difficult samples can be improved, and the labor cost can be reduced.
Fig. 3 is a schematic flow chart showing steps of another embodiment of the method for discovering a difficult sample according to the present invention. The embodiment of the present invention is based on the above-described embodiments. In this embodiment, the execution order of the steps in the flowchart shown in fig. 3 may be changed and some steps may be omitted according to different requirements. Hereinafter, the hard sample finding device (hereinafter, referred to as "finding device") is also exemplarily described as an execution subject. The method comprises the following specific steps:
and step S30, labeling the difficult sample image through a third key point labeling model to obtain a first difficult sample image containing key points, and labeling the difficult sample image through a fourth key point labeling model to obtain a second difficult sample image containing key points.
Specifically, the third key point marking model is a model which is dedicated to data cleaning, has a larger network structure with deeper input, is slower in speed than an actual online deployment model and has higher precision for marking the key points of the human face, and the fourth key point model is a small model which is actually deployed and is used for marking the key points of the human face. The network models of the third and fourth keypoint signature models may be models trained using a resnet network structure or a network of vgg network structures.
And marking the difficult sample image by the third key point marking model and the fourth key point marking model to obtain a second sample image correspondingly marked. The marked second sample image contains the coordinates of the left eye, right eye, and mouth center.
Step S31, inputting the first hard sample image and the second hard sample image into a face detection frame model to obtain a third hard sample image and a fourth hard sample image including a face frame.
Specifically, the face detection frame model is a model for marking a face frame, and the face detection frame model is a prior art and is not described in this embodiment.
By inputting the first difficult sample image and the second difficult sample image into the face detection frame model, a third difficult sample image and a fourth difficult sample image including a face frame can be output, wherein the third difficult sample image is correspondingly output when the first difficult sample image is input into the face detection frame model, and the fourth difficult sample image is correspondingly output when the second difficult sample image is input into the face detection frame model.
Step S32, publishing the third difficult sample image and the fourth difficult sample image to an annotation website, so that an annotator classifies the third difficult sample image and the fourth difficult sample image, wherein the classification category includes three categories, i.e., a third key point labeling model prediction result is accurate and a fourth key point labeling model prediction result is inaccurate, neither the third key point labeling model nor the fourth key point labeling model prediction result is accurate, and neither the third key point labeling model nor the fourth key point labeling model nor a face frame is accurate.
Specifically, when classifying the third difficult sample image and the fourth difficult sample image, since the classification results of different annotators may be different, when classifying the third difficult sample image and the fourth difficult sample image, a plurality of annotators can classify the third difficult sample image and the fourth difficult sample image, and then the same classification result of the third difficult sample image and the fourth difficult sample image by a plurality of people is selected as the classification result of the third difficult sample image and the fourth difficult sample image.
And step S33, receiving the classification result, and taking the hard sample image corresponding to the classification result as the training sample image of the fourth key point marking model when the classification result is that the prediction result of the third key point marking model is accurate and the prediction result of the fourth key point marking model is not accurate.
Specifically, when the classification result is that the prediction result of the third keypoint mark model is accurate and the prediction result of the fourth keypoint mark model is not accurate, the hard sample image corresponding to the classification result may be used as the training sample image of the fourth keypoint mark model, and then the hard sample image may be used to perform iterative training on the fourth keypoint mark model again, so as to improve the labeling accuracy of the fourth keypoint mark model.
And step S34, receiving the classification result, and issuing the difficult sample image corresponding to the classification result to a marking website when the classification result is that the prediction results of the third key point marking model and the fourth key point marking model are not correct, so that the marking personnel can correct the key points in the marked difficult sample image.
Step S35, receiving the difficult sample image corrected by the annotating staff, and using the difficult sample image corrected by the annotating staff as the training sample image of the third and fourth keypoint mark models.
Specifically, when the classification result is that the prediction result of the third key point marking model is inaccurate and the prediction result of the fourth key point marking model is not accurate, the difficult sample image corresponding to the classification result can be published to a marking website, so that a marker can manually correct the difficult sample image, and after the marker completes correction, the corrected landmark (key point) result can be printed on the face image again, so that different markers can judge whether the correction is accurate, and the correction accuracy is improved.
After the final correction is completed, the corrected difficult sample image is used as a training sample picture of a third key point marking model and a fourth key point marking model, so that iterative training can be performed on the third key point marking model and the fourth key point marking model again by using the corrected difficult sample image, and the marking accuracy of the third key point marking model and the fourth key point marking model is improved.
And step S36, receiving the classification result, and when the classification result is that the third key point marking model, the fourth key point marking model and the face frame are not accurate, issuing the hard sample image corresponding to the classification result to a marking website so that the marking personnel can correct the face detection frame in the marked hard sample image.
And step S37, receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as the training sample image of the face detection frame model.
Specifically, when the classification result is that the third key point marking model, the fourth key point marking model and the face frame are not correct, the difficult sample picture corresponding to the classification result can be published to a marking website, so that a marker can manually correct the face frame in the difficult sample image. After the face frame is corrected, the difficult sample image corrected by the annotating personnel is used as a training sample image of the face detection frame model, so that the face detection frame model can be subjected to repeated iterative training by using the corrected difficult sample image, and the detection accuracy of the face detection model is improved.
According to the method for discovering the difficult sample, provided by the embodiment of the invention, the third difficult sample image and the fourth difficult sample image are classified by the annotating personnel, so that whether the third key point marking model and the fourth key point marking model are accurate in predicting the key points in the sample images can be judged, and when the prediction result is not correct, the corresponding difficult sample image is taken as the training sample image to re-iteratively train the corresponding model, so that the accuracy of the model is improved.
Referring to fig. 4, a schematic diagram of program modules of a sample finding apparatus 400 (hereinafter referred to as "finding apparatus" 400) according to an embodiment of the invention is shown. The discovery apparatus 400 may be applied to a computer device, which may be a mobile phone, a tablet personal computer (tablet personal computer), a laptop computer (laptop computer), a server, or other devices having a data transmission function. In this embodiment, the discovery apparatus 400 may include or be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement the present invention and implement the above-described difficult sample discovery method. The program module referred to in the embodiments of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the hard sample discovery method in the storage medium than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
an obtaining module 401, configured to obtain a first sample set, where the first sample set includes a plurality of sample images with unlabeled attributes.
Specifically, the first sample set includes face sample images of various attributes, for example, face images that can be photographed at various angles, face images of various degrees of sharpness, face images under various lighting conditions, face images of various expressions, face images with glasses and without glasses, face images with a mask and without a mask, face images with bang and without bang, face images with a hat and without a hat, face images of various ages, and the like. The sample image without the labeled attribute refers to a sample image without labeling the image by means of manual labeling and the like.
An identifying module 402, configured to identify each sample image with an unmarked attribute based on a preset face attribute model, to obtain an attribute of each unmarked sample image, where the face attribute model is used to label the attribute of the sample image.
Specifically, the face attribute model may identify attributes of a sample image, and label the identified result in the sample image, where the sample image is preferably a face image, and the attributes of the sample image may include a deflection angle of the sample image, the deflection angle being a left deflection angle and a right deflection angle of the face, a blur degree of the sample image, an expression of the sample image, a backlight intensity of the sample image, whether the face of the sample image is blocked, whether a person in the sample image wears glasses, a mask, a hat, whether a person in the sample image has a bang, an age of the person in the sample image, and the like. The face attribute model can be obtained by training a deep neural network, which can be a convolutional neural network, by using a large number of training sample images in advance, but is not limited thereto.
In one embodiment, the face attribute model may be composed of a plurality of independent models, for example, a deflection angle recognition model, a ambiguity recognition model, a backlight intensity recognition model, an expression recognition model, and the like, and the sample image passes through the deflection angle recognition model, the ambiguity recognition model, the backlight intensity recognition model, the expression recognition model, and the like in sequence, so as to recognize various attributes of the sample image. In another embodiment, the face attribute model may also be composed of a plurality of model cascades, such as a deflection angle recognition model, a blur degree recognition model, a backlight intensity recognition model, an expression recognition model, and the like, and various attributes of the sample image are recognized by the sample image through the cascade model.
In this embodiment, by inputting each unlabeled sample image into the face attribute model, the attribute of each unlabeled sample image can be identified, so as to obtain the attribute of each unlabeled sample image.
A selecting module 403, configured to select sample images meeting preset conditions according to attributes of the sample images.
Specifically, the preset condition is any one of the following conditions: the deflection angle is greater than a first preset value, the ambiguity is greater than a second preset value, the backlight is greater than a third preset value, the expression is a preset expression, sunglasses are worn, a mask is worn, a hat is worn, and a person has a bang, and the age is greater than a fourth preset value or less than a fifth preset value, wherein the first preset value, the second preset value, the third preset value, the fourth preset value and the fifth preset value can be preset by a user or can be set by default of a system, and the method is not limited in the embodiment.
In this embodiment, after the attributes of each sample image are identified by the face attribute model, it may be determined whether the attributes of each sample image satisfy any one of the above conditions, and if one of the conditions is satisfied, it indicates that the sample image is a sample image that meets a preset condition.
A sorting module 404, configured to sort the image quality of the sample images meeting the preset condition based on a preset quality sorting model, and output a quality sorting result, where the quality sorting model is a trained model for identifying image quality.
In particular, the quality ranking model is used to score the quality of the images. The quality ranking model can be obtained by training a deep neural network model in advance through a large number of sample images containing high-definition base images and snapshot images of different characters, wherein the high-definition base images and the snapshot images contain scores of users for the high-definition base images and the snapshot images. After the training of the quality ranking model is completed, the images input thereto may be scored by the quality ranking model.
In this embodiment, the score values of the sample images meeting the preset conditions can be obtained by inputting the sample images meeting the preset conditions into the quality ranking model. After the score values of the sample images meeting the preset conditions are obtained, the score values of the sample images meeting the preset conditions can be sorted, and after the sorting is completed, a quality sorting result can be output. In this embodiment, when performing sorting, sorting may be performed according to the score value from large to small, or sorting may be performed according to the score value from small to large, which is not limited in this embodiment.
A labeling module 405, configured to perform, based on a first keypoint mark model, keypoint position labeling on the first sample image ranked at a preset position in the quality ranking result to obtain a labeled second sample image, and perform, based on a second keypoint mark model, keypoint position labeling on the first sample image ranked at the preset position in the quality ranking result to obtain a labeled third sample image, where the first keypoint mark model and the second keypoint mark model are used to perform keypoint positioning on the images.
Specifically, the preset position is a preset sorting position range, for example, the preset sorting position range is a sample image that is sorted from small to large before 50 bits, that is, the first sample image is a sample image that is sorted before 50 bits in the sample images meeting the preset condition.
In this embodiment, the first and second keypoint signature models are obtained by different network structures or different training criteria. For example, the first keypoint signature model is a keypoint signature model trained by adopting an vgg network structure and adopting a 72-point landmark (keypoint) standard, and the second keypoint signature model is a keypoint signature model trained by adopting a resnet network structure and adopting a 106-point landmark standard. It is understood that the network structure of the first and second keypoint signature models and the landmark standard are only exemplary and are not limited in this embodiment. For example, the first keypoint signature model may also adopt a resnet network structure, or a keypoint signature model obtained by training the 68-point landmark standard; the second keypoint labeling model can also adopt vgg network, and can also adopt a keypoint labeling model obtained by training of 150-point landmark standard.
After the first sample image is input into the first key point marking model, the key point coordinates of each input image can be predicted through the first key point marking model, and therefore the labeled second sample image is obtained. Wherein the first sample image comprises at least one image.
After the first sample image is input into the second key point marking model, the key point coordinates of each input image can be predicted through the second key point marking model, and therefore the labeled third sample image is obtained.
It should be noted that the key point coordinates are the labels of the first sample image by the first key point labeling model and the second key point labeling model, that is, the key points of the sample image are located in a coordinate point manner.
A calculating module 406, configured to calculate a unitized pixel deviation of the labeled keypoints in the second sample image and the third sample image.
Specifically, the second sample image and the third sample image labeled by the first keypoint model and the second keypoint model both include a plurality of keypoints, preferably the keypoints of the left eye, the right eye, and the center of the mouth.
Predicting coordinate values of key points (taking three points of the left eye, the right eye and the mouth as examples) of each first sample image through the first key point marking model to obtain a second sample image; and after coordinate values of three points of the left eye, the right eye and the mouth center of each first sample image are predicted through the second key point marking model, and a third sample image is obtained, the unitized pixel deviation can be calculated according to the coordinate values of three points of the landmark predicted values (the left eye, the right eye and the mouth center) of the second sample image and the third sample image. Specifically, the unitized pixel deviation can be calculated by the following formula:
Figure BDA0002398129180000161
wherein Δ x is the x-axis difference between two points corresponding to the second sample image and the third sample imageAnd delta y is the y-axis difference value of the corresponding two points of the second sample image and the third sample image, w is the width of the face frame in the first sample image, and h is the height of the face frame in the first sample image.
In an embodiment, the calculating module 406 is further configured to calculate a second unitized pixel deviation of the labeled second sample image and the right eye in the third sample image; calculating a third unitized pixel deviation of the noted mouth center in the second sample image and the third sample image; and taking an average value of the first unitized pixel deviation, the second unitized pixel deviation, and the third unitized pixel deviation as the unitized pixel deviation.
Specifically, after the first unitized pixel deviation, the second unitized pixel deviation, and the third unitized pixel deviation of three points at the center of the left eye, the right eye, and the mouth are obtained, respectively, the average value of the unitized pixel deviations of the three points may be used as the final unitized pixel deviation value.
In another embodiment of the present invention, the maximum value among the calculated first, second, and third unit pixel deviations may be set as the final unit pixel deviation value, or the median among the calculated first, second, and third unit pixel deviations may be set as the final unit pixel deviation value.
A module 407, configured to take the first sample image as a difficult sample image if the unitized pixel deviation is greater than or equal to a preset value.
Specifically, the preset value is a preset standard unit pixel deviation value, and the value can be set and modified according to actual conditions.
When the calculated deviation value of the unitized pixel is greater than or equal to the preset value, the first sample image can be judged as a difficult sample image, and when the calculated deviation value of the unitized pixel is less than the preset value, the first sample image can be judged as not a difficult sample image.
It should be noted that the hard sample image in the present embodiment refers to a face image captured in a complex scene, for example, a face image captured in strong light, a blocked face image, and the like.
The method for discovering the difficult sample provided by the embodiment of the invention can automatically select the first sample image from a large number of sample images by combining the human face attribute model and the quality sequencing model, and then further analyze the first sample image through the first key point model and the second key point model, thereby judging whether the first sample image is the difficult sample image. According to the embodiment of the invention, the sample image does not need to be judged manually, so that the discovery efficiency of difficult samples can be improved, and the labor cost can be reduced.
In another embodiment of the present invention, the discovery apparatus 400 further includes:
and the difficult sample labeling module is used for labeling the difficult sample image through a third key point labeling model to obtain a first difficult sample image containing key points, and labeling the difficult sample image through a fourth key point labeling model to obtain a second difficult sample image containing key points.
Specifically, the third key point marking model is a model which is dedicated to data cleaning, has a larger network structure with deeper input, is slower in speed than an actual online deployment model and has higher precision for marking the key points of the human face, and the fourth key point model is a small model which is actually deployed and is used for marking the key points of the human face. The network models of the third and fourth keypoint signature models may be models trained using a resnet network structure or a network of vgg network structures.
And marking the difficult sample image by the third key point marking model and the fourth key point marking model to obtain a second sample image correspondingly marked. The marked second sample image contains the coordinates of the left eye, right eye, and mouth center.
And the input module is used for inputting the first difficult sample image and the second difficult sample image into a face detection frame model so as to obtain a third difficult sample image and a fourth difficult sample image containing a face frame.
Specifically, the face detection frame model is a model for marking a face frame, and the face detection frame model is a prior art and is not described in this embodiment.
By inputting the first difficult sample image and the second difficult sample image into the face detection frame model, a third difficult sample image and a fourth difficult sample image including a face frame can be output, wherein the third difficult sample image is correspondingly output when the first difficult sample image is input into the face detection frame model, and the fourth difficult sample image is correspondingly output when the second difficult sample image is input into the face detection frame model.
And the publishing module is used for publishing the third difficult sample image and the fourth difficult sample image to an annotation website so as to enable an annotation person to classify the third difficult sample image and the fourth difficult sample image, wherein the classification category comprises three categories including accurate prediction result of a third key point marking model and inaccurate prediction result of a fourth key point marking model, inaccurate prediction results of the third key point marking model and the fourth key point marking model, and inaccurate accuracy of the third key point marking model, the fourth key point marking model and a human face frame.
Specifically, when classifying the third difficult sample image and the fourth difficult sample image, since the classification results of different annotators may be different, when classifying the third difficult sample image and the fourth difficult sample image, a plurality of annotators can classify the third difficult sample image and the fourth difficult sample image, and then the same classification result of the third difficult sample image and the fourth difficult sample image by a plurality of people is selected as the classification result of the third difficult sample image and the fourth difficult sample image.
And the receiving module is used for receiving the classification result, and taking the hard sample image corresponding to the classification result as a training sample image of the fourth key point marking model when the classification result is that the prediction result of the third key point marking model is accurate and the prediction result of the fourth key point marking model is not accurate.
Specifically, when the classification result is that the prediction result of the third keypoint mark model is accurate and the prediction result of the fourth keypoint mark model is not accurate, the hard sample image corresponding to the classification result may be used as the training sample image of the fourth keypoint mark model, and then the hard sample image may be used to perform iterative training on the fourth keypoint mark model again, so as to improve the labeling accuracy of the fourth keypoint mark model.
In an embodiment, the receiving module is further configured to receive a classification result, and when the classification result is that neither the third key point labeling model nor the fourth key point labeling model predicts that the result is incorrect, issue the difficult sample image corresponding to the classification result to a labeling website, so that a labeling person corrects the key point in the labeled difficult sample image; and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as the training sample images of the third key point marking model and the fourth key point marking model.
Specifically, when the classification result is that the prediction result of the third key point marking model is inaccurate and the prediction result of the fourth key point marking model is not accurate, the difficult sample image corresponding to the classification result can be published to a marking website, so that a marker can manually correct the difficult sample image, and after the marker completes correction, the corrected landmark (key point) result can be printed on the face image again, so that different markers can judge whether the correction is accurate, and the correction accuracy is improved.
After the final correction is completed, the corrected difficult sample image is used as a training sample picture of a third key point marking model and a fourth key point marking model, so that iterative training can be performed on the third key point marking model and the fourth key point marking model again by using the corrected difficult sample image, and the marking accuracy of the third key point marking model and the fourth key point marking model is improved.
In an embodiment, the receiving module is further configured to receive a classification result, and when the classification result is that the third key point labeling model, the fourth key point labeling model and the face frame are not accurate, issue the hard sample image corresponding to the classification result to a labeling website, so that a labeling person corrects the face detection frame in the labeled hard sample image; and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as a training sample image of the face detection frame model.
Specifically, when the classification result is that the third key point marking model, the fourth key point marking model and the face frame are not correct, the difficult sample picture corresponding to the classification result can be published to a marking website, so that a marker can manually correct the face frame in the difficult sample image. After the face frame is corrected, the difficult sample image corrected by the annotating personnel is used as a training sample image of the face detection frame model, so that the face detection frame model can be subjected to repeated iterative training by using the corrected difficult sample image, and the detection accuracy of the face detection model is improved.
According to the method for discovering the difficult sample, provided by the embodiment of the invention, the third difficult sample image and the fourth difficult sample image are classified by the annotating personnel, so that whether the third key point marking model and the fourth key point marking model are accurate in predicting the key points in the sample images can be judged, and when the prediction result is not correct, the corresponding difficult sample image is taken as the training sample image to re-iteratively train the corresponding model, so that the accuracy of the model is improved.
Fig. 5 is a schematic diagram of a hardware architecture of a computer device 500 according to an embodiment of the present invention. In the present embodiment, the computer device 500 is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance. As shown, the computer apparatus 500 includes, but is not limited to, at least a memory 501, a processor 502, and a network interface 503 communicatively coupled to each other via a device bus. Wherein:
in this embodiment, the memory 501 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 501 may be an internal storage unit of the computer device 500, such as a hard disk or a memory of the computer device 500. In other embodiments, the memory 501 may also be an external storage device of the computer device 500, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 500. Of course, the memory 501 may also include both internal and external memory units of the computer device 500. In this embodiment, the memory 501 is generally used for storing the operating device and various application software installed in the computer device 500, such as the program code of the sample finding device 400. Further, the memory 501 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 502 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 502 generally operates to control the overall operation of the computer device 500. In this embodiment, the processor 502 is configured to run the program code stored in the memory 501 or process data, for example, run the hard sample finding apparatus 400, so as to implement the hard sample finding method in the above embodiments.
The network interface 503 may include a wireless network interface or a wired network interface, and the network interface 503 is generally used for establishing a communication connection between the computer apparatus 500 and other electronic devices. For example, the network interface 503 is used to connect the computer device 500 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 500 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 5 only shows the computer device 500 with components 501 and 503, but it is understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the hard sample finding apparatus 400 stored in the memory 501 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 501 and executed by one or more processors (in this embodiment, the processor 502) to implement the hard sample finding method or the hard sample finding method of the present invention.
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer readable storage medium of the present embodiment is used for storing the hard sample finding apparatus 400, so as to realize the hard sample finding method or the hard sample finding method of the present invention when being executed by a processor.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for hard sample discovery, comprising:
obtaining a first sample set, wherein the first sample set comprises a plurality of sample images with unlabeled attributes;
identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image, wherein the face attribute model is used for marking the attribute of the sample image;
selecting sample images meeting preset conditions according to the attributes of the sample images;
ranking the image quality of the sample images meeting the preset conditions based on a preset quality ranking model, and outputting a quality ranking result, wherein the quality ranking model is a trained model for identifying the image quality;
performing key point position labeling on a first sample image sequenced at a preset position in the quality sequencing result based on a first key point marking model to obtain a labeled second sample image, and performing key point position labeling on the first sample image sequenced at the preset position in the quality sequencing result based on a second key point marking model to obtain a labeled third sample image, wherein the first key point marking model and the second key point marking model are used for performing key point positioning on the images;
calculating the unitized pixel deviation of the key points in the second sample image and the third sample image after the labeling; and
and if the unitized pixel deviation is larger than or equal to a preset value, taking the first sample image as a difficult sample image.
2. The difficult sample finding method according to claim 1, wherein the sample image is a face image, and the attributes of the sample image include at least one of a deflection angle, a blur degree, an expression, a backlight intensity, a shade, glasses, a mask, a hat, a bang, and an age;
the preset condition is any one of the following conditions:
the deflection angle is larger than a first preset value, the ambiguity is larger than a second preset value, the backlight is larger than a third preset value, the expression is a preset expression, sunglasses are worn, a mask is worn, a cap is worn, Liuhai exists, and the age is larger than a fourth preset value or smaller than a fifth preset value.
3. The difficult sample finding method according to claim 2, further comprising:
labeling the difficult sample image through a third key point labeling model to obtain a first difficult sample image containing key points, and labeling the difficult sample image through a fourth key point labeling model to obtain a second difficult sample image containing key points;
inputting the first difficult sample image and the second difficult sample image into a face detection frame model to obtain a third difficult sample image and a fourth difficult sample image containing a face frame;
and publishing the third difficult sample image and the fourth difficult sample image to an annotation website so as to enable an annotation worker to classify the third difficult sample image and the fourth difficult sample image, wherein the classification category comprises three categories, namely accurate prediction result of a third key point marking model, inaccurate prediction result of a fourth key point marking model, inaccurate prediction results of the third key point marking model and the fourth key point marking model, and inaccurate classification of the third key point marking model, the fourth key point marking model and a human face frame.
4. The difficult sample finding method according to claim 3, further comprising:
and receiving the classification result, and taking the difficult sample image corresponding to the classification result as a training sample image of the fourth key point marking model when the classification result is that the prediction result of the third key point marking model is accurate and the prediction result of the fourth key point marking model is not accurate.
5. The difficult sample finding method according to claim 3, further comprising:
receiving a classification result, and issuing a difficult sample image corresponding to the classification result to a marking website when the classification result is that the prediction results of the third key point marking model and the fourth key point marking model are not correct, so that a marking person can correct key points in the marked difficult sample image;
and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as the training sample images of the third key point marking model and the fourth key point marking model.
6. The difficult sample finding method according to claim 3, further comprising:
receiving a classification result, and issuing a difficult sample image corresponding to the classification result to a marking website when the classification result is inaccurate in a third key point marking model, a fourth key point marking model and a face frame, so that a marking person corrects the face detection frame in the marked difficult sample image;
and receiving the difficult sample image corrected by the annotating personnel, and taking the difficult sample image corrected by the annotating personnel as a training sample image of the face detection frame model.
7. The method of any one of claims 1 to 6, wherein the calculating the normalized pixel deviation of the keypoints in the labeled second sample image and the labeled third sample image is calculating the normalized pixel deviation of the centers of left eye, right eye and mouth in the labeled second sample image and the labeled third sample image, and comprises:
calculating a first unitized pixel deviation for the left eye in the second and third sample images after labeling;
calculating a second unitized pixel deviation for the right eye in the second and third sample images after labeling;
calculating a third unitized pixel deviation of the noted mouth center in the second sample image and the third sample image;
an average value of the first unitized pixel deviation, the second unitized pixel deviation, and the third unitized pixel deviation is set as the unitized pixel deviation.
8. A difficult sample finding device, comprising:
the acquisition module is used for acquiring a first sample set, wherein the first sample set comprises a plurality of sample images with unlabeled attributes;
the identification module is used for identifying each sample image with the unmarked attribute based on a preset face attribute model to obtain the attribute of each unmarked sample image, wherein the face attribute model is used for marking the attribute of the sample image;
the selecting module is used for selecting sample images meeting preset conditions according to the attributes of the sample images;
the sorting module is used for sorting the image quality of the sample images meeting the preset conditions based on a preset quality sorting model and outputting a quality sorting result, wherein the quality sorting model is a trained model for identifying the image quality;
the labeling module is used for performing key point position labeling on the first sample images sequenced at the preset positions in the quality sequencing results based on a first key point labeling model to obtain labeled second sample images, and performing key point position labeling on the first sample images sequenced at the preset positions in the quality sequencing results based on a second key point labeling model to obtain labeled third sample images, wherein the first key point labeling model and the second key point labeling model are used for performing key point positioning on the images;
a calculating module, configured to calculate a unitized pixel deviation of the labeled keypoints in the second sample image and the third sample image; and
and the module is used for taking the first sample image as a difficult sample image if the unitized pixel deviation is greater than or equal to a preset value.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the hard sample discovery method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the hard sample discovery method of any one of claims 1-7.
CN202010138382.XA 2020-03-03 2020-03-03 Difficult sample discovery method and device and computer equipment Active CN111401158B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010138382.XA CN111401158B (en) 2020-03-03 2020-03-03 Difficult sample discovery method and device and computer equipment
PCT/CN2020/118113 WO2021174820A1 (en) 2020-03-03 2020-09-27 Discovery method and apparatus for difficult sample, and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010138382.XA CN111401158B (en) 2020-03-03 2020-03-03 Difficult sample discovery method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN111401158A true CN111401158A (en) 2020-07-10
CN111401158B CN111401158B (en) 2023-09-01

Family

ID=71432167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010138382.XA Active CN111401158B (en) 2020-03-03 2020-03-03 Difficult sample discovery method and device and computer equipment

Country Status (2)

Country Link
CN (1) CN111401158B (en)
WO (1) WO2021174820A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021174820A1 (en) * 2020-03-03 2021-09-10 平安科技(深圳)有限公司 Discovery method and apparatus for difficult sample, and computer device
CN116416666A (en) * 2023-04-17 2023-07-11 北京数美时代科技有限公司 Face recognition method, system and storage medium based on distributed distillation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133220A (en) * 2016-11-30 2018-06-08 北京市商汤科技开发有限公司 Model training, crucial point location and image processing method, system and electronic equipment
CN109558864A (en) * 2019-01-16 2019-04-02 苏州科达科技股份有限公司 Face critical point detection method, apparatus and storage medium
WO2019109526A1 (en) * 2017-12-06 2019-06-13 平安科技(深圳)有限公司 Method and device for age recognition of face image, storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450B (en) * 2016-03-01 2018-11-27 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on depth convolutional neural networks
US10332312B2 (en) * 2016-12-25 2019-06-25 Facebook, Inc. Shape prediction model compression for face alignment
CN109635838B (en) * 2018-11-12 2023-07-11 平安科技(深圳)有限公司 Face sample picture labeling method and device, computer equipment and storage medium
CN110135263A (en) * 2019-04-16 2019-08-16 深圳壹账通智能科技有限公司 Portrait attribute model construction method, device, computer equipment and storage medium
CN110110611A (en) * 2019-04-16 2019-08-09 深圳壹账通智能科技有限公司 Portrait attribute model construction method, device, computer equipment and storage medium
CN111401158B (en) * 2020-03-03 2023-09-01 平安科技(深圳)有限公司 Difficult sample discovery method and device and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133220A (en) * 2016-11-30 2018-06-08 北京市商汤科技开发有限公司 Model training, crucial point location and image processing method, system and electronic equipment
WO2019109526A1 (en) * 2017-12-06 2019-06-13 平安科技(深圳)有限公司 Method and device for age recognition of face image, storage medium
CN109558864A (en) * 2019-01-16 2019-04-02 苏州科达科技股份有限公司 Face critical point detection method, apparatus and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021174820A1 (en) * 2020-03-03 2021-09-10 平安科技(深圳)有限公司 Discovery method and apparatus for difficult sample, and computer device
CN116416666A (en) * 2023-04-17 2023-07-11 北京数美时代科技有限公司 Face recognition method, system and storage medium based on distributed distillation

Also Published As

Publication number Publication date
WO2021174820A1 (en) 2021-09-10
CN111401158B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
CN111476227B (en) Target field identification method and device based on OCR and storage medium
CN110705405B (en) Target labeling method and device
KR101551417B1 (en) Facial recognition
CN107223246B (en) Image labeling method and device and electronic equipment
CN104616021B (en) Traffic sign image processing method and device
KR101165415B1 (en) Method for recognizing human face and recognizing apparatus
CN110610197A (en) Method and device for mining difficult sample and training model and electronic equipment
JP6702716B2 (en) Image processing device, image processing method, and program
CN108154132A (en) A kind of identity card text extraction method, system and equipment and storage medium
TW202009681A (en) Sample labeling method and device, and damage category identification method and device
CN111401158B (en) Difficult sample discovery method and device and computer equipment
JP2007052575A (en) Metadata applying device and metadata applying method
CN111222409A (en) Vehicle brand labeling method, device and system
CN111738036A (en) Image processing method, device, equipment and storage medium
CN113673500A (en) Certificate image recognition method and device, electronic equipment and storage medium
CN110717492A (en) Method for correcting direction of character string in drawing based on joint features
JP6435934B2 (en) Document image processing program, image processing apparatus and character recognition apparatus using the program
CN110781195B (en) System, method and device for updating point of interest information
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium
CN111444746B (en) Information labeling method based on neural network model
CN114049540A (en) Method, device, equipment and medium for detecting marked image based on artificial intelligence
CN110427828B (en) Face living body detection method, device and computer readable storage medium
CN110837571A (en) Photo classification method, terminal device and computer readable storage medium
JP6567638B2 (en) Noseprint matching system, noseprint matching method, and noseprint matching program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40032304

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant