CN112633065A - Face detection method, system, storage medium and terminal based on data enhancement - Google Patents

Face detection method, system, storage medium and terminal based on data enhancement Download PDF

Info

Publication number
CN112633065A
CN112633065A CN202011303542.8A CN202011303542A CN112633065A CN 112633065 A CN112633065 A CN 112633065A CN 202011303542 A CN202011303542 A CN 202011303542A CN 112633065 A CN112633065 A CN 112633065A
Authority
CN
China
Prior art keywords
scale
face
image
data
anchor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011303542.8A
Other languages
Chinese (zh)
Inventor
赵磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Terminus Technology Group Co Ltd
Original Assignee
Terminus Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Terminus Technology Group Co Ltd filed Critical Terminus Technology Group Co Ltd
Priority to CN202011303542.8A priority Critical patent/CN112633065A/en
Publication of CN112633065A publication Critical patent/CN112633065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The invention discloses a face detection method, a system, a storage medium and a terminal based on data enhancement, wherein the method comprises the following steps: acquiring a target image to be detected from a pre-divided test set; inputting the target image to be detected into a pre-trained face detection model; the face detection model is generated based on training data samples after data enhancement; outputting the detected multiple faces; wherein the plurality of faces carry bounding boxes labeled by categories. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.

Description

Face detection method, system, storage medium and terminal based on data enhancement
Technical Field
The invention relates to the technical field of deep learning of computers, in particular to a face detection method, a face detection system, a storage medium and a terminal based on data enhancement.
Background
In a target detection task based on deep learning, especially in a face detection task widely applied to an actual scene, the difficulty in detecting faces and small faces is high, and many technical challenges are faced because the resolution ratio of pictures is low, the pictures are blurred, and background noise is high.
The existing face detection method mainly comprises the steps of detecting a face by using a traditional image pyramid and a multi-scale sliding window. The method is also based on a data amplification method, and the human face detection performance is improved by increasing the number and the types of human face samples; based on a feature fusion method, the multi-scale features of a high layer and a low layer are fused to improve the detection performance; a method based on data anchor sampling and matching strategies; methods that utilize contextual information, and the like. In the data anchor sampling and matching strategy-based method, the Pyramidbox network proposes a method for supervising and learning small, fuzzy and partially-occluded faces based on anchor context information. Meanwhile, a Data-anchor-sampling (DAS) training strategy based on scale perception is proposed to change the distribution of large and small faces in a training sample. The DAS method has obvious effect on small-scale face detection, and the sampling mode causes unbalanced distribution of face data of different scales, so that the detection performance of a face detection model is reduced, and the detection precision of the face in an image is further reduced.
Disclosure of Invention
The embodiment of the application provides a face detection method, a face detection system, a storage medium and a terminal based on data enhancement. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a method for detecting a face based on data enhancement, where the method includes:
acquiring a target image to be detected from a pre-divided test set;
inputting the target image to be detected into a pre-trained face detection model; wherein the face detection model is generated based on training data sample training after data enhancement;
outputting the detected multiple faces; wherein the plurality of faces carry bounding boxes labeled by categories.
Optionally, after the generating the processed face detection frame, the method further includes:
and displaying the detected faces.
Optionally, before the obtaining the target image to be detected from the pre-divided test set, the method further includes:
collecting a plurality of image data samples with human faces;
dividing the image data sample with the face into a training set, a verification set and a test set;
performing data enhancement processing on the image data samples in the training set to generate training data after data anchor sampling;
creating a face detection model through a face detection algorithm (pyramid);
inputting the training data after the data anchor sampling into the created human face detection model for training, and outputting a model loss value;
and when the model loss value reaches a preset minimum threshold value, generating a pre-trained face detection model.
Optionally, the performing data enhancement processing on the image data samples in the training set to generate training data after data anchor sampling includes:
selecting any image sample from the training set and determining the image sample as an image sample to be enhanced;
selecting a face bounding box of any scale from the image sample to be enhanced according to anchor (anchor) scales of different predefined sizes and/or different aspect ratios;
zooming the selected face bounding box with any dimension, and acquiring a 640x 640 sub-area image of the target face from the zoomed face bounding box;
and continuing to select an image sample from the rest image samples in the training set to determine the image sample to be enhanced until all the image samples in the training set have 640 × 640 sub-region images, and generating training data after data anchor sampling.
Optionally, the scaling the selected face bounding box of any scale includes:
aiming at different sizes of faces to be detected of 6 different detection network layers in a Pyramidbox detection network, obtaining an Anchor scale set {16, 32,64,128,256 and 512 };
traversing the face Anchor scales closest to the face bounding box one by one from the Anchor scale set by adopting a traversal algorithm;
selecting an index of any scaling scale from a pre-designed target scaling scale set according to the human face Anchor scale, and acquiring the scale of the index from the Anchor scale set according to the selected index of any scaling scale;
calculating the scale required by the target face corresponding to the face bounding box with any selected scale according to the scale of the obtained index; the calculation formula of the scale required by the target face is as follows:
Figure BDA0002787527620000031
wherein the content of the first and second substances,
Figure BDA0002787527620000032
the scale for acquiring the index in the Anchor scale set;
calculating a scale scaling parameter based on the scale required by the target face and the scale corresponding to the face bounding box with any selected scale;
and scaling the selected human face bounding box with any scale according to the scale scaling parameter.
Optionally, the scaling parameter calculation formula is s*=starget/sface(ii) a Wherein s istargetIs the scale required by the target face, sfaceAnd the scale corresponding to the selected face bounding box with any scale is obtained.
In a second aspect, an embodiment of the present application provides a face detection system based on data enhancement, where the face detection system includes:
the image acquisition module is used for acquiring a target image to be detected from a pre-divided test set;
the image input module is used for inputting the target image to be detected into a pre-trained face detection model; the face detection model is generated based on training data samples after data enhancement;
the image output module is used for outputting a plurality of detected human faces; and the plurality of faces are provided with bounding boxes marked by classes.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, the data enhancement-based face detection system firstly acquires a target image to be detected from a pre-divided test set, then inputs the target image to be detected into a pre-trained face detection model, wherein the face detection model is generated by training based on a training data sample after data enhancement, and finally outputs a plurality of detected faces, wherein the faces have a boundary frame with class marks. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flowchart of a method for detecting a face based on data enhancement according to an embodiment of the present application;
FIG. 2 is a flowchart of a data enhancement method for training data according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for training a face detection model based on data enhancement according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a system of a face detection system based on data enhancement according to an embodiment of the present application;
fig. 5 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Up to now, the existing face detection methods mainly include the traditional image pyramid and the multi-scale sliding window to detect the face. The method is also based on a data amplification method, and the human face detection performance is improved by increasing the number and the types of human face samples; based on a feature fusion method, the multi-scale features of a high layer and a low layer are fused to improve the detection performance; a method based on data anchor sampling and matching strategies; methods that utilize contextual information, and the like. In the data anchor sampling and matching strategy-based method, the Pyramidbox network proposes a method for supervising and learning small, fuzzy and partially-occluded faces based on anchor context information. Meanwhile, a Data-anchor-sampling (DAS) training strategy based on scale perception is proposed to change the distribution of large and small faces in a training sample. The DAS method has obvious effect on small-scale face detection, and the sampling mode causes unbalanced distribution of face data of different scales, so that the detection performance of a face detection model is reduced, and the detection precision of the face in an image is further reduced. Therefore, the present application provides a method, a system, a storage medium and a terminal for face detection based on data enhancement, so as to solve the problems existing in the related art. In the technical scheme provided by the application, because the training sample that this application will gather carries out data enhancement for the big or small face data distribution of different yards is more balanced, trains the back through the training data after the reinforcing to pyramidbox network in the face detection model, makes the detection performance of face detection model promote by a wide margin, thereby has further improved the detection precision of people's face in the image, adopts the following exemplary embodiment to carry out the detailed description.
The following describes in detail a face detection method based on data enhancement according to an embodiment of the present application with reference to fig. 1 to 3. The method may be implemented in dependence on a computer program, which may be run on a von neumann based data enhancement face detection system. The computer program may be integrated into the application or may be identified as running as a separate tool-class application.
Referring to fig. 1, a schematic flow chart of a face detection method based on data enhancement is provided in an embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:
s101, acquiring a target image to be detected from a pre-divided test set;
the verification set is sample image data for verifying whether the model reaches a preset minimum loss value, and the test set is used for testing whether the detection precision of the model reaches an expected effect. The image to be detected is an image frame in the test set.
Generally, an existing face detection method is based on a Data anchor sampling and matching strategy to detect, and when detecting, a face detection model is used to detect a plurality of faces (for example, a plurality of small face images) in a target image to be detected, wherein the face detection model is created and trained through a Pyramidbox network, and the Pyramidbox algorithm neural network proposes a method for monitoring blur and partial occlusion of learned faces based on anchor (anchor) context information and simultaneously proposes a Data-anchor-sampling (das) training strategy based on scale perception to change the distribution of large and small faces in a training sample and increase small face samples in a Data enhancement mode.
The DAS method has obvious effect of enhancing the small-scale face, and the sampling mode causes the unbalanced distribution of the face data of different scales, so that the detection performance of the face detection model is reduced, and the detection precision of the face in the image is further reduced. Therefore, the existing DAS sampling method is effectively improved, so that after the training data are enhanced, the large and small face data distribution with different scales is more balanced, and the detection performance of the trained pyramid network is greatly improved.
In a possible implementation manner, when the data enhancement-based face detection system performs detection, a target image to be detected is first obtained from a pre-divided test set, where the target image includes a large number of faces (e.g., a large number of small face images).
S102, inputting the target image to be detected into a pre-trained face detection model; the face detection model is generated based on training data samples after data enhancement;
the pre-trained face detection model is a mathematical model for detecting a plurality of faces in an image frame.
Generally, when a pre-trained face detection model is used for model training, firstly, a plurality of image data samples with faces are collected, then, the image data samples with the faces are divided into a training set, a verification set and a test set, then, data enhancement processing is performed on the image data samples in the training set to generate training data after data anchor sampling, then, a face detection model is created through a face detection algorithm (PyramidBox), finally, the training data after data anchor sampling is input into the created face detection model for training, a model loss value is output, and when the model loss value reaches a preset minimum threshold value, the pre-trained face detection model is generated.
Furthermore, when the data enhancement processing is carried out on the image data samples in the training set to generate the training data after the data anchor sampling, firstly, selecting any image sample from the training set to determine as an image sample to be enhanced, then selecting a human face bounding box with any scale from the image sample to be enhanced according to anchor (anchor) scales with different predefined sizes and/or different aspect ratios, and then scaling the selected face bounding box with any scale, acquiring a 640x 640 sub-region image of the target face from the scaled face bounding box, and finally continuing to select one image sample from the rest image samples in the training set to determine an image sample to be enhanced until all the image samples in the training set have the 640x 640 sub-region image, and generating training data after data anchor sampling.
Further, when acquiring the priority of each image sample in the training set, the acquisition time indicated by each image sample in the training set is firstly acquired, and then the priority of each image in the training set is determined based on the acquisition time.
Further, when the face bounding box of any selected scale is zoomed, different scale sizes of faces to be detected of 6 different detection network layers in the Pyramidbox detection network are firstly obtained to obtain an Anchor scale set {16, 32,64,128,256,512}, wherein, the numerical value of the Anchor scale set scale corresponds to a plurality of face related region data in the face bounding box, then, a traversal algorithm is adopted to traverse the face Anchor dimension closest to the face bounding box from the Anchor dimension set one by one, then an index of any scaling dimension is selected from a pre-designed target scaling dimension set according to the face Anchor dimension, acquiring the scale of the index from the Anchor scale set according to the index of any selected scaling scale, and calculating the scale required by the target human face corresponding to the human face bounding box of any selected scale according to the scale of the acquired index; wherein, the calculation formula of the scale required by the target face is as follows:
Figure BDA0002787527620000071
wherein the content of the first and second substances,
Figure BDA0002787527620000072
the index scale is obtained from the Anchor scale set, the scale scaling parameter is calculated based on the scale required by the target face and the scale corresponding to the face bounding box of any selected scale, and finally the face bounding box of any selected scale is scaled according to the scale scaling parameter.
Further, the scaling parameter is calculated as s*=starget/sface(ii) a Wherein s istargetDimension, s, required for the target facefaceAnd selecting the corresponding dimension of the face bounding box with any dimension.
In a possible implementation manner, after the face detection model is created and the face detection model training is finished, the target image to be detected acquired in step S101 is input into the face detection model of which the training is finished, so that a plurality of faces in the target image to be detected are detected.
S103, outputting a plurality of detected human faces; wherein the plurality of faces carry bounding boxes labeled by categories.
Generally, for example, as shown in fig. 2, fig. 2 is a specific flowchart of a method for enhancing data with respect to training data according to an embodiment of the present application, which first collects face data and divides the collected face data into a training set, a verification set and a test set, then randomly selects an image from the training set, then randomly selects a face scale from the image, selects a face anchor scale closest to the face scale from all anchor scales of the image, then randomly selects a scale index from a pre-generated target scaling scale set, randomly selects a target scale face according to the scale index, then calculates a face scaling scale according to the first selected face scale and the target face scale, adjusts an original image into a scaled image according to the scaling scale, and finally randomly selects a sub-image containing a face from the scaled image, and repeating the above method process to obtain a satisfied number of training samples for training the detection network.
For ease of understanding, the data enhancement process is illustrated here, and 1, a Sface face gtbox is randomly selected from the image. 2. The anchor dimension S1 that matches the gtbox is found from the anchor dimensions (anchors) (16,32,64,128,256, 512). 3. A scale S2 is randomly selected from the targets (16,32,64, S1 x 2). 4. The original image including Sface is changed (resize), the specific value of resize is S2/Sface, 5, and a graph with a size of 640 × 640 is trained from crop in the pictures after resize.
Such as: the first step is to randomly select a face, and assume the scale of the face to be 140 pix; the second step is to find the predefined anchor scale, 128pix, that best matches it; the third step is to randomly select a target dimension, such as 32pix, from {16, 32,64,128,256 }; fourthly, making the original map img1 containing 140pix human faces into resize of scale 32/140-0.2285 to obtain img 2; the sixth step is to crop a 640x 640 sub-image containing the face from the crop in img 2.
DAS operation can change the distribution of data: the proportion of the small-scale face is improved by 1, and the small-scale face is generated by 2 times of the large-scale face so as to increase the diversity of the small-scale face.
In a possible implementation manner, after a plurality of faces are detected based on step S102, the detected faces are output and displayed.
In the embodiment of the application, the data enhancement-based face detection system firstly acquires a target image to be detected from a pre-divided test set, then inputs the target image to be detected into a pre-trained face detection model, wherein the face detection model is generated by training based on a training data sample after data enhancement, and finally outputs a plurality of detected faces, wherein the faces have a boundary frame with class marks. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.
Fig. 3 is a schematic flow chart of a model training method based on a data-enhanced face detection model according to an embodiment of the present application. The model training method based on the data enhanced face detection model can comprise the following steps:
s201, collecting a plurality of image data samples with human faces;
s202, dividing an image data sample with a human face into a training set, a verification set and a test set;
s203, selecting any image sample from the training set and determining the image sample as an image sample to be enhanced;
s204, selecting a human face bounding box with any dimension from the image samples to be enhanced according to anchor (anchor) dimensions with different predefined sizes and/or different aspect ratios;
in one possible implementation, a scale S is randomly selected from a training imagefaceThe face bounding box of (1), where the predefined anchor dimensions in the PyramidBox are: si=24+iWherein i is 0, 1.
S205, scaling the selected face bounding box with any scale, and acquiring 640x 640 sub-area images of the target face from the scaled face bounding box;
in one possible implementation, when scaling, the sum S is first found from all anchor scalesfaceThe closest face anchor dimension. The calculation formula is as follows:
Figure BDA0002787527620000091
wherein ianchorAn index representing the anchor scale that best matches the bounding box based on the selected face.
Then, in order to maintain the equalization of the randomly selected scale, it is necessary to randomly select an index i of one scale from the target scaling scale set according to the condition of selecting different scalestarget. The calculation formula is as follows:
itarget=random{set0in which set is0={i0,i1}
itarget=random{set1In which set is1={i0,i1,i2}
itarget=random{set2In which set is2={i0,i1,i2,i3}
itarget=random{set3In which set is3={i2,i3,i4,i5}
itarget=random{set4In which set is4={i3,i4,i5}
itarget=random{set5In which set is5={i4,i5}
Set herein0,set1,set2,set3,set4,set5Respectively represent ianchorAnd taking the target scaling scale set when the anchor scale indexes are 0,1, 2, 3, 4 and 5.
And in order to maintain the scale continuity of the interval, the target face scale is changed into:
Figure BDA0002787527620000101
wherein
Figure BDA0002787527620000102
Is (3) a mesoscale index itargetCorresponding anchor scale.
Finally, the face dimension SfaceThe scaling required for the readjustment is:
s*=starget/sface
and scaling the original training image by a scaling scale s*And adjusting the image to be a scaled image, and randomly selecting 640x 640 sub-area images containing the human face from the scaled image to obtain training data after data anchor sampling.
S206, continuously determining the image samples to be enhanced from the training set according to the high-low sequence of the priority, and generating training data after data anchor sampling when all the image samples in the training set have 640 × 640 sub-region images;
s207, creating a face detection model through a face detection algorithm (pyramid Box);
s208, inputting the training data after data anchor sampling into the created human face detection model for training, and outputting a model loss value;
and S209, when the model loss value reaches a preset minimum threshold value, generating a pre-trained face detection model.
In the embodiment of the application, the data enhancement-based face detection system firstly acquires a target image to be detected from a pre-divided test set, then inputs the target image to be detected into a pre-trained face detection model, wherein the face detection model is generated by training based on a training data sample after data enhancement, and finally outputs a plurality of detected faces, wherein the faces have a boundary frame with class marks. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.
The following are embodiments of systems of the present invention that may be used to perform embodiments of methods of the present invention. For details which are not disclosed in the embodiments of the system of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 4, a schematic structural diagram of a data enhancement-based human face detection system according to an exemplary embodiment of the present invention is shown. The data enhancement-based face detection system can be implemented by software, hardware or a combination of the two to form all or part of an intelligent robot. The system 1 comprises an image acquisition module 10, an image input module 20, and an image output module 30.
The image acquisition module 10 is used for acquiring a target image to be detected from a pre-divided test set;
the image input module 20 is configured to input the target image to be detected into a pre-trained face detection model; wherein the face detection model is generated based on training data sample training after data enhancement;
an image output module 30 for outputting the detected plurality of faces; wherein the plurality of faces carry bounding boxes labeled by categories.
It should be noted that, when the face detection system based on data enhancement provided by the above embodiment executes the face detection method based on data enhancement, the division of the above functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the embodiment of the face detection system based on data enhancement and the embodiment of the face detection method based on data enhancement provided by the above embodiments belong to the same concept, and details of implementation processes are found in the embodiments of the methods, which are not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, the data enhancement-based face detection system firstly acquires a target image to be detected from a pre-divided test set, then inputs the target image to be detected into a pre-trained face detection model, wherein the face detection model is generated by training based on a training data sample after data enhancement, and finally outputs a plurality of detected faces, wherein the faces have a boundary frame with class marks. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.
The present invention also provides a computer readable medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the method for detecting a human face based on data enhancement provided by the above-mentioned method embodiments. The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for face detection based on data enhancement of the above-described method embodiments.
Please refer to fig. 5, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 5, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 1001 may include one or more processing cores, among other things. Processor 1001 interfaces various components throughout terminal 1000 using various interfaces and lines to perform various functions and process data of terminal 1000 by executing or executing instructions, programs, code sets, or instruction sets stored in memory 1005 and invoking data stored in memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory system located remotely from the processor 1001. As shown in fig. 5, a memory 1005, which is defined as a computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data-based enhanced face detection application.
In the terminal 1000 shown in fig. 5, the user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the data enhancement based face detection application stored in the memory 1005, and specifically perform the following operations:
acquiring a target image to be detected from a pre-divided test set;
inputting the target image to be detected into a pre-trained face detection model; wherein the face detection model is generated based on training data sample training after data enhancement;
outputting the detected multiple faces; wherein the plurality of faces carry bounding boxes labeled by categories.
In one embodiment, the processor 1001, after executing the generation of the processed face detection box, further performs the following operations:
and displaying the detected faces.
In one embodiment, the processor 1001, before performing the acquisition of the target image to be detected from the pre-partitioned test set, further performs the following operations:
collecting a plurality of image data samples with human faces;
dividing the image data sample with the face into a training set, a verification set and a test set;
performing data enhancement processing on the image data samples in the training set to generate training data after data anchor sampling;
creating a face detection model through a face detection algorithm (pyramid);
inputting the training data after the data anchor sampling into the created human face detection model for training, and outputting a model loss value;
and when the model loss value reaches a preset minimum threshold value, generating a pre-trained face detection model.
In one embodiment, when performing the data enhancement processing on the image data samples in the training set to generate the training data after data anchor sampling, the processor 1001 specifically performs the following operations:
selecting any image sample from the training set and determining the image sample as an image sample to be enhanced;
selecting a face bounding box of any scale from the image sample to be enhanced according to anchor (anchor) scales of different predefined sizes and/or different aspect ratios;
zooming the selected face bounding box with any dimension, and acquiring a 640x 640 sub-area image of the target face from the zoomed face bounding box;
and continuing to select an image sample from the rest image samples in the training set to determine the image sample to be enhanced until all the image samples in the training set have 640 × 640 sub-region images, and generating training data after data anchor sampling.
In one embodiment, when the processor 1001 performs scaling on the selected face bounding box of any scale, the following operation is specifically performed:
aiming at different sizes of faces to be detected of 6 different detection network layers in a Pyramidbox detection network, obtaining an Anchor scale set {16, 32,64,128,256 and 512 };
traversing the face Anchor scales closest to the face bounding box one by one from the Anchor scale set by adopting a traversal algorithm;
selecting an index of any scaling scale from a pre-designed target scaling scale set according to the human face Anchor scale, and acquiring the scale of the index from the Anchor scale set according to the selected index of any scaling scale;
calculating the scale required by the target face corresponding to the face bounding box with any selected scale according to the scale of the obtained index; the calculation formula of the scale required by the target face is as follows:
Figure BDA0002787527620000141
wherein the content of the first and second substances,
Figure BDA0002787527620000142
the scale for acquiring the index in the Anchor scale set;
calculating a scale scaling parameter based on the scale required by the target face and the scale corresponding to the face bounding box with any selected scale;
and scaling the selected human face bounding box with any scale according to the scale scaling parameter.
In one embodiment, the processor 1001 specifically performs the following operations when performing the priority of acquiring each image sample in the training set:
acquiring the acquisition time indicated by each image sample in the training set;
determining a priority of each image in the training set based on the acquisition time.
In the embodiment of the application, the data enhancement-based face detection system firstly acquires a target image to be detected from a pre-divided test set, then inputs the target image to be detected into a pre-trained face detection model, wherein the face detection model is generated by training based on a training data sample after data enhancement, and finally outputs a plurality of detected faces, wherein the faces have a boundary frame with class marks. Therefore, by adopting the embodiment of the application, the data of the collected training samples are enhanced, so that the human face data of different sizes are distributed more uniformly, and after the pyramidbox network in the human face detection model is trained through the enhanced training data, the detection performance of the human face detection model is greatly improved, so that the detection precision of the human face in the image is further improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (9)

1. A face detection method based on data enhancement is characterized by comprising the following steps:
acquiring a target image to be detected from a pre-divided test set;
inputting the target image to be detected into a pre-trained face detection model; the face detection model is generated based on training data samples after data enhancement;
outputting the detected multiple faces; wherein the plurality of faces carry bounding boxes labeled by categories.
2. The method of claim 1, wherein after outputting the detected plurality of faces, further comprising:
and displaying the detected faces.
3. The method according to claim 1, wherein before the obtaining the target image to be detected from the pre-divided test set, the method further comprises:
collecting a plurality of image data samples with human faces;
dividing the image data sample with the face into a training set, a verification set and a test set;
performing data enhancement processing on the image data samples in the training set to generate training data after data anchor sampling;
creating a face detection model through a face detection algorithm (pyramid);
inputting the training data after the data anchor sampling into the created human face detection model for training, and outputting a model loss value;
and when the model loss value reaches a preset minimum threshold value, generating a pre-trained face detection model.
4. The method of claim 3, wherein the performing data enhancement processing on the image data samples in the training set to generate data anchor sampled training data comprises:
selecting an image sample from the training set and determining the image sample as an image sample to be enhanced;
selecting a face bounding box of any scale from the image sample to be enhanced according to anchor (anchor) scales of different predefined sizes and/or different aspect ratios;
zooming the selected face bounding box with any scale, and acquiring a 640x 640 sub-area image of the target face from the zoomed face bounding box;
and continuing to select an image sample from the rest image samples in the training set to determine the image sample to be enhanced until all the image samples in the training set have 640 × 640 sub-region images, and generating training data after data anchor sampling.
5. The method of claim 4, wherein scaling the selected face bounding box of any scale comprises:
aiming at different scales of faces to be detected of 6 different detection network layers in a Pyramidbox detection network, obtaining an Anchor scale set {16, 32,64,128,256 and 512 };
traversing the face Anchor scales closest to the face bounding box one by one from the Anchor scale set by adopting a traversal algorithm;
selecting an index of any scaling scale from a pre-designed target scaling scale set according to the human face Anchor scale, and acquiring the scale of the index from the Anchor scale set according to the selected index of any scaling scale;
calculating the scale required by the target face corresponding to the face bounding box with any selected scale according to the scale of the obtained index; the calculation formula of the scale required by the target face is as follows:
Figure FDA0002787527610000021
wherein the content of the first and second substances,
Figure FDA0002787527610000022
the scale for acquiring the index in the Anchor scale set;
calculating a scale scaling parameter based on the scale required by the target face and the scale corresponding to the face bounding box with any selected scale;
and scaling the selected human face bounding box with any scale according to the scale scaling parameter.
6. The method of claim 5, wherein the scaling parameter calculation formula is s*=starget/sface(ii) a Wherein s istargetIs the scale required by the target face, sfaceAnd the scale corresponding to the selected face bounding box with any scale is obtained.
7. A system for face detection based on data enhancement, the system comprising:
the image acquisition module is used for acquiring a target image to be detected from a pre-divided test set;
the image input module is used for inputting the target image to be detected into a pre-trained face detection model; the face detection model is generated based on training data samples after data enhancement;
the image output module is used for outputting a plurality of detected human faces; wherein the plurality of faces carry bounding boxes labeled by categories.
8. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1 to 6.
9. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 6.
CN202011303542.8A 2020-11-19 2020-11-19 Face detection method, system, storage medium and terminal based on data enhancement Pending CN112633065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011303542.8A CN112633065A (en) 2020-11-19 2020-11-19 Face detection method, system, storage medium and terminal based on data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011303542.8A CN112633065A (en) 2020-11-19 2020-11-19 Face detection method, system, storage medium and terminal based on data enhancement

Publications (1)

Publication Number Publication Date
CN112633065A true CN112633065A (en) 2021-04-09

Family

ID=75303619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011303542.8A Pending CN112633065A (en) 2020-11-19 2020-11-19 Face detection method, system, storage medium and terminal based on data enhancement

Country Status (1)

Country Link
CN (1) CN112633065A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298913A (en) * 2021-06-07 2021-08-24 Oppo广东移动通信有限公司 Data enhancement method and device, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222657A (en) * 2019-06-11 2019-09-10 中国科学院自动化研究所 Single step human-face detector optimization system, method, apparatus
CN110363137A (en) * 2019-07-12 2019-10-22 创新奇智(广州)科技有限公司 Face datection Optimized model, method, system and its electronic equipment
CN110598638A (en) * 2019-09-12 2019-12-20 Oppo广东移动通信有限公司 Model training method, face gender prediction method, device and storage medium
CN111553227A (en) * 2020-04-21 2020-08-18 东南大学 Lightweight face detection method based on task guidance
CN111898410A (en) * 2020-06-11 2020-11-06 东南大学 Face detection method based on context reasoning under unconstrained scene
CN111914665A (en) * 2020-07-07 2020-11-10 泰康保险集团股份有限公司 Face shielding detection method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222657A (en) * 2019-06-11 2019-09-10 中国科学院自动化研究所 Single step human-face detector optimization system, method, apparatus
CN110363137A (en) * 2019-07-12 2019-10-22 创新奇智(广州)科技有限公司 Face datection Optimized model, method, system and its electronic equipment
CN110598638A (en) * 2019-09-12 2019-12-20 Oppo广东移动通信有限公司 Model training method, face gender prediction method, device and storage medium
CN111553227A (en) * 2020-04-21 2020-08-18 东南大学 Lightweight face detection method based on task guidance
CN111898410A (en) * 2020-06-11 2020-11-06 东南大学 Face detection method based on context reasoning under unconstrained scene
CN111914665A (en) * 2020-07-07 2020-11-10 泰康保险集团股份有限公司 Face shielding detection method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU TANG等: "PyramidBox: A Context-assisted Single Shot Face Detector", 《ARXIV:1803.07737V2[CS.CV]》, pages 1 - 21 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298913A (en) * 2021-06-07 2021-08-24 Oppo广东移动通信有限公司 Data enhancement method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN112434721B (en) Image classification method, system, storage medium and terminal based on small sample learning
CN108664981B (en) Salient image extraction method and device
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN109671020B (en) Image processing method, device, electronic equipment and computer storage medium
CN111104962A (en) Semantic segmentation method and device for image, electronic equipment and readable storage medium
JP2023520846A (en) Image processing method, image processing apparatus, computer program and computer equipment based on artificial intelligence
CN109583509B (en) Data generation method and device and electronic equipment
CN112488999B (en) Small target detection method, small target detection system, storage medium and terminal
CN108875931B (en) Neural network training and image processing method, device and system
CN111914843B (en) Character detection method, system, equipment and storage medium
WO2022227770A1 (en) Method for training target object detection model, target object detection method, and device
CN112149694B (en) Image processing method, system, storage medium and terminal based on convolutional neural network pooling module
CN114066718A (en) Image style migration method and device, storage medium and terminal
CN112633077A (en) Face detection method, system, storage medium and terminal based on intra-layer multi-scale feature enhancement
CN115330940A (en) Three-dimensional reconstruction method, device, equipment and medium
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN112633065A (en) Face detection method, system, storage medium and terminal based on data enhancement
CN112633085A (en) Human face detection method, system, storage medium and terminal based on attention guide mechanism
CN110796115A (en) Image detection method and device, electronic equipment and readable storage medium
CN114820755B (en) Depth map estimation method and system
CN114529689B (en) Ceramic cup defect sample amplification method and system based on antagonistic neural network
TWI803243B (en) Method for expanding images, computer device and storage medium
CN114511702A (en) Remote sensing image segmentation method and system based on multi-scale weighted attention
CN113222843B (en) Image restoration method and related equipment thereof
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination