CN111652051B

CN111652051B - Face detection model generation method, device, equipment and storage medium

Info

Publication number: CN111652051B
Application number: CN202010315193.5A
Authority: CN
Inventors: 王祥雪; 林焕凯; 贺迪龙; 刘双广
Original assignee: Gosuncn Technology Group Co Ltd
Current assignee: Gosuncn Technology Group Co Ltd
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2023-06-16
Anticipated expiration: 2040-04-21
Also published as: CN111652051A

Abstract

The invention discloses a face detection model generation method, which is suitable for an improved TinyDSOD network model and comprises the following steps: initializing parameters of the improved TinyDSOD network model; combining the complete face images and the incomplete face images of a plurality of different users, and inputting the combined complete face images and the incomplete face images serving as training sets into the improved TinyDSOD network model so as to train the improved TinyDSOD network model; the incomplete face image is a face image of a user blocked by a blocking object; and outputting the improved TinyDSOD network model as a face detection model after training is completed. The invention also discloses a face detection model generating device, a face detection model generating device and a computer readable storage medium. By adopting the embodiment of the invention, the generated face detection model can detect incomplete faces and improve the face detection efficiency and accuracy.

Description

Face detection model generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of face detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a face detection model.

Background

In some application scenarios, such as hospitals, research institutions and the like, part of staff needs to wear a mask for a long time to operate, when face detection is carried out, most of features are lost when the face is shielded by the mask, the whole face is usually required to be presented in the existing face detection technology to be normally detected, the face is required to be taken off by the staff for recognition, the face detection speed is definitely slow when the flow of people is large, and the user experience is poor. Therefore, detection of incomplete faces is a highly desirable problem.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device, equipment and a storage medium for generating a face detection model, wherein the generated face detection model can detect incomplete faces and improve face detection efficiency and accuracy.

In order to achieve the above object, an embodiment of the present invention provides a face detection model generation method, which is applicable to an improved TinyDSOD network model, including:

initializing parameters of the improved TinyDSOD network model;

combining the complete face images and the incomplete face images of a plurality of different users, and inputting the combined complete face images and the incomplete face images serving as training sets into the improved TinyDSOD network model so as to train the improved TinyDSOD network model; the incomplete face image is a face image of a user blocked by a blocking object;

and outputting the improved TinyDSOD network model as a face detection model after training is completed.

Compared with the prior art, the face detection model generation method disclosed by the embodiment of the invention comprises the steps of initializing parameters of the improved TinyDSOD network model; secondly, combining complete face images and incomplete face images of a plurality of different users, and inputting the combined complete face images and the incomplete face images serving as training sets into the improved TinyDSOD network model so as to train the improved TinyDSOD network model, wherein the category combination can reduce the complexity of the model and improve the detection rate; and finally, outputting the improved TinyDSOD network model as a face detection model after training is completed. According to the face detection model generation method disclosed by the embodiment of the invention, the generated face detection model can detect incomplete faces, and meanwhile, the face detection efficiency and accuracy are improved.

As an improvement of the scheme, the improved TinyDSOD network model is obtained by replacing a three-layer convolution pooling structure of a Stem module of the original TinyDSOD network model with a two-way structure; the two-path structure comprises a path structure for performing convolution operation and inversion operation and a path structure for performing pooling operation.

As an improvement of the scheme, the improved TinyDSOD network model is obtained by replacing the DDB-b structure of the original TinyDSOD network model with a DDB-b-plus structure; the DDB-b-plus structure comprises a structure for performing no operation and two other structures for performing convolution operation, wherein one of the two other structures for performing convolution operation is configured with a corresponding expansion coefficient.

As an improvement of the scheme, the improved TinyDSOD network model is obtained by replacing the depth separable convolution after the upsample layer of the original TinyDSOD network model with 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding coefficient of expansion.

As an improvement of the scheme, the number of feature graphs in the modified TinyDSOD network model is 4, and the aspect ratio on each feature graph is 1.

In order to achieve the above object, an embodiment of the present invention further provides a face detection model generating device, which is applicable to an improved TinyDSOD network model, including:

the initialization parameter module is used for initializing parameters of the improved TinyDSOD network model;

the training module is used for combining the complete face images and the incomplete face images of a plurality of different users and inputting the combined complete face images and the incomplete face images serving as training sets into the improved TinyDSOD network model so as to train the improved TinyDSOD network model; the incomplete face image is a face image of a user blocked by a blocking object;

and the model output module is used for outputting the improved TinyDSOD network model as a face detection model after training is completed.

Compared with the prior art, the face detection model generating device disclosed by the embodiment of the invention comprises the following steps that firstly, an initialization parameter module initializes parameters of the improved TinyDSOD network model; secondly, the training module combines the complete face images and the incomplete face images of a plurality of different users and then inputs the combined complete face images and the incomplete face images into the improved TinyDSOD network model as a training set so as to train the improved TinyDSOD network model, and the category combination can reduce the complexity of the model and improve the detection rate at the same time; and finally, the model output module outputs the improved TinyDSOD network model as a face detection model after training is completed. The face detection model generating device disclosed by the embodiment of the invention can detect incomplete faces and improve the face detection efficiency and accuracy.

As an improvement of the scheme, the improved TinyDSOD network model is obtained by replacing a three-layer convolution pooling structure of a Stem module of the original TinyDSOD network model with a two-way structure; the two-path structure comprises a path of structure for performing convolution operation and inversion operation and a path of structure for performing pooling operation;

the improved TinyDSOD network model is obtained by replacing a DDB-b structure of an original TinyDSOD network model with a DDB-b-plus structure; the DDB-b-plus structure comprises a structure for performing no operation and two other structures for performing convolution operation, wherein one of the two other structures for performing convolution operation is configured with a corresponding expansion coefficient.

As an improvement of the scheme, the improved TinyDSOD network model is obtained by replacing the depth separable convolution after the upsample layer of the original TinyDSOD network model with 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding expansion coefficient;

the number of feature graphs in the improved TinyDSOD network model is 4, and the aspect ratio on each feature graph is 1.

To achieve the above object, an embodiment of the present invention further provides a face detection model generating device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the face detection model generating method according to any one of the above embodiments.

In order to achieve the above object, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the face detection model generating method according to any one of the above embodiments.

Drawings

FIG. 1 is a flowchart of a face detection model generation method provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a Stem module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a DDB-b structure in an original TinyDSOD network model provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a DDB-b-plus structure in an improved TinyDSOD network model provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of the structure of 3 parallel, depth-separable convolutions in an improved TinyDSOD network model provided by an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a face detection model generating device according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a face detection model generating device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart of a face detection model generating method according to an embodiment of the present invention; the face detection model generation method comprises the following steps:

s1, initializing parameters of the improved TinyDSOD network model;

s2, combining complete face images and incomplete face images of a plurality of different users, and inputting the combined complete face images and the incomplete face images serving as training sets into the improved TinyDSOD network model so as to train the improved TinyDSOD network model;

and S3, outputting the improved TinyDSOD network model as a face detection model after training is completed.

It is worth to be noted that the face detection model generation method provided by the embodiment of the invention is suitable for the improved TinyDSOD network model. The face detection model generation method can be implemented by the face detection/recognition device. The incomplete face image is a face image of a user blocked by a blocking object, for example, the incomplete face image is a face image of the user wearing a mask.

The original TinyDSOD network model is based on the thought of a backbone network and a feature pyramid, denseBlock in DenseNet is used as a basic component of the backbone network, and meanwhile convolution operation in DenseBlock is replaced by deep separable convolution, so that the network extraction capacity is ensured, and meanwhile, the detection speed is improved. In addition, the idea of feature fusion is introduced into the original TinyDSOD network model, and high-level features are fused with adjacent low-level features upwards, so that the detection capability of a small target is improved, and the backbone network structure of the original TinyDSOD is shown in table 1.

TABLE 1 original TinyDSOD network structure

Specifically, the original TinyDSOD network model is improved in advance.

The three-layer convolution and pooling structure of the Stem module in the original TinyDSOD network model is replaced by a two-way structure; the two-path structure comprises a path structure for performing convolution operation and inversion operation and a path structure for performing pooling operation.

For example, referring to fig. 2, the three-layer convolution and pooling structure of the original TinyDSOD is changed into two paths, one path is subjected to convolution twice, the inverse operation is added, and the other path is directly subjected to pooling operation, so that the structural design can ensure that the Stem module can maintain the diversity of the features as much as possible in the process of realizing downsampling.

The DDB-b structure in the original TinyDSOD network model is replaced by a DDB-b-plus structure; the DDB-b-plus structure comprises a structure for performing no operation and two other structures for performing convolution operation, wherein one of the two other structures for performing convolution operation is configured with a corresponding expansion coefficient.

Exemplary, referring to fig. 3, fig. 3 is a schematic diagram of DDB-b structure in the original tinydssod network model. The DDB-b is changed into a DDB-b-plus structure, as shown in fig. 4, the DDB-b-plus structure is changed from two paths to three paths, and the expansion coefficient is introduced into the added third path, so that the receptive field of convolution calculation is enlarged.

The depth separable convolution after the upsample layer in the original TinyDSOD network model is replaced by 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding coefficient of expansion. Illustratively, as shown in fig. 5, different expansion coefficients are configured in each depth separable convolution, so that the receptive field of convolution calculation is further increased, and the characteristic loss is reduced.

The number of feature graphs in the improved TinyDSOD network model is 4, and the aspect ratio on each feature graph is 1. For example, the sizes of the complete face and the incomplete face in the training set are counted, the feature images of the original tinydssod are reduced to 4, and the aspect ratio of the preset achorpox on the 4-layer feature images is fixed to 1 because the face is approximately square, and the specific sizes are shown in table 2.

TABLE 2 Anchor Box size parameters

Feature layer name	Anchor Box size
		First_out_norm_mbox_priorbox	16,24,32
Second_out_norm_mbox_priorbox	48,64
		Third_out_norm_mbox_priorbox	96,128
Fourth_norm_mbox_priorbox	192,224,256

In the image shot by the patrol robot, the human face belongs to a small target, so the proposal improves a data enhancement module in an original TinyDSOD network model. The original data enhancement module comprises operations of random cutting, overturning and the like on the original data to expand the training data set, wherein the area ratio of the randomly cut image to the labeling frame of GroundTruth is 0.1,0.3,0.5,0.7,0.9 and 1.0 respectively, taking 0.1 as an example, which means that as long as the selected image area contains 10% of GroundTruth, the area can be cut for expanding the training set, but such cutting can cause a large number of images of incomplete targets in the training set. In order to make the clipped image include a more complete GroundTruth, the parameters of the batch are changed to 1.0, that is, only when the selected region is completely covered by a certain GroundTruth, the region image is clipped into the training set.

Furthermore, the embodiment of the invention also cuts the improved TinyDSOD network model to meet the application requirement of light patrol robots. The number of repeated modules in each DenseBlock in the original TinyDSOD network model before improvement is changed from original 4, 6 and 6 to 2, so that model parameters are greatly reduced, but receptive fields are increased, the feature extraction capacity is not lost, and the light-weight requirement is met.

The embodiment of the invention adopts the loss function as shown in the formula (1), and can simultaneously realize the position regression and the target classification of the target, wherein the loss function L is the sum of the classification confidence loss and the position loss.

Where N is the number of DefaultBox that matches GroundTruth (actual object); l (L) _conf (z, c) is a classification confidence loss, L _loc (z, l, g) is the loss of position of DefaultBox; z is the matching result of the DefaultBox and the reference object frames of different categories; c is the confidence of the predicted object frame; l is the position information of the predicted object frame; g is the position information of the labeling frame of GroundTruth; alpha is a parameter that balances confidence loss and location loss, and is typically set to 1, i.e., the weights of both losses are the same.

Specifically, in step S1, parameters of the modified tinydssod network model are randomly initialized. In step S2, the complete face images and the incomplete face images of a plurality of different users are combined and then input as a training set into the modified TinyDSOD network model, so as to train the modified TinyDSOD network model. In step S3, after training is completed, the modified TinyDSOD network model is output as a face detection model. In addition, in the training process, the adopted system is ubuntu16.04, the GPU is GTX1080, the training frame is caffe-ssd, the training mode is end2end, the initial learning rate is 0.1, the max_iter is 100000, the learning rate changing strategy is multistep (20000, 40000, 60000, 80000), the Momentum is 0.9, the weight attenuation is 0.0005, the optimization method adopts SGD, and the image input size is 320 x 320.

Further, after the face detection model is obtained through training, the evaluation indexes referred to in the embodiment of the invention comprise an accuracy rate and a recall rate, wherein the accuracy rate is the proportion of correct detection in all detected targets, the recall rate is the proportion of correct detection in the total detection number, and the total detection number comprises a positive detection number, a missed detection number and a false detection number, as shown in the formula (2) and the formula (3).

Accuracy = positive number/(positive number + false number) formula (2);

recall = positive number/(positive number + missed number + false number) equation (3).

Based on a deep neural network, the embodiment of the invention provides a light-weight method for incomplete face detection, which is based on TinyDSOD, and compared with the original TinyDSOD model, the detection rate and the accuracy rate of the incomplete face data set when the test environment is GTX1080 are greatly improved, and the detailed result is shown in a table 3. Compared with the original TinyDSOD, the size of the model in the embodiment of the invention is greatly reduced, the detection speed is faster, and the application requirement of the front end light weight of the patrol robot is met.

TABLE 3 comparison of results of the modified TinyDSOD of the invention with the original TinyDSOD

Referring to fig. 6, fig. 6 is a schematic structural diagram of a face detection model generating apparatus 10 according to an embodiment of the present invention; the face detection model generation apparatus 10 includes:

an initialization parameter module 11, configured to initialize parameters of the modified TinyDSOD network model;

the training module 12 is configured to combine the complete face images and the incomplete face images of a plurality of different users and input the combined complete face images and the incomplete face images as a training set into the modified TinyDSOD network model, so as to train the modified TinyDSOD network model; the incomplete face image is a face image of a user blocked by a blocking object, for example, the incomplete face image is a face image of the user wearing a mask;

and the model output module 13 is used for outputting the TinyDSOD network model as a face detection model after training is completed.

The depth separable convolution after the upsample layer in the original TinyDSOD network model is replaced by 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding coefficient of expansion.

The number of feature graphs in the original TinyDSOD network model is 4, and the aspect ratio on each feature graph is 1.

The working process of the face detection model generating device 10 is referred to the working process of the face detection model generating method in the above embodiment, and will not be described herein.

Compared with the prior art, the face detection model generating device 10 disclosed by the embodiment of the invention comprises the following steps that firstly, an initialization parameter module 11 initializes parameters of the improved TinyDSOD network model; secondly, the training module 12 combines the complete face images and the incomplete face images of a plurality of different users and then inputs the combined complete face images and the incomplete face images into the improved TinyDSOD network model as a training set so as to train the improved TinyDSOD network model, and the category combination can reduce the complexity of the model and improve the detection rate at the same time; finally, the model output module 13 outputs the modified TinyDSOD network model as a face detection model after training is completed. The face detection model generating device disclosed by the embodiment of the invention can detect incomplete faces and improve the face detection efficiency and accuracy.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a face detection model generating apparatus 20 according to an embodiment of the present invention. The face detection model generation device 20 of this embodiment includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21 implements the steps in the above-described face detection model generation method embodiment, such as steps S1 to S3 shown in fig. 1, when executing the computer program. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program, for example, the initialization parameter module 11.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the face detection model generating device 20. For example, the computer program may be divided into an initialization parameter module 11, a training module 12 and a model output module 13, and specific functions of each module refer to specific working procedures of the face detection model generating apparatus 10 described in the foregoing embodiments, which are not described herein.

The face detection model generating device 20 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The face detection model generation device 20 may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the face detection model generating device 20, and does not constitute a limitation of the face detection model generating device 20, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the face detection model generating device 20 may further include an input-output device, a network access device, a bus, etc.

The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor 21 may be any conventional processor or the like, and the processor 21 is a control center of the face detection model generating apparatus 20, and connects the respective parts of the entire face detection model generating apparatus 20 using various interfaces and lines.

The memory 22 may be used to store the computer program and/or module, and the processor 21 may implement various functions of the face detection model generating device 20 by running or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the modules/units integrated by the face detection model generating device 20 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. The face detection model generation method is characterized by being suitable for an improved TinyDSOD network model, and comprises the following steps:

initializing parameters of the improved TinyDSOD network model;

outputting the improved TinyDSOD network model as a face detection model after training is completed;

the improved TinyDSOD network model is obtained by replacing a three-layer convolution and pooling structure of a Stem module of the original TinyDSOD network model with a two-way structure; the two-path structure comprises a path of structure for performing convolution operation and inversion operation and a path of structure for performing pooling operation; the improved TinyDSOD network model is obtained by replacing a DDB-b structure of an original TinyDSOD network model with a DDB-b-plus structure; the DDB-b-plus structure comprises a structure for performing no operation and two other structures for performing convolution operation, wherein one of the two other structures for performing convolution operation is configured with a corresponding expansion coefficient; the improved TinyDSOD network model is obtained by replacing the depth separable convolution behind the upsample layer of the original TinyDSOD network model with 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding coefficient of expansion.

2. The face detection model generation method of claim 1 wherein the feature map in the modified TinyDSOD network model is 4, and the aspect ratio on each feature map is 1.

3. The utility model provides a face detection model generation device which is characterized in that is applicable to the modified TinyDSOD network model, includes:

the model output module is used for outputting the improved TinyDSOD network model as a face detection model after training is completed;

the improved TinyDSOD network model is obtained by replacing a three-layer convolution and pooling structure of a Stem module of the original TinyDSOD network model with a two-way structure; the two-path structure comprises a path of structure for performing convolution operation and inversion operation and a path of structure for performing pooling operation; the improved TinyDSOD network model is formed by replacing a DDB-b structure of an original TinyDSOD network model with a DDB-b-plus structure; the DDB-b-plus structure comprises a structure for performing no operation and two other structures for performing convolution operation, wherein one of the two other structures for performing convolution operation is configured with a corresponding expansion coefficient; the improved TinyDSOD network model is obtained by replacing the depth separable convolution behind the upsample layer of the original TinyDSOD network model with 3 parallel depth separable convolutions; wherein each of the parallel depth separable convolutions is configured with its corresponding coefficient of expansion.

4. A face detection model generation apparatus according to claim 3, wherein 4 feature maps are provided in the modified tinydssod network model, and the aspect ratio on each feature map is 1.

5. A face detection model generation apparatus comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the face detection model generation method according to any one of claims 1 to 2 when executing the computer program.

6. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to perform the face detection model generation method according to any one of claims 1 to 2.