CN113673470A

CN113673470A - Face detection model training method, electronic device and computer-readable storage medium

Info

Publication number: CN113673470A
Application number: CN202111005294.3A
Authority: CN
Inventors: 奉万森; 芦爱余; 李志文; 任高生
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-19

Abstract

The embodiment of the application provides a face detection model training method, electronic equipment and a computer readable storage medium, and relates to the technical field of image processing. Carry out the reinforcing of sheltering from on line to face image through the mode of random reinforcing in the training process of face detection model to be formed with the face sample image that shelters from, the face sample image that will have the shelter from again is used for the training of face detection model, on the one hand, through the reinforcing of sheltering from on line, shelter from the reinforcing for the off-line, the face sample image that has the shelter from that has the generation has the characteristics that variety and generation speed are fast, on the other hand, adopt the face sample image that has the shelter from that has the variety to carry out the training of face detection model, can improve the precision that face detection model shelters from the detection of face under the face scene to the shelter from.

Description

Face detection model training method, electronic device and computer-readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a face detection model training method, an electronic device, and a computer-readable storage medium.

Background

The human face has a wide application field, such as applications of access control systems of human face detection, tracking and recognition, applications of human face beauty of human face key point detection, applications of digital human face and special effects of 3D human face key points and the like, and the applications are based on human face samples with structured information after training. The face sample with the structured information is susceptible to the influence of a blocking object (such as a mask, glasses, a human hand and the like) to influence the training effect of the model, and further influence the accuracy of face recognition or detection based on the trained face recognition model. For artificial intelligence (e.g., deep learning scheme), the face samples of faces occluded by the occlusion objects are also important reasons for failure of face recognition or detection. For the face samples of the face which is shielded by the shielding object, the face samples used for training are fewer due to the large difficulty of sample collection. Therefore, how to make automatic generation of the face sample more quickly and better in the training process so as to improve the recognition accuracy of the face detection model for face recognition in the scene where the face is shielded by the shielding object is a technical problem which needs to be solved urgently by the technical personnel in the field.

Disclosure of Invention

In order to overcome at least the above-mentioned deficiencies in the prior art, the present application is directed to a face detection model training method, an electronic device and a computer-readable storage medium.

In a first aspect, an embodiment of the present application provides a face detection model training method, where the method includes:

obtaining an image of a shelter;

randomly enhancing the unshielded face sample images in the training process to obtain a plurality of randomly enhanced face sample images, wherein the random enhancement comprises changing the rotation angle of the face and/or the position of the face in the face sample images;

determining the adding position of the shelter image in each enhanced human face sample image based on a plurality of randomly enhanced human face sample images, and generating a sheltered human face sample image;

and training the face detection model by taking the input non-shielded face sample image and the generated shielded face sample image as training samples to obtain the trained face detection model.

In the scheme, firstly, a shelter image is obtained, and the face sample images which are not sheltered are randomly enhanced to obtain a plurality of face sample images which are randomly enhanced; then, determining the adding position of the shelter image in each enhanced face sample image, and generating a face sample image with shelter; and finally, training the face detection model by taking the input non-shielded face sample image and the generated shielded face sample image as training samples to obtain the trained face detection model. Carry out the reinforcing of sheltering from on line to the face image through the mode of random reinforcing in the training process of face detection model to be formed with the face sample image that shelters from, the face sample image that will have the shelter from again is used for the training of face detection model, on the one hand, through the reinforcing of sheltering from on line, for the mode of off-line shelter from the reinforcing, the face sample image that has the shelter from that the formation has the characteristics that variety and generation speed are fast, on the other hand, adopt the face sample image that has the shelter from that has the diversity to carry out the training of face detection model, can improve the face detection's of face detection model under the shelter from the face scene precision.

In a possible implementation manner, the step of determining, based on a plurality of face sample images after random enhancement, an addition position of the obstruction image in each of the enhanced face sample images, and generating a face sample image with an obstruction includes:

adjusting the size of the shelter image according to the size of the face in the unoccluded face sample image;

calculating a rotation angle of a face in the enhanced face sample image and/or a position of the face in the enhanced face sample image, and rotating the size-adjusted obstruction image according to the rotation angle and/or adjusting the size-adjusted obstruction image according to the position of the face in the enhanced face sample image;

and based on different types of the shielding objects in the shielding object image, adding the shielding object image into the enhanced human face sample image according to a preset shielding position to obtain the human face sample image with shielding.

In a possible implementation manner, when the type of the blocking object is a blocking object with a fixed position, the step of adding the blocking object image to the enhanced face sample image according to a preset blocking position based on different types of blocking objects in the blocking object image to obtain a face sample image with a blocking function includes:

acquiring an adding reference key point which is pre-configured in the enhanced face sample image and used for adding the obstruction image;

and adding the shielding object image into the enhanced human face sample image according to the adding reference key point to obtain the human face sample image with shielding.

In a possible implementation manner, when the type of the blocking object is a blocking object with an unfixed position, the step of adding the blocking object image to the enhanced face sample image according to a preset blocking position based on the different types of the blocking objects in the blocking object image to obtain the face sample image with the blocking object includes:

acquiring an addition reference key point sequence of the shielding object, which is configured in advance in the enhanced face sample image, wherein the addition reference key point sequence comprises a plurality of different addition reference key points;

and adding the obstruction image into the enhanced face sample image according to each different adding reference key point in the adding reference key point sequence to obtain the face sample image of which the obstruction image is positioned at different adding reference key points.

In a possible implementation manner, the step of calculating a rotation angle of a face in the enhanced face sample image includes:

acquiring the external canthus coordinate of the left eye and the external canthus coordinate of the right eye in the enhanced human face sample image;

and calculating to obtain the face rotation angle based on the external canthus coordinates of the left eye and the external canthus coordinates of the right eye.

In a possible implementation manner, the calculation formula for calculating the face rotation angle based on the external canthus coordinate of the left eye and the external canthus coordinate of the right eye is as follows:

wherein, theta is the rotation angle of the face, l_eyeX is the abscissa of the external canthus of the left eye, l_eyeY is the ordinate of the external canthus of the left eye, r_eyeX is the abscissa of the external canthus of the right eye, r_eyeY is the ordinate of the external canthus of the right eye.

In a possible implementation manner, the face detection model includes a feature extraction network, and the step of training the face recognition model by using the input unshielded face sample image and the generated shielded face sample image as training samples to obtain a trained face detection model includes:

sequentially acquiring each training sample, inputting the training sample into the feature extraction network, and extracting feature information of key position points of the human face from the training sample;

detecting whether a face key position point is located in an area where a shelter in the training sample is located;

if the face key position points are not located in the area where the shielding object in the training sample is located, calculating a first feature difference between a first feature vector consisting of feature information of all face key position points in the training sample and a second feature vector consisting of feature information of all known face key position points in the training sample, and obtaining a loss function value based on the first feature difference and a first weight parameter;

if the face key position point is located in the area where the shielding object in the training sample is located, calculating a second feature difference between a third feature vector formed by feature information of the face key position point which is not shielded by the area where the shielding object is located in the training sample and a fourth feature vector formed by feature information of a known face key position point which is not shielded by the area where the shielding object is located in the training sample, and obtaining a loss function value based on the second feature difference and a second weight parameter, wherein the first weight parameter is larger than the second weight parameter;

and iteratively updating the network parameters of the feature extraction network according to the loss function value until a training convergence condition is met, and obtaining a trained face detection model.

In a possible implementation manner, if there is a region where a human face key location point is located in the training sample, the step of calculating a second feature difference between a third feature vector composed of feature information of the human face key location point that is not covered by the region where the blocking object is located in the training sample and a fourth feature vector composed of feature information of a known human face key location point that is not covered by the region where the blocking object is located in the training sample includes:

determining a face key point mask matrix, wherein each matrix element in the face key point mask matrix represents a weight coefficient of a corresponding face key position point, when the face key position point is located in an area where a shelter in the training sample is located, the weight coefficient corresponding to the face key position point is 0, otherwise, the weight coefficient corresponding to the face key position point is 1;

multiplying the first eigenvector by the mask matrix of the face key point to obtain a third eigenvector;

multiplying the second eigenvector by the mask matrix of the face key point to obtain a fourth eigenvector;

and calculating the second feature difference based on the third feature vector and the fourth feature vector.

In a second aspect, an embodiment of the present application further provides a face detection model training device, where the face detection model training device includes:

the acquisition module is used for acquiring a shelter image by a user;

the enhancement module is used for randomly enhancing the unshielded face sample images in the training process to obtain a plurality of randomly enhanced face sample images, wherein the random enhancement comprises changing the rotation angle of the face and/or the position of the face in the face sample images;

the generating module is used for determining the adding position of the obstruction image in each enhanced face sample image based on a plurality of randomly enhanced face sample images to generate an obstructed face sample image;

and the training module is used for training the face detection model by taking the input non-shielded face sample image and the generated shielded face sample image as training samples to obtain the trained face detection model.

In a third aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a computer-readable storage medium, where the processor and the computer-readable storage medium are connected through a bus system, the computer-readable storage medium is used to store a program, an instruction, or a code, and the processor is used to execute the program, the instruction, or the code in the computer-readable storage medium, so as to implement the face detection model training method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed, the electronic device is caused to perform the face detection model training method in the first aspect or any one of the possible implementation manners of the first aspect.

Based on any one of the above aspects, the face detection model training method, the electronic device, and the computer-readable storage medium provided in the embodiments of the present application perform online occlusion enhancement on a face image in a random enhancement manner during training of a face detection model, so as to form an occluded face sample image, and then use the occluded face sample image for training of the face detection model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that need to be called in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a face detection model training method provided in an embodiment of the present application;

FIG. 2 is a schematic view of a covering provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an occluded face sample image according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a sub-step of step S12 in FIG. 1;

FIG. 5 is a flowchart illustrating a sub-step of step S14 in FIG. 1;

fig. 6 is a schematic diagram of functional modules of a face detection model training apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will now be described in detail with reference to the drawings, and the specific operations in the method embodiments may also be applied to the apparatus embodiments or the system embodiments.

In order to solve the technical problems mentioned in the background art, fig. 1 is a schematic flow chart of a face detection model training method provided in an embodiment of the present application, which may be executed by an electronic device with image processing capability, and the following describes the face detection model training method in detail with reference to fig. 1.

In step S11, a mask image is acquired.

In the embodiment of the present application, please refer to fig. 2, fig. 2 illustrates a schematic diagram of a shelter, where the shelter may include, but is not limited to, a mask, glasses, a human hand, and the like, and the shelter may include a fixed position shelter (e.g., the mask and the glasses) and a non-fixed position shelter (e.g., the human hand) according to whether the position of the shelter for shielding a human face is fixed.

The obstruction image can be an obstruction image with a mask, that is, other image areas except for the area where the obstruction is located in the obstruction image are transparent, so that when the obstruction image is added into the face image subsequently, only the area where the obstruction is located can shield the area corresponding to the face image.

And step S12, randomly enhancing the unoccluded face sample images in the training process to obtain a plurality of randomly enhanced face sample images.

In the embodiment of the present application, the random enhancement may include changing the rotation angle of the face and/or the position of the face in the face sample image. For example, only the rotation angle of the face may be changed (for example, the face is rotated by an arbitrary angle), or only the position of the face in the face sample image may be changed (for example, the face originally located in the central position region of the face sample image is adjusted to the upper left corner region of the face sample image), or both the rotation angle of the face and the position of the face in the face sample image may be changed.

Through the steps, a large number (for example, 1000) of face sample images for simulating the face shielded by a shielding object can be obtained after a pair of face sample images which are not shielded are randomly enhanced.

Step S13, determining the adding position of the obstruction image in each enhanced face sample image based on the plurality of face sample images after random enhancement, and generating a face sample image with obstruction.

In this step, the adding position of the obstruction in each enhanced face sample image can be determined based on the type of the obstruction (e.g., a fixed position obstruction or a non-fixed position obstruction), and the obstruction image is added to the adding position in each enhanced face sample image to obtain a plurality of face sample images with the obstruction, so as to realize the obstruction enhancement on the face. Referring to fig. 3, fig. 3 illustrates a schematic diagram of a face sample image with a mask, in the embodiment of the present application, one mask may be added to the face sample image, or multiple masks may be added to the face sample image, for example, only glasses are added to the face sample image in the first row in fig. 3 as a mask, only a human hand is added to the face sample image in the second row in fig. 3 as a mask, only a mask is added to the face sample image in the third row in fig. 3 as a mask, and at least two of the mask, the human hand, and the glasses may be added to the face sample image in the fourth row in fig. 3 as a mask.

And step S14, training the face detection model by taking the input non-shielded face sample image and the generated shielded face sample image as training samples to obtain the trained face detection model.

In the embodiment of the application, the human face detection model is trained by adopting the human face sample image without shielding and the human face sample image with shielding, so that the human face recognition capability of the human face detection model under the condition that the shielding object shields the human face scene can be enhanced.

Based on the above scheme, carry out the reinforcing of sheltering from on line to the face image through the mode of random enhancement in the training process of face detection model to form the face sample image that shelters from, the face sample image that will have the shelter from again is used for the training of face detection model, on the one hand, through the reinforcing of sheltering from on line, shelter from the reinforcing for the off-line, the face sample image that has the shelter from that the formation has the characteristics that variety and generation speed are fast, on the other hand, adopt the face sample image that has the shelter from that has the diversity to carry out the training of face detection model, can improve the face detection precision of face detection model to shelter from the thing under the face scene. For example, when the scheme is applied to a live broadcast scene, the special effect can be added based on the detected face, so that the face special effect (such as a beauty effect) can be effectively prevented from flying away, and the display of the face special effect in the live broadcast scene is ensured.

In a possible implementation manner of the embodiment of the present application, please refer to fig. 4, step S12 can be implemented in the following manner.

And a substep S121, adjusting the size of the obstruction image according to the size of the face in the unoccluded face sample image.

Since the non-occluded face sample image is prepared in advance, the pixel size of the face in the non-occluded face sample image is known, and in order to make the size of the occlusion object and the size of the face conform to the actual situation, the size of the occlusion object image is adjusted (reduced or enlarged) based on the size of the face in the non-occluded face sample image.

And a substep S122, calculating a rotation angle of the face in the enhanced face sample image and/or a position of the face in the enhanced face sample image, and rotating the size-adjusted obstruction image according to the rotation angle and/or adjusting the size-adjusted obstruction image according to the position of the face in the enhanced face sample image.

In the sub-step, in order to ensure the authenticity of the occlusion enhancement data, the generated data needs to approach the real data as much as possible, and for this reason, the occlusion object image needs to be correspondingly adjusted according to the transformation relationship of the face in the face sample image before and after random enhancement. Specifically, when the face is rotated during random enhancement, the corresponding rotation angle of the obstruction image can be adaptively adjusted; or, if the face is subjected to position adjustment in random enhancement, the corresponding position adjustment can be adaptively carried out on the obstruction image; alternatively, if the face is rotated and the position of the face is adjusted during the stochastic enhancement, it is necessary to adaptively rotate and adjust the position of the object image at the same time.

And a substep S123 of adding the shielding object image into the enhanced face sample image according to a preset shielding position based on different types of shielding objects in the shielding object image to obtain the shielded face sample image.

In the embodiment of the present application, the sub-step S123 may obtain the face sample image with the occlusion according to different types of the occlusions.

In one implementation of the embodiment of the present application, when the type of the shade is a fixed-position shade, the sub-step S123 may be implemented as follows.

Firstly, acquiring an adding reference key point which is configured in advance in the enhanced human face sample image and used for adding an obstruction image.

In detail, taking the blocking object as an example of the glasses, the added reference key point can be the eye center points of the left and right eyes; taking the mask as an example, the added reference key points can be the positions of the left and right corners of the mouth.

And then, adding the shielding object image into the enhanced human face sample image according to the adding reference key point to obtain the human face sample image with shielding.

In the embodiment of the application, the corresponding adding benchmark reference point can be marked in the shelter in advance, after the adding benchmark reference point on the shelter in the shelter image is aligned with the adding benchmark key point in the enhanced face sample image, the shelter image and the enhanced face sample image are superposed, and the face sample image with the shelter can be obtained.

In an implementation manner of the embodiment of the present application, when the type of the shade is a shade with an unfixed position, the substep S123 may be implemented as follows.

Firstly, an adding reference key point sequence of a preset shelter in an enhanced human face sample image is obtained.

Specifically, the added reference keypoint sequence may include a plurality of different added reference keypoints. Taking the shelter as the human hand as an example, the adding reference key points of the human hand which may appear at different positions of the human face can be correspondingly set based on historical experience, and the adding reference key point sequence is formed by the adding reference key points corresponding to the different shelter positions.

And then, adding the obstruction image into the enhanced face sample image according to each different adding reference key point in the adding reference key point sequence to obtain the face sample image of which the obstruction image is positioned at different adding reference key points.

The adding process of the step is the same as the adding mode of the shielding object with the fixed position, and the description is omitted here. The difference is that in this embodiment, a plurality of occluded face sample images with different occlusion positions can be obtained at one time.

Further, the way of calculating the rotation angle of the face in the enhanced face sample image in the sub-step S122 may be as follows.

Firstly, acquiring the external canthus coordinate of a left eye and the external canthus coordinate of a right eye in an enhanced human face sample image;

and then, calculating to obtain the face rotation angle based on the external canthus coordinates of the left eye and the right eye.

Specifically, the formula for calculating the face rotation angle may be as follows.

In order to avoid that the learning gravity center of the face detection model is placed on a shelter rather than the structural information of the face itself in the process of training the non-sheltered face sample image and the sheltered face sample image, in the embodiment of the present application, please refer to the step of fig. 5, and S14 can be implemented through the following sub-steps.

And a substep S141 of sequentially obtaining each training sample, inputting the training sample into a feature extraction network, and extracting feature information of the key position points of the human face from the training sample.

The feature information of the face key position points may be coordinate information of the face key points in the training sample, and the face key position points may represent position points of face structure information, for example, the face key position points may include a nose tip position point, a left mouth corner position point, a right mouth corner position point, an outer eye corner position point of a left eye, an inner eye corner position point of a left eye, an outer eye corner position point of a right eye, an inner eye corner position point of a right eye, and the like.

And a substep S142, detecting whether the face key position point is located in the area where the shelter in the training sample is located.

In the embodiment of the present application, it may be detected whether all specified face key position points can be detected in the training sample by a mode recognition, and if some face key position points (for example, left mouth corner position points and right mouth corner position points) are not detected, it is determined that there is a region where the blocking object of the face key position points in the training sample is located, and the process proceeds to substep S144. If all the specified face key position points are detected, it is determined that there is no area where the face key position points are located in the shelter in the training sample, and the procedure goes to substep S143.

And a substep S143, calculating a first feature difference between a first feature vector composed of feature information of all face key position points in the training sample and a second feature vector composed of feature information of all known face key position points in the training sample, and obtaining a loss function value based on the first feature difference and the first weight parameter.

The first feature vector may be composed of detected coordinates of all face key position points, for example, the first feature vector ═ the coordinates of the nose tip position point, the left mouth corner position point, the right mouth corner position point, the outer eye corner position point, the left eye corner position point, the inner eye corner position point, the right eye corner position point, the inner eye corner position point, the right eye corner position point, and the second feature vector may be composed of known coordinates of all face key position points. The first feature difference may be obtained by comparing a difference between the first feature vector and the second feature vector. In one possible implementation manner of the embodiment of the present application, the first feature difference may be a euclidean distance between the first feature vector and the second feature vector. After the first feature difference is calculated, the loss function value may be obtained by multiplying the first feature difference by the first weight parameter.

And a substep S144, calculating a second feature difference between a third feature vector composed of feature information of the face key position points which are not shielded by the region where the shielding object is located in the training sample and a fourth feature vector composed of feature information of the known face key position points which are not shielded by the region where the shielding object is located in the training sample, and obtaining a loss function value based on the second feature difference and a second weight parameter, wherein the first weight parameter is greater than the second weight parameter.

In one possible implementation of the embodiment of the present application, the substep S144 may be implemented as follows.

Firstly, a mask matrix of the key points of the human face is determined.

Specifically, each matrix element in the face key point mask matrix represents a weight coefficient of a corresponding face key position point, and when the face key position point is located in an area where a barrier in the training sample is located, the weight coefficient corresponding to the face key position point is 0, otherwise, the weight coefficient corresponding to the face key position point is 1.

For example, when the face key point mask matrix is [ nose tip position point weight coefficient, left mouth corner position point weight coefficient, right mouth corner position point weight coefficient, outer eye corner position point weight coefficient of left eye, inner eye corner position point weight coefficient of right eye, outer eye corner position point weight coefficient of right eye ], the face key point mask matrix may be represented as [ 1001111 ] when the left mouth corner position point and the right mouth corner position point are located in the region where the occlusion in the training sample is located.

And then, multiplying the first eigenvector by the mask matrix of the key points of the human face to obtain a third eigenvector.

Also taking the case where the left mouth corner position point and the right mouth corner position point are located in the region where the blocking object in the training sample is located as an example, the third feature vector may be [ nose tip position point coordinate 00 outer canthus position point coordinate of the left eye, inner canthus position point coordinate of the left eye, outer canthus position point coordinate of the right eye, inner canthus position point coordinate of the right eye ].

And then, multiplying the second eigenvector by the mask matrix of the key points of the human face to obtain a fourth eigenvector.

In this embodiment, the fourth feature is calculated in the same or similar manner as the third feature vector, and is not described herein again.

And finally, calculating to obtain a second feature difference based on the third feature vector and the fourth feature vector.

In one possible implementation of the embodiment of the present application, the second feature difference may be a euclidean distance between the third feature vector and the fourth feature vector. After the second feature difference is calculated, the loss function value may be obtained by multiplying the first feature difference by the first weight parameter. Wherein the first weight parameter is greater than the second weight parameter.

And a substep S145, iteratively updating the network parameters of the feature extraction network according to the loss function values until a training convergence condition is met, and obtaining a trained face detection model.

In this embodiment, the training convergence condition may be any one of the following two conditions:

firstly, the loss function value is smaller than a preset loss function threshold;

secondly, the number of network parameter iterations reaches the preset number of iterations.

In the process, the first weight parameter when the key position points of the human face are not shielded by the shielding objects in the training sample is larger than the second weight parameter when the key position points of the human face are shielded by the shielding objects in the training sample, so that the human face detection model can pay more emphasis on learning the non-shielded human face sample, the shielded human face is learned on the basis of well-learned human face structure information, namely, the human face detection model learns the non-shielded human face sample as a key point by adjusting the weight parameters. In addition, the shielded human face key position points are processed, so that the human face detection model can only perform learning identification based on the determined human face key position points, and the influence of a reverse gradient on a normal feature learning process caused by the fact that the human face detection model cannot learn the shielded human face key position points and forcibly learns the human face key position points is avoided.

Referring to fig. 6, fig. 6 is a schematic diagram of functional modules of a face detection model training apparatus 200 according to an embodiment of the present disclosure, in this embodiment, the face detection model training apparatus 200 may be divided into the functional modules according to a method embodiment executed by an electronic device, that is, the following functional modules corresponding to the face detection model training apparatus 200 may be used to execute the above method embodiments. The training apparatus 200 based on the face detection model may include an obtaining module 210, an enhancing module 220, a generating module 230, and a training module 240, and the functions of the functional modules of the training apparatus 200 based on the face detection model are described in detail below.

And an obtaining module 210, configured to obtain an image of the obstruction by a user.

In the embodiment of the present application, the blocking object may include, but is not limited to, a mask, glasses, a human hand, and the like, and the blocking object may include a blocking object at a fixed position (e.g., the mask and the glasses) and a blocking object at a non-fixed position (e.g., the human hand) according to whether the position of the blocking object for blocking the human face is fixed or not.

The obtaining module 210 may be configured to perform the step S11 described above, and as for a detailed implementation of the obtaining module 210, reference may be made to the detailed description of the step S11 described above.

The enhancing module 220 is configured to randomly enhance an unobstructed face sample image in a training process to obtain a plurality of randomly enhanced face sample images, where the random enhancement includes changing a face rotation angle and/or a position of a face in the face sample image.

In the embodiment of the present application, the random enhancement may include changing the rotation angle of the face and/or the position of the face in the face sample image. For example, only the rotation angle of the face (for example, rotating the face by an arbitrary angle) may be changed, only the position of the face in the face sample image may be changed (for example, the face originally located in the central position region of the face sample image is adjusted to the upper left corner region of the face sample image), and the rotation angle of the face and the position of the face in the face sample image may also be changed at the same time.

By the method, a large number (for example, 1000) of face sample images for simulating the face shielded by the shielding object can be obtained after a pair of face sample images which are not shielded are randomly enhanced.

The enhancement module 220 may be configured to perform the step S12, and the detailed implementation of the enhancement module 220 may refer to the detailed description of the step S12.

A generating module 230, configured to determine, based on the plurality of face sample images after random enhancement, an adding position of the obstruction image in each of the enhanced face sample images, and generate a face sample image with an obstruction.

The generating module 230 may determine an adding position of an obstruction in each enhanced face sample image based on the type of the obstruction (e.g., a fixed position obstruction or a non-fixed position obstruction), and add the obstruction image to the adding position in each enhanced face sample image to obtain a plurality of face sample images with the obstruction, thereby implementing the obstruction enhancement on the face.

The generating module 230 may be configured to perform the step S13 described above, and as for the detailed implementation of the generating module 230, reference may be made to the detailed description of the step S13 described above.

And the training module 240 is configured to train the face detection model by using the input non-occluded face sample image and the generated occluded face sample image as training samples, so as to obtain a trained face detection model.

The training module 240 trains the face detection model by using the non-blocked face sample image and the blocked face sample image, so as to enhance the face recognition capability of the face detection model in the scene that the blocking object blocks the face.

The training module 240 may be configured to perform the step S14, and the detailed implementation of the training module 240 may refer to the detailed description of the step S14.

It should be noted that the division of the modules in the above apparatus or system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these modules can be implemented in the form of software (e.g., open source software) that can be invoked by a processor; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by a processor, and part of the modules can be realized in the form of hardware. For example, the training module 240 may be implemented by a single processor, for example, the training module may be stored in a memory of the apparatus or system in the form of program codes, and a certain processor of the apparatus or system calls and executes the functions of the training module 240, and the implementation of other modules is similar, and will not be described herein again. In addition, the modules can be wholly or partially integrated together or can be independently realized. The processor described herein may be an integrated circuit with signal processing capability, and in the implementation process, each step or each module in the above technical solutions may be implemented in the form of an integrated logic circuit in the processor or a software program executed.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a hardware structure of an electronic device 10 for implementing the face detection model training method according to the embodiment of the present disclosure. As shown in fig. 7, electronic device 10 may include a processor 11, a computer-readable storage medium 12, and a bus 13.

In a specific implementation process, at least one processor 11 executes computer-executable instructions stored in the computer-readable storage medium 12 (for example, the obtaining module 210, the enhancing module 220, the generating module 230, and the training module 240 included in the face detection model training apparatus 200 shown in fig. 6), so that the processor 11 may execute the face detection model training method according to the above method embodiment, where the processor 11 and the computer-readable storage medium 12 may be connected through the bus 13.

For a specific implementation process of the processor 11, reference may be made to the above-mentioned method embodiments executed by the electronic device 10, and implementation principles and technical effects thereof are similar, and details of this embodiment are not described herein again.

The computer-readable storage medium 12 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 111 is used to store programs or data.

The bus 13 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

Fig. 7 is only a schematic diagram of a possible structure of the electronic device 10 provided in the embodiment of the present application, and in other embodiments, the electronic device 10 may further include more components, such as a communication unit, and the electronic device 10 may transmit the synthesized binocular video to other communication devices through the communication unit.

In addition, an embodiment of the present application further provides a readable storage medium, where a computer executing instruction is stored in the readable storage medium, and when a processor executes the computer executing instruction, the face detection model training method as described above is implemented.

In summary, the face detection model training method, the electronic device and the computer-readable storage medium provided by the embodiment of the application are provided. Firstly, obtaining a shelter image, and randomly enhancing the human face sample images which are not sheltered to obtain a plurality of human face sample images which are randomly enhanced; then, determining the adding position of the shelter image in each enhanced face sample image, and generating a face sample image with shelter; and finally, training the face detection model by taking the input non-shielded face sample image and the generated shielded face sample image as training samples to obtain the trained face detection model. Carry out the reinforcing of sheltering from on line to face image through the mode of random reinforcing in the training process of face detection model to be formed with the face sample image that shelters from, the face sample image that will have the shelter from again is used for the training of face detection model, on the one hand, through the reinforcing of sheltering from on line, shelter from the reinforcing for the off-line, the face sample image that has the shelter from that has the generation has the characteristics that variety and generation speed are fast, on the other hand, adopt the face sample image that has the shelter from that has the variety to carry out the training of face detection model, can improve the precision that face detection model shelters from the detection of face under the face scene to the shelter from. When the scheme is applied to a live broadcast scene, the special effect can be added based on the detected face, the face special effect (such as a beauty effect) can be effectively prevented from flying away, and the display of the face special effect in the live broadcast scene is ensured.

Additionally, the order in which the elements and sequences are processed, the use of alphanumeric characters, or the use of other designations in this specification is not intended to limit the order of the processes and methods in this specification, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided in the accompanying drawings is not intended to limit the scope of the application, but is merely representative of selected embodiments of the application. Based on this, the protection scope of the present application shall be subject to the protection scope of the claims. Moreover, all other embodiments that can be made available by a person skilled in the art without making any inventive step based on the embodiments of the present application shall fall within the scope of protection of the present application.

Claims

1. A face detection model training method is characterized by comprising the following steps:

obtaining an image of a shelter;

2. The training method of the face detection model according to claim 1, wherein the step of determining the adding position of the obstruction image in each enhanced face sample image based on a plurality of face sample images after random enhancement to generate an occluded face sample image comprises:

3. The training method of the face detection model according to claim 2, wherein when the type of the blocking object is a blocking object with a fixed position, the step of adding the blocking object image to the enhanced face sample image according to a preset blocking position based on the different types of the blocking objects in the blocking object image to obtain the blocked face sample image comprises:

4. The training method of the face detection model according to claim 2, wherein when the type of the blocking object is a blocking object with an unfixed position, the step of adding the blocking object image to the enhanced face sample image according to a preset blocking position based on the different type of the blocking object in the blocking object image to obtain the blocked face sample image comprises:

5. The training method of the face detection model according to claim 2, wherein the step of calculating the rotation angle of the face in the enhanced face sample image comprises:

6. The training method of the face detection model according to claim 5, wherein the calculation formula for calculating the face rotation angle based on the external canthus coordinates of the left eye and the right eye is as follows:

7. The training method of the face detection model according to any one of claims 1 to 6, wherein the face detection model comprises a feature extraction network, and the step of training the face recognition model by using the input non-occluded face sample image and the generated occluded face sample image as training samples to obtain the trained face detection model comprises:

8. The training method of the face detection model according to claim 7, wherein if there is a face key position point located in the region where the blocking object is located in the training sample, the step of calculating a second feature difference between a third feature vector composed of feature information of the face key position point in the training sample that is not blocked by the region where the blocking object is located and a fourth feature vector composed of feature information of a known face key position point in the training sample that is not blocked by the region where the blocking object is located includes:

9. An electronic device, comprising a processor and a computer-readable storage medium, wherein the processor and the computer-readable storage medium are connected through a bus system, the computer-readable storage medium is used for storing a program, an instruction or a code, and the processor is used for executing the program, the instruction or the code in the computer-readable storage medium to implement the face detection model training method according to any one of claims 1 to 8.

10. A computer-readable storage medium having stored therein instructions that, when executed, cause an electronic device to perform the face detection model training method of any one of claims 1-8.