CN112560705A

CN112560705A - Face detection method and device and electronic equipment

Info

Publication number: CN112560705A
Application number: CN202011504477.5A
Authority: CN
Inventors: 李健; 高大帅; 武卫东; 陈明
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-03-26

Abstract

The invention provides a face detection method and device and electronic equipment, wherein the face detection method comprises the following steps: inputting a face image to be detected into a pre-trained face model, wherein the face model comprises: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module; acquiring a face pose angle, a face key point and a face target frame output by the face model; and displaying the face pose angle, the face key points and the face target frame in the current interface. The face detection method provided by the invention can improve the accuracy of the face detection result.

Description

Face detection method and device and electronic equipment

Technical Field

The present invention relates to the field of face detection technologies, and in particular, to a face detection method and apparatus, and an electronic device.

Background

Two scenes which are most common in OCR (Optical Character Recognition) are scanned document Character Recognition and photographed natural scene Character Recognition, the scanned document characters and the photographed characters have great difference in the factors such as fonts and backgrounds, and generally, the Character Recognition difficulty of a natural scene is far greater than that of the scanned document characters. In order to cater for huge font difference and background difference, the recognition rate of the character line recognition model is ensured mainly by increasing the amount of training sample data and increasing the recognition model at present. However, when the amount of training sample data reaches a certain order of magnitude, the incremental effect of the character recognition rate brought by simply increasing the character line recognition model is smaller and smaller, and the performance of the character line recognition model is seriously reduced by increasing the character line recognition model.

Therefore, the existing character line recognition model cannot take recognition rate and performance into consideration.

In the age of people-to-people connection, the service provided around people firstly aims at identifying the identity of people, so that people invent a plurality of cards and certificates as the basis for identifying the identity of people, and the identity identification is essentially 'seeing things like people and not recognizing people', neglects the most essential requirements of people, solves the old problems and brings new problems. The method conforms to the trend of the era, fully utilizes the face recognition technology, emphasizes the sharing and opening of face big data, and builds a portrait library and a face bayonet system. The face recognition technology can be widely applied to intelligent police affairs and intelligent city construction, and provides intelligent face services for the whole society.

Face recognition is a biometric technique for identifying an identity based on facial feature information of a person. The method can be widely applied to a plurality of important fields of natural comparison and identification of personnel identities, such as public security, finance, airports, subways, frontier ports and the like. For example: the method comprises the specific scenes of face brushing payment, face brushing attendance checking, face brushing passing brake, visitor registration, face brushing entrance guard, face brushing withdrawal and the like.

The face detection is an indispensable link before face recognition, and the accuracy of face recognition is directly influenced by the result of face detection. The existing face detection scheme mainly aims at the scene that the front face of a person to be identified faces a detection system, and for the face to be identified, compared with the detection system, the detection system has a larger deflection program or is in a side face state, the detection result given by the face detection system is accurate, and even the detection result cannot be given.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are provided to provide a face detection method and apparatus, an electronic device, and a storage medium, which overcome the above problems or at least partially solve the above problems.

In a first aspect, an embodiment of the present invention discloses a face detection method, where the method includes: inputting a face image to be detected into a pre-trained face model, wherein the face model comprises: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module;

acquiring a face pose angle, a face key point and a face target frame output by the face model;

and displaying the face pose angle, the face key points and the face target frame in the current interface.

Optionally, the step of displaying the face pose angle, the face key points, and the face target frame in the current interface includes:

correcting the face target frame according to the face pose angle and the face key point to obtain a corrected face target frame;

and displaying the face image to be detected in the current interface, and marking the corrected face target frame in the face image to be detected.

Optionally, before the step of inputting the face image to be detected into the pre-trained face model, the method further includes:

and adjusting the face image to be tested into a color image with a first preset size.

acquiring a first preset number of sample images, wherein the sample images are face images, the number of the sample images with face angles larger than a preset angle is larger than a second preset number, and the first preset number is larger than the second preset number;

adjusting each sample image into a color image with a second preset size;

adding a face pose angle label and a face key point label to each adjusted sample image;

for each adjusted sample image, adjusting the sample image to a first preset size;

transforming the sample images adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images;

and training a preset face model based on each sample image adjusted to be in a first preset size and a plurality of transformed sample images corresponding to each sample image.

Optionally, the step of transforming the sample image adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images includes:

randomly shielding a partial area in the sample image adjusted to be in the first preset size to generate a plurality of transformed sample images, wherein one transformed sample image is generated by random shielding each time; and/or the presence of a gas in the gas,

adjusting preset parameters of the sample image adjusted to be in a first preset size, and generating a plurality of transformed sample images, wherein each preset parameter adjustment generates one transformed sample image, and the preset parameters include: at least one of chroma and brightness;

and carrying out fuzzy processing on the sample image adjusted to the first preset size to generate a plurality of transformed sample images.

In a second aspect, an embodiment of the present invention discloses a face detection apparatus, where the apparatus includes:

the input module is used for inputting the face image to be detected into a pre-trained face model, wherein the face model comprises: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module;

the first acquisition module is used for acquiring a face pose angle, a face key point and a face target frame output by the face model;

and the display module is used for displaying the face pose angle, the face key point and the face target frame in the current interface.

Optionally, the display module comprises:

the correction submodule is used for correcting the face target frame according to the face pose angle and the face key point to obtain a corrected face target frame;

and the display sub-module is used for displaying the face image to be detected in the current interface and marking the corrected face target frame in the face image to be detected.

Optionally, the apparatus further comprises:

the first adjusting module is used for adjusting the face image to be tested into a color image with a first preset size before the input module inputs the face image to be tested into a pre-trained face model.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring a first preset number of sample images before the input module inputs the face images to be detected into a pre-trained face model, wherein the sample images are face images, the number of the sample images with the face angles larger than a preset angle is larger than a second preset number, and the first preset number is larger than the second preset number;

the second adjusting module is used for adjusting each sample image into a color image with a second preset size;

the label adding module is used for adding a face pose angle label and a face key point label to each sample image after adjustment;

a third adjusting module, configured to adjust the sample image to a first preset size for each adjusted sample image;

the image transformation module is used for transforming the sample images adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images;

and the model training module is used for training a preset human face model based on each sample image adjusted to be in a first preset size and the plurality of transformed sample images corresponding to each sample image.

Optionally, the image transformation module comprises:

the first transformation submodule is used for randomly shielding a partial area in the sample image adjusted to be in the first preset size to generate a plurality of transformed sample images, wherein one transformed sample image is generated by random shielding each time; and/or the presence of a gas in the gas,

a second transformation submodule, configured to adjust a preset parameter of the sample image adjusted to a first preset size, and generate a plurality of transformed sample images, where a transformed sample image is generated by adjusting the preset parameter each time, and the preset parameter includes: at least one of chroma and brightness; and/or the presence of a gas in the gas,

and the third transformation submodule is used for carrying out fuzzy processing on the sample image adjusted to the first preset size to generate a plurality of transformed sample images.

In a third aspect, an embodiment of the present invention discloses an electronic device, including: one or more processors; and one or more machine-readable media having instructions stored thereon; the instructions, when executed by the one or more processors, cause the processors to perform a face detection method as any one of the above.

In a fourth aspect, an embodiment of the present invention discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the face detection method as described in any one of the above.

According to the face detection scheme provided by the embodiment of the invention, the face image to be detected is input into the face model which is trained in advance, and the face model comprises the face detection network module, the key point detection network module and the pose angle detection network module, so that the detection information of three dimensions, namely a face pose angle, a face key point and a face target frame, can be obtained after the face image to be detected is detected.

Drawings

FIG. 1 is a flow chart of steps of a face detection method according to an embodiment of the present invention;

FIG. 2 is a flow chart of steps of a further method of face detection according to an embodiment of the present invention;

fig. 3 is a block diagram of a face detection apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Referring to fig. 1, a flowchart illustrating steps of a face detection method according to an embodiment of the present invention is shown.

The face detection method of the embodiment of the invention can comprise the following steps:

step 101: and inputting the face image to be detected into a pre-trained face model.

Wherein, the face model includes: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module.

The face detection network model is an existing model, adopts RFBNet to carry out face detection, is a universal target detection network based on anchors, and can directly output the position of a detection frame of a target, the type of the target and the scoring confidence coefficient of the target. The RFBNet network is used for target detection, and can achieve good effect while considering speed. The face detection network model mainly introduces RFB (received Field Block) in an SSD (solid State disk), namely a Single Shot MultiBox Detector network, introduces the starting point of the RFB to strengthen the feature extraction capability of the network by simulating the Receptive Field of human vision, structurally, the RFB refers to the idea of the existing model, namely an initial model, and mainly adds an expansion convolution layer on the basis of the initial module, thereby effectively increasing the Receptive Field. The human face network model is set as a SSD network-based model, and the detection speed and the detection result precision are stable. Wherein, RFBNet is a general target detection network; the Anchor is a target detection Anchor frame preset in the target detection network. The SSD is a single-stage multi-scale prediction detector and a deep learning general target detection network.

The invention optimizes the original human face detection network structure, modifies the human face detection network structure, and on the basis of the original human face detection network module, namely a main network, two network branches are newly opened to serve as auxiliary monitoring networks, wherein one network branch is a full connection layer with an output value of 68 key points (128 position coordinate values), namely a key point detection network module, the other network branch is a full connection layer with an output value of 3 position angle of the human face, namely a position and attitude angle detection network module, and the two network branches and the main network are trained together to play a role in auxiliary human face detection.

In the embodiment of the invention, the original human face detection network architecture is optimized, the auxiliary network is used for acquiring the human face pose angle and the human face key point information, and the auxiliary network is used for monitoring the main detection network during training, so that the adverse effect of the human face pose angle can be reduced, the robustness of a human face detection model is improved, the information of the angle and the human face key point is effectively utilized in a large-angle natural scene, and the detection rate of the human face is improved.

In an optional embodiment, before the face image to be detected is input into the pre-trained face model, the face image to be detected is adjusted to a color image with a first preset size.

The first preset size may be set by a person skilled in the art according to actual requirements, and is not specifically limited in the embodiment of the present application. The first preset size needs to be matched with the size of a training sample of the face model.

Step 102: and acquiring a face pose angle, a face key point and a face target frame output by the face model.

Face pose angle: refers to three angles of the head in three-dimensional space, including but not limited to: three directions of head shaking, head nodding and head tilting;

the face key points may represent key point positions of a specific part on the face, such as the corners of the mouth, the nose tip, the corners of the eyes, and the like. The setting of the face key points can be set by a human body in the field according to actual requirements, and is not particularly limited in the embodiment of the application.

In the actual implementation process, after a face image to be detected is input into a face detection model, a face detection network module, a key point detection network module and a pose angle detection network module respectively predict a face to be detected, the face detection network module outputs a face target frame, the key point detection network module outputs face key point information, and the pose angle detection network module outputs face pose angle information. Three modules in the face detection model are executed independently in parallel without mutual interference.

Step 103: and displaying the face pose angle, the face key points and the face target frame in the current interface.

The displayed face pose angle, the face key points and the face target frame are convenient for a subsequent related program to accurately identify the face.

An optional face detection process is shown in fig. 2, and the optional face detection process includes the following steps:

step 201: and inputting the face image to be detected into a pre-trained face model.

Step 202: and acquiring a face pose angle, a face key point and a face target frame output by the face model.

Step 203: and correcting the face target frame according to the face pose angle and the face key points to obtain a corrected face target frame.

Step 204: and displaying the face image to be detected in the current interface, and marking a corrected face target frame in the face image to be detected.

In this optional embodiment, when the face pose angle, the face key points, and the face target frame are displayed in the current interface, the face target frame may be corrected according to the face pose angle and the face key points, so as to obtain a corrected face target frame; and displaying the face image to be detected in the current interface, and marking a corrected face target frame in the face image to be detected. In the optional embodiment, the face detection system corrects the face target frame without modifying the face target frame by a subsequent system, so that the face pose angle and the face key point do not need to be displayed between systems, and the time and resources consumed by data transmission can be saved.

The face detection method provided by the embodiment of the invention inputs the face image to be detected into the pre-trained face model, and the face model comprises the face detection network module, the key point detection network module and the pose angle detection network module, so that the detection information of three dimensions, namely a face pose angle, a face key point and a face target frame, can be obtained after the face image to be detected is detected. In an optional embodiment, before the step of inputting the face image to be detected into the pre-trained face model, the method further includes a face model training process, where the face model training process includes the following steps:

the method comprises the following steps: acquiring a first preset number of sample images;

the sample images are face images, the number of the sample images with the face angles larger than the preset angles is larger than a second preset number, and the first preset number is larger than the second preset number; greater than the preset angle may be subsequently referred to as a large angle.

For example: 50000 natural scene pictures containing human faces are prepared, wherein 40000 human face images with large human face angles are contained.

Step two: adjusting each sample image into a color image with a second preset size;

and adjusting the sample images to be a second uniform preset size, and actually carrying out normalization processing on each sample image. The second preset size may be set to: 640 pixels 3 pixels.

Step three: adding a face pose angle label and a face key point label to each sample image after adjustment;

for each sample image, if the sample image does not have a face pose angle and a face key point label, the sample image can be calculated by using the existing related open source software such as OpenFace and the like, and the face pose angle label and the face key point label are added into the sample image based on the calculation result.

Step four: adjusting the sample image to a first preset size for each adjusted sample image;

the first preset size can be set by a person skilled in the art according to practical requirements, and is not particularly limited in the embodiments of the present application. For example: the first predetermined size may be set to 640 pixels by 640 pixels.

Step five: transforming the sample images adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images;

this step is a transformation extension of the sample image.

Step six: and training a preset face model based on each sample image adjusted to be in a first preset size and the plurality of transformed sample images corresponding to each sample image.

In the actual implementation process, a tensoflow frame can be adopted for model training, the adam is used by the optimizer, the initial learning rate is 0.01, and the learning rate is attenuated by one tenth every 1000 rounds.

In an optional embodiment, the step of transforming the sample image adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images includes:

randomly shielding a partial area in the sample image adjusted to the first preset size to generate a plurality of transformed sample images, wherein one transformed sample image is generated by random shielding each time; and/or adjusting preset parameters of the sample image adjusted to the first preset size to generate a plurality of transformed sample images, wherein each time the preset parameters are adjusted to generate one transformed sample image, the preset parameters include: at least one of chroma and brightness; and/or carrying out fuzzy processing on the sample image adjusted to the first preset size to generate a plurality of transformed sample images.

The mode of transforming the sample images can quickly obtain more reliable sample images, and compared with the mode of manually screening and marking the sample images one by one, the method can save human resources and the sampling efficiency of the sample images.

Referring to fig. 3, a block diagram of a face detection apparatus according to an embodiment of the present invention is shown.

The face detection device of the embodiment of the invention can comprise the following modules:

an input module 301, configured to input a face image to be detected into a pre-trained face model, where the face model includes: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module;

a first obtaining module 302, configured to obtain a face pose angle, a face key point, and a face target frame output by the face model;

and the display module 303 is configured to display the face pose angle, the face key point, and the face target frame in the current interface.

Optionally, the display module comprises:

Optionally, the apparatus further comprises:

Optionally, the image transformation module comprises:

The face detection device provided by the embodiment of the invention inputs the face image to be detected into the pre-trained face model, and the face model comprises the face detection network module, the key point detection network module and the pose angle detection network module, so that the detection information of three dimensions, namely a face pose angle, a face key point and a face target frame, can be obtained after the face image to be detected is detected.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

In an embodiment of the invention, an electronic device is also provided. The electronic device may include one or more processors and one or more machine-readable media having instructions, such as an application program, stored thereon. The instructions, when executed by the one or more processors, cause the processors to perform the above-described face detection method.

In an embodiment of the present invention, there is also provided a non-transitory computer-readable storage medium having a computer program stored thereon, the program being executable by a processor of an electronic device to perform the above-mentioned face detection method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The above detailed description is made on a face detection method and apparatus, an electronic device and a storage medium provided by the present invention, and a specific example is applied in the present document to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A face detection method, comprising:

inputting a face image to be detected into a pre-trained face model, wherein the face model comprises: the system comprises a face detection network module, a key point detection network module and a pose angle detection network module;

2. The method of claim 1, wherein the step of displaying the face pose angles, face key points, and face target boxes in the current interface comprises:

3. The method according to claim 1, wherein before the step of inputting the face image to be detected into a pre-trained face model, the method further comprises:

and adjusting the face image to be detected into a color image with a first preset size.

4. The method according to claim 1, wherein before the step of inputting the face image to be detected into a pre-trained face model, the method further comprises:

adjusting each sample image into a color image with a second preset size;

5. The method according to claim 4, wherein the step of transforming the sample image adjusted to the first preset size according to a preset rule to obtain a plurality of transformed sample images comprises:

adjusting preset parameters of the sample image adjusted to be in a first preset size, and generating a plurality of transformed sample images, wherein each preset parameter adjustment generates one transformed sample image, and the preset parameters include: at least one of chroma and brightness; and/or the presence of a gas in the gas,

6. An apparatus for face detection, the apparatus comprising:

7. The apparatus of claim 6, wherein the display module comprises:

8. The apparatus of claim 6, further comprising:

the first adjusting module is used for adjusting the face image to be detected into a color image with a first preset size before the input module inputs the face image to be detected into a pre-trained face model.

9. The apparatus of claim 6, further comprising:

10. An electronic device, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon;

the instructions, when executed by the one or more processors, cause the processors to perform a face detection method as claimed in any one of claims 1 to 5.