CN113158706A

CN113158706A - Face snapshot method, device, medium and electronic equipment

Info

Publication number: CN113158706A
Application number: CN202010014960.9A
Authority: CN
Inventors: 江璐; 李朋
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-07-23
Anticipated expiration: 2040-01-07
Also published as: CN113158706B

Abstract

Disclosed are a face snapshot method, apparatus, medium, and electronic device, wherein the face snapshot method includes: determining a first face-based tracking result and a second head-based tracking result from the detected face region and the head region in the image sequence; determining a face sequence of the target to be captured according to the first tracking result and the second tracking result; and determining a face snapshot of the target to be snapshot based on the face sequence. The technical scheme provided by the disclosure is favorable for reducing the re-grabbing rate of face snapshot and improving the accuracy of face snapshot.

Description

Face snapshot method, device, medium and electronic equipment

Technical Field

The present disclosure relates to computer vision technologies, and in particular, to a face capturing method, a face capturing apparatus, a storage medium, and an electronic device.

Background

The face snapshot is applied to the fields of security monitoring, intelligent business and the like, and has a great application value. The requirements for face capture typically include: a person with a front face appearing in a sequence of images is snap-shot and a person with a front face in the sequence of images is snap-shot only once.

How to accurately realize face snapshot is a technical problem of great concern.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a face snapshot method, a face snapshot device, a storage medium and electronic equipment.

According to an aspect of an embodiment of the present disclosure, there is provided a face capturing method including: determining a first face-based tracking result and a second head-based tracking result from the detected face region and the head region in the image sequence; determining a face sequence of the target to be captured according to the first tracking result and the second tracking result; and determining a face snapshot of the target to be snapshot based on the face sequence.

According to still another aspect of the embodiments of the present disclosure, there is provided a face capturing apparatus including: a detection module for determining a first face-based tracking result and a second head-based tracking result from the detected face region and head region in the image sequence; the face sequence determining module is used for determining a face sequence of the target to be snapshot according to the first tracking result and the second tracking result determined by the detection module; and the snapshot image determining module is used for determining the face snapshot image of the target to be snapshot based on the face sequence determined by the face sequence determining module.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for implementing the above method.

According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method.

According to the face snapshot method and device provided by the embodiment of the disclosure, the face sequence of the target to be snapshot is determined by utilizing the first face-based tracking result and the second head-based tracking result, so that the phenomenon that the face sequence of the target to be snapshot is incomplete is avoided, and a more accurate face sequence of the target to be snapshot is obtained. The face snapshot of the target to be snapshot is obtained by utilizing the more accurate face sequence, so that the phenomenon that the face snapshot is repeatedly obtained for the same target to be snapshot is avoided, and the more appropriate face snapshot is selected from the face sequence. Therefore, the technical scheme provided by the disclosure is beneficial to reducing the re-grabbing rate of face snapshot and improving the accuracy of face snapshot.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic diagram of a scenario in which the present disclosure is applicable;

FIG. 2 is a flow chart of one embodiment of a face snapshot method of the present disclosure;

FIG. 3 is a flow diagram of one embodiment of the present disclosure for determining a face sequence of a target to be snapshotted in conjunction with a first tracking result and a second tracking result;

FIG. 4 is a flow diagram of one embodiment of a facial snapshot map of a target to be snapshot of the present disclosure;

FIG. 5 is a flow diagram of one embodiment of the present disclosure for determining face quality for valid faces;

FIG. 6 is a flow chart of another embodiment of a face snapshot method of the present disclosure;

FIG. 7 is a schematic structural diagram of one embodiment of a facial capture device of the present disclosure;

fig. 8 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the disclosure

In the process of implementing the present disclosure, the inventor finds that, in the face snapshot process, target tracking is generally required, and a front face is selected from a face sequence of a target to be snapshot obtained by tracking, and is used as a face snapshot of the target to be snapshot. In the process of tracking the target to be snapshotted, if the target to be snapshotted has actions such as turning around, the face sequence of the target to be snapshotted is often incomplete. For example, the face sequences of two or more segments obtained by tracking are actually the face sequence of the same target to be captured, but are regarded as the face sequences of two or more targets to be captured. For another example, the object to be captured may be discarded because the duration of the object to be captured that appears in the video frame again after turning around is too short. The incomplete face sequence of the target to be snapshotted not only influences the re-grabbing rate of the face snapshot, but also influences the quality of the selected face snapshot image, thereby influencing the accuracy of the face snapshot.

Brief description of the drawings

The face snapshot technology provided by the disclosure can be suitable for the fields of security monitoring, intelligent commerce and the like. One example is shown in figure 1.

In fig. 1, at least one image pickup apparatus 100 (only one image pickup apparatus 100 is schematically shown in fig. 1) and a device 101 are installed in a place where security monitoring is required, such as a bank, an airport, or a stock office, and the image pickup apparatus 100 and the device 101 are connected to each other, so that information can be exchanged between the image pickup apparatus 100 and the device 101. The device 101 may be a server or a desktop computer or the like.

The image pickup apparatus 100 can be in a recording state all day. The image pickup device 100 can transmit the video obtained by shooting to the equipment 101 in real time, the equipment 101 executes face snapshot processing on the video shot by the image pickup device 100, the image pickup device 100 can also execute face snapshot processing locally, and transmits face snapshot images of various targets to be snapshot obtained by the face snapshot processing to the equipment 101.

For example, assuming that the user 102, the user 103 and the user 104 are simultaneously active in a security monitored location, the camera 100 captures a video including the user 102, the user 103 and the user 104, and the video is transmitted to the device 101 in real time. The device 101 performs a face-snapshot process on the video, thereby obtaining a face sequence of the user 102, a face sequence of the user 103, and a face sequence of the user 104. The device 101 selects a frontal and clear facial image from the sequence of faces of the user 102 as a snapshot of the face of the user 102. The device 101 selects a frontal and sharp facial image from the sequence of faces of the user 103 as a snapshot of the face of the user 103. The device 101 selects a frontal and clear facial image from the sequence of faces of the user 104 as a snapshot of the face of the user 104. Thereafter, the device 101 may perform matching processing on the face snapshot of the user 102, the face snapshot of the user 103, and the face snapshot of the user 104 with the user records in the base library, respectively, successfully identify the user when the face snapshots are successfully matched with the user records in the base library, and may further determine whether to add the face snapshot to the corresponding user records or to replace the face images in the user records with the face snapshots. Upon an unsuccessful match with a user record in the underlying repository, a user record may be created in the underlying repository for the user.

Exemplary method

FIG. 2 is a flow chart of one embodiment of a face snapshot method of the present disclosure. The method shown in fig. 2 mainly comprises: s200, S201, and S202. The following describes each step.

S200, according to the detected face region and the head region in the image sequence, determining a first face-based tracking result and a second head-based tracking result.

An image sequence in this disclosure may refer to a piece of video formed of a plurality of consecutive video frames. For example, the image sequence may be a video captured in real time by an imaging device or a history video. In the case where the image sequence is a video captured by the camera in real time, the number of images in the image sequence in the present disclosure may be changed in real time, for example, during the process of capturing the video by the camera, a video frame obtained by current capturing is successively added as an image to the image sequence.

The present disclosure may employ a face detection algorithm (e.g., a multi-layered perceptron model-based face detection algorithm) and a head detection algorithm (e.g., a multi-layered perceptron model-based head detection algorithm) to perform face detection and head detection on an image sequence, such that all face regions and all head regions in each image in the image sequence may be obtained.

The face area in the present disclosure may refer to an area of a face picture containing one target to be captured in an image. The head region in the present disclosure may refer to a region of the image that contains a head picture of an object to be captured. The present disclosure may perform Multi-target Tracking processing on each face region in each image according to an MOT (Multi-Object Tracking) algorithm, to obtain a first Tracking result based on a face. Similarly, the present disclosure may perform multi-target tracking processing on each head region in each image by using an MOT algorithm, to obtain a second tracking result based on the head. The object to be snapshotted in the present disclosure generally refers to a person.

The first face-based tracking result in the present disclosure may refer to distinguishing all face regions in each image of the image sequence from the target to be captured, so as to determine the face region of each target to be captured in a plurality of continuous images. The method can set one face identification for each face area through an MOT algorithm, and different face identifications can be considered as different targets to be snapped. The same face identification can be regarded as the same target to be captured. The face regions with the same face identification in a plurality of continuous images can form a face sequence, and the face sequence is the face sequence of an object to be captured. That is, for any object to be captured, the face regions of the object to be captured in the plurality of continuous images form a face sequence, all face regions in the face sequence have the same face identification, and the face identifications of all face regions in the face sequence can be regarded as the identification of the object to be captured. The face sequence obtained in this step may be regarded as an initial face sequence of the target to be captured. All initial face sequences of the target to be captured are the first face-based tracking result obtained by the present disclosure.

The second head-based tracking result in the present disclosure may refer to distinguishing all head regions in each image of the image sequence from the objects to be captured, so as to determine the head region of each object to be captured in a plurality of consecutive images. According to the method, a head mark can be set for each head area through an MOT algorithm, and different head marks can be considered as different targets to be snapped. The same head identification can be regarded as the same object to be captured. The head regions with the same head identification in a plurality of continuous images can form a head sequence, and the head sequence is the head sequence of an object to be captured. That is to say, for any object to be captured, the head regions of the object to be captured in the plurality of continuous images form a head sequence, all head regions in the head sequence have the same head identifier, and the head identifiers of all head regions in the head sequence can be regarded as the identifier of the object to be captured. The head sequences of all the targets to be snapped obtained in this step are the second tracking results based on the head obtained in this disclosure.

S201, determining a face sequence of the target to be captured according to the first tracking result and the second tracking result.

The present disclosure needs to consider the head-based second tracking result in determining the face sequence of the target to be snapshotted. That is to say, the present disclosure does not directly use the initial face sequence of each target to be captured as the face sequence of each target to be captured after obtaining the initial face sequence of each target to be captured, but needs to determine the face sequence of the target to be captured in combination with the second tracking result, so that the initial face sequences of two or more targets to be captured may be merged and used as the face sequence of one target to be captured in S200, thereby making the number of the face sequences of the target to be captured obtained by the present disclosure more likely to coincide with the number of the actual targets to be captured in the image sequence, and avoiding the phenomenon that the number of the face sequences of the target to be captured is more than the number of the actual targets to be captured in the image sequence as much as possible.

S202, determining a face snapshot image of the target to be snapshot based on the face sequence.

For any object to be captured, the face capture image of the object to be captured may be a face region selected from the face sequence of the object to be captured. The face snapshot of the object to be snapshot is usually a face area containing the front face of the object to be snapshot and with a clear picture. The method and the device can screen the face sequence of the target to be snapshotted according to the preset selection strategy of the face snapshotting picture, and use the screened face area as the face snapshotting picture of the target to be snapshotted.

The face sequence of the target to be snapshot is determined by utilizing the first face-based tracking result and the second head-based tracking result, so that the phenomenon that the face sequence of the target to be snapshot is incomplete is avoided, and the more accurate face sequence of the target to be snapshot is obtained. The face snapshot of the target to be snapshot is obtained by utilizing the more accurate face sequence, so that the phenomenon that the face snapshot is repeatedly obtained for the same target to be snapshot is avoided, and the more appropriate face snapshot is selected from the complete face sequence. Therefore, the technical scheme provided by the disclosure is beneficial to reducing the re-grabbing rate of face snapshot and improving the accuracy of face snapshot. The re-capturing rate in the present disclosure may refer to an index for measuring that the target to be captured is repeatedly captured, that is, a ratio of the number of the repeatedly captured targets to the number of the correctly captured targets.

In an alternative example, one implementation of the present disclosure to determine a face sequence of a target to be captured in conjunction with a first tracking result and a second tracking result is shown in fig. 3.

In fig. 3, S300 corrects the first tracking result based on the second tracking result.

Optionally, all the header regions included in any header sequence in the present disclosure have the same header identification, and the identification of the header region included in one header sequence may be regarded as the identification of the header sequence. The present disclosure may form at least one header sequence according to the respective header identifications of all the header regions, for example, all the header regions having the same header identification form one header sequence; then, a head sequence corresponding to each face sequence is determined, and if a situation that one head sequence corresponds to a plurality of face sequences exists, the present disclosure needs to merge the plurality of face sequences corresponding to the head sequence, so that the plurality of face sequences corresponding to the head sequence become a face sequence of a target to be snapshoted. The present disclosure may implement merging of multiple face sequences by modifying the face identifications of the multiple face sequences so that the multiple face sequences have the same face identification. According to the method and the device, the plurality of face sequences corresponding to the head sequence are combined, so that the phenomenon that a face tracking track of a target to be snapped is broken can be avoided, and the accuracy of a first tracking result is further ensured.

Optionally, the present disclosure may determine whether the identifier of the head region and the identifier of the face region correspond to the same target to be captured, that is, determine the head sequence corresponding to one face sequence, according to the overlapping condition of the head region and the face region in one image of the image sequences. The method and the device can also determine whether the identification of the head area and the identification of the face area correspond to the same target to be snapped according to the overlapping condition of the head area and the face area in a plurality of images in the image sequence.

For example, for a head region with a mark i and a face region with a mark j in an nth image in an image sequence, if the ratio of the intersection area of the head region with the mark i and the face region with the mark j to the area of the phase-parallel region of the head region with the mark i and the face region with the mark j is not less than a predetermined ratio, it may be considered that the head sequence corresponding to the head sequence with the mark i is the head sequence formed by the head region with the mark j, that is, the head sequence with the mark i and the head sequence with the mark j belong to the same object to be captured.

For another example, for a head region with the identifier i and a face region with the identifier j in a plurality of images in the image sequence, the ratio of the intersection area of the head region with the identifier i and the face region with the identifier j in each image to the area of the phase of the head region with the identifier i and the face region with the identifier j may be calculated, and a mean value may be calculated for a plurality of ratios, and if the mean value is not less than a predetermined ratio, it may be considered that the head sequence corresponding to the face sequence with the identifier i is a head sequence formed by the head region with the identifier j, that is, the head sequence with the identifier i and the head sequence with the identifier j belong to the same object to be candid.

S301, determining a face sequence of the target to be captured according to the corrected first tracking result.

Alternatively, in the case of face sequence merging, the present disclosure may use the merged face sequence as a face sequence of the target to be snapshotted. In the case where no face sequence merging occurs, the present disclosure may treat the initial face sequence as the face sequence of the target to be snapped.

During the shooting process of the camera device, the head posture of the object to be captured usually changes continuously. For example, a phenomenon of frequently turning the head occurs due to the object to be snapshotted talking with another person. As another example, a multi-directional look around phenomenon may occur due to the target to be captured viewing the surrounding environment. When the head of the target to be captured is greatly twisted or sagged to cause the face of the target to be captured to be incapable of being captured by the camera device, the face tracking processing of the target to be captured by the method may be terminated, so that a face sequence of the target to be captured is formed, and when the face of the target to be captured is captured by the camera device again, the multi-target tracking processing is often performed on the target to be captured again, so that the face sequence of the target to be captured is formed again. However, the first face-based tracking result obtained at the subsequent time and the first face-based tracking result obtained at the previous time are not taken as the first tracking result of the same target to be captured, but are taken as the first tracking results of two targets to be captured. That is to say, the face tracking track of one target to be captured is broken, so that the first face-based tracking result originally belonging to the same target to be captured is used as the first face-based tracking result of different targets to be captured. Because the head tracking track of the target to be snapshotted is not broken possibly when the camera device cannot capture the face of the target to be snapshotted, namely the head tracking track of the target to be snapshotted is continuous possibly, the first tracking result is corrected by utilizing the second tracking result obtained by head tracking processing, a plurality of initial face sequences can be combined into one face sequence, the broken face tracking track is continuous, and accordingly the completeness of the face sequence of the target to be snapshotted is guaranteed.

In an alternative example, the present disclosure determines an example of a face snapshot of a target to be captured based on a face sequence as shown in fig. 4.

In fig. 4, S400, a valid face meeting a first preset condition in the face sequence of the target to be captured is determined.

Optionally, the first preset condition in the present disclosure may be set according to an actual requirement. For example, the first preset condition may be a condition set for the position of the face detection frame, the size of the face detection frame, whether the face is a frontal face, and the occlusion condition of the face.

Optionally, the first preset condition in the present disclosure may include: the face detection frame belongs to at least one of a preset region range of the image, the width of the face detection frame reaches a first preset value, the height of the face detection frame reaches a second preset value, the area of the face detection frame reaches a third preset value, the face in the face detection frame belongs to the front face, and the face shielding score in the face detection frame is smaller than a fourth preset value.

The face detection frame belonging to the preset region range of the image may refer to that the face detection frame does not belong to an edge region of the image. For example, the face detection box does not coincide with the edge region. For another example, the ratio of the area of the face detection frame in the edge region of the image to the total area of the face detection frame does not exceed a ratio, and the like.

The fact that the face in the face detection frame belongs to the front face may mean that a pitch value (i.e., a pitch angle) and a yaw value (i.e., a yaw angle) of the face respectively belong to respective corresponding angle ranges, and the like.

The face shielding score in the face detection frame is used for measuring the face shielding condition, and the face shielding score can be embodied through the visible condition of the face key point. For example, the facial occlusion score may be the number of visible facial keypoints, i.e., the number of visible facial keypoints should reach a certain number. Each face keypoint typically has multiple confidences, and a face keypoint may be considered visible if the highest confidence of all confidences for that face keypoint is not less than a predetermined confidence. As another example, the facial occlusion score may be the mean of the confidence of visible facial keypoints, or the like. This disclosure does not limit the concrete representation of the face occlusion score.

For any face region in any face sequence, if the face region meets the first preset condition in the disclosure, the face region can be considered as a valid face.

S401, determining the face quality of the effective face.

Alternatively, the face quality in the present disclosure may be an index set for the standard of the face snapshot. The present disclosure may determine the face quality of the valid face by scoring aspects of the valid face. For example, the present disclosure may determine the face quality of the valid face by scoring the face key points, the face pose, and the face detection box, respectively, and based on the respective scoring results. One specific example of the present disclosure to determine the face quality of a valid face is as described below with respect to fig. 5.

S402, determining a face snapshot image of the target to be snapshot according to the face quality of the effective face.

Optionally, this disclosure may determine whether the face sequence of a target to be snapshot satisfies a second preset condition, and if the second preset condition is satisfied, select the effective face with the highest face quality from the effective face, and determine whether the highest face quality satisfies a preset face quality requirement, and if the preset face quality requirement is satisfied, take the effective face with the highest face quality as the face snapshot of the target to be snapshot. If the preset face quality requirement is not met, the subsequent processing operation can be determined according to the actual situation. For example, the face sequence of the target to be captured may be discarded, or a new face region may be continuously waited for to be added to the face sequence.

Optionally, the second preset condition in the present disclosure may include: whether the length of the face sequence of the target to be snapped reaches a predetermined sequence length and whether the disappearance duration of the face sequence of the target to be snapped in the image sequence reaches a predetermined duration.

For example, the present disclosure may first determine the length of the face sequence of the target to be captured, and determine whether the length has reached the predetermined sequence length, and if the length has reached the predetermined sequence length as a result of the determination, determine that the face sequence of the target to be captured satisfies the second preset condition.

As another example, the present disclosure may first determine a disappearance time of the face sequence of the target to be captured in the image sequence, determine whether the disappearance time has reached a predetermined time, and determine that the face sequence of the target to be captured satisfies the second preset condition if the disappearance time has reached the predetermined time.

As another example, the present disclosure may first determine a length of the face sequence of the target to be captured and a disappearance duration of the face sequence of the target to be captured in the image sequence, and determine whether the length of the face sequence of the target to be captured has reached a predetermined sequence length and whether the disappearance duration of the face sequence of the target to be captured in the image sequence has reached the predetermined duration, and if the determination result is that the length of the face sequence of the target to be captured has reached the predetermined sequence length and the disappearance duration of the face sequence of the target to be captured in the image sequence has reached the predetermined duration, determine that the face sequence of the target to be captured satisfies the second preset condition.

The present disclosure can avoid performing an operation of acquiring an unnecessary face snapshot of a target to be snapshot by setting a second preset condition for a face sequence. Particularly, the preset sequence length is set according to the length of the face sequence of the target to be snapshot, so that the phenomenon that the face snapshot image of the target to be snapshot cannot be obtained in time due to the fact that the target to be snapshot appears in the image sequence for a long time can be avoided; moreover, the phenomenon that a new target to be snapped flashes in the image sequence due to factors such as tracking errors of the target to be snapped and the like can be avoided, the phenomenon that the flashing target to be snapped is subjected to face snapping is avoided, and the phenomenon that a face snapping picture is obtained from an invalid face sequence is favorably avoided. The face sequence capturing method and device have the advantages that the preset time length is set according to the disappearance time length of the face sequence of the target to be captured, and under the condition that the face tracking track is broken, two or more face sequences can be combined into a complete face sequence. Therefore, the face snapshot image of the target to be snapshot is obtained from the effective and complete face sequence.

In fig. 5, in S500, the face key point detection and the face pose detection are performed on the image sequence, so as to obtain the face key point and the face pose of each target to be captured in the image sequence.

Optionally, the present disclosure may perform face keypoint detection and face pose detection on all face regions in all images in the image sequence, respectively, so as to obtain a face keypoint and a face pose corresponding to each face region. The facial pose may include a pitch value (i.e., pitch angle) and a yaw value (i.e., yaw angle) of the face. Of course, the face pose may also include the roll value (i.e., roll angle) of the face.

Optionally, the present disclosure may provide all face regions in each image to a convolutional neural network, and perform feature extraction, keypoint prediction, and other processing on the input face region through the convolutional neural network.

Optionally, the present disclosure may provide all face regions in each image to a convolutional neural network, and perform feature extraction, angle prediction, and other processing on the input face region through the convolutional neural network, and the present disclosure may obtain a pitch value (i.e., a pitch angle), a yaw value (i.e., a yaw angle), and a roll value (i.e., a roll angle) corresponding to each face region according to an output of the convolutional neural network.

Optionally, the convolutional neural network for predicting the key point of the face and the convolutional neural network for predicting the angle may be two independent convolutional neural networks, or may be a convolutional neural network having a main network for extracting image features and two network branches, where one network branch is used for predicting the key point of the face according to the image features extracted by the main network, and the other network branch is used for predicting a pitch value (i.e., a pitch angle), a yaw value (i.e., a yaw angle), and a roll value (i.e., a roll angle) according to the image features extracted by the main network. The present disclosure does not limit the specific implementation of obtaining facial keypoints and facial poses for each facial region in an image sequence.

S501, determining the face quality of the effective face according to the obtained face key points, the face posture and the face detection frame.

Optionally, for any effective face of any target to be captured, the present disclosure may determine the face quality of the effective face by using the face key points, the face pose, and the size of the face detection box of the effective face.

Optionally, the present disclosure may determine the non-occlusion quality of an effective face according to the face key points of the effective face. That is, the present disclosure may score the non-occlusion quality of the effective face according to the facial keypoints of the effective face. For example, the present disclosure may average the confidence levels of all visible keypoints for a valid face and take the calculated average as a score of the non-occluded mass of the valid face. As another example, the present disclosure may calculate a weighted average of the confidence of all visible keypoints for a valid face and take the weighted average as a score of the non-occluded quality of the valid face. The weight of each key point can be determined according to the importance degree of each key point. For example, the eye keypoints may be weighted slightly more, while the face contour keypoints may be weighted slightly less.

Optionally, the present disclosure may determine the frontal quality of an effective face based on the facial pose of the effective face. That is, the present disclosure may score the frontal face quality of a valid face based on its pitch value (i.e., pitch angle) and yaw value (i.e., yaw angle). The frontal face quality of the effective face may also be referred to as the face orientation quality. The present disclosure may fit the face orientation quality of the valid face using an elliptical formula, for example, the present disclosure may design an elliptical formula based on the following four assumptions:

assuming that when yaw is 0 degrees and pitch is 0 degrees, the score of positive face quality score is 1;

assuming that when yaw is 40 degrees and pitch is 0 degrees, the score of positive face quality score is 0.5;

assuming that when yaw is 0 degrees and pitch is 30 degrees, the score of positive face quality score is 0.5;

assuming that when yaw is 40 degrees and pitch is 30 degrees, the score of positive face quality score is 0;

the present disclosure can obtain the following formula (1) by spotting the elliptic formula, and obtain the score of the front face quality of the effective face using the following formula (1):

1- (yaw x yaw/1600+ pitch x pitch/900) × 10 formula (1)

Alternatively, the present disclosure may determine the size quality of an effective face according to the size of the face detection box of the effective face. That is, the present disclosure may score the size quality of the valid face according to the size of the face detection box of the valid face. The present disclosure may utilize a piecewise scoring function to determine the size quality of the effective face. For example, when the minimum size of the width and height of the face detection box of the valid face is greater than a large threshold, the score of the size quality of the valid face is 1. When the minimum size of the width and height of the face detection box of a valid face is equal to a knee threshold, the score of the size quality of the valid face is 0.5. When the minimum size of the width and height of the face detection box of the valid face is smaller than a smaller threshold, the score of the size quality of the valid face is 0. When the minimum size of the width and height of the face detection box of a valid face is between the above-mentioned inflection point threshold and the larger threshold, the scores of the size quality of the valid face are linearly distributed between 0.5 and 1. When the minimum size of the width and height of the face detection box of a valid face is between the above-mentioned inflection point threshold and the smaller threshold, the scores of the size quality of the valid face are linearly distributed between 0.5 and 0.

Optionally, the present disclosure may perform weighting calculation according to the unshielded quality and weight thereof, the front face quality and weight thereof, and the size quality and weight thereof of the effective face, and use the calculation result as the face quality of the effective face. The weights of the unshielded quality, the positive quality, and the size quality of the effective face in this disclosure are usually not the same, and the sum of the three weights may be 1. For example, the weight of the unobstructed quality and the frontal quality of the valid face may be greater than the weight of the size quality.

The face quality of each effective face can be conveniently and accurately determined by determining the face quality of the effective face by using the face key points, the face posture and the face detection frame. The method comprises the steps of determining the unshielded quality of an effective face by using key points of the face, determining the front face quality of the effective face by using the face posture, determining the size quality of the effective face by using the size of a face detection frame, determining the face quality of the effective face by using the qualities and corresponding weights, providing a feasible and objective evaluation mode for measuring the face quality of the effective face, and being beneficial to ensuring that the finally obtained face snapshot of a target to be snapshot is the best face image.

Fig. 6 is a flow chart of another embodiment of a face snapshot method of the present disclosure. The method as shown in fig. 6 mainly comprises the steps of: S600-S611. The following describes each step.

And S600, the camera device generates video frames in real time through video recording.

Optionally, each video frame generated by the camera device is taken as an image in the image sequence.

S601, respectively carrying out face detection on the currently generated video frame in real time to obtain all face detection frames in the currently generated video frame.

Alternatively, the present disclosure may provide a video frame currently generated by the image pickup apparatus as an input to the first neural network, and obtain the center point coordinates and the width and height of all the face detection boxes in the currently generated video frame according to the output of the first neural network.

S602, performing head detection on the currently generated video frame in real time to obtain all head detection frames in the currently generated video frame.

Optionally, the present disclosure may provide a video frame currently generated by the camera device as an input to the second neural network, and obtain the center point coordinates and the width and height of all the head detection boxes in the currently generated video frame according to the output of the second neural network.

S603, according to each face detection frame in the currently generated video frame, each face area is intercepted from the currently generated video frame.

Optionally, the present disclosure may intercept each face region from the currently generated video frame according to the coordinates, width, and height of the center point of each face detection box in the currently generated video frame.

S604, according to the head detection frames in the currently generated video frame, all the head areas are intercepted from the currently generated video frame.

Optionally, the present disclosure may intercept each head region from the currently generated video frame according to the coordinates, width, and height of the center point of each head detection box in the currently generated video frame.

And S605, performing multi-target tracking processing on each face area in the currently generated video frame.

Optionally, the present disclosure sets a face identifier for each face region, so that the corresponding face region is added to the currently existing face sequence, or a new face sequence is created for the corresponding face region. Each face region in a face sequence has the same face identification and different face sequences have different face identifications.

S606, face key point detection and face pose detection are performed on each face region, and face key points, pitch values, and yaw values of each face region are obtained.

S607, performing multi-target tracking processing on each head region in the currently generated video frame.

Optionally, the present disclosure sets a header identifier for each header region, so that the corresponding header region is added to the currently existing header sequence, or a new header sequence is created for the corresponding header region. Each header region in a header sequence has the same header identification and different header sequences have different header identifications.

And S608, merging the current face sequences by using the current head sequences.

And S609, determining the effective face and the face quality of the effective face according to the face detection frame, the face key point, the pitch value and the yaw value of each face region in the currently generated video frame.

S610, for each current face sequence, determining whether the length of the current face sequence (e.g., the number of face images included in the face sequence) reaches a predetermined sequence length, whether the disappearing time of the face sequence reaches a predetermined time, and whether the score of the optimal face quality of the effective face in the face sequence reaches a predetermined score (i.e., whether the optimal face quality meets a preset face quality requirement), and if the length of the face sequence reaches the predetermined sequence length and the score of the optimal face quality of the effective face in the face sequence reaches the predetermined score, going to S611; if the disappearing time of the face sequence reaches the predetermined time, go to S611; if the length of the face sequence does not reach the predetermined sequence length or the disappearance duration of the face sequence does not reach the predetermined duration or the score of the optimal face quality of the effective faces in the face sequence does not reach the predetermined score although the length of the face sequence reaches the predetermined sequence length, return is made to S610.

And S611, outputting the face area with the optimal face quality of the effective face in the face sequence.

Alternatively, in S610, if the disappearing time of a face sequence reaches a predetermined time and the score of the optimal face quality of the valid faces in the face sequence does not reach a predetermined score, the present disclosure may discard the face sequence.

Exemplary devices

Fig. 7 is a schematic structural diagram of an embodiment of a face capture device of the present disclosure. The device of the embodiment can be used for realizing the corresponding method embodiment of the disclosure. The apparatus shown in fig. 7 comprises: a detection module 700, a determine face sequence module 701, and a determine snapshot module 702. Optionally, the apparatus of the present disclosure may further include: the keypoints and poses are detected 703.

The detection module 700 is configured to determine a first face-based tracking result and a second head-based tracking result from a detected face region and a detected head region in the sequence of images.

The face sequence determining module 701 is configured to determine a face sequence of the target to be snapshotted according to the first tracking result and the second tracking result determined by the detecting module 700.

Optionally, the module 701 for determining a face sequence may modify the first tracking result obtained by the detection module 700 according to the second tracking result obtained by the detection module 700. For example, the determine face sequence module 701 may combine all face sequences formed by head regions with the same head identifier in the image sequence into one face sequence. Then, the determine face sequence module 701 may determine a face sequence of the target to be captured according to the corrected first tracking result.

The snapshot-determining module 702 is configured to determine a face snapshot of the target to be snapshot based on the face sequence determined by the face-sequence-determining module 702.

The key point and posture detection module 703 is configured to perform face key point detection and face posture detection on the image sequence, respectively, to obtain a face key point and a face posture of each target to be captured in the image sequence.

Optionally, the determining a snapshot module 702 may include: a determine valid faces sub-module 7021, a determine face quality sub-module 7022, and a determine snapshot sub-module 7023. The effective face determining sub-module 7021 is configured to determine an effective face that meets a preset condition in the face sequence of the target to be captured. The determine face quality sub-module 7022 is used to determine the face quality of valid faces. For example, the determine face quality sub-module 7022 may determine the face quality of the valid face based on the face keypoints, the face pose, and the face detection box. In a more specific example, the determine face quality sub-module 7022 determines the non-occlusion quality of the valid face according to the face key points of the valid face, the determine face quality sub-module 7022 may determine the frontal face quality of the valid face according to the face pose of the valid face, and the determine face quality sub-module 7022 may determine the size quality of the valid face according to the face detection frame of the valid face; the determine face quality sub-module 7022 may determine the face quality of the valid face based on the unshielded quality and its weight, the positive face quality and its weight, and the size quality and its weight of the valid face. The snapshot image determining submodule 7023 is configured to determine a face snapshot image of the target to be snapshot according to the face quality of the effective face.

Optionally, the determining snapshot map sub-module 7023 may select an effective face with the highest face quality from the effective faces when the face sequence of the target to be snapshot satisfies a second preset condition, and use the effective face with the highest face quality as the face snapshot map of the target to be snapshot when the highest face quality satisfies a preset face quality requirement. In an example, the determine-snap-shot submodule 7023 may determine a length of a face sequence of the target to be snapped, and if the length reaches a predetermined sequence length, the determine-snap-shot submodule 7023 determines that the face sequence of the target to be snapped meets a second preset condition. In another example, the snapshot determining submodule 7023 may determine a disappearing time length of the face sequence in the image sequence, and if the disappearing time length reaches a predetermined time length, the snapshot determining submodule 7023 determines that the face sequence of the target to be snapshot satisfies a second preset condition.

Exemplary electronic device

An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 8. FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 8, the electronic device 81 includes one or more processors 811 and memory 812.

The processor 811 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capability and/or instruction execution capability, and may control other components in the electronic device 81 to perform desired functions.

Memory 812 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 811 to implement the face-snapping methods of the various embodiments of the disclosure described above and/or other desired functionality. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 81 may further include: an input device 813, an output device 814, etc., which are interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 813 may also include, for example, a keyboard, a mouse, and the like. The output device 814 may output various information to the outside. The output devices 814 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 81 relevant to the present disclosure are shown in fig. 8, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 81 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the face capture method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification, supra.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the face capture method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A face snapshot method, comprising:

determining a first face-based tracking result and a second head-based tracking result from the detected face region and the head region in the image sequence;

determining a face sequence of the target to be captured according to the first tracking result and the second tracking result;

and determining a face snapshot of the target to be snapshot based on the face sequence.

2. The method of claim 1, wherein the determining a face sequence of the target to be snapped from the first tracking result and the second tracking result comprises:

correcting the first tracking result according to the second tracking result;

and determining the face sequence of the target to be captured according to the corrected first tracking result.

3. The method of claim 2, wherein said modifying the first tracking result based on the second tracking result comprises:

determining all face sequences corresponding to head sequences formed by head regions with the same head identification in the image sequences;

and combining all the face sequences into one face sequence.

4. The method of any of claims 1 to 3, wherein the determining a face snapshot of the object to be snapshot based on the face sequence comprises:

determining an effective face which meets a first preset condition in the face sequence of the target to be snapshotted;

determining a face quality of the valid face;

and determining the face snapshot image of the target to be snapshot according to the face quality of the effective face.

5. The method of claim 4, wherein the method further comprises:

respectively carrying out face key point detection and face posture detection on the image sequence to obtain face key points and face postures of the targets to be snapped in the image sequence;

the determining the face quality of the valid face comprises:

and determining the face quality of the effective face according to the face key points, the face posture and the face detection frame.

6. The method of claim 5, wherein said determining a face quality of the valid face from the face keypoints, face pose, and face detection box comprises:

determining the non-occlusion quality of the effective face according to the key points of the face;

determining the frontal face quality of the effective face according to the face pose;

determining the size quality of the effective face according to the face detection frame;

and determining the face quality of the effective face according to the unshielded quality and the weight thereof, the front face quality and the weight thereof, and the size quality and the weight thereof of the effective face.

7. The method according to any one of claims 4 to 6, wherein the determining a face snapshot of the object to be snapshot according to the face quality of the valid face comprises:

under the condition that the face sequence of the target to be snapshotted meets a second preset condition, selecting an effective face with the highest face quality from the effective faces;

and when the highest face quality meets the preset face quality requirement, taking the effective face with the highest face quality as the face snapshot picture of the target to be snapshot.

8. The method of claim 7, wherein the method further comprises:

determining the length of the face sequence of the target to be snapped, and if the length reaches the preset sequence length, determining that the face sequence of the target to be snapped meets a second preset condition; or

Determining the disappearance duration of the face sequence in the image sequence, and if the disappearance duration reaches the preset duration, determining that the face sequence of the target to be snapshotted meets a second preset condition.

9. A facial capture device comprising:

a detection module for determining a first face-based tracking result and a second head-based tracking result from the detected face region and head region in the image sequence;

the face sequence determining module is used for determining a face sequence of the target to be snapshot according to the first tracking result and the second tracking result determined by the detection module;

and the snapshot image determining module is used for determining the face snapshot image of the target to be snapshot based on the face sequence determined by the face sequence determining module.

10. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the above claims 1-8.

11. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-8.