CN114692759A

CN114692759A - Video face classification method, device, equipment and storage medium

Info

Publication number: CN114692759A
Application number: CN202210332659.1A
Authority: CN
Inventors: 王博
Original assignee: Shenzhen Wondershare Software Co Ltd
Current assignee: Shenzhen Wondershare Software Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-01

Abstract

The embodiment of the invention discloses a video face classification method, a video face classification device, video face classification equipment and a storage medium. The application is applied to the technical field of video editing, and comprises the following steps: selecting a first sorted video frame to be detected from an extracted frame set corresponding to an input video as a current video frame to be detected, and detecting a face image of the current video frame to be detected; if the image frames exist, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face surrounding block diagram, and mapping the coordinates of the face key points back to the current video frame to be detected so as to detect the face posture and the definition of the face image; if the detection is passed, the face image is corrected and cut so as to be input into a preset neural network to extract face features, and face position information of the current video frame to be detected is obtained through a series of processing; and circulating until the detection is completed to obtain the face position information of the input video. According to the method and the device, the efficiency and the accuracy of video face classification can be improved.

Description

Video face classification method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of video editing, in particular to a video face classification method, a video face classification device, video face classification equipment and a storage medium.

Background

Most of the existing face classification technologies adopt a traditional clustering algorithm (such as K-means, Threshold clustering, Mean shift, DBSCAN, Rank-order and the like) to perform unsupervised automatic clustering, but due to abundant video information, the number of video frames corresponding to a video with short time is large, so that the calculated amount of data is obviously increased, the consumed time is more, the face classification efficiency is low, and the traditional clustering algorithm can also cause the face serial number to change in the clustering process, thereby reducing the face classification accuracy.

Disclosure of Invention

The embodiment of the invention provides a video face classification method, a video face classification device, video face classification equipment and a storage medium, and aims to solve the problem that the existing video face classification efficiency and accuracy are low.

In a first aspect, an embodiment of the present invention provides a video face classification method, which includes:

selecting a first to-be-detected video frame in sequence from an extracted frame set corresponding to an input video as a current to-be-detected video frame, and detecting a face image of the current to-be-detected video frame, wherein the extracted frame set comprises a plurality of to-be-detected video frames;

if the current video frame to be detected has a face image, generating at least one face surrounding block diagram according to the face image, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face surrounding block diagram, and mapping the coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points;

respectively carrying out face posture and definition detection on the face image according to the preset number of target face key points and the face surrounding block diagram to obtain a face image quality detection result;

if the face image quality detection result is that the detection is passed, correcting and cutting the face image according to the preset number of target face key points to obtain a face cutting image;

inputting the face cropping picture into a preset neural network to extract face features, classifying the face features by a face feature classification method according to preset marks to obtain face information, and storing a face serial number, a frame number corresponding to the current video frame to be detected and upper left-corner coordinates and lower right-corner coordinates corresponding to the face bounding box picture in the face information to obtain face position information corresponding to the previous video frame to be detected;

and selecting the next video frame to be detected from the extracted frame set as the current video frame to be detected, and returning to the step of detecting the face image of the current video frame to be detected until the detection of all the video frames to be detected in the extracted frame set is completed, so as to obtain the face position information corresponding to the input video.

Further, carrying out face gesture detection on the face image through a face gesture detection method according to the preset number of target face key points to obtain a first face image quality detection result; and if the first face image quality detection result is that the detection is passed, performing definition detection on the face image by a definition detection method according to the face surrounding block diagram to obtain a second face image quality detection result, and taking the second face image quality detection result as a face image quality detection result.

Further, selecting a preset number of target face key points from the preset number of target face key points, and calculating a pitching attitude angle, a yawing attitude angle and a rolling attitude angle according to the preset number of target face key points; and if the pitching attitude angle, the yawing attitude angle and the rolling attitude angle are respectively in threshold value intervals of a preset pitching attitude angle, a preset yawing attitude angle and a preset rolling attitude angle, setting a first human face image quality detection result as a passing detection result.

Further, inputting the face surrounding block diagram into a Laplacian operator to obtain a definition value corresponding to the face image; and comparing the definition value with a preset definition value to obtain a second human face image quality detection result.

Further, normalizing the size of the face image to a preset size; calculating a plane rotation correction angle, a scaling coefficient and a rotation center according to the preset number of target face key points; and correcting the face image through affine transformation according to the preset size, the plane rotation correction angle, the scaling coefficient and the rotation center, and cutting the corrected face image to obtain a face cutting image.

Further, selecting a left pupil center key point, a right pupil center key point, a left mouth corner key point and a right mouth corner key point from the preset number of target face key points; calculating a central point between the central key point of the left pupil and the central key point of the right pupil to obtain a central point of a connecting line of two eyes; calculating a central point between the left mouth corner key point and the right mouth corner key point to obtain a central point of a connecting line of the two mouth corners; calculating a plane rotation correction angle and a scaling coefficient according to the connecting center point of the two eyes and the connecting center point of the two mouth angles; and taking the central points of the left pupil center key point, the right pupil center key point, the left mouth angle key point and the right mouth angle key point as rotation centers.

Further, if the preset flag is a preset flag value, constructing face information related to the face features, wherein the face information comprises a face serial number, a face number and a face feature center; if the preset mark is not a preset mark value, constructing a Kd-tree according to the face feature center, inputting the face feature into the Kd-tree to search so as to find the face feature center closest to the face feature as a target face feature center, and taking the distance between the target face feature center and the face feature as a target distance; if the target distance is smaller than a preset distance threshold value, the face feature is endowed with the face serial number corresponding to the target face feature center, and the face feature center corresponding to the face serial number is updated; and if the target distance is greater than a preset distance threshold value, newly adding a face serial number, and constructing new face information according to the newly added face serial number and the face features.

In a second aspect, an embodiment of the present invention further provides a video face classification apparatus, which includes:

the video detection device comprises a first detection unit, a second detection unit and a third detection unit, wherein the first detection unit is used for selecting a first to-be-detected video frame in sequence from an extraction frame set corresponding to an input video as a current to-be-detected video frame and detecting a face image of the current to-be-detected video frame, and the extraction frame set comprises a plurality of to-be-detected video frames;

a calculating unit, configured to generate at least one face bounding box according to the face image if the face image exists in the current video frame to be detected, calculate a preset number of face key points for each face bounding box through a preset face key point positioning algorithm, and map coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points;

the second detection unit is used for respectively carrying out face posture and definition detection on the face image according to the preset number of target face key points and the face surrounding block diagram to obtain a face image quality detection result;

the face image quality detection unit is used for detecting whether the face image quality detection result passes the detection or not, and if so, the face image quality detection unit is used for correcting and cutting the face image according to the preset number of target face key points to obtain a face cutting image;

a classification storage unit, configured to input the face cropping map into a preset neural network to extract face features, classify the face features according to a preset flag by a face feature classification method to obtain face information, and store a face sequence number in the face information, a frame number corresponding to the current video frame to be detected, and upper-left and lower-right coordinates corresponding to the face bounding map to obtain face position information corresponding to the previous video frame to be detected;

and the return execution unit is used for selecting the next video frame to be detected from the extracted frame set as the current video frame to be detected, and returning to the step of detecting the face image of the current video frame to be detected until the detection of all the video frames to be detected in the extracted frame set is finished so as to obtain the face position information corresponding to the input video.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.

In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.

The embodiment of the invention provides a video face classification method, a video face classification device, video face classification equipment and a storage medium. Wherein the method comprises the following steps: selecting a first sorted video frame to be detected from an extracted frame set corresponding to an input video as a current video frame to be detected, and detecting a face image of the current video frame to be detected; if the current video frame to be detected exists, generating at least one face surrounding block diagram according to the face image, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face surrounding block diagram, and mapping the coordinates of the face key points back to the current video frame to be detected to obtain a preset number of target face key points; respectively detecting the face pose and the definition of the face image according to the preset number of target face key points and the face surrounding block diagram; if the detection is passed, the face image is corrected and cut to obtain a face cutting image; inputting the face cropping picture into a preset neural network to extract face features, classifying the face features by a face feature classification method according to preset marks to obtain face information, and storing a face serial number in the face information, a frame number corresponding to the current video frame to be detected and upper left corner coordinates and lower right corner coordinates corresponding to the face surrounding frame picture to obtain face position information corresponding to the previous video frame to be detected; and circularly executing until the detection is completed to obtain the face position information corresponding to the input video. According to the technical scheme of the embodiment of the invention, firstly, the human face posture and the definition of a human face image are detected to obtain a human face image quality detection result; if the detection is passed, the face image is corrected and cut to obtain a face cut image, and the face image with poor face posture and definition can be screened out, so that the face classification efficiency is improved; the face cutting image is input into a preset neural network to extract face features, the face features are classified according to preset marks to obtain face information, then related information is stored to obtain face position information, the change of a face sequence number in the classification process is avoided, and the accuracy of video face classification can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a video face classification method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a video face classification apparatus according to an embodiment of the present invention; and

fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Referring to fig. 1, fig. 1 is a schematic flow chart of a video face classification method according to an embodiment of the present invention. The video face classification method can be applied to terminals, such as intelligent terminal equipment like smart phones, portable computers, desktop computers and notebook computers, and can be realized through software installed on the terminals, such as application programs named as video editing software, so that not only can face images with poor face posture and definition be screened out, but also the change of face serial numbers in the classification process can be avoided, and the efficiency and accuracy of video face classification can be improved. The video face classification method is explained in detail below. As shown in fig. 1, the method comprises the following steps S100-S150.

S100, selecting a first to-be-detected video frame in sequence from an extracted frame set corresponding to an input video as a current to-be-detected video frame, and detecting a face image of the current to-be-detected video frame, wherein the extracted frame set comprises a plurality of to-be-detected video frames.

In the embodiment of the present invention, before classifying a video face, an input video needs to be analyzed to obtain a video frame set, and the video frame set is extracted according to a preset interval frame to obtain an extracted frame set, where the extracted frame set includes a plurality of video frames to be detected, and the preset interval frame may be set according to an actual situation, for example, if the preset interval frame is 2 frames; and then selecting the first sorted video frame to be detected from the extracted frame set as the current video frame to be detected, and detecting the face image of the current video frame to be detected. It should be noted that, in the embodiment of the present invention, the face image detection is performed on the current video frame to be detected through a Retinaface algorithm, and if the face image exists in the current video frame to be detected, the face image is processed, understandably, if the face image does not exist in the current video frame to be detected, the current video frame to be detected is skipped over, and the next video frame to be detected in the extracted frame set is detected.

And S110, if a face image exists in the current video frame to be detected, generating at least one face surrounding block diagram according to the face image, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face surrounding block diagram, and mapping the coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points.

In the embodiment of the invention, the face image detection is carried out on the current video frame to be detected through a Retinaface algorithm, if the face image exists in the current video frame to be detected, which indicates that at least one face image exists in the current video frame to be detected, at least one face surrounding frame is generated according to the face image, wherein the face surrounding frame comprises an upper left corner coordinate and a lower right corner coordinate; after the face bounding boxes are generated, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face bounding box, wherein the preset face key point positioning algorithm is a PFLD face key point detection algorithm, and mapping the coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points. It should be noted that, in the embodiment of the present invention, the preset number of face key points and the preset number of target face key points are all 106 personal face key points.

And S120, respectively carrying out face posture and definition detection on the face image according to the preset number of target face key points and the face surrounding block diagram to obtain a face image quality detection result.

In the embodiment of the invention, according to the preset number of target face key points, carrying out face pose detection on the face image by a face pose detection method to obtain a first face image quality detection result, wherein the face pose detection method comprises the steps of firstly selecting a preset number of target face key points from the preset number of target face key points, and calculating a pitching pose angle, a yawing pose angle and a rolling pose angle according to the preset number of target face key points; and determining a first human face image quality detection result according to the pitching attitude angle, the yawing attitude angle, the rolling attitude angle, a preset pitching attitude angle threshold value, a preset yawing attitude angle threshold value and a preset rolling attitude angle. Specifically, selecting target face key points with serial numbers of 34, 39, 51, 47, 67, 56, 66, 80, 58, 64, 85, 91, 94 and 1714 from 106 target face key points, calculating the pitch attitude angle, the yaw attitude angle and the roll attitude angle according to 14 target face key points, and if the pitch attitude angle, the yaw attitude angle and the roll attitude angle are respectively in threshold intervals of a preset pitch attitude angle, a preset yaw attitude angle and a preset roll attitude angle, indicating that the face attitude is good, setting a first face image quality detection result as pass detection; otherwise, if the face posture is not in accordance with the requirement, the first face image quality detection result is set to be not passed, and the face image is discarded. It should be noted that, in the embodiment of the present invention, the threshold intervals of the preset pitch attitude angle, the preset yaw attitude angle and the preset roll attitude angle are (-15, +15), (-10, +10) and (-30, +30), respectively.

Further, when the first face image quality detection result is that the first face image passes the detection, the definition of the face image is detected through a definition detection method according to the face surrounding block diagram to obtain a second face image quality detection result, and the second face image quality detection result is used as a face image quality detection result. In the embodiment of the invention, the definition detection method comprises the steps of firstly inputting the human face surrounding block diagram into a Laplacian operator to obtain a definition value corresponding to the human face image; and comparing the definition value with a preset definition value to obtain a second human face image quality detection result, wherein the preset definition value is 200. Specifically, if the sharpness value is greater than the preset sharpness value, which indicates that the face quality is better, the second face image quality detection result is set to pass the detection; and otherwise, if the definition value is not greater than the preset definition value, indicating that the face quality is poor, setting the quality detection result of the second face image as a detection failure, and discarding the face image. It should be noted that, in the embodiment of the present invention, the accuracy of face classification can be improved by screening out a face image with a poor face pose and sharpness.

And S130, if the detection result of the quality of the face image is that the detection is passed, correcting and cutting the face image according to the preset number of target face key points to obtain a face cutting image.

In the embodiment of the invention, if the detection result of the quality of the face image is that the detection is passed, which indicates that the face image has better posture and definition, the face image is corrected and cut according to the preset number of target face key points to obtain a face cut-out image. Specifically, the Size of the face image is normalized to a preset Size, wherein the preset Size is (112, 96); calculating a plane rotation correction Angle, a scaling coefficient Scale and a rotation Center according to the preset number of target face key points, specifically, selecting a left pupil Center key point 105, a right pupil Center key point 106, a left mouth Angle key point 85 and a right mouth Angle key point 91 from the preset number of target face key points, understandably, 105, 106, 86 and 91 are key point serial numbers; calculating a central point between the key point 105 of the center of the left pupil and the key point 106 of the center of the right pupil to obtain a central point of a connecting line between two eyes; calculating a central point between the left mouth corner key point 86 and the right mouth corner key point 91 to obtain a central point of a connecting line of the two mouth corners; calculating a plane rotation correction Angle and a scaling coefficient Scale according to the connecting line central point of the two eyes and the connecting line central point of the two mouth angles, wherein the scaling coefficient is that the distance between the connecting line central point of the two eyes and the connecting line central point of the two mouth angles is 40 pixels when the face image is subjected to undistorted change; taking the Center points of the left pupil Center key point 105, the right pupil Center key point 106, the left mouth angle key point 85, and the right mouth angle key point 91 as a rotation Center; and correcting the face image through affine transformation according to the preset Size, the plane rotation correction Angle, the scaling coefficient Scale and the rotation Center, and cutting the corrected face image to obtain a face cutting image.

S140, inputting the face cropping picture into a preset neural network to extract face features, classifying the face features through a face feature classification method according to preset marks to obtain face information, and storing a face serial number in the face information, a frame number corresponding to the current video frame to be detected, and upper left-corner coordinates and lower right-corner coordinates corresponding to the face bounding box picture to obtain face position information corresponding to the front video frame to be detected.

In the embodiment of the invention, the human face cutting picture is input into the preset nerveThe network is used for extracting face features, wherein the preset neural network is a convolutional neural network, and the face features are 128-dimensional feature vectors; classifying the human face features by a human face feature classification method according to preset marks to obtain human face information; specifically, if the preset flag is a preset flag value Null, which indicates that a valid face is detected for the first time, face information { ID } related to the face features is constructed_n{ num, fea128_ center } }, wherein the face information includes a face sequence number ID_nThe number num of faces and the center 128_ center of face feature; if the preset mark is not the preset mark value Null, indicating that an effective face exists, constructing a Kd-tree according to the existing face feature center, inputting the face feature into the Kd-tree to search so as to find the face feature center closest to the face feature as a target face feature center, and taking the distance between the target face feature center and the face feature as a target distance; if the target distance is smaller than a preset distance threshold, indicating that the face feature and the target face feature center correspond to the same face image, assigning the face feature with the face sequence number corresponding to the target face feature center, and updating the face feature center corresponding to the face sequence number, in practical application, updating the face feature center corresponding to the face sequence number by a formula (fea128_ center × num + fea128_ new)/(num +1), where fea128_ new is the face feature; if the target distance is larger than a preset distance threshold value, the fact that the centers of the face features and the target face features do not correspond to the same face image is indicated, and a face serial number ID is newly added_n+1And constructing new face information { ID) according to the new face serial number and the face features_n+1{ num, fea128_ center } }. Storing the face serial number, the frame number corresponding to the current video frame to be detected, and the upper left corner coordinate and the lower right corner coordinate corresponding to the face bounding box in the face information to obtain the face position information corresponding to the video frame to be detected, for example, { ID }_n,f_N,(x₁,y₁),(x₂,y₂) Where, ID_nIs the face number, f_NThe frame number is, the coordinates of the upper left corner and the lower right corner are (x)₁，y₁)，(x₂，y₂). It should be noted that, in the embodiment of the present invention, the search of the face feature center is performed through the Kd-tree, so that the search speed is high.

S150, selecting the next video frame to be detected from the extracted frame set as the current video frame to be detected, and returning to the step of detecting the face image of the current video frame to be detected until the detection of all the video frames to be detected in the extracted frame set is completed, so as to obtain the face position information corresponding to the input video.

In the embodiment of the present invention, after the face position information corresponding to the video frame to be detected is obtained, the next video frame to be detected is selected from the extracted frame set as the current video frame to be detected, in practical applications, if the preset interval frame is 2 frames, the 4 th video frame is selected as the current video frame to be detected, and the step of performing face image detection on the current video frame to be detected is returned until the detection of all the video frames to be detected in the extracted frame set is completed, so as to obtain the face position information corresponding to the input video.

Fig. 2 is a schematic block diagram of a video face classification apparatus 200 according to an embodiment of the present invention. As shown in fig. 2, the present invention further provides a video face classification apparatus 200 corresponding to the above video face classification method. The video face classification apparatus 200 includes a unit for performing the above-described video face classification method, and may be configured in a terminal. Specifically, referring to fig. 2, the video face classification apparatus 200 includes a first detection unit 201, a calculation unit 202, a second detection unit 203, a rectification clipping unit 204, a classification saving unit 205, and a return execution unit 206.

The first detection unit 201 is configured to select a first to-be-detected video frame in a sorted order from an extracted frame set corresponding to an input video as a current to-be-detected video frame, and perform face image detection on the current to-be-detected video frame, where the extracted frame set includes a plurality of to-be-detected video frames; the calculating unit 202 is configured to generate at least one face surrounding block diagram according to the face image if the face image exists in the current video frame to be detected, calculate a preset number of face key points for each face surrounding block diagram through a preset face key point positioning algorithm, and map coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points; the second detection unit 203 is configured to perform face pose and sharpness detection on the face image according to the preset number of target face key points and the face bounding box, so as to obtain a face image quality detection result;

the correcting and cutting unit 204 is configured to, if the detection result of the quality of the face image is that the detection is passed, correct and cut the face image according to the preset number of target face key points to obtain a face cut image; the classification storage unit 205 is configured to input the face cropping map into a preset neural network to extract a face feature, classify the face feature according to a preset flag by a face feature classification method to obtain face information, and store a face sequence number in the face information, a frame number corresponding to the current video frame to be detected, and an upper left corner coordinate and a lower right corner coordinate corresponding to the face bounding map to obtain face position information corresponding to the previous video frame to be detected; the return execution unit 206 is configured to select a next video frame to be detected from the extracted frame set as a current video frame to be detected, and return to the step of performing face image detection on the current video frame to be detected until all video frames to be detected in the extracted frame set are detected, so as to obtain face position information corresponding to the input video.

In some embodiments, such as this embodiment, the second detecting unit 203 includes a first detecting subunit 2031 and a second detecting subunit 2032.

The first detecting subunit 2031 is configured to perform face pose detection on the face image by using a face pose detection method according to the preset number of target face key points to obtain a first face image quality detection result; the second detecting subunit 2032 is configured to, if the first face image quality detection result is a detection pass, perform sharpness detection on the face image by a sharpness detection method according to the face bounding box to obtain a second face image quality detection result, and use the second face image quality detection result as a face image quality detection result.

In some embodiments, such as this embodiment, the first detecting subunit 2031 includes a first calculating subunit 20311 and a setting unit 20312.

The first calculating subunit 20311 is configured to select a preset number of target face key points from the preset number of target face key points, and calculate a pitch attitude angle, a yaw attitude angle, and a roll attitude angle according to the preset number of target face key points; the setting unit 20312 is configured to set the first human face image quality detection result as a detection pass if the pitch attitude angle, the yaw attitude angle, and the roll attitude angle are within threshold intervals of a preset pitch attitude angle, a preset yaw attitude angle, and a preset roll attitude angle, respectively.

In some embodiments, such as this embodiment, the second detecting sub-unit 2032 comprises a second calculating sub-unit 20321 and a comparing unit 20322.

The second calculating subunit 20321 is configured to input the face bounding box into a Laplacian operator to obtain a sharpness value corresponding to the face image; the comparing unit 20322 is configured to compare the sharpness value with a preset sharpness value to obtain a second face image quality detection result.

In some embodiments, for example, in this embodiment, the correction clipping unit 204 includes a normalization unit 2041, a selection unit 2042, a third calculation subunit 2043, a fourth calculation subunit 2044, a fifth calculation subunit 2045, a serving unit 2046, and a correction clipping subunit 2047.

The normalization unit 2041 is configured to normalize the size of the face image to a preset size; the selecting unit 2042 is configured to select a left pupil center key point, a right pupil center key point, a left mouth corner key point, and a right mouth corner key point from the preset number of target face key points; the third calculating subunit 2043 is configured to calculate a central point between the central key point of the left pupil and the central key point of the right pupil to obtain a central point of a two-eye connection line; the fourth calculating subunit 2044 is configured to calculate a central point between the left mouth corner key point and the right mouth corner key point to obtain a central point of a connecting line between the two mouth corners; the fifth calculating subunit 2045 is configured to calculate a plane rotation correction angle and a scaling coefficient according to the two-eye connecting line central point and the two-mouth angle connecting line central point; the acting unit 2046 is configured to use center points of the left pupil center key point, the right pupil center key point, the left mouth corner key point, and the right mouth corner key point as rotation centers; the correction cutting subunit 2047 is configured to correct the face image through affine transformation according to the preset size, the plane rotation correction angle, the scaling factor, and the rotation center, and cut the corrected face image to obtain a face cut image.

In some embodiments, such as this embodiment, the classification saving unit 205 includes a first constructing unit 2051, a searching unit 2052, an updating unit 2053, and a second constructing unit 2054.

The first construction unit 2051 is configured to construct face information related to the face features if the preset flag is a preset flag value, where the face information includes a face number, and a face feature center; the searching unit 2052 is configured to, if the preset flag is not a preset flag value, construct a Kd-tree according to the face feature center, input the face feature into the Kd-tree to perform a search so as to find the face feature center closest to the face feature as a target face feature center, and use a distance between the target face feature center and the face feature as a target distance; the updating unit 2053 is configured to assign the face sequence number corresponding to the target face feature center to the face feature and update the face feature center corresponding to the face sequence number if the target distance is smaller than a preset distance threshold; the second construction unit 2054 is configured to, if the target distance is greater than a preset distance threshold, newly add a face sequence number, and construct new face information according to the newly added face sequence number and the face feature.

The specific implementation manner of the video face classification device 200 according to the embodiment of the present invention corresponds to the above-mentioned video face classification method, and is not described herein again.

The video face classification apparatus may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 3.

Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 300 is a terminal, and the terminal may be an electronic device with a communication function, such as a smart phone, a desktop computer, a laptop computer, or a tablet computer.

Referring to fig. 3, the computer device 300 includes a processor 302, a memory, which may include a storage medium 303 and an internal memory 304, and a network interface 305 connected by a system bus 301.

The storage medium 303 may store an operating system 3031 and computer programs 3032. The computer program 3032, when executed, causes the processor 302 to perform a method for video face classification.

The processor 302 is used to provide computing and control capabilities to support the operation of the overall computer device 300.

The internal memory 304 provides an environment for the running of the computer program 3032 in the storage medium 303, and when the computer program 3032 is executed by the processor 302, the processor 302 can be caused to execute a video face classification method.

The network interface 305 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer apparatus 300 to which the present application is applied, and that a particular computer apparatus 300 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 302 is configured to run a computer program 3032 stored in the memory to implement the following steps: selecting a first to-be-detected video frame in sequence from an extracted frame set corresponding to an input video as a current to-be-detected video frame, and detecting a face image of the current to-be-detected video frame, wherein the extracted frame set comprises a plurality of to-be-detected video frames; if the current video frame to be detected has a face image, generating at least one face surrounding block diagram according to the face image, calculating a preset number of face key points by a preset face key point positioning algorithm aiming at each face surrounding block diagram, and mapping the coordinates of the preset number of face key points back to the current video frame to be detected to obtain a preset number of target face key points; respectively carrying out face posture and definition detection on the face image according to the preset number of target face key points and the face surrounding block diagram to obtain a face image quality detection result; if the face image quality detection result is that the detection is passed, correcting and cutting the face image according to the preset number of target face key points to obtain a face cutting image; inputting the face cropping picture into a preset neural network to extract face features, classifying the face features by a face feature classification method according to preset marks to obtain face information, and storing a face serial number, a frame number corresponding to the current video frame to be detected and upper left-corner coordinates and lower right-corner coordinates corresponding to the face bounding box picture in the face information to obtain face position information corresponding to the previous video frame to be detected; and selecting the next video frame to be detected from the extracted frame set as the current video frame to be detected, and returning to the step of detecting the face image of the current video frame to be detected until the detection of all the video frames to be detected in the extracted frame set is completed, so as to obtain the face position information corresponding to the input video.

In some embodiments, for example, in this embodiment, when the processor 302 implements the step of performing face pose and sharpness detection on the face image according to the preset number of target face key points and the face bounding box respectively to obtain a face image quality detection result, the following steps are specifically implemented: carrying out face gesture detection on the face image by a face gesture detection method according to the preset number of target face key points to obtain a first face image quality detection result; and if the first face image quality detection result is that the detection is passed, performing definition detection on the face image by a definition detection method according to the face surrounding block diagram to obtain a second face image quality detection result, and taking the second face image quality detection result as a face image quality detection result.

In some embodiments, for example, in this embodiment, when the step of performing face pose detection on the face image by using a face pose detection method according to the preset number of target face key points to obtain a first face image quality detection result is implemented by the processor 302, the following steps are specifically implemented: selecting a preset number of target face key points from the preset number of target face key points, and calculating a pitching attitude angle, a yawing attitude angle and a rolling attitude angle according to the preset number of target face key points; and if the pitching attitude angle, the yawing attitude angle and the rolling attitude angle are respectively in threshold value intervals of a preset pitching attitude angle, a preset yawing attitude angle and a preset rolling attitude angle, setting a first human face image quality detection result as a passing detection result.

In some embodiments, for example, in this embodiment, when the step of performing sharpness detection on the face image by using a sharpness detection method according to the face bounding box to obtain a second face image quality detection result is implemented by the processor 302, the following steps are specifically implemented: inputting the human face surrounding block diagram into a Laplacian operator to obtain a definition value corresponding to the human face image; and comparing the definition value with a preset definition value to obtain a second human face image quality detection result.

In some embodiments, for example, in this embodiment, when the step of calculating the target face key points according to the preset number and correcting and cropping the face image to obtain the face cropping map is implemented by the processor 302, the following steps are specifically implemented: normalizing the size of the face image to a preset size; calculating a plane rotation correction angle, a scaling coefficient and a rotation center according to the preset number of target face key points; and correcting the face image through affine transformation according to the preset size, the plane rotation correction angle, the scaling coefficient and the rotation center, and cutting the corrected face image to obtain a face cutting image.

In some embodiments, for example, in this embodiment, when the processor 302 implements the step of calculating the plane rotation correction angle, the scaling factor and the rotation center according to the preset number of target face key points, the following steps are specifically implemented: selecting a left pupil center key point, a right pupil center key point, a left mouth corner key point and a right mouth corner key point from the preset number of target face key points; calculating a central point between the central key point of the left pupil and the central key point of the right pupil to obtain a central point of a connecting line of two eyes; calculating a central point between the left mouth corner key point and the right mouth corner key point to obtain a central point of a connecting line of the two mouth corners; calculating a plane rotation correction angle and a scaling coefficient according to the connecting center point of the two eyes and the connecting center point of the two mouth angles; and taking the central points of the left pupil center key point, the right pupil center key point, the left mouth angle key point and the right mouth angle key point as rotation centers.

In some embodiments, for example, in this embodiment, when the processor 302 implements the step of classifying the face features according to the preset flag by using the face feature classification method to obtain the face information, the following steps are implemented: if the preset mark is a preset mark value, constructing face information related to the face features, wherein the face information comprises a face serial number, face number and a face feature center; if the preset mark is not a preset mark value, constructing a Kd-tree according to the face feature center, inputting the face feature into the Kd-tree to search so as to find the face feature center closest to the face feature as a target face feature center, and taking the distance between the target face feature center and the face feature as a target distance; if the target distance is smaller than a preset distance threshold, giving the face feature the face serial number corresponding to the target face feature center, and updating the face feature center corresponding to the face serial number; and if the target distance is greater than a preset distance threshold value, newly adding a face serial number, and constructing new face information according to the newly added face serial number and the face features.

It should be understood that, in the embodiment of the present Application, the Processor 302 may be a Central Processing Unit (CPU), and the Processor 302 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the video face classification method described above.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A video face classification method is characterized by comprising the following steps:

2. The video face classification method according to claim 1, wherein the performing face pose and sharpness detection on the face image according to the preset number of target face key points and the face bounding box respectively to obtain a face image quality detection result comprises:

carrying out face gesture detection on the face image through a face gesture detection method according to the preset number of target face key points to obtain a first face image quality detection result;

and if the first face image quality detection result is that the detection is passed, performing definition detection on the face image by a definition detection method according to the face surrounding block diagram to obtain a second face image quality detection result, and taking the second face image quality detection result as a face image quality detection result.

3. The video face classification method according to claim 2, wherein the performing face pose detection on the face image according to the preset number of target face key points by a face pose detection method to obtain a first face image quality detection result comprises:

selecting a preset number of target face key points from the preset number of target face key points, and calculating a pitching attitude angle, a yawing attitude angle and a rolling attitude angle according to the preset number of target face key points;

and if the pitching attitude angle, the yawing attitude angle and the rolling attitude angle are respectively in threshold value intervals of a preset pitching attitude angle, a preset yawing attitude angle and a preset rolling attitude angle, setting a first human face image quality detection result as a passing detection result.

4. The video face classification method according to claim 2, wherein the performing sharpness detection on the face image according to the face bounding box by sharpness detection method to obtain a second face image quality detection result comprises:

inputting the human face surrounding block diagram into a Laplacian operator to obtain a definition value corresponding to the human face image;

and comparing the definition value with a preset definition value to obtain a second human face image quality detection result.

5. The video face classification method according to claim 1, wherein the correcting and cropping the face image according to the preset number of target face key points to obtain a face cropping image comprises:

normalizing the size of the face image to a preset size;

calculating a plane rotation correction angle, a scaling coefficient and a rotation center according to the preset number of target face key points;

and correcting the face image through affine transformation according to the preset size, the plane rotation correction angle, the scaling factor and the rotation center, and cutting the corrected face image to obtain a face cutting image.

6. The method for classifying a video face according to claim 5, wherein the calculating a plane rotation correction angle, a scaling factor and a rotation center according to the preset number of target face key points comprises:

selecting a left pupil center key point, a right pupil center key point, a left mouth corner key point and a right mouth corner key point from the preset number of target face key points;

calculating a central point between the central key point of the left pupil and the central key point of the right pupil to obtain a central point of a connecting line of two eyes;

calculating a central point between the left mouth corner key point and the right mouth corner key point to obtain a central point of a connecting line of the two mouth corners;

calculating a plane rotation correction angle and a scaling coefficient according to the central point of the connecting line of the two eyes and the central point of the connecting line of the two mouth angles;

and taking the central points of the left pupil center key point, the right pupil center key point, the left mouth angle key point and the right mouth angle key point as rotation centers.

7. The video face classification method according to claim 1, wherein the classifying the face features according to the preset flag by a face feature classification method to obtain face information comprises:

if the preset mark is a preset mark value, constructing face information related to the face features, wherein the face information comprises a face serial number, face number and a face feature center;

if the preset mark is not a preset mark value, constructing a Kd-tree according to the face feature center, inputting the face feature into the Kd-tree to search so as to find the face feature center closest to the face feature as a target face feature center, and taking the distance between the target face feature center and the face feature as a target distance;

if the target distance is smaller than a preset distance threshold, giving the face feature the face serial number corresponding to the target face feature center, and updating the face feature center corresponding to the face serial number;

and if the target distance is greater than a preset distance threshold value, newly adding a face serial number, and constructing new face information according to the newly added face serial number and the face features.

8. A video face classification apparatus, comprising:

the system comprises a first detection unit, a second detection unit and a third detection unit, wherein the first detection unit is used for selecting a first to-be-detected video frame in sequence from an extraction frame set corresponding to an input video as a current to-be-detected video frame and detecting a face image of the current to-be-detected video frame, and the extraction frame set comprises a plurality of to-be-detected video frames;

9. A computer device, characterized in that the computer device comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the video face classification method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the video face classification method according to any one of claims 1 to 7.