CN108229432A

CN108229432A - Face calibration method and device

Info

Publication number: CN108229432A
Application number: CN201810096476.8A
Authority: CN
Inventors: 查俊莉; 汤锦鹏
Original assignee: Guangzhou Dongjing Computer Technology Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2018-06-29

Abstract

The embodiment of the present application provides a kind of face calibration method and device, and wherein method includes：Face picture is handled according to first nerves network model, to determine a collection of human face region；A collection of human face region is handled according to nervus opticus network model, to filter out non-face region from a collection of human face region；It is handled according to third nerve network model having filtered non-face a collection of human face region, to determine unique human face region at T moment；According to face tracking model to unique human face region at T moment into line trace, to determine unique human face region at T+1 moment.Due to have passed through multiple neural network models by multilayer convolution operation, human face characteristic point is efficiently accurately extracted, greatly improves the robustness of face calibration, further, when introducing feedback mechanism, improves calibration efficiency and stability.

Description

Face calibration method and device

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a face calibration method and device.

Background

The neural network makes a great breakthrough in the field of image recognition, promotes the rapid development of the application of the face calibration as an image, provides higher stability for the face calibration when coping with changes of postures, illumination and expressions, and promotes the wide application of the face calibration in more and more fields of entertainment, safety and the like.

The face calibration is mainly divided into two stages of face detection and face characterization. In the face detection stage, any one picture is given, whether one or more faces exist in the picture is judged, and the position area of each face is returned. The research of face detection mainly focuses on template matching, subspace methods and the like in the early stage, and mainly focuses on data-driven methods such as statistical model methods, neural network learning methods and the like in the later stage. Most typically, the Viola and Jones (VJ for short) obtain a face detector with very good real-time performance through a cascade classifier trained by Haar-Like features and AdaBoost. But for real complex environments such as: the face size is changeable, the gesture is various, illumination condition is abominable, resolution ratio is low etc. classic VJ face detector is not good more often. Recently, more and more face recognition algorithms based on the CNN convolutional neural network emerge, and show stronger robustness and higher detection accuracy. Such as: FacenessNet, DCNN, etc.

The face characterization is mainly to align the face and extract the features of the face, and to locate the positions of key areas such as eyebrows, glasses, mouth, nose, face contour, etc., which is also called face key point detection. The current common face alignment has 5-point alignment and 68-point alignment. The face alignment can be applied to facial feature positioning, expression recognition, face caricature generation, augmented reality, face changing and the like. The method for detecting the key points of the human face is divided into three types: 1. traditional methods based on asm (active Shape model) and aam (active appearance model); 2. a cascade shape regression-based method; 3. a method based on deep learning. Although the traditional method model is simple and easy to understand and apply, the traditional method model has strong dependence on the model and poor robustness. Therefore, most people use a deep learning-based method to detect key points of the human face.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method and an apparatus for face calibration, which are used to overcome or alleviate the above-mentioned drawbacks in the prior art.

The embodiment of the application provides a face region calibration method, which comprises the following steps:

processing the face pictures according to the first neural network model to determine a batch of face regions;

processing the batch of face regions according to a second neural network model to filter out non-face regions from the batch of face regions;

processing the batch of face regions with the non-faces filtered according to a third neural network model to determine a unique face region at the time T;

and tracking the unique face region at the moment T according to the face tracking model to determine the unique face region at the moment T + 1.

Optionally, in any embodiment of the present application, the method further includes:

acquiring an acquired original face picture, and carrying out scaling processing on the original face picture to obtain image pyramids with different sizes;

and taking the image pyramids with different sizes as the input of the first neural network model, so that the first neural network model processes the face pictures to determine a batch of face regions.

Optionally, in any embodiment of the present application, acquiring an acquired original face picture includes: and acquiring an original face picture acquired by an image acquisition unit arranged on the electronic terminal through a development interface of the electronic terminal.

Optionally, in any embodiment of the present application, processing the face pictures according to the first neural network model to determine a batch of face regions includes: and processing the face pictures in sequence according to different convolution layers and convolution kernels configured in the first neural network model to determine a batch of face regions.

Optionally, in any embodiment of the present application, successively processing the face pictures according to different convolution layers and convolution kernels configured in the first neural network model to determine a batch of face regions includes: processing the face picture according to different convolution layers and convolution kernels configured in the first neural network model to respectively obtain a plurality of candidate face area frames; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

Optionally, in any embodiment of the present application, processing the batch of face regions according to a second neural network model to filter out non-face regions from the batch of face regions includes: and processing the face regions in sequence according to different convolution layers and convolution kernels configured in the second neural network model so as to filter out non-face regions from the face regions.

Optionally, in any embodiment of the present application, successively processing the batch of face regions according to different convolution layers and convolution kernels configured in the second neural network model to filter out non-face regions from the batch of face regions includes: processing the face regions in the batch in sequence according to different convolution layers and convolution kernels configured in a second neural network model to respectively obtain a plurality of candidate face region frames; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

Optionally, in any embodiment of the present application, processing the batch of face regions with non-faces filtered according to a third neural network model to determine a unique face region at time T includes: and processing the batch of face regions with the non-faces filtered according to different convolution layers and convolution kernels configured in the third neural network model to determine the unique face region and the position of a face key point at the time T.

Optionally, in any embodiment of the present application, tracking the unique face region at time T according to a face tracking model to determine the unique face region at time T +1 includes: and tracking the unique face region at the time T according to a position filter and a scale filter in the face tracking model to determine the unique face region at the time T + 1.

Optionally, in any embodiment of the present application, the method further includes: and judging whether the unique face area tracking is successful or not according to the unique face area at the moment T and the unique face area at the moment T + 1.

Optionally, in any embodiment of the present application, determining whether the unique face region tracking is successful according to the unique face region at the time T and the unique face region at the time T +1 includes: if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is equal to a set overlap threshold value, judging that the unique face region is successfully tracked; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

Optionally, in any embodiment of the present application, if it is determined that the unique face region tracking is successful, the unique face region at the time T +1 is used as an input of the third neural network model, so as to process the batch of face regions with non-faces filtered, so as to determine the unique face region at the time T + 2.

Optionally, in any embodiment of the present application, if it is determined that the unique face region tracking fails, the step of processing the face pictures according to the first neural network model is skipped to determine a batch of face regions again.

The embodiment of the present application further provides a face region calibration apparatus, which includes:

the first program unit is used for processing the face pictures according to the first neural network model so as to determine a batch of face regions;

the second program unit is used for processing the batch of face regions according to a second neural network model so as to filter out non-face regions from the batch of face regions;

a third program unit, configured to process the batch of face regions with non-faces filtered according to a third neural network model, so as to determine a unique face region at time T;

and the fourth program unit is used for tracking the unique face area at the time T according to the face tracking model so as to determine the unique face area at the time T + 1.

the conversion unit is used for acquiring an acquired original face picture and carrying out zooming processing on the original face picture to obtain image pyramids with different sizes;

and the input unit is used for taking the image pyramids with different sizes as the input of the first neural network model, so that the first neural network model processes the face pictures to determine a batch of face regions.

Optionally, in any embodiment of the present application, the first program unit is further configured to process the face pictures sequentially according to different convolution layers and convolution kernels configured in the first neural network model, so as to determine a batch of face regions.

Optionally, in any embodiment of the present application, the first program unit is further configured to process the face picture successively according to different convolution layers and convolution kernels configured in the first neural network model, so as to obtain a plurality of candidate face region frames respectively; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

Optionally, in any embodiment of the present application, the second program unit is further configured to process the batch of face regions in sequence according to different convolution layers and convolution kernels configured in the second neural network model, so as to filter out non-face regions from the batch of face regions.

Optionally, in any embodiment of the present application, the second program unit is further configured to process the batch of face regions successively according to different convolution layers and convolution kernels configured in the second neural network model, so as to obtain a plurality of candidate face region frames respectively; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

Optionally, in any embodiment of the present application, the third program unit is further configured to sequentially process the batch of face regions with non-faces filtered according to different convolution layers and convolution kernels configured in the third neural network model, so as to determine a unique face region and a face key point position at time T.

Optionally, in any embodiment of the present application, the fourth program unit is further configured to track the unique face region at time T according to a position filter and a scale filter in the face tracking model, so as to determine the unique face region at time T + 1.

Optionally, in any embodiment of the present application, the method further includes: and the feedback unit is used for judging whether the unique face area tracking is successful according to the unique face area at the moment T and the unique face area at the moment T + 1.

Optionally, in any embodiment of the present application, the feedback unit is further configured to determine that the unique face area is successfully tracked if the overlap between the face area frames of the unique face area at the time T and the face area frames of the unique face area at the time T +1 is equal to a set overlap threshold; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

Optionally, in any embodiment of the application, if it is determined that the unique face region tracking is successful, the feedback unit is further configured to use the unique face region at the time T +1 as an input of the third neural network model, so as to process the batch of face regions with non-faces filtered, so as to determine the unique face region at the time T + 2.

The embodiment of the present application further provides an electronic device, which includes the face region calibration apparatus in any one of the above embodiments.

In the embodiment of the application, the face pictures are processed according to the first neural network model to determine a batch of face regions; processing the batch of face regions according to a second neural network model to filter out non-face regions from the batch of face regions; processing the batch of face regions with the non-faces filtered according to a third neural network model to determine a unique face region at the time T; the unique face region at the time T is tracked according to the face tracking model to determine the unique face region at the time T +1, and due to the fact that the plurality of neural network models are subjected to multilayer convolution operation, face characteristic points are efficiently and accurately extracted, and the robustness of face calibration is greatly improved. In addition, the device is not influenced by complex environment. Meanwhile, for the most advanced neural network algorithm at present, the prediction efficiency is improved by 300%, and the accuracy is not influenced. Further, when a feedback mechanism is introduced, calibration efficiency and stability are improved.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a schematic flow chart of a human face region calibration method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a human face region calibration device in the second embodiment of the present application;

fig. 3 is a schematic structural diagram of a face area calibration device in the third embodiment of the present application;

fig. 4 is a schematic structural diagram of a human face area calibration device in the fourth embodiment of the present application.

Detailed Description

It is not necessary for any particular embodiment of the invention to achieve all of the above advantages at the same time.

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Fig. 1 is a schematic flow chart of a human face region calibration method according to an embodiment of the present application; as shown in fig. 1, it includes:

s101, processing the face pictures according to a first neural network model to determine a batch of face regions;

optionally, in an embodiment, step S101 further includes: acquiring an acquired original face picture, and carrying out scaling processing on the original face picture to obtain image pyramids with different sizes; and taking the image pyramids with different sizes as the input of the first neural network model, so that the first neural network model processes the face pictures to determine a batch of face regions.

Taking an image acquisition device such as a camera in a mobile terminal as an example, the permission to use the camera can be obtained through a corresponding development interface provided by the mobile terminal, and each frame of picture shot by the camera is extracted. The image captured by the camera is typically an RGB three-channel image that can be converted to a single-channel grayscale image using conventional conversion algorithms or conversion tools. By converting the image shot by the camera into a gray image, the irrelevant information of the image is greatly reduced. Then, the grayscale image is subjected to resize (scaling) processing to generate a plurality of sub-image frames with different resolutions, that is, the grayscale image resize is an image pyramid with different sizes and resolutions, for example, 480 × 480 image is sequentially subjected to resize of 144 × 144, 43 × 43, 13 × 13 by taking 0.3 as a multiple, so as to obtain an image pyramid with four sizes and resolutions. An image pyramid is a kind of multi-scale representation of an image, which is a structure that interprets an image in multiple resolutions. A pyramid of an image is a series of image sets of progressively lower resolution arranged in a pyramid shape and derived from the same original image. Which is obtained by down-sampling, the higher the level, the smaller the image and the lower the resolution.

But not limited to, in practical use, the image may be processed in other manners to generate a plurality of sub-image frames with different resolutions; it is also possible to generate a plurality of sub-image frames with different resolutions by directly performing image processing such as scaling processing without performing grayscale image frame conversion.

Specifically, in the embodiment, the acquiring of the acquired original face picture in step S101 may specifically include: and acquiring an original face picture acquired by an image acquisition unit arranged on the electronic terminal through a development interface of the electronic terminal.

Specifically, in the embodiment, when the face pictures are processed according to the first neural network model in step S101 to determine a batch of face regions, the face pictures may be processed sequentially according to different convolution layers and convolution kernels configured in the first neural network model to determine a batch of face regions. In specific implementation, the face image may be processed sequentially according to different convolution layers and convolution kernels configured in the first neural network model to obtain a plurality of candidate face region frames respectively; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

Illustratively, the first neural network model is a four-layer convolutional neural network model. The first layer is an input layer, and the obtained image pyramid can be used as the input of the input layer; the second layer is a convolution layer of 3 × 3, the convolution kernel is 5 × 10, and image features are extracted; the third layer is a convolution layer of 3 × 3, the convolution kernel is 3 × 16, and image features are extracted again based on the extraction result of the second layer; the last layer is the output regression layer, the convolution of 1 × 12, and the last output of a batch of face regions, which can reflect the following results: 1. whether the face is a human face; 2. the face region frame position. It should be noted that, in practical applications, not limited to the above four-layer convolutional neural network model structure, those skilled in the art may also adopt more layers of model structures according to actual needs.

If the first neural network model is a four-layer convolutional neural network model, each layer of network outputs a candidate face region frame, and candidate face region frames with the overlapping degrees smaller than a set overlapping degree threshold value are screened out by calculating the overlapping degrees among a plurality of subsequent face region frames, namely a group of face regions.

Specifically, the overlapping degree of any two candidate face region frames can be calculated by the following formula:

R_Adenotes the area of the A candidate face region box, R_BRepresenting the area of the B candidate face region box, the overlapping degree IOU (interaction-Over-Union) of the two candidate bright region boxes is calculated as follows:

all region boxes (i.e., the batch of face regions) with IOU < threshold (overlap threshold) are filtered out as input to the first neural network model in step S102.

Before step S101, the following steps may be performed: acquiring an acquired original face picture and carrying out scaling processing on the original face picture to obtain image pyramids with different sizes; and taking the image pyramids with different sizes as the input of the first neural network model, so that the first neural network model processes the face pictures to determine a batch of face regions.

S102, processing the batch of face regions according to a second neural network model to filter out non-face regions from the batch of face regions;

specifically, in this embodiment, when the batch of face regions are processed according to the second neural network model in S102 to filter out non-face regions from the batch of face regions, the batch of face regions may be processed in sequence according to different convolution layers and convolution kernels configured in the second neural network model to filter out the non-face regions from the batch of face regions.

Further, successively processing the batch of face regions according to different convolution layers and convolution kernels configured in the second neural network model in S102 to filter out non-face regions from the batch of face regions, including: processing the face regions in the batch in sequence according to different convolution layers and convolution kernels configured in a second neural network model to respectively obtain a plurality of candidate face region frames; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

Illustratively, the second neural network model is a five-layer convolutional neural network model. The first layer is an input layer, and a batch of generated human face areas are used as input of the input layer; the second layer is a convolution layer of 3 × 3, the convolution kernel is 11 × 28, and feature extraction of the face region image is performed; the third layer is a convolution layer of 3 × 3, the convolution kernel is 4 × 48, and feature extraction of the face region is performed again based on the extraction result of the second layer; the fourth layer is a 2 × 2 convolution layer, the convolution kernel is 3 × 64, and feature extraction is performed again on the face region based on the extraction result of the third layer; the fifth layer is a 128 full connected layer, and the batch of face regions with non-faces filtered are finally output, which may reflect the following results: 1. whether the face is a human face; 2. the face region frame position. It should be noted that, in practical applications, the five-layer convolutional neural network model structure is not limited to the above, and those skilled in the art may also adopt other layer-level lightweight model structures according to actual requirements.

And further, outputting a candidate face region frame at each layer of the network of the second neural network model, and filtering out non-face regions from the batch of face regions by calculating the overlapping degree between a plurality of subsequent face region frames and screening out candidate face region frames with the overlapping degree smaller than a set overlapping degree threshold value.

S103, processing the batch of face regions with the non-faces filtered according to a third neural network model to determine a unique face region at the moment T;

in this embodiment, when the batch of face regions with non-faces filtered are processed according to the third neural network model in step S103 to determine the unique face region at time T, the batch of face regions with non-faces filtered may be sequentially processed according to different convolution layers and convolution kernels configured in the third neural network model to determine the unique face region and the face key point position at time T.

Illustratively, the third neural network model is a six-layer convolutional neural network model: the first layer is an input layer, and the output of the second neural network model in step S102 is resized to 48 × 48 pictures as input. The second layer was a 3 × 3 convolutional layer with a convolution kernel of 23 × 32, and feature extraction was performed based on 48 × 48 pictures. The third layer is a convolution layer of 3 × 3, the convolution kernel is 10 × 64, and feature extraction is performed based on the output of the second layer. The fourth layer is a 2 × 2 convolutional layer, the convolution kernel is 4 × 64, and feature extraction is performed based on the output of the third layer. The fifth layer is a 2 × 2 convolutional layer, the convolution kernel is 3 × 128, and feature extraction is performed based on the output of the fourth layer. The sixth layer is a fully connected layer, and the only face area at the moment T is finally output, which can reflect the following results: 1. whether the face is a human face; 2. a face region frame position; 3. and obtaining the calibration area and the key point position of the final face frame.

S104, tracking the unique face area at the moment T according to the face tracking model to determine the unique face area at the moment T + 1;

in this embodiment, when the unique face region at the time T is tracked according to the face tracking model in step S104 to determine the unique face region at the time T +1, the unique face region at the time T is specifically tracked according to a position filter and a scale filter in the face tracking model to determine the unique face region at the time T + 1.

In this embodiment, the two filters are a position filter and a scale filter, respectively, the former performs positioning of a face in the current image frame, and the latter performs estimation of a face scale in the current image frame. The two filters are relatively independent so that different feature types and feature calculation modes can be selected for training and testing. When the target is tracked, in a new frame of image frame, a two-dimensional position filter is used for determining a new candidate position of the target, and then a one-dimensional scale filter is used for obtaining candidate frames with different scales by taking the current central position of the target as a central point, so that the best matching scale is found, the frame rate can reach 100+ fps, the accuracy is more than 0.8, and the requirement of face calibration on a mobile terminal can be completely met.

In step S104, according to the calibration result of the third neural network model to the face, the position and the scale of the face in the image frame to be processed are obtained; and determining the position and the scale of the human face in an image frame after the image frame to be processed according to the position and the scale and a preset position model and a scale model. Optionally, after determining the position and the scale of the human face in an image frame after the image frame to be processed, a preset position model and a preset scale model may be updated according to the determined position and scale.

The image frame after the image frame to be processed may be an image frame next to the current image frame, or may be an image frame after several frames apart.

Illustratively, the input (input) of the face tracking model includes: 1) an image i (t) at time t; 2) the face position P (t-1) and the scale S (t-1) of the previous frame; 3) the position models A _ trans (t-1), B _ trans (t-1) and the scale models A _ scale (t-1), B _ scale (t-1) of the previous frame. The output (output) comprises: 1) the face estimation position P (t) and the estimation scale S (t) of the current frame; 2) updated position models A _ trans (t), B _ trans (t) and scale models A _ scale (t), B _ scale (t).

Wherein the position model and the scale model can be determined by:

for a certain image g, the following formula (2) can be expressed by using an input image f and a filter h:

wherein,representing a cross product.

According to the convolution theorem, the fourier transform of the functional cross-correlation is equal to the product of the functional fourier transforms, and the following formula (3) is obtained by processing the formula (1):

wherein F () represents a Fourier transform,denotes the complex conjugate of F (h).

The formula (2) is simplified to the following formula (4):

wherein G is a simplification of F (G), F is a simplification of F (F),is composed ofThe simplification of (1).

Setting a linear least squares error function as the following equation (5):

wherein ε represents the error; 1, …, d, d represents the dimension of the feature vector of image F; h^lRepresenting filtering the l-dimension feature; f^lA l-dimension feature vector representing the image F; "| | |" represents the euclidean distance; | | non-woven hair²Expressing the square sum;a regular expression representing an error to reduce an over-fitting problem in the optimization; λ represents the weight parameter of the regular expression.

By minimizing the error function for equation (4), the final solved filter is as follows equation (6):

wherein,represents the complex conjugate of G and represents the complex conjugate of G,is represented by F^KK 1, …, d, d representing the dimension of the feature vector of image F; 1, …, d, d representing the direction of features of image FThe dimension of the quantity.

Then the filter calculation for a certain time instant i is as follows (7):

wherein,

where t is 1, …, N denotes the number of image frames, η is a training parameter, and may be expressed as a learning rate.

In performing position tracking, G, F, position models a _ trans () and B _ trans (), which are position-dependent, are acquired based on a position filter; in scale tracking, scale-dependent G, F, scale models A _ scale () and B _ scale () are obtained based on the scale filter.

The process of performing position estimation is as follows: a) sampling according to 2 times of target size on a current image frame l according to the position P (t-1) and the scale S (t-1) of a previous image frame of the human face to obtain a sample Ztrans; b) calculating the position response according to the position models A _ trans (t-1) and B _ trans (t-1) of the previous image frame, wherein the formula is as follows:c) the face position p (t) max (y _ trans) is obtained.

Where y trans represents the position filter response value,representing the inverse of the discrete fourier transform,l-channel vector representing the t-th picture, l ═ 1, …, d, d represent the dimension of the feature vector of the image, and λ represents the weight parameter.

The procedure for scale estimation is as follows: a) extracting face samples Ztrans _ scale with different scales; b) in the same manner as above, y _ scale is calculated, and the face scale s (t) is obtained as max (y _ scale).

The process of performing model update is as follows: a) extracting training samples f _ trans and f _ scale from a current image frame l (t); extracting corresponding Hog characteristics and gray characteristics, and constructing a Gaussian response function of a corresponding scale; b) and updating the position models A _ trans (t-1) and B _ trans (t-1) and the scale models A _ scale (t-1) and B _ scale (t-1).

S105, judging whether the tracking is successful;

in this embodiment, in step S105, it may be specifically determined whether the unique face region tracking is successful according to the unique face region at the time T and the unique face region at the time T + 1.

In step S105, specifically, determining whether the unique face region tracking is successful according to the unique face region at the time T and the unique face region at the time T +1 includes: if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is equal to a set overlap threshold value, judging that the unique face region is successfully tracked; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

S106, if the tracking is successful, judging whether to continue to calibrate;

optionally, if it is determined that the unique face region tracking is successful, taking the unique face region at the time T +1 as an input of the third neural network model, so as to process the batch of face regions with non-faces filtered, so as to determine the unique face region at the time T + 2.

If the tracking fails, the process goes to step S101. If the unique face region tracking is judged to fail, the step of processing the face pictures according to the first neural network model is skipped to, so that a batch of face regions are determined again.

The specific way of judging whether to continue calibration can be determined by a set calibration flag or a set condition for continuing calibration, such as the number of continuous calibration.

S107A, if the calibration is continued, acquiring the unique face area at the moment of determining T +1, and jumping to the step S103;

if the tracking is successful, the output tracking target area position is directly cut out from the unique face area at the moment of T +1, and the step S103 is skipped to as the input of a third neural network model, so that more accurate face area positions and key point calibration positions can be obtained. As the most time-consuming S101 is saved in the step, the prediction time can be improved by 3 times, and the prediction efficiency is greatly accelerated. And the input of the third neural network model is directly the only face region which is successfully tracked, namely the target frame is accurately tracked, so that the final output face calibration position and the key point position are more stable.

S107B, if the calibration is not continued, the process ends.

It should be noted that, in any other embodiment, only step S101 to step S104 are included, and step S105 to step S107 are further optimized or further applied technical solutions.

Fig. 2 is a schematic structural diagram of a human face region calibration device in the second embodiment of the present application; as shown in fig. 2, it includes:

Specifically, in this embodiment, the first program unit is further configured to process the face pictures in sequence according to different convolution layers and convolution kernels configured in the first neural network model, so as to determine a batch of face regions.

Specifically, in this embodiment, the first program unit is further configured to process the face picture successively according to different convolution layers and convolution kernels configured in the first neural network model, so as to obtain a plurality of candidate face region frames respectively; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

Specifically, in this embodiment, the second program unit is further configured to process the batch of face regions in sequence according to different convolution layers and convolution kernels configured in the second neural network model, so as to filter out non-face regions from the batch of face regions.

Specifically, in this embodiment, the second program unit is further configured to sequentially process the batch of face regions according to different convolution layers and convolution kernels configured in the second neural network model, so as to obtain a plurality of candidate face region frames respectively; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

Specifically, in this embodiment, the third program unit is further configured to sequentially process the batch of face regions with non-faces filtered according to different convolution layers and convolution kernels configured in the third neural network model, so as to determine a unique face region and a face key point position at time T.

Specifically, in this embodiment, the fourth program unit is further configured to track the unique face region at time T according to a position filter and a scale filter in the face tracking model, so as to determine the unique face region at time T + 1.

Fig. 3 is a schematic structural diagram of a face area calibration device in the third embodiment of the present application; as shown in fig. 3, it may include, in addition to the first program unit, the second program unit, the third program unit and the fourth program unit in fig. 2:

In a specific implementation, the conversion unit and the input unit may be used as a substructure of the first program unit, or may be a structure independent of the first program unit.

Fig. 4 is a schematic structural diagram of a face area calibration device in the fourth embodiment of the present application; as shown in fig. 4, it includes, in addition to the first program unit, the second program unit, the third program unit, the fourth program unit, the conversion unit, and the input unit in fig. 2: and the feedback unit is used for judging whether the unique face area tracking is successful according to the unique face area at the moment T and the unique face area at the moment T + 1.

In specific implementation, the feedback unit is further configured to determine that the unique face area is successfully tracked if the overlap between the face area frames of the unique face area at the time T and the face area frames of the unique face area at the time T +1 is equal to a set overlap threshold; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

In specific implementation, if it is determined that the unique face region tracking is successful, the feedback unit is further configured to use the unique face region at the time T +1 as an input of the third neural network model, so as to process the batch of face regions with non-faces filtered, so as to determine the unique face region at the time T + 2.

It should be noted that the expressions first, second, third and fourth are not limited to numbers, and for those skilled in the art, the program modules may be multiplexed or shared, and therefore, the number of the program modules may be less than four.

In addition, the program modules are not necessarily located at the same physical location, but may be based on a distributed architecture, such as being partially located on a front-end mobile terminal and partially located on a back-end server.

The embodiment of the present application further provides an electronic device, which includes the face region calibration apparatus in any one of the above embodiments. The electronic equipment can be a PC or a mobile terminal. The technical scheme of the embodiment of the application can be applied to scenes such as expression recognition, generation of human face cartoon, reality enhancement, face changing and the like.

The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions and/or portions thereof that contribute to the prior art may be embodied in the form of a software product that can be stored on a computer-readable storage medium including any mechanism for storing or transmitting information in a form readable by a computer (e.g., a computer). For example, a machine-readable medium includes Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media, optical storage media, flash memory storage media, electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others, and the computer software product includes instructions for causing a computing device (which may be a personal computer, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus (device), or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. A face region calibration method is characterized by comprising the following steps:

processing the batch of face regions with the non-faces filtered according to a third neural network model to determine a unique face region at the moment T, wherein T is greater than 0;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein obtaining the captured original face picture comprises: and acquiring an original face picture acquired by an image acquisition unit arranged on the electronic terminal through a development interface of the electronic terminal.

4. The method of claim 1, wherein processing the face pictures according to the first neural network model to determine a set of face regions comprises: and processing the face pictures in sequence according to different convolution layers and convolution kernels configured in the first neural network model to determine a batch of face regions.

5. The method of claim 4, wherein successively processing the face pictures according to different convolutional layers and convolutional kernels configured in the first neural network model to determine a batch of face regions comprises: processing the face picture according to different convolution layers and convolution kernels configured in the first neural network model to respectively obtain a plurality of candidate face area frames; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

6. The method of claim 1, wherein processing the collection of face regions according to a second neural network model to filter out non-face regions from the collection of face regions comprises: and processing the face regions in sequence according to different convolution layers and convolution kernels configured in the second neural network model so as to filter out non-face regions from the face regions.

7. The method of claim 6, wherein successively processing the plurality of face regions according to different convolutional layers and convolutional kernels configured in a second neural network model to filter out non-face regions from the plurality of face regions comprises: processing the face regions in the batch in sequence according to different convolution layers and convolution kernels configured in a second neural network model to respectively obtain a plurality of candidate face region frames; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

8. The method of claim 1, wherein processing the batch of face regions with non-faces filtered according to a third neural network model to determine a unique face region at time T comprises: and processing the batch of face regions with the non-faces filtered according to different convolution layers and convolution kernels configured in the third neural network model to determine the unique face region and the position of a face key point at the time T.

9. The method of claim 1, wherein tracking the unique face region at time T according to a face tracking model to determine the unique face region at time T +1 comprises: and tracking the unique face region at the time T according to a position filter and a scale filter in the face tracking model to determine the unique face region at the time T + 1.

10. The method of claim 1, further comprising: and judging whether the unique face area tracking is successful or not according to the unique face area at the moment T and the unique face area at the moment T + 1.

11. The method of claim 10, wherein determining whether the unique face region tracking is successful according to the unique face region at the time T and the unique face region at the time T +1 comprises: if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is equal to a set overlap threshold value, judging that the unique face region is successfully tracked; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

12. The method of claim 11, wherein if the unique face region tracking is determined to be successful, taking a unique face region at time T +1 as an input of the third neural network model to process the batch of face regions with non-faces filtered to determine a unique face region at time T + 2.

13. The method as claimed in claim 11, wherein if the unique face region tracking is determined to fail, the step of processing the face pictures according to the first neural network model is skipped to determine a batch of face regions again.

14. A face region calibration device is characterized by comprising:

15. The apparatus of claim 14, further comprising:

16. The apparatus according to claim 14, wherein the first program unit is further configured to process the face pictures sequentially according to different convolutional layers and convolutional kernels configured in the first neural network model to determine a batch of face regions.

17. The apparatus according to claim 16, wherein the first program unit is further configured to process the face picture in sequence according to different convolution layers and convolution kernels configured in the first neural network model to obtain a plurality of candidate face region frames, respectively; and determining a batch of face regions according to the overlapping of the candidate face region frames and the set overlapping threshold value.

18. The apparatus of claim 14, wherein the second program unit is further configured to process the batch of face regions sequentially according to different convolutional layers and convolutional kernels configured in a second neural network model, so as to filter out non-face regions from the batch of face regions.

19. The apparatus according to claim 18, wherein the second program unit is further configured to process the batch of face regions in sequence according to different convolutional layers and convolutional kernels configured in a second neural network model to obtain a plurality of candidate face region frames, respectively; and filtering out non-face regions from the batch of face regions according to the overlapping of the candidate face region frames and a set overlapping threshold value.

20. The apparatus according to claim 14, wherein the third program unit is further configured to process the batch of face regions with non-faces filtered successively according to different convolution layers and convolution kernels configured in a third neural network model, so as to determine unique face regions and face key point positions at time T.

21. The apparatus according to claim 14, wherein the fourth program unit is further configured to track the unique face region at time T according to a position filter and a scale filter in the face tracking model to determine the unique face region at time T + 1.

22. The apparatus of claim 14, further comprising: and the feedback unit is used for judging whether the unique face area tracking is successful according to the unique face area at the moment T and the unique face area at the moment T + 1.

23. The apparatus according to claim 22, wherein the feedback unit is further configured to determine that the unique face region tracking is successful if the overlap of the face region frames of the unique face region at time T and the unique face region at time T +1 is equal to a set overlap threshold; or if the overlap of the face region frames of the unique face region at the time T and the unique face region at the time T +1 is smaller than or larger than a set overlap threshold, determining that the unique face region tracking fails.

24. The apparatus of claim 23, wherein if it is determined that the unique face region tracking is successful, the feedback unit is further configured to use a unique face region at time T +1 as an input of the third neural network model, so as to process the batch of face regions with non-faces filtered, so as to determine a unique face region at time T + 2.

25. An electronic device, characterized by comprising the face region calibration apparatus of any one of claims 14 to 24.