CN114267068B

CN114267068B - Face recognition method based on continuous frame information, electronic equipment and storage medium

Info

Publication number: CN114267068B
Application number: CN202111602174.1A
Authority: CN
Inventors: 朱海涛; 寇鸿斌; 陈智超; 户磊
Original assignee: Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-11-01
Anticipated expiration: 2041-12-24
Also published as: CN114267068A

Abstract

The embodiment of the invention relates to the field of face recognition, and discloses a face recognition method based on continuous frame information, electronic equipment and a storage medium, wherein continuous frame depth images of a face to be recognized when the face is shaken along the horizontal direction are obtained; marking the space-time coordinates of various human face key points in the image to form a space-time coordinate sequence; dividing each space-time coordinate sequence into a plurality of sequence segments by taking a turning point of a space coordinate in the horizontal direction changing along with time as a boundary; performing curve fitting on the space-time coordinates to obtain a space-time curve of each sequence segment; aiming at a space-time curve of a human face key point on a human face vertical central line, taking a moment corresponding to the maximum value of a space coordinate in the depth direction as a face-facing moment; the identity of the face to be recognized is determined based on the depth values of various face key points of the face to be recognized at the moment of face correction and the depth values of various face key points corresponding to the registered face, so that the recognition accuracy of the face with a large angle can be improved.

Description

Face recognition method based on continuous frame information, electronic equipment and storage medium

Technical Field

The present invention relates to the field of face recognition, and in particular, to a face recognition method based on continuous frame information, an electronic device, and a storage medium.

Background

With the rapid development of artificial intelligence technology, face recognition has become an important human-computer interaction mode, and is widely applied to the fields of security monitoring, intelligent payment, social media, medical treatment and the like. However, when the existing face recognition system encounters a large-angle face, the feature difference between the front face and the large-angle face is large, and it cannot be guaranteed that a face image to be recognized is strictly the front face during face recognition, so that the matching effect of the existing face recognition system is not ideal, and the face recognition accuracy is reduced.

Therefore, a face recognition method with high accuracy for large-angle face recognition is needed.

Disclosure of Invention

The embodiment of the invention aims to provide a face recognition method based on continuous frame information, an electronic device and a storage medium, which can improve the recognition precision of a large-angle face.

In order to solve the above technical problem, an embodiment of the present invention provides a face recognition method based on continuous frame information, including:

acquiring continuous frame depth images of a face to be recognized when the face is shaken along the horizontal direction;

labeling the space-time coordinates of various human face key points in the continuous frame depth images, and forming a space-time coordinate sequence of each human face key point;

aiming at each space-time coordinate sequence, taking a turning point of a space coordinate in the horizontal direction changing along with time as a boundary line, and dividing the space-time coordinate sequence into a plurality of sequence segments;

performing curve fitting on the space-time coordinates in each sequence segment to obtain a space-time curve of the sequence segment;

aiming at the space-time curve of any one of the human face key points on the vertical center line of the human face, taking a time point corresponding to the maximum value of the spatial coordinates in the depth direction as a face-righting moment, and determining the depth values of various human face key points at the face-righting moment according to the spatial coordinates in the depth direction at the face-righting moment in each space-time curve;

and determining the identity of the face to be recognized based on the depth values of various face key points of the face to be recognized at the moment of the front face and the depth values of various face key points corresponding to the registered face.

An embodiment of the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of face recognition based on continuous frame information as described above.

Embodiments of the present invention also provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for face recognition based on continuous frame information as described above.

Compared with the prior art, the method and the device have the advantages that continuous frame depth images when the face to be recognized swings horizontally are obtained; labeling the space-time coordinates of various human face key points in the continuous frame depth images, and forming a space-time coordinate sequence of each human face key point; aiming at each space-time coordinate sequence, taking a turning point of a space coordinate in the horizontal direction changing along with time as a boundary line, and dividing the space-time coordinate sequence into a plurality of sequence segments; performing curve fitting on the space-time coordinates in each sequence segment to obtain a space-time curve of the sequence segment; aiming at a space-time curve of any human face key point on a human face vertical central line, taking a time point corresponding to the maximum value of the spatial coordinates in the depth direction as a face-facing moment, and determining the depth values of various human face key points at the face-facing moment according to the spatial coordinates in the depth direction at the face-facing moment in each space-time curve; and determining the identity of the face to be recognized based on the depth values of various face key points of the face to be recognized at the moment of the face front and the depth values of various face key points corresponding to the registered face. According to the scheme, a space-time curve of key points of the human face is fitted based on continuous depth images shot when the human face shakes the head, a time point corresponding to the maximum value of the space coordinate in the depth direction is determined from the space-time curve and serves as a face-facing moment, and the depth information of the key points of the human face at the face-facing moment is utilized for face recognition, so that the recognition accuracy of the human face with a large angle is improved.

Drawings

FIG. 1 is a first flowchart of a face recognition method based on continuous frame information according to an embodiment of the present invention;

FIG. 2 is a specific flowchart II of a face recognition method based on continuous frame information according to an embodiment of the present invention;

fig. 3 is a specific flowchart three of a face recognition method based on continuous frame information according to an embodiment of the present invention;

FIG. 4 is a detailed flowchart of a face recognition method based on continuous frame information according to an embodiment of the present invention;

FIG. 5 is a flow diagram of a method for training a feature extraction model according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a network structure of a feature extraction model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

An embodiment of the present invention relates to a face recognition method based on continuous frame information, and as shown in fig. 1, the face recognition method based on continuous frame information provided in this embodiment includes the following steps.

Step 101: and acquiring continuous frame depth images of the face to be recognized when the face is shaken along the horizontal direction.

Specifically, a depth camera may be used to continuously capture face depth images of multiple consecutive frames as the face is panning in the horizontal direction. In the actual shooting process, in order to ensure the accuracy of subsequent image processing, the attitude angle of the human face is changed only on the azimuth angle of the human face as much as possible (the angle corresponding to the human face shaking along the horizontal direction is changed), and the pitch angle and the roll angle of the human face are ensured to be basically unchanged. For example, the collected person can be reminded to shake the head right and left (or shake the head repeatedly for multiple times) as slowly and at a uniform speed as possible by reminding the collected person to take the right face posture and keeping the eyes in front of the face and facing the camera before the depth image is collected. In the shaking process, the depth camera continuously takes pictures to obtain depth images of continuous frames, and a time-sequential depth image sequence G is formed_tT is the collected depth image G_tAnd the corresponding acquisition time.

Step 102: and labeling the space-time coordinates of various human face key points in the continuous frame depth images, and forming a space-time coordinate sequence of each human face key point.

Specifically, in this embodiment, a large number of key points are labeled for the positions of five sense organs of a human face (such as the positions of the left eye corner, the right eye corner, the nose tip, the left mouth corner, the right mouth corner, and the like), and the key points of the same name in the depth images of consecutive frames are attributed to the key points of the same kind of human face through image detection, and the key points of different names are attributed to the key points of different kinds of human faces. And marking the space-time coordinates of the face key points in the continuous frame depth images aiming at the face key points under each category. Each spatio-temporal coordinate includes four dimensions, namely, a time dimension (time coordinate (t)) and three spatial dimensions (spatial coordinates (x, y, z)), where x, y, and z are spatial coordinates in the horizontal direction, the vertical direction, and the depth direction in this order. In each frame of depth image, the time coordinate in the space-time coordinate of any human face key point is the acquisition time t of the current frame of depth image, and the space coordinates x, y and z are the horizontal coordinate, the vertical coordinate and the depth coordinate of the human face key point in the current frame of depth image in sequence.

After the time-space coordinates of various preset human face key points in the continuous frame depth images are marked, the time-space coordinates of different types of human face key points can be respectively counted, and the time-space coordinates of the human face key points of each type are sequenced in time sequence to obtain a time-space coordinate sequence of each type of human face key points.

Step 103: aiming at each space-time coordinate sequence, a turning point of a space coordinate in the horizontal direction changing along with time is taken as a boundary line, and the space-time coordinate sequence is divided into a plurality of sequence segments.

Specifically, each time-space coordinate sequence is generated based on shaking the head of the human face along the horizontal direction, and when the human face shakes the head back and forth along the horizontal direction, the coordinate values in the horizontal direction in the space coordinates of each key point change from small to large and then from large to small repeatedly. Based on this, turning points of the spatial coordinates in the horizontal direction changing with time, that is, turning points changing from small to large and then becoming small, and turning points changing from large to small and then becoming large are determined for the spatial coordinates in each spatial-temporal coordinate sequence. Then the space-time coordinate sequence is divided into a plurality of sequence segments by taking the turning points as boundaries.

Step 104: and performing curve fitting on the space-time coordinates in each sequence segment to obtain a space-time curve of the sequence segment.

Specifically, curve fitting can be performed on the space-time coordinates in each sequence segment to obtain a space curve as the space-time curve of the sequence segment.

In one example, a least squares polynomial curve fit may be applied to the spatio-temporal coordinates in each sequence segment to obtain a spatio-temporal curve for the sequence segment. The specific fitting procedure is as follows.

Fitting a curve by using a binomial equation of a least square method, wherein the value of each space-time coordinate (X, Y, Z, t) in the X, Y and Z axes of the space coordinate is expressed as follows:

taking the X axis as an example, only the coefficient A = [ a ] needs to be obtained₀,a₁,a₂,a₃,…,a_k]And obtaining the curve of the change of the x coordinate value along with the time. Here k may take the empirical value of 9.

The fitting discrimination standard adopted by the scheme is that the sum of squares of deviations is minimum: and (4) the fitted curve is considered to be in accordance with the requirement when the square sum of the difference value of the fitted curve and the actual value is minimum. Namely:

wherein the content of the first and second substances,

successively a fitted curve (function of t), δ, on the coordinate axes X, Y, Z_xi、δ_yi、δ_ziIn turn a fitted curve

At the ith time coordinate t_iCoordinate value of (3)

With the ith time coordinate t_iActual value x of_i、y_i、z_iM is the number of spatio-temporal coordinate points used for fitting.

Derived to obtain:

wherein, it is provided with

After the formula is constructed, the following calculation is continued.

And aiming at the space-time coordinates in any sequence segment, respectively making a difference between the time value of the non-first space-time coordinate and the time of the first space-time coordinate, updating the difference value to be used as a time coordinate value t of the non-first space-time coordinate, and then setting the time coordinate value of the first space-time coordinate to be 0. And then, fitting a trajectory change curve of the space coordinate value (x, y, z) with respect to time by using the updated space-time coordinate [ x, y, z, t ] in the sequence segment, namely fitting a space fitting curve, making the space coordinate (x, y, z) in each space-time coordinate [ x, y, z, t ] in the sequence segment and the space coordinate of the corresponding moment on the fitting curve be different, and obtaining coefficients A, B and C by using the least square sum fitting of the difference values.

The fitting method only needs to be brought into actual values [ X, Y, Z, t ] in each sequence segment, solve the solutions of corresponding coefficients in the matrixes in the formulas (3), (4) and (5) (when the fitting method is brought in, time coordinate value t needs to be correspondingly converted as the fitting curve is generated), and then three time-related curves X, Y and Z of the space coordinate of each sequence segment can be obtained.

It should be noted that after the curve is formed by fitting, the time coordinate t in the curve is also required to be restored to a time value corresponding to the time-space coordinate of the corresponding face key point in the continuous frame depth image, that is, the curve is integrally translated on the time coordinate axis, the first time coordinate in the curve is restored to a time coordinate value corresponding to the first time-space coordinate in the time-space coordinate sequence, and the time coordinate values of other positions in the curve are added with the time coordinate value of the first time-space coordinate on the basis of the current value. The curve after the time coordinate value is updated is the space-time curve of the final sequence segment.

Step 105: and aiming at the space-time curve of any human face key point on the vertical center line of the human face, taking the time point corresponding to the maximum value of the spatial coordinates in the depth direction as the face-facing moment, and determining the depth values of various human face key points at the face-facing moment according to the spatial coordinates in the depth direction at the face-facing moment in each space-time curve.

Specifically, in the process of shaking the head of the human face, when the posture of the human face is in the face, points (for example, points at the positions of the middle of the forehead, the nose bridge, the nose tip and the like) on the vertical center line of the human face are closest to the camera in the corresponding spatial position compared with other angle postures, that is, the depth is the largest. According to the characteristic, any one of the human face key points (such as points positioned on the middle of the forehead, the nose bridge, the nose tip and the like) positioned on the vertical central line of the human face can be selected from the human face key points, the space-time coordinate corresponding to the maximum value of the space coordinate in the depth direction is extracted from the space-time curve(s) of the human face key points, and then the time point corresponding to the time coordinate in the space-time coordinate is used as the face-setting time. Because the key points of various types of human faces move synchronously, when a certain time-space coordinate of any key point of the human faces on the vertical center line of the human faces is a time-space coordinate at the moment corresponding to the face, the time-space coordinates of the key points of other types of human faces corresponding to the time point in the time-space coordinate are also time-space coordinates at the face state. Therefore, the spatial coordinates in the depth direction can be extracted from the corresponding space-time coordinates of the front face at the moment in all the space-time curves, and the depth values of various face key points at the moment of the front face can be determined from the spatial coordinates.

Step 106: and determining the identity of the face to be recognized based on the depth values of various face key points of the face to be recognized at the face front moment and the depth values of various face key points corresponding to the registered face.

Specifically, after obtaining the depth values of various face key points of the face to be recognized at the moment of the face front, the depth values of the face key points may be compared with the depth values of the same kind of face key points corresponding to the registered face, so as to determine the identity of the face to be recognized.

During the similarity comparison, the depth values of the same type of face key points can be directly compared, or after the features of the depth values of various types of face key points are extracted, the extracted feature values are compared in a similarity manner. The process and method of the similarity comparison are not limited in this embodiment.

Compared with the related art, the method has the advantages that continuous frame depth images when the face to be recognized is shaken along the horizontal direction are obtained; marking the space-time coordinates of various human face key points in the continuous frame depth images, and forming a space-time coordinate sequence of each human face key point; aiming at each space-time coordinate sequence, taking a turning point of a space coordinate in the horizontal direction changing along with time as a boundary line, and dividing the space-time coordinate sequence into a plurality of sequence segments; performing curve fitting on the space-time coordinates in each sequence segment to obtain a space-time curve of the sequence segment; aiming at a space-time curve of any human face key point on a human face vertical central line, taking a time point corresponding to the maximum value of the spatial coordinates in the depth direction as a face-facing moment, and determining the depth values of various human face key points in the face-facing moment according to the spatial coordinates in the depth direction at the face-facing moment in each space-time curve; and determining the identity of the face to be recognized based on the depth values of various face key points of the face to be recognized at the moment of the face front and the depth values of various face key points corresponding to the registered face. According to the scheme, the time-space curve of key points of the human face is fitted based on the continuous depth images shot when the human face shakes head, the time point corresponding to the maximum value of the space coordinate in the depth direction is determined from the time-space curve and serves as the face-facing moment, and the depth information of each key point of the human face at the face-facing moment is utilized to conduct face recognition, so that the recognition accuracy of the human face with large angles is improved.

Another embodiment of the present invention relates to a face recognition method based on continuous frame information, which is an improvement of the face recognition method based on continuous frame information shown in fig. 1, and the improvement is that: and respectively refining the division process of the sequence segments, the determination process of the depth values of various face key points at the face front moment and the identity determination process of the face to be recognized.

As shown in fig. 2, the above step 103 may include the following sub-steps.

Substep 1031: and removing the space-time coordinate corresponding to the turning point of the space coordinate in the horizontal direction changing along with the time aiming at each space-time coordinate sequence.

Specifically, since the turning point of the horizontal spatial coordinate changing with time may belong to a trend (e.g., from small to large) of the horizontal spatial coordinate located in front of the turning point or a trend (e.g., from large to small) of the horizontal spatial coordinate located behind the turning point, the spatio-temporal coordinate corresponding to the turning point of the horizontal spatial coordinate changing with time may be discarded from each spatio-temporal coordinate sequence in consideration of the accuracy of the data, and not be used as the spatio-temporal coordinate in any subsequently generated sequence segment.

Sub-step 1032: and dividing the rest space-time coordinate sequence into a plurality of sequence segments by taking the turning points as boundary lines.

Aiming at each space-time coordinate sequence, after removing the space-time coordinate corresponding to the turning point of the space coordinate along with the change of time in the horizontal direction, regarding the remaining space-time coordinate sequence, still taking the removed turning point as a boundary line, dividing the remaining space-time coordinate sequence into a plurality of sequence segments, and still enabling each sequence segment to contain a plurality of space-time coordinates.

As shown in fig. 3, the above step 105 may include the following substeps.

Sub-step 1051: and respectively taking a time point corresponding to the maximum value of the spatial coordinates in the depth direction as a face correction moment aiming at a plurality of space-time curves of any human face key point on the human face vertical central line.

Specifically, the time-space coordinate sequence corresponding to each face key point is divided into a plurality of sequence segments, and each sequence segment corresponds to a time-space curve respectively, so that each face key point can correspond to a plurality of time-space curves. When a time-space curve for any human face key point on the vertical center line of the human face is executed, and a time point corresponding to the maximum value of the spatial coordinates in the depth direction is taken as a face correction time, one face correction time can be respectively determined for each time-space curve corresponding to the human face key point. The number of frontal moments determined at this time is equal to the number of spatio-temporal curves.

Substep 1052: and determining the depth value of each face key point at the moment of the front face according to a plurality of candidate depth values corresponding to each face key point at the moment of the front face.

Specifically, after obtaining a plurality of face moments, one face moment can be found from all space-time curves corresponding to all kinds of face key points, and the spatial coordinates in the depth direction at the face moment are extracted as a candidate depth value. In this way, a plurality of candidate depth values may be correspondingly determined for each face keypoint, and a final depth value of each face keypoint at the moment of the face may be determined based on the candidate depth values. For example, an average value of a plurality of candidate depth values corresponding to each kind of face keypoints is obtained, and the average value is used as a final depth value of the corresponding kind of face keypoints at the face front moment.

As shown in fig. 4, the step 106 may include the following substeps.

Substep 1061: and inputting the depth values of various face key points of the face to be recognized at the face-setting moment into a pre-trained feature extraction model to obtain recognition features.

Specifically, a feature extraction model may be trained in advance, where the feature extraction model enables a high similarity between output features obtained correspondingly when the depth values of the above-mentioned face key points under the same face are input; when the depth values of the key points of the human faces under different human faces are input, the similarity between the corresponding obtained output features is low. And extracting characteristic values (also called as characteristic vectors) corresponding to the depth values of the various face key points of the face to be recognized under the front face through the characteristic extraction model. In this embodiment, the feature value corresponding to the face to be recognized is referred to as a recognition feature.

In order to conveniently and uniformly process the input depth data of different human faces, when the sub-step is executed, the depth values of various human face key points of the human face to be recognized at the moment of the face front are normalized, the normalized depth values are input into a pre-trained feature extraction model, and corresponding recognition features are output and obtained. For example, the depth values of various face key points of the face to be currently recognized are normalized to be between [0,1 ].

Substep 1062: and comparing the recognition features with the registered features obtained by the feature extraction model of the depth values of various face key points corresponding to the registered face, and determining the identity of the face to be recognized according to the comparison result.

The depth values of the key points of the face corresponding to the registered face may be obtained in advance through the feature extraction model to obtain corresponding feature values, and the feature values corresponding to the registered face are referred to as registered features in this embodiment.

Specifically, after the depth values (after normalization) of various face key points of the face to be recognized at the face-front moment are input into the feature extraction model to obtain the recognition features, similarity comparison can be performed between the recognition features and the registered features corresponding to the registered face, and the identity of the face to be recognized is determined according to the comparison result. For example, when the similarity value obtained by comparison is greater than a preset threshold value, the identity of the face to be recognized and the identity of the registered face to be compared are determined to be the same person; and when the similarity value obtained by comparison is smaller than a preset threshold value, determining that the identity of the face to be recognized is different from that of the registered face to be compared.

In one example, as shown in fig. 5, the training process of the feature extraction model includes the following steps:

step 201: the method comprises the steps of obtaining depth values of various face key points in a plurality of face depth images containing different front faces.

In particular, a depth camera may be used to capture face depth images of different frontal faces. And marking a plurality of face key points from the face depth image, and determining the depth values of the face key points.

In order to enable subsequent model training to have higher generalization performance, more than 200 persons of depth images need to be acquired to construct an image sample and a label thereof. The person to be collected needs to sit up as far as possible to collect the face depth image in the face-facing posture.

Step 202: the depth values of the key points of the human faces in the two depth images of the human face of the same person are respectively used as an anchor sample and a positive sample, and the depth values of the key points of the human faces in one depth image of the human face of the person other than the same person are used as negative samples, so that a group of training samples are formed.

Specifically, when the training sample is selected, the depth values of the face key points in two face depth images of the same person can be used as an anchor sample (a) and a positive sample (p), respectively, and the depth values of the face key points in one face depth image of the person other than the same person are used as a negative sample (n), so as to form a group of training samples.

In order to facilitate uniform processing of the input depth data of different faces, in the step, the depth values of various face key points of the sample to be trained at the face-facing moment are normalized, and the normalized depth values are used as final training samples. For example, the depth values for various face key points of the face in each sample are normalized to between [0,1 ].

Step 203: and taking the normalized training samples as input and the characteristic values of various samples in the training samples as output, constructing a characteristic extraction model, and training the characteristic extraction model.

The loss function adopted by training the feature extraction model is constructed based on first loss between the feature values corresponding to the anchor sample and the positive sample and second loss between the feature values corresponding to the anchor sample and the negative sample.

Specifically, as shown in fig. 6, the network structure of the feature extraction model includes, in order from the input to the output: the device comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first full-connection layer and a second full-connection layer. The input of the feature extraction model is the above-mentioned anchor sample, positive sample and negative sample, and the output is the feature values corresponding to these samples, for example, the feature values corresponding to the anchor sample, positive sample and negative sample are expressed as：P_a、P_p、P_n。

The above-mentioned construction process of the loss function is to construct the loss function according to the following formula:

loss＝loss_same+loss_diff…………………………(8)

wherein, loss is the loss value of the loss function, loss _ diff is the second loss, P_ai、P_pi、P_niAnd sequentially obtaining the corresponding characteristic values of the anchor sample a, the positive sample p and the negative sample n in the ith group of training samples, wherein m is the number of the training sample groups.

Specifically, when the loss of training is calculated by using the formulas (6), (7) and (8), the parameters (W) of the feature extraction model E are optimized according to the conventional deep learning network optimization method_E) Namely:

compared with the related art, the embodiment removes the space-time coordinate corresponding to the turning point of the space coordinate along the horizontal direction changing with time by aiming at each space-time coordinate sequence; and dividing the remaining space-time coordinate sequence into a plurality of sequence segments by taking the turning point as a boundary, thereby ensuring the accuracy of the space-time coordinate in the generated sequence segments.

Respectively taking a time point corresponding to the maximum value of the spatial coordinates in the depth direction as the face correction moment by aiming at a plurality of space-time curves of any one face key point on the face vertical central line; and determining the depth value of each face key point at the front face time according to a plurality of candidate depth values corresponding to each face key point at the front face time, thereby ensuring the preparation of the obtained depth value of each face key point at the front face time.

Inputting the depth values of various face key points of the face to be recognized at the face setting moment into a pre-trained feature extraction model to obtain recognition features; the identification features are compared with the registered features obtained by the feature extraction model of the depth values of various face key points corresponding to the registered face, and the identity of the face to be identified is determined according to the comparison result, so that the identification method for identifying the identity of the face to be identified based on the feature extraction model is realized.

Another embodiment of the invention relates to an electronic device, as shown in FIG. 7, comprising at least one processor 302; and a memory 301 communicatively coupled to the at least one processor 302; wherein the memory 301 stores instructions executable by the at least one processor 302, the instructions being executable by the at least one processor 302 to enable the at least one processor 302 to perform any of the method embodiments described above.

Where the memory 301 and the processor 302 are coupled in a bus, the bus may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 302 and the memory 301 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 302 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 302.

The processor 302 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 301 may be used to store data used by processor 302 in performing operations.

Another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes any of the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A face recognition method based on continuous frame information is characterized by comprising the following steps:

acquiring continuous frame depth images of a face to be recognized when the face is shaken in the horizontal direction;

aiming at the space-time curve of any one of the human face key points on the vertical center line of the human face, taking the time point corresponding to the maximum value of the spatial coordinates in the depth direction as the face-facing moment, and determining the depth value of each human face key point under the face-facing moment according to the spatial coordinates in the depth direction under the face-facing moment in each space-time curve;

2. The method according to claim 1, wherein the dividing the spatio-temporal coordinate sequence into a plurality of sequence segments with a turning point of a spatial coordinate in a horizontal direction changing with time as a boundary for each spatio-temporal coordinate sequence comprises:

removing the space-time coordinate corresponding to a turning point of which the space coordinate in the horizontal direction changes along with time aiming at each space-time coordinate sequence;

and dividing the remaining space-time coordinate sequence into a plurality of sequence segments by taking the turning point as a boundary.

3. The method of claim 1, wherein said curve fitting the spatio-temporal coordinates in each of the sequence segments to obtain a spatio-temporal curve of the sequence segment comprises:

and fitting the space-time coordinates in each sequence segment by a least square polynomial curve to obtain a space-time curve of the sequence segment.

4. The method according to claim 1, wherein the determining, for the space-time curves of any one of the face key points on the vertical center line of the face, a time point corresponding to a maximum value of spatial coordinates in a depth direction as a face-facing time, and according to spatial coordinates in the depth direction at the face-facing time in each of the space-time curves, depth values of various face key points at the face-facing time are determined, includes:

aiming at a plurality of space-time curves of any one of the key points of the face on the vertical center line of the face, respectively taking the time point corresponding to the maximum value of the space coordinate in the depth direction as the face correction moment;

and taking the space coordinate of the depth direction of the front face at the moment in each space-time curve as a candidate depth value, and determining each type of the depth value of the key points of the front face at the moment according to a plurality of candidate depth values corresponding to each type of the key points of the front face at the moment in the front face.

5. The method according to claim 1, wherein the determining the identity of the face to be recognized based on the depth values of the various face key points of the face to be recognized at the moment of the front face and the depth values of the various face key points corresponding to the registered face comprises:

inputting the depth values of various face key points of the face to be recognized at the face correcting moment into a pre-trained feature extraction model to obtain recognition features;

and comparing the recognition features with the registered features obtained by the feature extraction model of the depth values of various face key points corresponding to the registered face, and determining the identity of the face to be recognized according to the comparison result.

6. The method according to claim 5, wherein the inputting the depth values of various face key points of the face to be recognized to a pre-trained feature extraction model at the face front moment to obtain recognition features comprises:

and normalizing the depth values of various face key points of the face to be recognized at the face setting moment, and inputting the normalized depth values into a pre-trained feature extraction model to obtain recognition features.

7. The method of claim 6, wherein the training process of the feature extraction model comprises:

acquiring the depth values of the plurality of human face key points in a plurality of human face depth images containing different human frontal faces;

respectively taking the depth values of the face key points in the two face depth images of the same person as an anchor sample and a positive sample, and taking the depth value of the face key point in one face depth image of the person other than the same person as a negative sample to form a group of training samples;

taking the normalized training samples as input and the characteristic values of various samples in the training samples as output, constructing the characteristic extraction model, and training the characteristic extraction model;

wherein a loss function used for training the feature extraction model is constructed based on a first loss between the anchor sample and the feature value corresponding to the positive sample, and a second loss between the anchor sample and the feature value corresponding to the negative sample.

8. The method of claim 7, wherein the constructing of the loss function comprises:

constructing the loss function according to the following formula:

loss＝loss_same+loss_diff

wherein, loss is the loss value of the loss function, loss _ same is the first loss, loss _ diff is the second loss, P_ai、P_pi、P_niThe eigenvalues corresponding to the anchor sample a, the positive sample p and the negative sample n in the ith group of training samples are sequentially arranged, and m is the number of training sample groups.

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of face recognition based on consecutive frame information according to any one of claims 1 to 8.

10. A computer-readable storage medium storing a computer program, wherein the computer program is configured to implement the method for face recognition based on continuous frame information according to any one of claims 1 to 8 when executed by a processor.