WO2020244174A1 - 人脸识别方法、装置、设备及计算机可读存储介质 - Google Patents

人脸识别方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2020244174A1
WO2020244174A1 PCT/CN2019/121347 CN2019121347W WO2020244174A1 WO 2020244174 A1 WO2020244174 A1 WO 2020244174A1 CN 2019121347 W CN2019121347 W CN 2019121347W WO 2020244174 A1 WO2020244174 A1 WO 2020244174A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
feature group
feature
time series
spatial
Prior art date
Application number
PCT/CN2019/121347
Other languages
English (en)
French (fr)
Inventor
柳军领
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020244174A1 publication Critical patent/WO2020244174A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present invention relates to the field of computer vision technology, in particular to a face recognition method, device, equipment and computer readable storage medium.
  • Face recognition refers to a biometric recognition technology based on human facial feature information, which is widely used in many fields, such as community access control, company attendance, judicial and criminal investigation, etc. Combining with practical applications, it can be known that a human face is a natural structural target with quite complex changes in details. Therefore, the detection and recognition of such targets is a challenging subject. Specifically, the difficulty of recognition is reflected in: (1) Due to differences in appearance, expression, posture, skin color, etc., the face itself has pattern variability; (2) Due to the uncertainty of appendages such as bangs, glasses, and beards The face has different characteristics; (3) The size of the image, the direction of the light source and the intensity of the light will affect the final expression of the face. Therefore, faces with good front/vertical/light can be easily recognized; faces with frontal/skewed/bad light are generally not recognized.
  • the realization process of face recognition can include: first, intercept the image containing the face from the video stream to obtain the face image; secondly, extract the facial features in the face image; then, perform the extraction of the facial features Classification, complete face recognition.
  • the feature extraction rate is not high (for example, the feature extraction rate is not high, which can be expressed as: the extracted facial features are single, inaccurate, and the calculation process is complicated). Brings the problem of low face recognition accuracy.
  • the embodiments of the present invention provide a face recognition method, device, equipment, and computer-readable storage medium, which can improve the accuracy of the face feature extraction process, so as to improve the accuracy of face recognition.
  • an embodiment of the present invention provides a face recognition method, which includes:
  • N is a positive integer greater than 1;
  • first face spatial feature group includes a face feature corresponding to each frame of face image
  • the extracting time series features from the first face spatial feature group to obtain the face time series feature group includes:
  • the first face space feature group is input into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes each face in the first face space feature group The timing characteristics corresponding to the characteristics;
  • spatial mapping is performed on the fused temporal feature to obtain a mapped face temporal feature group.
  • the dimension of the first face spatial feature group is M
  • the dimension of the first face spatial feature group is determined according to the FaceNet model
  • the first face The spatial feature group is in the first space
  • the dimension of the face time series feature group is S
  • the dimension of the face time series feature group is the number of hidden layer neurons in the preset recurrent neural network model Determined
  • the face time series feature group is in the second space
  • the dimension of the first face space feature group is not equal to the dimension of the face time series feature group, the fusion time series
  • the features are spatially mapped to obtain the mapped face sequence feature group, including:
  • a fully connected layer is added to the preset recurrent neural network model, so that the fusion time series feature is mapped to the first space, and a face time series feature with the same dimension as the first face space feature group is obtained.
  • the preset cyclic neural network model is a two-layer long and short-term memory network model, and the network structure of each layer is the same.
  • the extracting time series features from the first face spatial feature group to obtain the face time series feature group further includes:
  • the first face space feature group is input into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes each face in the first face space feature group The timing characteristics corresponding to the characteristics;
  • Matching the target face corresponding to the time series feature group of the face in the face database includes:
  • the degree of matching is less than the preset threshold, continue to calculate the degree of matching between the remaining second face sequence features and the face image, until the degree of matching is greater than the preset threshold, determine the second person The target face corresponding to the face sequence feature.
  • the extracting N frames of face images of the same target face in the video stream includes:
  • the performing spatial feature extraction on the N frames of face images to obtain a first face spatial feature group includes:
  • the N frames of face images are input into the FaceNet model to extract spatial features in the N frames of face images.
  • the face time sequence feature group can be obtained. Since the face time sequence feature group can reflect the complementary information contained in multiple frames of face images, it can improve The accuracy of face feature extraction process to improve the accuracy of face recognition.
  • an embodiment of the present invention provides a face recognition device, which includes:
  • the image extraction unit is used to extract N frames of face images of the same target face in the video stream, wherein the N frames of face images have time series; N is a positive integer greater than 1;
  • the first feature extraction unit is configured to perform spatial feature extraction on the N frames of face images to obtain a first face spatial feature group, wherein the first face spatial feature group includes the person corresponding to each frame of face image Facial features
  • the second feature extraction unit is configured to extract time series features from the first face spatial feature group to obtain a face time series feature group
  • the recognition unit is configured to match the target face corresponding to the face sequence feature group in the face database.
  • the second feature extraction unit includes a first time-series feature extraction unit, a fusion unit, and a first spatial mapping unit;
  • the first time series feature extraction unit is configured to input the first face spatial feature group into a preset recurrent neural network model to output a face time series feature group, wherein the face time series feature group includes The time series feature corresponding to each face feature in the first face space feature group;
  • the fusion unit is configured to perform fusion processing on the time sequence features in the face time sequence feature group to obtain fused time sequence features
  • the first spatial mapping unit is configured to perform spatial mapping on the fusion time series feature when the dimension of the first face spatial feature group is not equal to the dimension of the face time series feature group to obtain The mapped face sequence feature group.
  • the dimension of the first face spatial feature group is M
  • the dimension of the first face spatial feature group is determined according to the FaceNet model
  • the first face The spatial feature group is in the first space
  • the dimension of the face time series feature group is S
  • the dimension of the face time series feature group is the number of hidden layer neurons in the preset recurrent neural network model Determined
  • the face sequence feature group is in the second space
  • the space mapping unit is specifically used for:
  • a fully connected layer is added to the preset recurrent neural network model, so that the fusion time series feature is mapped to the first space, and a face time series feature with the same dimension as the first face space feature group is obtained.
  • the preset cyclic neural network model is a two-layer long and short-term memory network model, and the network structure of each layer is the same.
  • the second feature extraction unit further includes a second time series feature extraction unit, a determination unit, and a second spatial mapping unit; wherein,
  • the second time series feature extraction unit is configured to input the first face space feature group into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes The time series feature corresponding to each face feature in the first face space feature group;
  • the determining unit is configured to determine a first face sequence feature in the face sequence feature group; wherein the first face sequence feature is any face sequence feature in the face sequence feature group ;
  • the second spatial mapping unit is configured to perform spatial mapping on the first face temporal feature group when the dimension of the first face spatial feature group is not equal to the dimension of the face temporal feature group , To obtain the second face sequence feature;
  • the identification unit includes: a matching degree determination unit and a processing unit;
  • the matching degree determining unit is configured to continue to calculate the matching degree between the remaining second face sequence features and the face image when the matching degree is less than a preset threshold, until the matching degree is greater than When the threshold is preset, the target face corresponding to the second face sequence feature is determined.
  • the image extraction unit is specifically configured to:
  • the first feature extraction unit is specifically configured to:
  • the N frames of face images are input into the FaceNet model to extract spatial features in the N frames of face images.
  • an embodiment of the present invention provides a face recognition device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer that supports the face recognition device to execute the above method A program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect described above.
  • an embodiment of the present invention provides a computer-readable storage medium, the computer storage medium stores a computer program, the computer program includes program instructions, the program instructions when executed by a processor cause the processing
  • the device executes the method of the first aspect described above.
  • an embodiment of the present invention provides a computer program.
  • the computer program includes program instructions that, when executed by a processor, cause the processor to execute the method of the first aspect.
  • a face time series feature group By extracting time series information from the first face spatial feature group, a face time series feature group can be obtained, and by performing feature fusion on the face features included in the face time series feature group, the fused time series feature is obtained
  • the face time series feature can be obtained through spatial mapping. Since the face time series feature can reflect the multiple attributes of multiple frames of face images, In addition, the face features are richer, which can improve the accuracy of the face feature extraction process to improve the accuracy of face recognition.
  • FIG. 1 is a schematic diagram of the internal processing logic of an LSTM neural network model provided by an embodiment of the present application
  • Figure 2 is a schematic structural diagram of a cascaded LSTM neural network model provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a face recognition system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a face recognition method provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a two-layer LSTM model provided by an embodiment of the present application.
  • FIG. 6A is a schematic flowchart of a method for spatial mapping of face temporal features according to an embodiment of the present application
  • FIG. 6B is a schematic flowchart of another method for spatial mapping of face temporal features according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a face recognition device provided by an embodiment of the present application.
  • Fig. 8 is a schematic block diagram of a face recognition device according to another embodiment of the present application.
  • the LSTM model uses input gates, output gates, forget gates, and cell structures to control the learning and forgetting of historical information, so that the model is suitable for processing long sequence problems.
  • FIG. 1 is a schematic structural diagram of an LSTM provided by an embodiment of the present application. As shown, the set time t, LSTM model memory unit 1 is represented by C t, forgotten gate output is presented as f t, the output of the input gate is represented as i t, is expressed as the gate output O (t), three The element values of each gate are in the interval [0,1].
  • the forget gate is to control whether to forget, that is, to control whether to forget the hidden cell state of the upper layer with a certain probability.
  • the forgetting gate At time t, for the forgetting gate, its input is the hidden state h(t-1) of the previous sequence and the data x(t) of the current sequence.
  • the output of the forgetting gate is obtained.
  • the activation function here can be sigmoid.
  • the processing logic of the forget gate can be expressed as the following mathematical expression (1):
  • W f , U f , and b f are the coefficients and biases of the linear relationship, and ⁇ represents the activation function sigmoid.
  • the input gate is responsible for processing the input of the current sequence position and deciding what new information to put in the "cell state".
  • the input gate is composed of two parts, the first part is under the action of the activation function sigmoid , the output i t, the second portion tanh function in the active role, the output is a t, the two parts have to update the results of multiplication state of the cell.
  • the role of the input gate is to prepare for status updates.
  • processing logic of the input gate can be expressed as the following mathematical expression (2):
  • W i, U i, b i, W a, U a, b a is a coefficient and an offset linear relationship
  • represents the activation function sigmoid.
  • the cell state C (t) consists of two parts, the first part is C (t-1) and outputs the product F t forgetting gate, the second gate portion is input i t and a t is the product, i.e. can be expressed as the following mathematical expression (3):
  • * means Hadamard product.
  • the Hadamard product is multiplied bitwise.
  • the update of the hidden state h (t) consists of two parts.
  • the first part is O (t) , which consists of the hidden state h (t-1) of the previous sequence and the data x (t ) , and the activation function sigmoid.
  • the second part is composed of the hidden state C (t) and the activation function tanh.
  • the processing logic can be expressed as the following mathematical expression (4):
  • the preset recurrent neural network model may include, but is not limited to, an LSTM neural network model, and may also include a convolutional neural network (Convolutional Neural Network, CNN).
  • CNN convolutional Neural Network
  • the LSTM neural network model as an example.
  • the specific architecture of the model can be shown in Figure 2.
  • multiple cells are cascaded, for example, t cells as shown in Figure 2.
  • the model can extract the timing information contained in multiple frames of face images.
  • the implementation process of constructing a multi-task cascaded convolutional neural network model may include:
  • A1. Determine the sample data of the training set
  • the multi-task cascaded convolutional neural network model contains three sub-networks.
  • the first sub-network of the cascade is a small convolutional neural network.
  • the second sub-network of the cascade is a medium convolutional neural network, and the third sub-network of the cascade is a large convolutional neural network.
  • multi-task learning for each cascaded sub-network, multi-task learning is used, for example, simultaneous learning of "face classification”, “frame regression”, and “face key point detection” , “Face attribute analysis” four tasks;
  • A4 Put all the images in the sample data of the training set into the multi-task cascaded convolutional neural network model for training, and obtain a trained multi-task cascaded convolutional neural network model.
  • the multiple images intercepted from the video stream (that is, the test set sample data) are input into the trained multi-task cascaded convolutional neural network model, To determine whether there is a face and to determine a face candidate frame.
  • the face recognition system 30 integrates a multi-task cascaded convolutional neural network model 300, a FaceNet model 301, a preset recurrent neural network model 302, a fully connected layer 303, and face matching Model 304.
  • the multi-task cascaded convolutional neural network model 300 is used to extract N frames of face images of the same target face in the video stream, where the N frames of face images have time series; N is a positive integer greater than 1. ;
  • the FaceNet model 301 is used to perform spatial feature extraction on the N frames of face images to obtain a first face spatial feature group, wherein the first face spatial feature group includes the face features corresponding to each frame of face image ;
  • the preset recurrent neural network model 302 is used to extract time series information from the first face spatial feature group to obtain a face time series feature group;
  • the fully-connected layer 303 is used for when the dimension M of the first face spatial feature group is not equal to the dimension S of the face time series feature group (for example, M is less than S), perform a time-series analysis on the first face.
  • the features are spatially mapped to obtain the second face time series feature; wherein, the first face time series feature is any face time series feature in the face time series feature group;
  • the face matching model 304 is used to determine the degree of matching between the second face sequence feature and the face image stored in the face database; if the degree of matching is less than a preset threshold, continue to calculate the remaining The degree of matching between the second face sequence feature and the face image is determined until the matching degree is greater than a preset threshold, and the target face corresponding to the second face sequence feature is determined.
  • the preset recurrent neural network model 302 is also used to perform feature fusion processing on the face features included in the face time series feature group to obtain the fused time series feature.
  • the fully connected layer 303 is specifically used to:
  • spatial mapping is performed on the fused temporal feature to obtain a mapped face temporal feature group.
  • the face recognition model 304 is specifically used for:
  • Step S401 Extract N frames of face images of the same target face in the video stream, where the N frames of face images have time series; N is a positive integer greater than 1.
  • the device can extract video frames containing human faces in the video in chronological order from the original video, so as to obtain a video stream containing human face images.
  • the video stream includes face images corresponding to person A, person B, person C, and person D.
  • the device can intercept N frames of face images of the same target face (for example, person A) in a time sequence in the video stream.
  • N frames of face images are image frames containing the same target face determined by performing face detection and face tracking processing on each frame of image in the video stream. It can be understood that the N frames of face images captured in the video stream are related in the time dimension, that is, the N frames of face images have time series.
  • a trained multi-task cascaded convolutional neural network model can be used to perform face detection on the face images of the same target face in the video stream, and when the face images of the same target face are detected When, determine the face candidate frame of the face image, and then crop the face image according to the face candidate frame to remove the influence of the complex environment background on the recognition effect.
  • Step S402 Perform spatial feature extraction on the N frames of face images to obtain a first face spatial feature group, where the first face spatial feature group includes face features corresponding to each frame of face image.
  • the FaceNet model can be used to extract the face spatial features contained in each of the N frames of face images, and then N feature vectors corresponding to the N frames of face images can be generated. Specifically, these N feature vectors form the first face space feature group.
  • the first face spatial feature group extracted by the FaceNet model is a high-order feature with a dimension (Q) of 128. Since the FaceNet model can be used to obtain a multi-dimensional matrix of the face image, this multi-dimensional matrix can reflect more detailed characteristics of the face, thereby meeting the requirements for face recognition accuracy.
  • Step S403 Extract time series features from the first face spatial feature group to obtain a face time series feature group.
  • the number of face time series features included in the face time series feature group is N (that is, the number of face time series features is equal to the number of frames of the face image).
  • the device may input the first face space feature group into the preset recurrent neural network model to output the face time series feature group; wherein the face time series feature group includes the first face space feature The time sequence feature corresponding to each face feature in the group.
  • the preset recurrent neural network model may be an LSTM model.
  • the number of layers of the LSTM model is greater than or equal to 2, and the network structure of each layer is the same.
  • FIG. 5 is a schematic structural diagram of a two-layer LSTM model provided by an embodiment of the present application.
  • the output of the first layer LSTM is used as the second layer LSTM enter.
  • t cells are cascaded in the first layer LSTM model, which are cell 1, cell 2, ..., cell t; in the second layer LSTM model, t cells are cascaded, These t cells are cell 1, cell 2, ..., cell t.
  • its input is x10
  • its output x20 is used as the input of cell 1 in the second layer LSTM model.
  • the accuracy in the process of facial feature extraction can be improved to improve the accuracy of face recognition.
  • N frames of face images are sequential.
  • the time step of the LSTM model is set to N (here, the time step is equal to the number of frames of the face image), that is, N frames of face images are used
  • the respective facial features are used as the input of the LSTM model to extract the timing information.
  • a set of face time series feature groups with time series information can be obtained, where the length of the face time series feature group is N, and the dimension of each face feature in the face time series feature group
  • the number is S, that is, the dimension of each face feature in the face time series feature group is equal to the number S of hidden layer neurons in the LSTM model.
  • the first face space feature group extracted by the FaceNet model is a high-order feature with a dimension (Q) of 128 (where the first face space feature is in the first space), and the face time series feature
  • Q dimension of 128
  • the face time series feature The dimension of each face feature in the group is determined by the number S of hidden layer neurons in the LSTM model, and the time series feature group of the face is in the second space, then this also means that the first face space feature group is between the dimension of each face feature and the dimension of each face feature in the face time series feature group, there are two situations where the dimensions are equal and the dimensions are not equal. The two situations are described in detail below:
  • the dimension of the face feature is 128)
  • the target face corresponding to the face sequence feature group is matched in the face database.
  • the dimension M of each face feature in the first face space feature group is 128, and the number S of hidden layer neurons in the LSTM model is not equal to 128 (that is, in the face temporal feature group The dimension of each face feature is not equal to 128).
  • M is less than S
  • the face sequence feature group in the second space is mapped to the face sequence feature group in the first space to obtain the mapped face sequence feature group.
  • the following two different implementations can be included:
  • the fused temporal features can be spatially mapped to obtain the mapped face temporal feature group, where the fused temporal features are obtained by fusing the temporal features in the face temporal feature group;
  • the first face sequence feature can be spatially mapped to obtain the mapped second face sequence feature.
  • the first face sequence feature is any person in the face sequence feature group. Face timing characteristics.
  • the first implementation method spatially map the fused temporal features to obtain the mapped facial temporal feature group.
  • the mapped face sequence feature group can be obtained by performing the following steps (see FIG. 6A):
  • Step B1 Input the first face space feature group into a preset recurrent neural network model to output a face time series feature group, wherein the face time series feature group includes the first face space feature group The time sequence feature corresponding to each face feature;
  • Step B2 subject the temporal features in the face temporal feature group to fusion processing to obtain fused temporal features
  • the technical means for performing fusion processing on the time series features in the face time series feature group may include, but is not limited to, operations such as averaging and normalizing the time series features.
  • the number of time series features included in the face time series feature group is N, and when the time series features in the face time series feature group are fused, the number of fused time series features is one. It is understandable that the fusion of temporal features can better reflect the multiple attributes of multiple frames of face images, and face features are more abundant.
  • Step B3 When the dimension of the first face spatial feature group is not equal to the dimension of the face temporal feature group, perform spatial mapping on the fused temporal feature to obtain the mapped face temporal feature group.
  • that the dimension M of the first face spatial feature group and the dimension S of the face temporal feature group are not equal may include: for example, M is less than S.
  • spatial mapping is performed on the fused time series feature to obtain the mapped person Face sequence feature group, including:
  • a fully connected layer is added to the preset recurrent neural network model, so that the fusion time series feature is mapped to the first space, and a face time series feature with the same dimension as the first face space feature group is obtained.
  • the preset recurrent neural network model is the LSTM model
  • the number of hidden layer neurons contained in the LSTM model is 512
  • the feature group is a high-order feature with a dimension (Q) of 128.
  • the target face corresponding to the temporal feature of the face is matched in the face database.
  • the temporal features of the face after spatial mapping can better reflect the multiple attributes of multiple frames of face images, and the face features are richer, which can improve the extraction of face features.
  • the second implementation manner spatially map the first face sequence feature to obtain the mapped second face sequence feature.
  • the mapped second face sequence feature can be obtained by performing the following steps (see FIG. 6B):
  • the preset recurrent neural network model is the LSTM model
  • the number of hidden layer neurons contained in the LSTM model is 512
  • the feature group is a high-order feature with a dimension (Q) of 128.
  • the target face corresponding to the second face sequence feature is matched in the face database.
  • the subsequent step S404 please refer to the subsequent step S404.
  • Step S404 Match the target face corresponding to the time series feature of the face in the face database.
  • the face database stores the face images of multiple people.
  • the database stores the target face A, target face B, target face C, and target face D. Face image.
  • the face image of each person stored in the face database is a positive face image.
  • the feature of the face image of each person in the database can be extracted to obtain the registered feature vector.
  • the registered feature vector is a specific manifestation of the face image of the target face in the database. It is understandable that face images of different people have different registration feature vectors obtained by extraction.
  • the correspondence between the face image and the registered feature vector can be as shown in Table 1:
  • the recognition of the target face can be achieved by calculating the matching degree between the feature vector in the face sequence feature group and the registered feature vector of the target face in the database. Specifically, calculate the Euclidean distance between the feature vector in the face time series feature group and the registered feature vector. When the Euclidean distance between the two is less than a set threshold (for example, the threshold is 0.2), it is recognized as the same Individual; if not, identify as a different person. It should be noted that, in the embodiment of the present application, the smaller the Euclidean distance between the feature vector in the face time series feature group and the registered feature vector, the higher the matching degree.
  • a set threshold for example, the threshold is 0.2
  • the recognition of the target face can be achieved by calculating the degree of matching between the fusion timing feature and the registered feature vector.
  • the recognition of the target face can be achieved by calculating the matching degree between the second face sequence feature and the registered feature vector.
  • the degree of matching between the second face sequence feature and the face image stored in the database is less than the preset threshold, at this time, continue to calculate the remaining second face sequence features and the person The matching degree of the face image until the matching degree is greater than the preset threshold, thereby completing the recognition of the target face.
  • the face timing feature group includes 10 face timing features, which are: face timing feature 1, face timing feature 2,..., face timing feature 10, where the device determines that it has undergone spatial mapping
  • the matching degree between the subsequent face sequence feature 1 and the registered feature vector is 0.6, which is less than the preset threshold 0.8.
  • the device continues to calculate the spatially mapped face sequence
  • the matching degree between the feature 2 and the registration vector is 0.9, and the matching degree is greater than the preset threshold 0.8.
  • the target face D can be recognized.
  • the face time sequence feature group can be obtained. Since the face time sequence feature group can reflect the complementary information contained in multiple frames of face images, it can improve The accuracy of face feature extraction process to improve the accuracy of face recognition.
  • steps in the flowcharts of FIGS. 4, 6A, and 6B are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 4, Figure 6A, and Figure 6B may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The order of execution of the sub-steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or stages of other steps.
  • an embodiment of this application also provides a face recognition device 70.
  • the face recognition device 70 may include:
  • the image extraction unit 701 is configured to extract N frames of face images of the same target face in the video stream, where the N frames of face images have time series; N is a positive integer greater than 1;
  • the first feature extraction unit 702 is configured to perform spatial feature extraction on the N frames of face images to obtain a first face spatial feature group, wherein the first face spatial feature group includes the corresponding face image of each frame Facial features
  • the second feature extraction unit 703 is configured to extract time series features from the first face spatial feature group to obtain a face time series feature group;
  • the recognition unit 704 is configured to match the target face corresponding to the face sequence feature group in the face database.
  • the second feature extraction unit 703 includes a first time series feature extraction unit, a fusion unit, and a first spatial mapping unit; wherein,
  • the first time series feature extraction unit is configured to input the first face spatial feature group into a preset recurrent neural network model to output a face time series feature group, wherein the face time series feature group includes The time series feature corresponding to each face feature in the first face space feature group;
  • the fusion unit is configured to perform fusion processing on the time sequence features in the face time sequence feature group to obtain fused time sequence features
  • the first spatial mapping unit is configured to perform spatial mapping on the fusion time series feature when the dimension of the first face spatial feature group is not equal to the dimension of the face time series feature group to obtain The mapped face sequence feature group.
  • the dimension of the first face space feature group is M
  • the dimension of the first face space feature group is determined according to the FaceNet model
  • the first face space The feature group is in the first space
  • the dimension of the face time series feature group is S
  • the dimension of the face time series feature group is determined according to the number of hidden layer neurons in the preset recurrent neural network model
  • the face sequence feature group is in the second space
  • the space mapping unit is specifically used for:
  • a fully connected layer is added to the preset recurrent neural network model, so that the fusion time series feature is mapped to the first space, and a face time series feature with the same dimension as the first face space feature group is obtained.
  • the preset cyclic neural network model is a two-layer long and short-term memory network model, and the network structure of each layer is the same.
  • the second feature extraction unit 703 further includes a second time-series feature extraction unit, a determination unit, and a second spatial mapping unit; wherein,
  • the second time series feature extraction unit is configured to input the first face space feature group into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes The time series feature corresponding to each face feature in the first face space feature group;
  • the determining unit is configured to determine a first face sequence feature in the face sequence feature group; wherein the first face sequence feature is any face sequence feature in the face sequence feature group ;
  • the second spatial mapping unit is configured to perform spatial mapping on the first face temporal feature group when the dimension of the first face spatial feature group is not equal to the dimension of the face temporal feature group , To obtain the second face sequence feature;
  • the identification unit 704 includes: a matching degree determination unit and a processing unit;
  • the matching degree determining unit is configured to continue to calculate the matching degree between the remaining second face sequence features and the face image when the matching degree is less than a preset threshold, until the matching degree is greater than When the threshold is preset, the target face corresponding to the second face sequence feature is determined.
  • the image extraction unit 701 is specifically configured to:
  • the first feature extraction unit 702 is specifically configured to:
  • the N frames of face images are input into the FaceNet model to extract spatial features in the N frames of face images.
  • the above device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of the units/modules in the foregoing embodiment is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules, or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the present invention also provides a face recognition device, which will be described in detail below with reference to the accompanying drawings:
  • FIG. 8 shows a schematic structural diagram of a face recognition device provided by an embodiment of the present invention.
  • the device 80 may include a processor 801, a memory 804, and a communication module 805.
  • the processor 801, the memory 804, and the communication module 805 may communicate with each other through a bus 806. connection.
  • the memory 804 may be a high-speed random access memory (RAM) memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 804 may also be at least one storage system located far away from the foregoing processor 801.
  • the memory 804 is used to store application program code, which may include an operating system, a network communication module, a user interface module, and a data processing program.
  • the communication module 805 is used to interact with external devices; the processor 801 is configured to call the program code, Perform the following steps:
  • N is a positive integer greater than 1;
  • first face spatial feature group includes a face feature corresponding to each frame of face image
  • the processor 801 extracts time series features from the first face space feature group to obtain a face time series feature group, including:
  • the first face space feature group is input into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes each face in the first face space feature group The timing characteristics corresponding to the characteristics;
  • spatial mapping is performed on the fused temporal feature to obtain a mapped face temporal feature group.
  • the dimension of the first face space feature group is M, the dimension of the first face space feature group is determined according to the FaceNet model, and the first face space feature group is in the first space;
  • the dimension of the face time series feature group is S, and the dimension of the face time series feature group is determined according to the number of hidden layer neurons in the preset recurrent neural network model;
  • the face time series The feature group is in the second space; when the dimension of the first face space feature group is not equal to the dimension of the face time series feature group, the processor 801 performs spatial mapping on the fused time series feature to obtain
  • the mapped face sequence feature group can include:
  • a fully connected layer is added to the preset recurrent neural network model, so that the fusion time series feature is mapped to the first space, and a face time series feature with the same dimension as the first face space feature group is obtained.
  • the preset cyclic neural network model is a two-layer long and short-term memory network model, and the network structure of each layer is the same.
  • the processor 801 extracts time series features from the first face spatial feature group to obtain a face time series feature group, which may further include:
  • the first face space feature group is input into a preset recurrent neural network model to output a face time series feature group, where the face time series feature group includes each face in the first face space feature group The timing characteristics corresponding to the characteristics;
  • the processor 801 matching the target face corresponding to the face sequence feature group in the face database may include:
  • the degree of matching is less than the preset threshold, continue to calculate the degree of matching between the remaining second face sequence features and the face image, until the degree of matching is greater than the preset threshold, determine the second person The target face corresponding to the face sequence feature.
  • the processor 801 extracts N frames of face images of the same target face in the video stream, which may include:
  • the processor 801 performs spatial feature extraction on the N frames of face images to obtain the first face spatial feature group, which may include:
  • the N frames of face images are input into the FaceNet model to extract spatial features in the N frames of face images.
  • the face recognition device 80 may be a terminal or a server.
  • its expression form may include a mobile phone, a tablet computer, a personal digital assistant (PDA), a mobile Internet device (Mobile Internet Device, MID). ) And other devices that can be used by users, which are not specifically limited in the embodiment of the present invention.
  • PDA personal digital assistant
  • MID mobile Internet Device
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and inherent logic, and should not be implemented in the embodiments of this application.
  • the process constitutes any limitation.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules and units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be Combined or can be integrated into another system, or some features can be ignored or not implemented.
  • the units described as separate components may be physically separated or not physically separated.
  • the components displayed as units may be physical units or not physical units, that is, they may be located in one place, or they may be distributed to Multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or software functional unit, which is not limited in this application.
  • the embodiment of the present application also provides a readable storage medium on which a computer program is stored, and when the computer program is executed, the face recognition method shown in FIGS. 4, 6A, and 6B is implemented. . If each component module of the above device is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium. Based on this understanding, the technical solution of the present application is essentially or The part that contributes to the prior art or all or part of the technical solution may be embodied in the form of a software product, and the computer product is stored in a computer-readable storage medium.
  • the foregoing computer-readable storage medium may be the internal storage unit of the face recognition device described in the foregoing embodiment, such as a hard disk or a memory.
  • the aforementioned computer-readable storage medium may also be an external storage device of the aforementioned face recognition device, such as an equipped plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, flash memory card (Flash Card) etc.
  • the aforementioned computer-readable storage medium may also include both an internal storage unit of the aforementioned face recognition device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the face recognition device.
  • the aforementioned computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the embodiment of the present application provides a face recognition method.
  • a face time sequence feature group can be obtained. Because the face time sequence feature group can reflect multiple frames of faces
  • the complementary information contained in the image can improve the accuracy of extracting facial features to improve the accuracy of face recognition.
  • the computer program can be stored in a computer readable storage medium. At this time, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage media include: ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸识别方法、装置、设备及计算机可读存储介质,其中方法包括:提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸,通过上述方法,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。

Description

人脸识别方法、装置、设备及计算机可读存储介质
本申请要求于2019年6月5日提交中国专利局,申请号为201910489828.0、发明名称为“人脸识别方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机视觉技术领域,尤其涉及一种人脸识别方法、装置、设备及计算机可读存储介质。
背景技术
人脸识别,是指基于人的脸部特征信息进行身份识别的一种生物识别技术,被广泛应用于多种领域,如小区门禁、公司考勤、司法刑侦等。结合实际应用可以知道,人脸是具有相当复杂细节变化的自然结构目标,所以,对于这类目标的检测与识别是一个富有挑战性的课题。具体来说,识别难度体现在:(1)由于外貌、表情、姿态、皮肤颜色等不同,人脸本身具有模式的可变性;(2)由于刘海、眼镜、胡须等附属物存在的不确定性而使人脸有不同的特征;(3)图像的大小、光源方向和光照强弱等都会影响人脸的最终表达。因此,正面/垂直/光线较好的人脸,可以比较容易的识别出来;而正侧面/歪斜/光线不好的人脸,一般无法识别。
目前,人脸识别的实现过程可以包括:首先,从视频流中截取包含人脸的图像,得到人脸图像;其次,提取人脸图像中的人脸特征;然后,对提取的人脸特征进行分类,完成人脸识别。
现有技术中,在人脸图像中提取人脸特征时,因特征提取率不高(例如,特征提取率不高可以表现为:提取的人脸特征单一、不准确、计算过程复杂等)容易带来人脸识别精度低的问题。
发明内容
本发明实施例提供一种人脸识别方法、装置、设备及计算机可读存储介质,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
为实现上述目的,第一方面,本发明实施例提供了一种人脸识别方法,该方法包括:
提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
在其中一种可能的实现方式中,所述在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,包括:
将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
在其中一种可能的实现方式中,所述第一人脸空间特征组的维数为M,所述第一人脸空间特征组的维数为根据FaceNet模型确定的,所述第一人脸空间特征组处于第一空间;所述人脸时序特征组的维数为S,所述人脸时序特征组的维数为根据所述预设的循环神经网络模型中的隐含层神经元数量确定的;所述人脸时序特征组处于第二空间;所述当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得 到映射后的人脸时序特征组,包括:
在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
在其中一种可能的实现方式中,所述预设的循环神经网络模型为双层的长短期记忆网络模型,且每层的网络结构均相同。
在其中一种可能的实现方式中,所述在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,还包括:
将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征;
当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征;
在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸,包括:
确定所述第二人脸时序特征与所述人脸数据库中存储的人脸图像的匹配度;
若所述匹配度小于预设阈值,则继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
在其中一种可能的实现方式中,所述提取视频流中同一目标人脸的N帧人脸图像,包括:
通过训练好的多任务级联卷积神经网络模型提取所述视频流中同一个目标人脸的N帧人脸图像。
在其中一种可能的实现方式中,所述对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,包括:
将所述N帧人脸图像输入FaceNet模型中,以提取所述N帧人脸图像中的空间特征。
实施本申请实施例,通过在第一人脸空间特征组中提取时序信息,可以得到人脸时序特征组,由于人脸时序特征组可以反映多帧人脸图像中包含的互补信息,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
第二方面,本发明实施例提供了一种人脸识别装置,该人脸识别装置包括:
图像提取单元,用于提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
第一特征提取单元,用于对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
第二特征提取单元,用于在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
识别单元,用于在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
在其中一种可能的实现方式中,所述第二特征提取单元包括第一时序特征提取单元、融合单元以及第一空间映射单元;其中,
所述第一时序特征提取单元,用于将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
所述融合单元,用于将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
所述第一空间映射单元,用于当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
在其中一种可能的实现方式中,所述第一人脸空间特征组的维数为M,所述第一人脸空间特征组的维数为根据FaceNet模型确定的,所述第一人脸空间特征组处于第一空间;所述人脸时序特征组的维数为S,所述人脸时序特征组的维数为根据所述预设的循环神经网络模型中的隐含层神经元数量确定的;所述人脸时序特征组处于第二空间;所述空间映射单元,具体用于:
在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
在其中一种可能的实现方式中,所述预设的循环神经网络模型为双层的长短期记忆网络模型,且每层的网络结构均相同。
在其中一种可能的实现方式中,所述第二特征提取单元还包括第二时序特征提取单元、确定单元以及第二空间映射单元;其中,
所述第二时序特征提取单元,用于将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
所述确定单元,用于在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征;
所述第二空间映射单元,用于当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征;
所述识别单元包括:匹配度确定单元、处理单元;
其中,所述匹配度确定单元,用于在所述匹配度小于预设阈值时,继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
在其中一种可能的实现方式中,所述图像提取单元,具体用于:
通过训练好的多任务级联卷积神经网络模型提取所述视频流中同一个目标人脸的N帧人脸图像。
在其中一种可能的实现方式中,所述第一特征提取单元,具体用于:
将所述N帧人脸图像输入FaceNet模型中,以提取所述N帧人脸图像中的空间特征。
第三方面,本发明实施例提供了一种人脸识别设备,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储支持人脸识别设备执行上述方法的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行上述第一方面的方法。
第四方面,本发明实施例提供了一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。
第五方面,本发明实施例提供了一种计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。
实施本申请实施例,通过在第一人脸空间特征组中提取时序信息,可以得到人脸时序特征组,并通过对人脸时序特征组中包含的人脸特征进行特征融合,得到融合时序特征,当第一人脸空间特征组的维数与人脸时序组的维数不相等时,通过空间映射可以得到人脸时序特征,由于人脸时序特征可以反映多帧人脸图像的多重属性,且人脸特征更为丰富,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种LSTM神经网络模型的内部处理逻辑的结构示意图;
图2是本申请实施例提供的一种级联形式的LSTM神经网络模型的结构示 意图;
图3是本申请实施例提供的一种人脸识别系统的架构示意图;
图4是本申请实施例提供的一种人脸识别方法的流程示意图;
图5是本申请实施例提供的一种双层LSTM模型的结构示意图;
图6A本申请实施例提供的一种针对人脸时序特征进行空间映射的方法流程示意图;
图6B本申请实施例提供的另一种针对人脸时序特征进行空间映射的方法流程示意图;
图7是本申请实施例提供的一种人脸识别装置的示意性框图;
图8是本申请另一实施例提供的一种人脸识别设备示意性框图。
具体实施方式
现对本申请中的部分用语进行解释说明,以便本领域技术人员理解。
(1)长短时记忆神经网络(Long-Short Term Memory,LSTM)
在本申请实施例中,LSTM模型是将输入门、输出门、遗忘门以及细胞(cell)结构,用于控制对历史信息的学习和遗忘,使模型适合处理长序列问题。请参见图1,是本申请实施例提供的一种LSTM的结构示意图。如图1所示,设时刻t,LSTM模型的记忆单元表示为C t,遗忘门的输出表示为f t,输入门的输出表示为i t,输出门的输出表示为O (t),三个门的元素值都在区间[0,1]。
具体来说,遗忘门是控制是否遗忘的,即以一定的概率控制是否遗忘上一层的隐藏细胞状态。在时刻t,对于遗忘门来说,其输入为上一序列的隐藏状态h(t-1)和本序列数据x(t),在激活函数的作用下,得到遗忘门的输出。具体地,这里的激活函数可以为sigmoid。
在实际应用中,遗忘门的处理逻辑可以表示为如下数学表达式(1):
f t=σ(W fh (t-1)+U fx (t)+b f)
其中,W f、U f、b f为线性关系的系数和偏置,σ表示激活函数sigmoid。
具体来说,输入门负责处理当前序列位置的输入,决定放什么新信息到“细胞状态”中,从图1中可以看出,输入门由两部分组成,第一部分在激活函数sigmoid的作用下,输出为i t,第二部分在激活函数tanh的作用下,输出为a t,这两部分结果进行相乘再去更新细胞状态。总的来说,输入门的作用是为了状态更新做准备。
在实际应用中,输入门的处理逻辑可以表示为如下数学表达式(2):
i t=σ(W ih (t-1)+U ix (t)+b i)
a t=tanh(W ah (t-1)+U ax (t)+b a)
其中,W i、U i、b i,W a、U a、b a为线性关系的系数和偏置,σ表示激活函数sigmoid。
在经过遗忘门和输入门后,可以确定传递信息的删除和增加,也即可以进行“细胞状态”的更新,由图1可以知道,细胞状态C (t)由两部分组成,第一部分是C (t-1)和遗忘门输出f t的乘积,第二部分是输入门i t和a t的乘积,也即可以表示为如下数学表达式(3):
C (t)=C (t-1)*f (t)+i (t)*a (t)
其中,*表示哈达玛积Hadamard积。
具体来说,这里,Hadamard积按位做乘法运算。
从图1中可以看出,隐藏状态h (t)的更新由两部分组成,第一部分是O (t),它由上一序列的隐藏状态h (t-1)和本序列数据x (t),以及激活函数sigmoid得到,第二部分由隐藏状态C (t)和激活函数tanh组成,其处理逻辑可以表示为如下数学表达式(4):
O (t)=σ(W Oh (t-1)+U Ox (t)+b O)
h (t)=O (t)*tanh(C (t))
在本申请实施例中,预设的循环神经网络模型可以包括但不限于LSTM神经网络模型,还可以包括卷积神经网络(Convolutional Neural Network,CNN)。具体地,LSTM神经网络模型为例,该模型的具体架构可以如图2所示,在该LSTM神经网络模型中,级联了多个细胞,例如,如图2所示的t个细胞,该模型可以提取多帧人脸图像中包含的时序信息。
(2)多任务级联卷积神经网络模型
在本申请实施例中,构建多任务级联卷积神经网络模型的实现过程可以包括:
A1、确定训练集样本数据;
A2、设计多任务级联卷积神经网络模型的具体结构,例如,多任务级联卷积神经网络模型中包含三个子网络,其中,级联的第一个子网络为小型卷积神经网络,级联的第二个子网络为中型卷积神经网络,级联的第三个子网络大型卷积神经网络。
A3、在多任务级联卷积神经网络模型内,对于每一个级联的子网络,采用多任务学习,例如,同时学习“人脸分类”、“边框回归”、“人脸关键点检测”、“人脸属性分析”四个任务;
A4、将训练集样本数据中的所有图像放进多任务级联卷积神经网络模型进行训练,得到训练好的多任务级联卷积神经网络模型。
那么,在得到训练好的多任务级联卷积神经网络模型之后,将视频流中截取的多张图像(也即测试集样本数据)输入训练好的多任务级联卷积神经网络模型中,以确定是否存在人脸以及确定人脸候选框。
为了便于更好的理解本申请实施例提供的一种人脸识别方法,下面结合图3所示的本申请实施例提供的一种人脸识别系统30来具体说明在实际应用中是如何实现人脸识别的,如图3所示,该人脸识别系统30集成了多任务级联卷积神经网络模型300、FaceNet模型301、预设的循环神经网络模型302、全连接层303以及人脸匹配模型304。
其中,多任务级联卷积神经网络模型300,用于提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
FaceNet模型301,用于对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
预设的循环神经网络模型302,用于在所述第一人脸空间特征组中提取时序信息,得到人脸时序特征组;
所述全连接层303,用于当所述第一人脸空间特征组的维数M不等于所述人脸时序特征组的维数S(例如M小于S)时,对第一人脸时序特征进行空间映射,以得到第二人脸时序特征;其中,所述第一人脸时序特征为人脸时序特征组中的任意一个人脸时序特征;
人脸匹配模型304,用于确定所述第二人脸时序特征与所述人脸数据库中存储的人脸图像的匹配度;若所述匹配度小于预设阈值,则继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
在其中一种可能的实现方式中,所述预设的循环神经网络模型302,还用于对人脸时序特征组中包含的人脸特征进行特征融合处理,得到融合时序特征。在这种情况下,全连接层303具体用于:
当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
在这种情况下,人脸识别模型304具体用于:
在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
基于图3所示的人脸识别系统的架构示意图,下面将结合图4所示的本申请实施例提供的一种人脸识别方法的流程示意图具体说明如何实现人脸识别,可以包括但不限于如下步骤:
步骤S401、提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数。
在本申请实施例中,设备可以在原始视频中按照时间顺序提取视频中包含人脸的视频帧,从而可以得到包含人脸图像的视频流。例如,视频流中包含人物A、人物B、人物C以及人物D各自对应的人脸图像。之后,设备可以在视频流中按照时间顺序截取同一目标人脸(例如,人物A)的N帧人脸图像。具体地,N帧人脸图像是通过对视频流中的各帧图像进行人脸检测和人脸跟踪处理所确定的包含同一目标人脸的图像帧。可以理解的是,在视频流中截取得到的N帧人脸图像在时间维度上具有关联性,也即:N帧人脸图像具有时序性。
在本申请实施例中,可以采用训练好的多任务级联卷积神经网络模型对视频流中的同一目标人脸的人脸图像进行人脸检测,在检测到同一目标人脸的人脸图像时,确定该人脸图像的人脸候选框,然后根据人脸候选框对人脸图像进行裁剪,以去除复杂环境背景对识别效果的影响。
步骤S402、对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征。
在本申请实施例中,可以采用FaceNet模型提取N帧人脸图像中各自包含的人脸空间特征,继而可以生成N帧人脸图像各自对应的N个特征向量。具体地,这N个特征向量组成了第一人脸空间特征组。需要说明的是,通过FaceNet模型提取得到的第一人脸空间特征组为维数(Q)为128的高阶特征。由于采用FaceNet模型可以获取到人脸图像的多维度矩阵,这多维矩阵可以反映人脸的更多细节特点,从而可以满足人脸识别精度的需求。
步骤S403、在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组。
在本申请实施例中,人脸时序特征组中包含的人脸时序特征的数量为N(也即,人脸时序特征的数量与人脸图像的帧数相等)。
在本申请实施例中,设备可以将第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组;其中,人脸时序特征组包括第一人脸空间特征组的每个人脸特征对应的时序特征。
可选的,预设的循环神经网络模型可以为LSTM模型。
进一步可选的,LSTM模型的层数大于等于2,并且每层的网络结构均相同。参见图5,是本申请实施例提供的一种双层LSTM模型的结构示意图。在实际应用中,将第一人脸空间特征组中的人脸特征输入双层LSTM模型以提取第一人脸空间特征组中的时序信息时,第1层LSTM的输出作为第2层LSTM的输入。如图5所示,第1层LSTM模型中级联了t个细胞,这t个细胞分别为细胞1,细胞2,……、细胞t;第2层LSTM模型中级联了t个细胞,这t个细胞分别为细胞1,细胞2,……、细胞t。以第1层LSTM模型中的细胞1为例,其输入为x10,其输出x20作为第2层LSTM模型中的细胞1的输入。
这里,在LSTM模型的层数为大于等于2的情况下,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
如前所述,N帧人脸图像具有时序性。当采用LSTM模型提取第一人脸空间特征组中的时序信息时,将LSTM模型的时间步长设置为N(这里,时间步长等于人脸图像的帧数),即采用N帧人脸图像各自对应的人脸特征作为LSTM模型的输入进行时序信息的提取,进一步地,将LSTM模型中的隐含层神经元的数量设置为S(S为大于1的正整数),例如,S=256,S=512等等,本申请实施例不作具体限定。那么,在经过LSTM模型计算输出之后,可以得到一组带有时序信息的人脸时序特征组,其中,人脸时序特征组的长度为N,人脸时序特征组中的每个人脸特征的维数为S,即人脸时序特征组中的每个人脸特征的维数与LSTM模型中隐含层神经元的数量S相等。
需要说明的是,当在实际应用中采用LSTM模型提取第一人脸空间特征中的时序特征时,在N=10,S=512的情况下,通过这一实现方式,可以提高人脸特征提取过程中的准确度,以达到提高人脸识别精度的目的。
如前所述,通过FaceNet模型提取得到的第一人脸空间特征组为维数(Q)为128的高阶特征(其中,第一人脸空间特征处于第一空间),而人脸时序特征组中的每个人脸特征的维数由LSTM模型中隐含层神经元的数量S决定,且 人脸时序特征组处于第二空间,那么,这也就意味着第一人脸空间特征组的每个人脸特征的维数与人脸时序特征组中的每个人脸特征的维数之间容易出现维数相等、维数不相等的两种情形,以下对这两种情形进行具体阐述:
在第一种情形下,第一人脸空间特征组中的每个人脸特征的维数为128,LSTM模型中隐含层神经元的数量S=128(也即人脸时序特征组中的每个人脸特征的维数为128)时,此时,无需在LSTM模型后添加一个全连接层,这也意味着无需将处于第二空间的人脸时序特征组映射到第一空间。此时,在人脸数据库中匹配人脸时序特征组对应的目标人脸,其具体实现请参考后续步骤S404。
在第二种情形下,第一人脸空间特征组中的每个人脸特征的维数M为128,LSTM模型中隐含层神经元的数量S不等于128(也即人脸时序特征组中的每个人脸特征的维数不等于128),例如,当M小于S时,此时,在LSTM模型后添加一个全连接层,并将其隐含层神经元的数量设置为128,以实现将处于第二空间的人脸时序特征组映射到处于第一空间,得到映射后的人脸时序特征组。在第二种情形下,可以包括以下两种不同的实现方式:
在一种可能的方式中,可以将融合时序特征进行空间映射,以得到映射后的人脸时序特征组,这里,融合时序特征为对人脸时序特征组中时序特征进行融合处理得到的;在另一种可能的方式中,可以将第一人脸时序特征进行空间映射,以得到映射后的第二人脸时序特征,这里,第一人脸时序特征为人脸时序特征组中的任意一个人脸时序特征。接下来对这两种实现方式进行具体阐述。
第一种实现方式:将融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
具体实现中,可以通过执行如下步骤(参见图6A)得到映射后的人脸时序特征组:
步骤B1、将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组 中每个人脸特征对应的的时序特征;
步骤B2、将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
在本申请实施例中,对人脸时序特征组中的时序特征进行融合处理的技术手段可以包括但不限于:对时序特征取平均、归一化等操作。
如前所述,人脸时序特征组中包含的时序特征的数量为N,当对人脸时序特征组中的时序特征进行融合处理所得到的融合时序特征的数量为1个。可以理解的是,融合时序特征可以更好的反映多帧人脸图像的多重属性,且人脸特征更为丰富。
步骤B3、当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
在本申请实施例中,第一人脸空间特征组的维数M与人脸时序特征组的维数S不相等可以包括:例如,M小于S。
具体实现中,所述当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组,包括:
在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
例如,预设的循环神经网络模型为LSTM模型,设置LSTM模型的时间步长N=10,LSTM模型中包含的隐含层神经元的数量为512,通过FaceNet模型提取得到的第一人脸空间特征组为维数(Q)为128的高阶特征,当在LSTM模型后添加一个全连接层时,将当前网络结构中隐含层神经元的数量设置为128,在这种情况下,可以实现将512维的融合时序特征映射到第一空间,并得到128维的人脸时序特征。此时,在人脸数据库中匹配人脸时序特征对应的目标人脸,其具体实现请参考后续步骤S404。可以理解的是,在这一实现方式中, 由于经过空间映射后的人脸时序特征可以更好的反映多帧人脸图像的多重属性,且人脸特征更为丰富,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
第二种实现方式:将第一人脸时序特征进行空间映射,以得到映射后的第二人脸时序特征。
具体实现中,可以通过执行如下步骤(参见图6B)得到映射后的第二人脸时序特征:
C1、将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征。
C2、在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征。
C3、当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征。
例如,预设的循环神经网络模型为LSTM模型,设置LSTM模型的时间步长N=10,LSTM模型中包含的隐含层神经元的数量为512,通过FaceNet模型提取得到的第一人脸空间特征组为维数(Q)为128的高阶特征,当在LSTM模型后添加一个全连接层时,将当前网络结构中隐含层神经元的数量设置为128,在这种情况下,可以实现将512维的第一人脸时序特征(这里,第一人脸时序特征为人脸时序特征组中的任意一个人脸时序特征)映射到第一空间,并得到128维的第二人脸时序特征。此时,在人脸数据库中匹配第二人脸时序特征对应的目标人脸,其具体实现请参考后续步骤S404。
步骤S404、在人脸数据库中匹配与所述人脸时序特征对应的所述目标人脸。
在本申请实施例中,人脸数据库中存储有多个人物的人脸图像,例如,数据库中存储有目标人脸A、目标人脸B、目标人脸C以及目标人脸D各自对应 的人脸图像。
可选的,人脸数据库中存储的每个人物的人脸图像为正脸图像。
在实际应用中,可以提取数据库中每个人物的人脸图像的特征,得到注册特征向量。这里,注册特征向量为目标人脸在数据库中的人脸图像的一种具体表现形式。可以理解的是,不同人物的人脸图像,提取得到的注册特征向量不同。例如,人脸图像与注册特征向量之间的对应关系可以如表1所示:
表1
人物 注册特征向量
目标人脸A 注册特征向量A
目标人脸B 注册特征向量B
在本申请实施例中,可以通过计算人脸时序特征组中的特征向量与目标人脸在数据库中的注册特征向量之间的匹配度来实现目标人脸的识别。具体地,计算人脸时序特征组中的特征向量与注册特征向量之间的欧式距离,当二者之间的欧式距离小于设定好的阈值(例如,该阈值为0.2),则识别为同一个人;若否,则识别为不同的人。需要说明的是,在本申请实施例中,人脸时序特征组中的特征向量与注册特征向量之间的欧式距离越小表示匹配度越高。
如前所述,在上述第一种实现方式中,可以通过计算融合时序特征与注册特征向量之间的匹配度来实现目标人脸的识别。
在上述第二种实现方式中,可以通过计算第二人脸时序特征与注册特征向量之间的匹配度来实现目标人脸的识别。在这一实现方式中,考虑到当第二人脸时序特征与数据库中存储的人脸图像之间的匹配度小于预设阈值时,此时,继续计算剩余的第二人脸时序特征与人脸图像的匹配度,直至匹配度大于预设阈值,从而完成目标人脸的识别。
例如,人脸时序特征组中包括10个人脸时序特征,分别为:人脸时序特征1、人脸时序特征2、......、人脸时序特征10,其中,设备确定经过空间映射后的人脸时序特征1与注册特征向量(例如,目标人脸C)之间的匹配度为0.6, 该匹配度小于预设阈值0.8,此时,设备继续计算经过空间映射后的人脸时序特征2与注册向量(例如,目标人脸D)之间的匹配度为0.9,该匹配度大于预设阈值0.8,此时,可以识别出目标人脸D。此时,无需计算剩余的其他人脸时序特征与人脸图像的匹配度。
实施本申请实施例,通过在第一人脸空间特征组中提取时序信息,可以得到人脸时序特征组,由于人脸时序特征组可以反映多帧人脸图像中包含的互补信息,可以提高人脸特征提取过程中的准确性,以提高人脸识别的精度。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本披露并不受所描述的动作顺序的限制,因为依据本披露,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本披露所必须的。
进一步需要说明的是,虽然图4、图6A、图6B的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图4、图6A、图6B中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
基于本申请以上描述的方法,本申请实施例还提供了一种人脸识别装置70,如图7所示,所述人脸识别装置70可以包括:
图像提取单元701,用于提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
第一特征提取单元702,用于对所述N帧人脸图像进行空间特征提取,得 到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
第二特征提取单元703,用于在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
识别单元704,用于在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
在其中一个可能的实现方式中,所述第二特征提取单元703包括第一时序特征提取单元、融合单元以及第一空间映射单元;其中,
所述第一时序特征提取单元,用于将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
所述融合单元,用于将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
所述第一空间映射单元,用于当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
在其中一个可能的实现方式中,所述第一人脸空间特征组的维数为M,所述第一人脸空间特征组的维数为根据FaceNet模型确定的,所述第一人脸空间特征组处于第一空间;所述人脸时序特征组的维数为S,所述人脸时序特征组的维数为根据所述预设的循环神经网络模型中的隐含层神经元数量确定的;所述人脸时序特征组处于第二空间;所述空间映射单元,具体用于:
在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
在其中一种可能的实现方式中,所述预设的循环神经网络模型为双层的长短期记忆网络模型,且每层的网络结构均相同。
所述第二特征提取单元703还包括第二时序特征提取单元、确定单元以及 第二空间映射单元;其中,
所述第二时序特征提取单元,用于将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
所述确定单元,用于在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征;
所述第二空间映射单元,用于当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征;
所述识别单元704包括:匹配度确定单元、处理单元;
其中,所述匹配度确定单元,用于在所述匹配度小于预设阈值时,继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
在其中一种可能的实现方式中,所述图像提取单元701,具体用于:
通过训练好的多任务级联卷积神经网络模型提取所述视频流中同一个目标人脸的N帧人脸图像。
在其中一种可能的实现方式中,所述第一特征提取单元702,具体用于:
将所述N帧人脸图像输入FaceNet模型中,以提取所述N帧人脸图像中的空间特征。
应该理解,上述的装置实施例仅是示意性的,本披露的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。
为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了一种人脸识别设备,下面结合附图来进行详细说明:
如图8示出的本发明实施例提供的人脸识别设备的结构示意图,设备80 可以包括处理器801、存储器804和通信模块805,处理器801、存储器804和通信模块805可以通过总线806相互连接。存储器804可以是高速随机存储记忆体(Random Access Memory,RAM)存储器,也可以是非易失性的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器804可选的还可以是至少一个位于远离前述处理器801的存储系统。存储器804用于存储应用程序代码,可以包括操作系统、网络通信模块、用户接口模块以及数据处理程序,通信模块805用于与外部设备进行信息交互;处理器801被配置用于调用该程序代码,执行以下步骤:
提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
其中,处理器801在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,包括:
将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
其中,所述第一人脸空间特征组的维数为M,所述第一人脸空间特征组的维数为根据FaceNet模型确定的,所述第一人脸空间特征组处于第一空间;所述人脸时序特征组的维数为S,所述人脸时序特征组的维数为根据所述预设的循环神经网络模型中的隐含层神经元数量确定的;所述人脸时序特征组处于第 二空间;处理器801在所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组,可以包括:
在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
其中,所述预设的循环神经网络模型为双层的长短期记忆网络模型,且每层的网络结构均相同。
其中,处理器801在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,还可以包括:
将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征;
当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征;
处理器801在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸,可以包括:
确定所述第二人脸时序特征与所述人脸数据库中存储的人脸图像的匹配度;
若所述匹配度小于预设阈值,则继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
其中,处理器801提取视频流中同一目标人脸的N帧人脸图像,可以包括:
通过训练好的多任务级联卷积神经网络模型提取所述视频流中同一个目标人脸的N帧人脸图像。
其中,处理器801对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,可以包括:
将所述N帧人脸图像输入FaceNet模型中,以提取所述N帧人脸图像中的空间特征。
在具体实现中,人脸识别设备80可以为终端或者服务器,具体地,其表现形式可以包括移动手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、移动互联网设备(Mobile Internet Device,MID)等各种用户可以使用的设备,本发明实施例不作具体限定。
应理解,本申请实施例提供的方法可以适用的应用场景只是作为一种示例,实际应用中并不限于此。
还应理解,本申请中涉及的第一、第二、第三以及各种数字编号仅仅为描述方便进行的区分,并不用来限制本申请的范围。
应理解,本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,一般表示前后关联对象是一种“或”的关系。
此外,在本申请的各个实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚的了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分 配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块和单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
所述作为分离部件说明的单元可以是物理上分开的,也可以不是物理上分开的,作为单元显示的部件可以是物理单元,也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
此外,在本申请各个实施例中所涉及的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现,本申请对此不作限定。
在本实施例中,本申请实施例还提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被执行时实现上述图4、图6A、图6B所示的人脸识别方法。上述装置的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中,基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机产品存储在计算机可读存储介质中。
上述计算机可读存储介质可以是前述实施例所述的人脸识别设备的内部存储单元,例如硬盘或内存。上述计算机可读存储介质也可以是上述人脸识别设备的外部存储设备,例如配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,上述计算机可读存储介质还可以既包括上述人脸识别设备的内部存储单元也包括外部存储设备。上述计算机可读存储介质用于存储上述计算机程序以及上述人脸识别设备所需的其他程序和数据。上述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
由上可见,本申请实施例提供一种人脸识别方法,通过在第一人脸空间特征组中提取时序信息,可以得到人脸时序特征组,由于人脸时序特征组可以反映多帧人脸图像中包含的互补信息,可以提高提取人脸特征的准确性,以提高人脸识别的精度。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可通过计算机程序来指令相关的硬件来完成,该计算机的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可存储程序代码的介质。

Claims (10)

  1. 一种人脸识别方法,其特征在于,包括:
    提取视频流中同一目标人脸的N帧人脸图像,其中,所述N帧人脸图像具有时序性;N为大于1的正整数;
    对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
    在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
    在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
  2. 根据权利要求1所述的方法,其特征在于,所述在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,包括:
    将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
    将所述人脸时序特征组中的时序特征经过融合处理得到融合时序特征;
    当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组。
  3. 根据权利要求2所述的方法,其特征在于,所述第一人脸空间特征组的维数为M,所述第一人脸空间特征组的维数为根据FaceNet模型确定的,所述第一人脸空间特征组处于第一空间;所述人脸时序特征组的维数为S,所述人脸时序特征组的维数为根据所述预设的循环神经网络模型中的隐含层神经元数量确定的;所述人脸时序特征组处于第二空间;所述当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述融合时序特征进行空间映射,以得到映射后的人脸时序特征组,包括:
    在预设的循环神经网络模型中添加全连接层,以使所述融合时序特征映射到所述第一空间,并得到与所述第一人脸空间特征组维数相同的人脸时序特征。
  4. 根据权利要求2所述的方法,其特征在于,所述预设的循环神经网络模 型为双层的长短期记忆网络模型,且每层的网络结构均相同。
  5. 根据权利要求2所述的方法,其特征在于,所述在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组,还包括:
    将所述第一人脸空间特征组输入到预设的循环神经网络模型中,以输出人脸时序特征组,其中,所述人脸时序特征组包括第一人脸空间特征组中每个人脸特征对应的的时序特征;
    在所述人脸时序特征组中确定第一人脸时序特征;其中,所述第一人脸时序特征为所述人脸时序特征组中的任意一个人脸时序特征;
    当所述第一人脸空间特征组的维数与所述人脸时序特征组的维数不相等时,对所述第一人脸时序特征进行空间映射,以得到第二人脸时序特征;
    在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸,包括:
    确定所述第二人脸时序特征与所述人脸数据库中存储的人脸图像的匹配度;
    若所述匹配度小于预设阈值,则继续计算剩余的所述第二人脸时序特征与所述人脸图像的匹配度,直至所述匹配度大于预设阈值时,确定所述第二人脸时序特征对应的所述目标人脸。
  6. 根据权利要求1所述的方法,其特征在于,所述提取视频流中同一目标人脸的N帧人脸图像,包括:
    通过训练好的多任务级联卷积神经网络模型提取所述视频流中同一个目标人脸的N帧人脸图像。
  7. 根据权利要求3所述的方法,其特征在于,所述对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,包括:
    将所述N帧人脸图像输入所述FaceNet模型中,以提取所述N帧人脸图像中的空间特征。
  8. 一种人脸识别装置,其特征在于,包括:
    图像提取单元,用于提取视频流中同一目标人脸的N帧人脸图像,其中, 所述N帧人脸图像具有时序性;N为大于1的正整数;
    第一特征提取单元,用于对所述N帧人脸图像进行空间特征提取,得到第一人脸空间特征组,其中,所述第一人脸空间特征组包括每帧人脸图像对应的人脸特征;
    第二特征提取单元,用于在所述第一人脸空间特征组中提取时序特征,得到人脸时序特征组;
    识别单元,用于在人脸数据库中匹配与所述人脸时序特征组对应的所述目标人脸。
  9. 一种人脸识别设备,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如权利要求1-7任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2019/121347 2019-06-05 2019-11-27 人脸识别方法、装置、设备及计算机可读存储介质 WO2020244174A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910489828.0A CN110363081B (zh) 2019-06-05 2019-06-05 人脸识别方法、装置、设备及计算机可读存储介质
CN201910489828.0 2019-06-05

Publications (1)

Publication Number Publication Date
WO2020244174A1 true WO2020244174A1 (zh) 2020-12-10

Family

ID=68215621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121347 WO2020244174A1 (zh) 2019-06-05 2019-11-27 人脸识别方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110363081B (zh)
WO (1) WO2020244174A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011357A (zh) * 2021-03-26 2021-06-22 西安电子科技大学 基于时空融合的深度伪造人脸视频定位方法
CN113361456A (zh) * 2021-06-28 2021-09-07 北京影谱科技股份有限公司 一种人脸识别方法和系统
CN114613058A (zh) * 2022-03-25 2022-06-10 中国农业银行股份有限公司 一种具有考勤功能的门禁系统、考勤方法和相关装置
CN114821844A (zh) * 2021-01-28 2022-07-29 深圳云天励飞技术股份有限公司 基于人脸识别的考勤方法、装置、电子设备和存储介质
WO2023213095A1 (zh) * 2022-05-06 2023-11-09 深圳云天励飞技术股份有限公司 数据归档方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363081B (zh) * 2019-06-05 2022-01-11 深圳云天励飞技术有限公司 人脸识别方法、装置、设备及计算机可读存储介质
CN112381448B (zh) * 2020-11-30 2023-10-13 深圳云天励飞技术股份有限公司 基于人脸时空特征的教学质量评估方法、装置及电子设备
CN112734682B (zh) * 2020-12-31 2023-08-01 杭州芯炬视人工智能科技有限公司 人脸检测表面向量数据加速方法、系统、计算机设备和存储介质
CN117274727B (zh) * 2023-10-25 2024-04-12 荣耀终端有限公司 生物特征信息的处理方法、电子设备及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678250A (zh) * 2015-12-31 2016-06-15 北京小孔科技有限公司 视频中的人脸识别方法和装置
US20170083755A1 (en) * 2014-06-16 2017-03-23 Beijing Sensetime Technology Development Co., Ltd Method and a system for face verification
CN107895160A (zh) * 2017-12-21 2018-04-10 曙光信息产业(北京)有限公司 人脸检测与识别装置及方法
CN110363081A (zh) * 2019-06-05 2019-10-22 深圳云天励飞技术有限公司 人脸识别方法、装置、设备及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4594945B2 (ja) * 2007-02-13 2010-12-08 株式会社東芝 人物検索装置および人物検索方法
EP3467712B1 (en) * 2017-10-06 2023-04-26 Sensing Feeling Limited Methods and systems for processing image data
CN108960080B (zh) * 2018-06-14 2020-07-17 浙江工业大学 基于主动防御图像对抗攻击的人脸识别方法
CN109086707A (zh) * 2018-07-25 2018-12-25 电子科技大学 一种基于DCNNs-LSTM模型的表情追踪方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170083755A1 (en) * 2014-06-16 2017-03-23 Beijing Sensetime Technology Development Co., Ltd Method and a system for face verification
CN105678250A (zh) * 2015-12-31 2016-06-15 北京小孔科技有限公司 视频中的人脸识别方法和装置
CN107895160A (zh) * 2017-12-21 2018-04-10 曙光信息产业(北京)有限公司 人脸检测与识别装置及方法
CN110363081A (zh) * 2019-06-05 2019-10-22 深圳云天励飞技术有限公司 人脸识别方法、装置、设备及计算机可读存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821844A (zh) * 2021-01-28 2022-07-29 深圳云天励飞技术股份有限公司 基于人脸识别的考勤方法、装置、电子设备和存储介质
CN114821844B (zh) * 2021-01-28 2024-05-07 深圳云天励飞技术股份有限公司 基于人脸识别的考勤方法、装置、电子设备和存储介质
CN113011357A (zh) * 2021-03-26 2021-06-22 西安电子科技大学 基于时空融合的深度伪造人脸视频定位方法
CN113011357B (zh) * 2021-03-26 2023-04-25 西安电子科技大学 基于时空融合的深度伪造人脸视频定位方法
CN113361456A (zh) * 2021-06-28 2021-09-07 北京影谱科技股份有限公司 一种人脸识别方法和系统
CN113361456B (zh) * 2021-06-28 2024-05-07 北京影谱科技股份有限公司 一种人脸识别方法和系统
CN114613058A (zh) * 2022-03-25 2022-06-10 中国农业银行股份有限公司 一种具有考勤功能的门禁系统、考勤方法和相关装置
WO2023213095A1 (zh) * 2022-05-06 2023-11-09 深圳云天励飞技术股份有限公司 数据归档方法及装置

Also Published As

Publication number Publication date
CN110363081A (zh) 2019-10-22
CN110363081B (zh) 2022-01-11

Similar Documents

Publication Publication Date Title
WO2020244174A1 (zh) 人脸识别方法、装置、设备及计算机可读存储介质
WO2021077984A1 (zh) 对象识别方法、装置、电子设备及可读存储介质
Khorrami et al. How deep neural networks can improve emotion recognition on video data
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN108388876A (zh) 一种图像识别方法、装置以及相关设备
CN109902546A (zh) 人脸识别方法、装置及计算机可读介质
CN109766840A (zh) 人脸表情识别方法、装置、终端及存储介质
CN112464865A (zh) 一种基于像素和几何混合特征的人脸表情识别方法
CN111133453A (zh) 人工神经网络
WO2020238353A1 (zh) 数据处理方法和装置、存储介质及电子装置
Santhalingam et al. Sign language recognition analysis using multimodal data
CN110458235B (zh) 一种视频中运动姿势相似度比对方法
CN111108508B (zh) 脸部情感识别方法、智能装置和计算机可读存储介质
Núñez et al. Multiview 3D human pose estimation using improved least-squares and LSTM networks
Huo et al. Iterative feedback control-based salient object segmentation
Elharrouss et al. FSC-set: counting, localization of football supporters crowd in the stadiums
CN112906520A (zh) 一种基于姿态编码的动作识别方法及装置
Han et al. Robust human action recognition using global spatial-temporal attention for human skeleton data
CN109242309A (zh) 参会用户画像生成方法、装置、智能会议设备及存储介质
CN111666976A (zh) 基于属性信息的特征融合方法、装置和存储介质
CN113076905B (zh) 一种基于上下文交互关系的情绪识别方法
CN116205723A (zh) 基于人工智能的面签风险检测方法及相关设备
US20220172271A1 (en) Method, device and system for recommending information, and storage medium
Nikolov et al. Skeleton-based human activity recognition by spatio-temporal representation and convolutional neural networks with application to cyber physical systems with human in the loop

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932060

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932060

Country of ref document: EP

Kind code of ref document: A1