WO2023041963A1 - Face identification methods and apparatuses - Google Patents

Face identification methods and apparatuses Download PDF

Info

Publication number
WO2023041963A1
WO2023041963A1 PCT/IB2021/058720 IB2021058720W WO2023041963A1 WO 2023041963 A1 WO2023041963 A1 WO 2023041963A1 IB 2021058720 W IB2021058720 W IB 2021058720W WO 2023041963 A1 WO2023041963 A1 WO 2023041963A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
image
target
facial
face
Prior art date
Application number
PCT/IB2021/058720
Other languages
French (fr)
Inventor
Jiabin MA
Chunya LIU
Jinghuan Chen
Jinyi Wu
Original Assignee
Sensetime International Pte. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sensetime International Pte. Ltd. filed Critical Sensetime International Pte. Ltd.
Priority to AU2021240278A priority Critical patent/AU2021240278A1/en
Priority to CN202180002767.6A priority patent/CN113785304A/en
Publication of WO2023041963A1 publication Critical patent/WO2023041963A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the embodiments of the present disclosure relate to the field of image processing technology, and in particular, to a face identification method and apparatus.
  • Face identification is the most basic and important part of intelligent video analysis.
  • the target object in the video is needed to be tracked for a long time, and when receiving an identification request issued by the upper-layer application, the face identification and identity determination are performed on the tracked target object.
  • the embodiments of the present disclosure provide at least a method and apparatus for face identification.
  • a face identification method including:
  • a face identification apparatus including:
  • a target facial image sequence determination module configured to determining a facial image sequence of a target face, where the facial image sequence includes multiple facial region images of the target face;
  • a target facial region image determination module configured to perform face identification on at least one facial region image in the target facial image sequence, and determining, based on a confidence of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence;
  • an identity information determination module configured to determine, based on a face identification result of the target facial region image, identity information of the target face.
  • an electronic device including a memory and a processor, where the memory is configured to store computer instructions capable of being run on the processor, and the processor implements the face identification method according to the first aspect when executing the computer instructions.
  • a computer-readable storage medium storing a computer program, where steps of the face identification method according to the first aspect are implemented when the program is executed by a processor.
  • a target facial region image is determined. Further, identify information of the target face is determined.
  • a target facial image sequence is working as a quality selection model. In a case that one of facial region images is failed to be identified, the face identification can be continued to perform on other facial region images in the target facial image sequence, which significantly improves identification success rate for facial region images, that is, a recall rate.
  • the method described in this present disclosure does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
  • FIG. 1 is a flowchart of a face identification method illustrated in an embodiment of the present disclosure.
  • FIG. 1A is a flowchart of a method for determining a target facial image sequence of a target face illustrated in an embodiment of the present disclosure
  • FIG. 2 is a flowchart of another face identification method illustrated in an embodiment of the present disclosure
  • FIG. 3 is a block diagram of a face identification apparatus illustrated in an embodiment of the present disclosure
  • FIG. 4 is a block diagram of another face identification apparatus illustrated in an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a hardware structure of an electronic device illustrated in embodiments of the present disclosure.
  • first, second, third, etc. in the present disclosure, the information should not be limited to these terms. These terms are used only to distinguish the same type of information from each other.
  • first information may also be referred to as the second information without departing from the scope of the present disclosure, and similarly, the second information may also be referred to as the first information.
  • word “if’ as used herein may be interpreted as “when” or “as” or “determining in response to”.
  • the key step is how to select a high-quality facial image for face identification from a large number of facial image sequences in the video. If the quality of the to-be-identified facial image is too poor, the to-be-identified facial image cannot be matched with any identity information in the face database, so that an error result that the person cannot be found is obtained, that is, the facial image cannot be recalled.
  • the traditional method may train a face quality model by collecting large amounts of data to complete the selection of a high-quality facial image from the facial image sequence, and then use the selected high-quality facial image to perform the face identification.
  • the environment around the target object is complex, and the target object can have many movements, such as turning the head, bowing the head, covering the face by a hand, looking in the mirror or wearing a mask, etc. It is difficult to cover all scenarios by data collection to train the face quality model.
  • the embodiments of the present disclosure provide a face identification method.
  • the method can reduce the impact of low-quality facial image without collecting a large amount of data, significantly improve the accuracy of face identification, and ensure the recall rate of facial image.
  • FIG. 1 is a flowchart of face identification method shown in an embodiment of the present disclosure, and the method includes the following steps:
  • a target facial image sequence for a target face is determined, where the target facial image sequence includes multiple facial region images for the target face.
  • a facial region image can be an image involving the target face of the target object detected from the tracking video when the target object is being tracked, or it can be an image involving the target face of the target object obtained by shooting the target object.
  • the target face is a face of a designated or undesignated person to be identified.
  • the face identification method of this embodiment can be executed by a face identification apparatus, for example, can be executed by a terminal device, a servicer or other processing equipment, where the terminal device can be a user device, a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, personal digital processing, a handheld device, a computing device, an on-board device, a wearable device, etc.
  • the face identification method can be implemented by a processor invoking computer-readable instructions stored in a memory.
  • a specific method of determining the target facial image sequence is not limited.
  • the target facial image sequence can be selected from multiple facial image sequences for multiple different faces maintained by the device in advance, or also can be directly acquired from other devices.
  • determining the facial image sequence for the target face can be executed at any time.
  • the method in this embodiment may be executed when receiving a face identification request that includes information about the target face; or, the method can also be executed when receiving a track video or continuously shooting images.
  • step 104 face identification is performed on at least one facial region image in the target facial image sequence, and based on a confidence level of a face identification result of the at least one facial region image, a target facial region image is determined from the target facial image sequence.
  • one facial region image can be taken to compare with images of multiple faces respectively in a face database to get comparison results.
  • Each of the comparison results includes a confidence level.
  • the confidence level represents the probability that the face in the facial region image and a face in a compared image of the face database are of an identical object.
  • a comparison result with the highest confidence level is determined as the face identification result of the facial region image.
  • the confidence level of the face identification result of the facial region image reaches a preset confidence level threshold, that is, a face matched with the facial region image is found from the face database, so that face identification succeeds, then determine the facial region image is a target facial region image. If the confidence level of the face identification result of the facial region image doesn’t reach a preset confidence level threshold, that is, a face matched with the facial region image is not found from the face database, so that face identification fails, then continue select another facial image which hasn’t been selected before in the target facial image sequence to perform face identification until face identification succeeds.
  • a facial region image with a face identification result closest to the successful identification result can be selected as the target facial region image in all face identification results. That is, a facial region image with a highest confidence of face identification result as the target facial region image, or face identification can be ended.
  • the order of face identification in the facial region image of the facial image sequence can be arbitrary, such as randomly selecting an image for identification from the target facial image sequence sequentially; or can also be in accordance with the preset order, such as in accordance with image quality from high-quality to low-quality, a facial region image is sequentially selected in the facial image sequence for face identification.
  • a specific face identification method is not limited.
  • the identification may be performed through a neural network, or be performed through other methods.
  • step 106 based on a face identification result of the target facial region image, identity information of the target face is determined.
  • the face identification result of the target facial region image includes a facial image corresponding to the target facial region image in the face database, and the identity information associated with the facial image is determined as the identity information of the target face.
  • the identity information can include the ID number, name, registered account number, etc. pre-stored in the face database.
  • a target facial region image is determined. Further, identify information of the target face is determined. It is possible to select a low-quality image for identification from the multiple facial region images, but when one of the multiple facial region images cannot be identified, a next facial region image in the target facial image sequence can be used to continue the identification. It is equivalent to that a target facial image sequence is working as a quality selection model, which significantly improves identification success rate for facial region images, that is, a recall rate.
  • the method does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
  • the method can be used to identify a target face in a video stream.
  • determining the target facial image sequence for the target face may include that multiple facial region images of the target face are extracted from multi-frame images of the video stream.
  • the video stream can be a recorded video or a real-time video.
  • the multi-frame images of the video stream involve the target face of a target object.
  • the video stream may be a tracking video obtained by tracking the target object’s face.
  • the video stream can be acquired in advanced. Based on each of frames of images in an acquired video stream, faces in the video stream are detected. For each of multiple faces detected in the video stream, the detected face is tracked in the video stream to determine facial region images for the face in multi-frame images of the video stream; based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face is generated. Therefore, multiple facial image sequences for different faces presented in the video stream can be acquired in advance, which is beneficial to quickly determine the target facial image sequence for the target face from the facial image sequences for different faces in following processes.
  • a face identifier of the face is generated; according to the face identifier, the facial image sequence composed of facial region images for the face can be quickly determined.
  • the target facial image sequence can be determined when receiving the face identification request.
  • the face identification request can be issued by upper-layer application.
  • an acquired face identifier from the face identification request is referred as a face identifier of the target face.
  • the target facial image sequence is determined from the multiple facial image sequences.
  • the face tracking can be performed by Kalman filtering, or can be performed by neural network.
  • step 102 determining the target facial image sequence for the target face, as shown in FIG. 1A, which may specifically include the following steps:
  • facial region images for the target face in multi-frame images of the video stream are determined as multiple candidate facial region images for the target face.
  • the candidate facial region images in the tracking video can be obtained by face detection.
  • One or more candidate facial images can be detected in each frame of image of the tracking video, or for some frames of images, it is possible that no candidate facial region image is detected.
  • face tracking multiple candidate facial region images corresponding to the same face can be obtained, which means that multiple candidate facial region images for the target face are determined.
  • the multiple candidate facial region images can include a physical facial region image and one or more mirrored facial region images.
  • step 1022 image qualities of the multiple candidate facial region images are determined.
  • quality of the candidate facial region image is evaluated by a pre-trained facial image quality evaluation model, to determine an evaluated result.
  • the candidate facial region image can be input into a pre -trained facial image quality evaluation model to obtain the evaluated result of image quality.
  • the image quality can be evaluated by integrating any of factors such as image intelligibility, brightness, clarity, facial symmetry and noise, and the evaluated result of image quality can be expressed by grade, score, etc. or other methods.
  • the facial image quality evaluation model used in this embodiment may be a model with a predetermined accuracy obtained by conventional training.
  • a binary classifier based on deep learning may be used to train the model by general face quality data. It is not necessary to spend high-cost to collect a large amount of data to train a high-precision face quality model.
  • the multiple candidate facial region images are ranked to obtain a first sequence.
  • the multiple candidate facial region images can be ranked from high-quality to low-quality or from low-quality to high-quality to obtain a first sequence.
  • sequence A is ranked according to a descending order of the image quality scores to obtain sequence B.
  • sequence B is used to represent the first sequence.
  • step 1024 based on the first sequence, the target facial image sequence is determined.
  • the first sequence can be directly determined as the target facial image sequence; or a sub-sequence of the first sequence may be determined as the target facial image sequence, where the first sub-sequence includes a preset number of candidate facial region images satisfied with preset image quality requirements.
  • image quality can be further improved and the image quantity can be effectively controlled of the target facial image sequence, which is improve the accuracy of identification result and efficiency of identification for the target face.
  • a sparse process is performed on the first sequence to obtain a second sequence.
  • the target facial region sequence is determined.
  • an interval distance between any two adjacent candidate facial region images in the second sequence in the corresponding timing sequence of the video stream is greater than the preset interval threshold.
  • the following sparse process method can be used to remove some of the candidate facial region images in sequence B to obtain sequence C, so that the interval distance between any two adjacent candidate facial region images in sequence C is greater than the preset interval threshold, which is noted as step:
  • the sequence B includes the following six candidate facial region images which are Bi, B2, B3, B4, B5, Be.
  • the preset interval threshold is 0.05ms.
  • i is equal to 1
  • the interval between Bi and B2 is 0.02ms
  • the interval between Bi and B3 is 0.06ms
  • the interval between B] and B4 is 0.15ms
  • the interval between Bi and B5 is 0.04ms
  • the interval between Bi and Be is 0.10ms.
  • B2 and B5 can be deleted from sequence B.
  • the remaining elements in sequence B are Bi, B3, B4, Be.
  • intervals between B3 and B4, Be sequentially calculate intervals between B3 and B4, Be respectively.
  • the interval between B3 and B4 is 0.09ms
  • the interval between B3 and Be is 0.04ms.
  • Be can be deleted from sequence B.
  • the remaining elements in sequence B are Bi, B3, B4.
  • traversing for i is finished, and elements in sequence C are Bi, B3 B4.
  • the timing sequence can be represented by time labels.
  • the candidate facial region image is a part of the image frames of the video stream, and the time label of the candidate facial region image may be a time label of the corresponding image frame in the video stream.
  • a specific method of the time sequential sparse process is not limited. Any method, which can realize that an interval distance between any two adjacent candidate facial region images in the second sequence is greater than the preset interval threshold, can be adopted. For example, multiple time labels can be calculated according to the preset interval threshold, and the candidate facial region images in the first sequence within the difference range of the multiple time labels can be combined into a second sequence; for another example, elements in the first sequence can be traversed in order, and the adjacent candidate facial region images whose interval distance is less than the preset interval threshold are removed to obtain the second sequence.
  • the second sequence can be directly determined as the target facial image sequence, or the second sub-sequence can be extracted from the second sequence and then the second sub-sequence is determined as the target facial image sequence.
  • the second sub-sequence includes a preset number of candidate facial region images satisfied with preset image quality requirements.
  • the image quality requirements can require for ranking of image quality. For example, when the second sequence includes the candidate facial region images ranked in descending order of image quality from high to low, the sub-sequence composed of candidate facial region images with a preset number from head of the second sequence may be determined as the target facial image sequence; or, when the second sequence includes the candidate facial region images ranked in an ascending order of image quality from low to high, the sub-sequence composed of candidate facial region images with a preset number from tail of the second sequence may be determined as the target facial image sequence.
  • the preset number can be represented by K, which can be set by those skilled in the art according to actual needs. Take K candidate facial region images from head of the sequence C as the sequence D, which is the final target facial image sequence. In another example, when ranking in the first sequence is the ascending order, K candidate facial region images from tail of the sequence C can be selected as the sequence D.
  • the image quality requirement may also require for the level or score of the image quality, that is, the image quality is required to meet the preset level or the preset score.
  • This embodiment does not limit the image quality requirement.
  • a preset number of candidate facial region images are further filtered from the second sequence as the target facial image sequence, which can further simplify the target facial region images for the target face, so that the final target facial image sequence is discrete on timing and is high-quality, which improves the efficiency of face identification.
  • face identification can be performed on the facial region images in order of image quality from high to low.
  • determining the target facial image sequence may be acquiring the target facial image sequence maintained by and from other devices.
  • Other devices may be terminal devices, servers, or other processing devices, and other devices may execute steps 1021 to 1024 shown in FIG.1 A to obtain the target facial image sequence.
  • FIG. 2 provides a face identification method according to another embodiment of the present disclosure.
  • the method may include the following processing, where the steps that are the same as the procedures of the foregoing embodiment will not be described in detail.
  • step 202 the target facial image sequence of the target face is determined, and image qualities of multiple facial region images are determined.
  • the target facial image sequence includes multiple facial region images for the target face.
  • a pre-trained facial image quality evaluation model with a predetermined accuracy can be used to evaluate image qualities for facial region images to get evaluated results.
  • image qualities of the facial region images in the target facial image sequence can be evaluated to obtain the evaluated results.
  • image qualities of facial region images can be determined. For example, multiple candidate facial region images can be obtained from the multi-frame images of the video stream first, then image qualities of the candidate facial region images are evaluated to obtain evaluated results. Then, facial region images of the target facial image sequence are determined from the candidate facial region images according to the evaluated results.
  • step 204 in the case the target facial region image is not determined, face identification is performed on a first facial region image that has not been identified and with highest image quality in the target facial image sequence.
  • facial region images in the target facial images sequence can be identified successively according to the image quality order to obtain face identification results.
  • the facial region images in the target facial image sequence D can be as the first facial region images and are sequentially input into the face identification model one by one, so as to extract face features in the facial region images, and compare them with a facial image of the face database based on the face features. Confidence levels can be used to represent the comparison results. For each of facial region images in facial image sequence D, multiple confidence levels can be obtained when the image is compared in the face database. Where, the comparison result with the highest confidence level is determined as the face identification result of the facial region image.
  • step 206 in response to determining that the confidence level of the face identification result of the first facial region image is greater than a preset threshold, the first facial region image is determined as the target facial region image.
  • the face identification succeeds.
  • the first facial image is determined as the target facial region image, and the face identification is no longer carried out for other facial region images in the target facial image sequence.
  • step 208 in response to determining that confidence levels of face identification results of all facial region images in the target facial image sequence are less than a preset threshold, a second facial region image, whose face identification result with a maximum confidence level, is determined as the target facial region image.
  • a second facial region image whose face identification result with a maximum confidence level, is determined as the target facial region image.
  • step 210 based on a face identification result of the target facial region image, identity information of the target face is determined.
  • the identity information associated with the face identification result of the target facial region image in the face database is taken as the identity information of the target face. Therefore, the identity information of the target object is determined by face identification.
  • a facial region image with the highest image quality is identified firstly.
  • the identification fails, continue to use a facial region image with the highest quality of remaining images in the target facial image sequence for identification.
  • the target facial image sequence is working as a quality selection model, which significantly improves an identification success rate for facial region images and a recall rate for high quality facial region images.
  • the method does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
  • the face identification method provided by this embodiment in the present disclosure can be applied to game place environment.
  • players seated-in needs to be tracking for a long time.
  • the face identification is performed on the tracked player to verify identity.
  • the game place environment is really complex, such as, lightings change greatly, players keep many postures. It is difficult to cover all problem scenarios by data collection to train the face quality model to select a high-quality facial image for face identification.
  • a face detection model, a facial image quality evaluation model and a face identification model need to be trained in advance.
  • ⁇ detection model For the face detection model, common face detection models such as Retina Net, YoloV3 or PCN can be used to complete training with general face data, or general face data and game-place-specific face data can be used to improve the accuracy of the model.
  • a binary classifier based on deep learning may be used to complete training with general face quality data, or general face data and game-place-specific face data can be used to improve the accuracy of the model.
  • binary cross entropy loss can be used as a loss function during training.
  • resenet50 or SqueezeNet can be used to complete training with general face identification data.
  • Common face identification loss functions such as ArcFace can be used for training.
  • the tracking video captured in real time by the camera in the game place can be processed as follows to maintain facial image sequences with time dispersion and high quality.
  • a tracking video pre-captured by the camera in the game place can also be processed.
  • Face detecting Each frame image in the tracking video is input into the face detection model sequentially, and a face detection box in each frame image is obtained.
  • An image in the face detection box is a candidate facial region image.
  • Image quality evaluating Candidate facial region images are input into the facial image quality evaluation model frame by frame in real time, to obtain image quality scores.
  • Face tracking Through a fast face tracking method, such as Kalman filter, the quality scores of all candidate facial region images corresponding to the same player's face can be obtained. These scores can be represented by a quality score sequence.
  • High-quality facial image sequence selecting the sequence can be ranked according to a descending order of quality sores, then sparse process is performed on timing sequence, to make an interval distance between any two adjacent elements in the sequence is greater than the preset interval threshold; Then, K candidate facial region images are taken from head of the sequence as the final facial image sequence; the final facial image sequence includes K facial region images that ranked from high-quality to low-quality.
  • N facial image sequences also need to be maintained.
  • the tracking video keeps going over time, the maintained facial image sequences will also be updated.
  • face identification is performed on a specified player in the face identification request.
  • the facial region images in the facial image sequence for the player are input into the face identification model one by one for feature extraction and identity searching. If the searching confidence level of the current facial region image is greater than the preset threshold, that is, the searching is successful, identity information in the face database associated with the searching result with the searching confidence level is output. If the searching confidence level of the current facial region image is less than the preset threshold, this searching confidence is recorded and the next facial region image is input into the face identification model. If the current facial region image is already the last one in the facial image sequence, identity information in the face database associated with the searching result with the highest searching confidence level in recorded searching confidence levels is output.
  • FIG. 3 is the block diagram of a face identification apparatus shown in this embodiment of the present disclosure.
  • the apparatus includes target facial image sequence determination module 31, target facial region image determination module 32, and identity information determination module 33.
  • the target facial image sequence determination module 31 is configured to determine a target facial image sequence for a target face, where the target facial image sequence comprises multiple facial region images for the target face.
  • the target facial region image determination module 32 is configured to perform face identification on at least one facial region image in the target facial image sequence, and determine, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence.
  • the identity information determination module 33 is configured to determine, based on a face identification result of the target facial region image, identity information of the target face.
  • the target facial image sequence includes the multiple facial region images for the target face extracted from multi-frame images in a video stream.
  • the apparatus also includes: facial image sequence generation module 30.
  • Facial image sequence generation module 30 is configured to: perform face detection based on each frame of image in an acquired video stream; for each of multiple faces detected in the video stream: track, in the video stream, the detected face to determine facial region images for the face in multi-frame images of the video stream; generate, based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face.
  • the target facial image sequence determination module 31 is configured to: determine that facial region images for the target face in multi-frame images of the video stream are multiple candidate facial region images for the target face; determine image qualities of the multiple candidate facial region images; rank, according to the image qualities, the multiple candidate facial region images to obtain a first sequence; determine the target facial image sequence based on the first sequence.
  • the target facial image sequence determination module 31 is configured to determine that a first sub-sequence of the first sequence is the target facial image sequence, the first sub-sequence comprising a preset number of candidate facial region images satisfied with preset image quality requirements.
  • the target facial image sequence determination module 31 is configured to: perform, based on corresponding timing sequence of the multiple candidate facial region images in the first sequence, sparse process for the first sequence to obtain a second sequence, where an interval distance between any two adjacent candidate facial region images in the second sequence in the corresponding timing sequence of the video stream is greater than the preset interval threshold.
  • the target facial image sequence determination module 31 is further configured to determine that a second sub-sequence of the second sequence is the target facial image sequence, the second sub- sequence comprising a preset number of candidate facial region images satisfied with preset image quality requirements.
  • the target facial image sequence determination module 31 is configured to: for each of the multiple candidate facial region images, perform quality evaluation for the candidate facial region image by a pre-trained facial image quality evaluation model, to determine a evaluated result of image quality of the candidate facial region image.
  • the facial image sequence generation module 30 is further configured to generate a facial identifier of the face.
  • the target facial image sequence determination module 31 is configured to: take an acquired facial identifier in a face identification request as a facial identifier of the target face; determine, based on the facial identifier of the target face, the target facial image sequence for the target face from facial image sequences for the multiple faces.
  • the target facial image sequence determination module 31 is further configured to determine image qualities of the multiple candidate facial region images.
  • the target facial region image determination module 32 is specifically configured to: in response to determining that a confidence level of the face identification result of a first facial region image is greater than a preset threshold, determine that the first facial region image is the target facial region image.
  • the target facial region image determination module is further configured to: in response to determining that confidence levels of face identification results of all facial region images in the target facial image sequence are less than a preset threshold, determine a second facial region image, whose face identification result with a maximum confidence, and take the second facial region image as the target facial region image.
  • an electronic device is also provided, as shown in FIG. 5.
  • the electronic device includes a memory 51 and a processor 52, where the memory is configured to store computer instructions capable of being run on the processor, and the processor 52 implements the face identification method of any one of embodiments of the present disclosure when executing the computer instructions.
  • a computer program product includes computer programs codes/instructions, when executed in a processor to perform the method of any one of embodiments of the present disclosure.
  • a computer-readable storage medium storing a computer program, where steps of the face identification method of any one of embodiments of the present disclosure are implemented when the program is executed by a processor.
  • the apparatus since the apparatus substantially corresponds to the method embodiments, relevant information can be referred to the description of the method embodiments.
  • the apparatus embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present description. A person of ordinary skill in the art would understand and implement without creative efforts.

Abstract

The embodiments of the present disclosure provide a face identification method and apparatus, where the method includes: determining a target facial image sequence for a target face, wherein the target facial image sequence comprises multiple facial region images for the target face; performing face identification on at least one facial region image in the target facial image sequence, and determining, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence; determining, based on a face identification result of the target facial region image, identity information of the target face. This method improves the identification success rate of the facial region image and reduces the cost of face identification.

Description

FACE IDENTIFICATION METHODS AND APPARATUSES
CROSS REFERENCE TO RELATED APPLICATION
This application claims priority to Singaporean Patent Application No. 10202110328W entitled “FACE IDENTIFICATION METHODS AND APPARATUSES” and filed on September 19, 2021, which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0001] The embodiments of the present disclosure relate to the field of image processing technology, and in particular, to a face identification method and apparatus.
BACKGROUND
[0002] Face identification is the most basic and important part of intelligent video analysis. The target object in the video is needed to be tracked for a long time, and when receiving an identification request issued by the upper-layer application, the face identification and identity determination are performed on the tracked target object.
SUMMARY
[0003] In view of this, the embodiments of the present disclosure provide at least a method and apparatus for face identification.
[0004] In a first aspect, a face identification method is provided, and the method including:
[0005] determining a facial image sequence of a target face, where the facial image sequence includes multiple facial region images of the target face;
[0006] performing face identification on at least one facial region image in the target facial image sequence, and determining, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence;
[0007] determining, based on a face identification result of the target facial region image, identity information of the target face;
[0008] In a second aspect, a face identification apparatus is provided, the apparatus including:
[0009] a target facial image sequence determination module, configured to determining a facial image sequence of a target face, where the facial image sequence includes multiple facial region images of the target face;
[0010] a target facial region image determination module, configured to perform face identification on at least one facial region image in the target facial image sequence, and determining, based on a confidence of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence;
[0011] an identity information determination module, configured to determine, based on a face identification result of the target facial region image, identity information of the target face.
[0012] In a third aspect, an electronic device is provided, including a memory and a processor, where the memory is configured to store computer instructions capable of being run on the processor, and the processor implements the face identification method according to the first aspect when executing the computer instructions.
[0013] In a fourth aspect, a computer-readable storage medium is provided, storing a computer program, where steps of the face identification method according to the first aspect are implemented when the program is executed by a processor.
[0014] In the face identification method provided by the technical solution of embodiments in the present disclosure, by performing face identification on a facial region image in a target facial image sequence for a target face, a target facial region image is determined. Further, identify information of the target face is determined. A target facial image sequence is working as a quality selection model. In a case that one of facial region images is failed to be identified, the face identification can be continued to perform on other facial region images in the target facial image sequence, which significantly improves identification success rate for facial region images, that is, a recall rate. Compared with a method that only perform face identification on a high-quality facial region image selected by a face quality model, the method described in this present disclosure does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] In order to more clearly describe the technical solutions in one or more embodiments of the present disclosure or related technologies, a brief description of the appended drawings involved in the embodiments or related technical descriptions is provided below. Obviously, the drawings in the following description are only concerned with one or more embodiments recorded in this present disclosure. For those of ordinary skill in the art, other drawings can be acquired according to these drawings without any creative labor.
[0016] FIG. 1 is a flowchart of a face identification method illustrated in an embodiment of the present disclosure.
[0017] FIG. 1A is a flowchart of a method for determining a target facial image sequence of a target face illustrated in an embodiment of the present disclosure;
[0018] FIG. 2 is a flowchart of another face identification method illustrated in an embodiment of the present disclosure;
[0019] FIG. 3 is a block diagram of a face identification apparatus illustrated in an embodiment of the present disclosure; [0020] FIG. 4 is a block diagram of another face identification apparatus illustrated in an embodiment of the present disclosure;
[0021] FIG. 5 is a schematic diagram of a hardware structure of an electronic device illustrated in embodiments of the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0022] Examples will be described in detail herein, with the illustrations thereof represented in the drawings. When the following descriptions involve the drawings, like numerals in different drawings refer to like or similar elements unless otherwise indicated. Implementations described in the following explanatory embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the present disclosure as detailed in the appended claims.
[0023] Terms used in the present disclosure are for the purpose of describing particular embodiments only, and are not intended to limit the present disclosure. Terms “a”, “the” and “said” in their singular forms in the description and the appended claims are also intended to include plurality, unless clearly indicated otherwise in the context. It should also be understood that the term “and/or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.
[0024] It is to be understood that although different information may be described using terms such as first, second, third, etc. in the present disclosure, the information should not be limited to these terms. These terms are used only to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information without departing from the scope of the present disclosure, and similarly, the second information may also be referred to as the first information. Depending on the context, the word “if’ as used herein may be interpreted as “when” or “as” or “determining in response to”.
[0025] For face identification in video tracking, the key step is how to select a high-quality facial image for face identification from a large number of facial image sequences in the video. If the quality of the to-be-identified facial image is too poor, the to-be-identified facial image cannot be matched with any identity information in the face database, so that an error result that the person cannot be found is obtained, that is, the facial image cannot be recalled.
[0026] The traditional method may train a face quality model by collecting large amounts of data to complete the selection of a high-quality facial image from the facial image sequence, and then use the selected high-quality facial image to perform the face identification. However, in practical applications, the environment around the target object is complex, and the target object can have many movements, such as turning the head, bowing the head, covering the face by a hand, looking in the mirror or wearing a mask, etc. It is difficult to cover all scenarios by data collection to train the face quality model. [0027] During the whole video tracking process, there are sufficient high-quality facial images, in addition to low-quality facial images in some time periods caused by object’s movements, such as turning head, covering head by a hand, or caused by a mirrored face in a reflective surface, such as glass, mirror, ceramic tile, etc. The manner to select a high-quality facial image by the face quality model, specifically the manner to select single facial image, may not guarantee that the high-quality facial image for face identification can be selected, such as, a facial image satisfied with requirements like the face in the image is clear, uncovered and not rotated. If there is no identification result for the selected image, the face identification failure will be declared directly, which cannot be remedied. Therefore, in order to improve the success rate of face identification, a large amount of data needs to be collected to train the face quality model. However, the collection of a large amount of data spends high cost.
[0028] Therefore, the embodiments of the present disclosure provide a face identification method. The method can reduce the impact of low-quality facial image without collecting a large amount of data, significantly improve the accuracy of face identification, and ensure the recall rate of facial image.
[0029] As shown in FIG. 1, FIG. 1 is a flowchart of face identification method shown in an embodiment of the present disclosure, and the method includes the following steps:
[0030] In step 102, a target facial image sequence for a target face is determined, where the target facial image sequence includes multiple facial region images for the target face.
[0031] In this step, a facial region image can be an image involving the target face of the target object detected from the tracking video when the target object is being tracked, or it can be an image involving the target face of the target object obtained by shooting the target object. Here, the target face is a face of a designated or undesignated person to be identified.
[0032] The face identification method of this embodiment can be executed by a face identification apparatus, for example, can be executed by a terminal device, a servicer or other processing equipment, where the terminal device can be a user device, a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, personal digital processing, a handheld device, a computing device, an on-board device, a wearable device, etc. In some possible implementations, the face identification method can be implemented by a processor invoking computer-readable instructions stored in a memory.
[0033] In this embodiment, a specific method of determining the target facial image sequence is not limited. For example, the target facial image sequence can be selected from multiple facial image sequences for multiple different faces maintained by the device in advance, or also can be directly acquired from other devices.
[0034] In addition, in this step, determining the facial image sequence for the target face can be executed at any time. For example, the method in this embodiment may be executed when receiving a face identification request that includes information about the target face; or, the method can also be executed when receiving a track video or continuously shooting images.
[0035] In step 104, face identification is performed on at least one facial region image in the target facial image sequence, and based on a confidence level of a face identification result of the at least one facial region image, a target facial region image is determined from the target facial image sequence.
[0036] For face identification of facial region image, one facial region image can be taken to compare with images of multiple faces respectively in a face database to get comparison results. Each of the comparison results includes a confidence level. The confidence level represents the probability that the face in the facial region image and a face in a compared image of the face database are of an identical object. A comparison result with the highest confidence level is determined as the face identification result of the facial region image.
[0037] If the confidence level of the face identification result of the facial region image reaches a preset confidence level threshold, that is, a face matched with the facial region image is found from the face database, so that face identification succeeds, then determine the facial region image is a target facial region image. If the confidence level of the face identification result of the facial region image doesn’t reach a preset confidence level threshold, that is, a face matched with the facial region image is not found from the face database, so that face identification fails, then continue select another facial image which hasn’t been selected before in the target facial image sequence to perform face identification until face identification succeeds.
[0038] After face identifications are performed for all facial region images in the target facial image sequence, if there is no face is found in the face database which is corresponding to any one of the facial region images, a facial region image with a face identification result closest to the successful identification result can be selected as the target facial region image in all face identification results. That is, a facial region image with a highest confidence of face identification result as the target facial region image, or face identification can be ended.
[0039] In this step, the order of face identification in the facial region image of the facial image sequence can be arbitrary, such as randomly selecting an image for identification from the target facial image sequence sequentially; or can also be in accordance with the preset order, such as in accordance with image quality from high-quality to low-quality, a facial region image is sequentially selected in the facial image sequence for face identification.
[0040] In the embodiment, a specific face identification method is not limited. For example, the identification may be performed through a neural network, or be performed through other methods.
[0041] In step 106, based on a face identification result of the target facial region image, identity information of the target face is determined.
[0042] The face identification result of the target facial region image includes a facial image corresponding to the target facial region image in the face database, and the identity information associated with the facial image is determined as the identity information of the target face. The identity information can include the ID number, name, registered account number, etc. pre-stored in the face database.
[0043] In the face identification method provided by the embodiment, by performing face identification on multiple facial region images in a target facial image sequence, a target facial region image is determined. Further, identify information of the target face is determined. It is possible to select a low-quality image for identification from the multiple facial region images, but when one of the multiple facial region images cannot be identified, a next facial region image in the target facial image sequence can be used to continue the identification. It is equivalent to that a target facial image sequence is working as a quality selection model, which significantly improves identification success rate for facial region images, that is, a recall rate. Compared with a method that only perform face identification on a high-quality facial region image selected by a face quality model, the method does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
[0044] In an embodiment, the method can be used to identify a target face in a video stream. In step 102, determining the target facial image sequence for the target face may include that multiple facial region images of the target face are extracted from multi-frame images of the video stream.
[0045] The video stream can be a recorded video or a real-time video. The multi-frame images of the video stream involve the target face of a target object. In this embodiment, the video stream may be a tracking video obtained by tracking the target object’s face.
[0046] In an example, before step 102, the video stream can be acquired in advanced. Based on each of frames of images in an acquired video stream, faces in the video stream are detected. For each of multiple faces detected in the video stream, the detected face is tracked in the video stream to determine facial region images for the face in multi-frame images of the video stream; based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face is generated. Therefore, multiple facial image sequences for different faces presented in the video stream can be acquired in advance, which is beneficial to quickly determine the target facial image sequence for the target face from the facial image sequences for different faces in following processes.
[0047] In this example, for each of multiple faces detected in the video stream, after determining facial region images for the face in multi-frame images of the video stream, a face identifier of the face is generated; according to the face identifier, the facial image sequence composed of facial region images for the face can be quickly determined. In this step, the target facial image sequence can be determined when receiving the face identification request. The face identification request can be issued by upper-layer application. When the face identification request message is received, an acquired face identifier from the face identification request is referred as a face identifier of the target face. According to the face identifier of the target face, the target facial image sequence is determined from the multiple facial image sequences.
[0048] In this example, which face tracking method is used is not limited. For example, the face tracking can be performed by Kalman filtering, or can be performed by neural network.
[0049] In an example, in step 102, determining the target facial image sequence for the target face, as shown in FIG. 1A, which may specifically include the following steps:
[0050] In step 1021, facial region images for the target face in multi-frame images of the video stream are determined as multiple candidate facial region images for the target face.
[0051] For example, in a case that the video stream is a tracking video, the candidate facial region images in the tracking video can be obtained by face detection. One or more candidate facial images can be detected in each frame of image of the tracking video, or for some frames of images, it is possible that no candidate facial region image is detected. Through face tracking, multiple candidate facial region images corresponding to the same face can be obtained, which means that multiple candidate facial region images for the target face are determined. In the case that multiple candidate facial region images are detected in one frame image, the multiple candidate facial region images can include a physical facial region image and one or more mirrored facial region images.
[0052] In step 1022, image qualities of the multiple candidate facial region images are determined.
[0053] For example, for each of the multiple candidate facial region images, quality of the candidate facial region image is evaluated by a pre-trained facial image quality evaluation model, to determine an evaluated result. For example, the candidate facial region image can be input into a pre -trained facial image quality evaluation model to obtain the evaluated result of image quality. The image quality can be evaluated by integrating any of factors such as image intelligibility, brightness, clarity, facial symmetry and noise, and the evaluated result of image quality can be expressed by grade, score, etc. or other methods.
[0054] A sequence can be used to represent evaluated results of image quality of multiple candidate facial region images for the target face, such as sequence A = {an, n =1, 2, 3, ..., N}, an represents the nth evaluated result of image quality of a candidate facial region image, N represents the total number of frames of the candidate facial region images of the target face.
[0055] It should be noted that the facial image quality evaluation model used in this embodiment may be a model with a predetermined accuracy obtained by conventional training. For example, a binary classifier based on deep learning may be used to train the model by general face quality data. It is not necessary to spend high-cost to collect a large amount of data to train a high-precision face quality model. [0056] In step 1023, according to the evaluated results of image qualities, the multiple candidate facial region images are ranked to obtain a first sequence.
[0057] For example, the multiple candidate facial region images can be ranked from high-quality to low-quality or from low-quality to high-quality to obtain a first sequence. Following the above example, sequence A is ranked according to a descending order of the image quality scores to obtain sequence B. In this example, sequence B is used to represent the first sequence.
[0058] In step 1024, based on the first sequence, the target facial image sequence is determined.
[0059] The first sequence can be directly determined as the target facial image sequence; or a sub-sequence of the first sequence may be determined as the target facial image sequence, where the first sub-sequence includes a preset number of candidate facial region images satisfied with preset image quality requirements. By further filtering images in the first sequence based on image quality, and selecting only a sub-sequence including a preset number of candidate facial region images as the target facial image sequence, image quality can be further improved and the image quantity can be effectively controlled of the target facial image sequence, which is improve the accuracy of identification result and efficiency of identification for the target face.
[0060] In some embodiments, based on a corresponding timing sequence of the multiple candidate facial region images in the first sequence, a sparse process is performed on the first sequence to obtain a second sequence. According to the second sequence, the target facial region sequence is determined.
[0061] Where, an interval distance between any two adjacent candidate facial region images in the second sequence in the corresponding timing sequence of the video stream is greater than the preset interval threshold.
[0062] The following sparse process method can be used to remove some of the candidate facial region images in sequence B to obtain sequence C, so that the interval distance between any two adjacent candidate facial region images in sequence C is greater than the preset interval threshold, which is noted as step:
[0063] a) Traverse values in sequence B, and note the serial number of the current candidate facial region image as i.
[0064] b) For each i, from i to start traverse values of sequence B, note the current candidate facial region image number as j.
[0065] c) If the interval distance between B; and B is less than the preset interval threshold, which is noted as step, then delete By.
[0066] c/) When finish traversing for i, all remaining candidate facial region images in sequence B are assigned to sequence C. [0067] e) End.
[0068] In an example, assuming that the sequence B includes the following six candidate facial region images which are Bi, B2, B3, B4, B5, Be. And the preset interval threshold is 0.05ms. When i is equal to 1, sequentially calculate intervals between Bi and B2, B3, B4, B5, Be respectively. The interval between Bi and B2 is 0.02ms, the interval between Bi and B3 is 0.06ms, the interval between B] and B4 is 0.15ms, the interval between Bi and B5 is 0.04ms, the interval between Bi and Be is 0.10ms. In this case, B2 and B5 can be deleted from sequence B. Currently, the remaining elements in sequence B are Bi, B3, B4, Be. When i is equal to 3, sequentially calculate intervals between B3 and B4, Be respectively. The interval between B3 and B4 is 0.09ms, the interval between B3 and Be is 0.04ms. In this case, Be can be deleted from sequence B. Currently, the remaining elements in sequence B are Bi, B3, B4. Now, traversing for i is finished, and elements in sequence C are Bi, B3 B4.
[0069] In implements, the timing sequence can be represented by time labels. The candidate facial region image is a part of the image frames of the video stream, and the time label of the candidate facial region image may be a time label of the corresponding image frame in the video stream.
[0070] For the target object in the tracking video, when the target object has some movements such as turning head, covering head by a hand, or drinking water, an image captured these movements by the tracking video will not be suitable for face identification. Generally, these movements will be lasted in a certain period of time. Sequential continuous image frames in the tracking video are usually similar, and image qualities are generally similar. Repeated face identification on similar images is not worthwhile, so the time sequential sparse process can be performed on collected candidate facial region images to obtain the second sequence, so as to avoid face identification on low-quality images for many times and improve the efficiency of face identification.
[0071] In the embodiment, a specific method of the time sequential sparse process is not limited. Any method, which can realize that an interval distance between any two adjacent candidate facial region images in the second sequence is greater than the preset interval threshold, can be adopted. For example, multiple time labels can be calculated according to the preset interval threshold, and the candidate facial region images in the first sequence within the difference range of the multiple time labels can be combined into a second sequence; for another example, elements in the first sequence can be traversed in order, and the adjacent candidate facial region images whose interval distance is less than the preset interval threshold are removed to obtain the second sequence.
[0072] After the second sequence is obtained, according to the second sequence, the second sequence can be directly determined as the target facial image sequence, or the second sub-sequence can be extracted from the second sequence and then the second sub-sequence is determined as the target facial image sequence. Where, the second sub-sequence includes a preset number of candidate facial region images satisfied with preset image quality requirements.
[0073] The image quality requirements can require for ranking of image quality. For example, when the second sequence includes the candidate facial region images ranked in descending order of image quality from high to low, the sub-sequence composed of candidate facial region images with a preset number from head of the second sequence may be determined as the target facial image sequence; or, when the second sequence includes the candidate facial region images ranked in an ascending order of image quality from low to high, the sub-sequence composed of candidate facial region images with a preset number from tail of the second sequence may be determined as the target facial image sequence.
[0074] The preset number can be represented by K, which can be set by those skilled in the art according to actual needs. Take K candidate facial region images from head of the sequence C as the sequence D, which is the final target facial image sequence. In another example, when ranking in the first sequence is the ascending order, K candidate facial region images from tail of the sequence C can be selected as the sequence D.
[0075] The image quality requirement may also require for the level or score of the image quality, that is, the image quality is required to meet the preset level or the preset score. This embodiment does not limit the image quality requirement. According to the face quality requirement, a preset number of candidate facial region images are further filtered from the second sequence as the target facial image sequence, which can further simplify the target facial region images for the target face, so that the final target facial image sequence is discrete on timing and is high-quality, which improves the efficiency of face identification. After obtaining the target facial image sequence, face identification can be performed on the facial region images in order of image quality from high to low.
[0076] In another example, determining the target facial image sequence may be acquiring the target facial image sequence maintained by and from other devices. Other devices may be terminal devices, servers, or other processing devices, and other devices may execute steps 1021 to 1024 shown in FIG.1 A to obtain the target facial image sequence.
[0077] FIG. 2 provides a face identification method according to another embodiment of the present disclosure. The method may include the following processing, where the steps that are the same as the procedures of the foregoing embodiment will not be described in detail.
[0078] In step 202, the target facial image sequence of the target face is determined, and image qualities of multiple facial region images are determined.
[0079] The target facial image sequence includes multiple facial region images for the target face.
[0080] A pre-trained facial image quality evaluation model with a predetermined accuracy can be used to evaluate image qualities for facial region images to get evaluated results.
[0081] For example, after determining the target facial image sequence, image qualities of the facial region images in the target facial image sequence can be evaluated to obtain the evaluated results.
[0082] Or before determining target facial image sequence, image qualities of facial region images can be determined. For example, multiple candidate facial region images can be obtained from the multi-frame images of the video stream first, then image qualities of the candidate facial region images are evaluated to obtain evaluated results. Then, facial region images of the target facial image sequence are determined from the candidate facial region images according to the evaluated results.
[0083] In step 204, in the case the target facial region image is not determined, face identification is performed on a first facial region image that has not been identified and with highest image quality in the target facial image sequence.
[0084] In the case the target facial region image is not determined, facial region images in the target facial images sequence can be identified successively according to the image quality order to obtain face identification results.
[0085] For example, the facial region images in the target facial image sequence D can be as the first facial region images and are sequentially input into the face identification model one by one, so as to extract face features in the facial region images, and compare them with a facial image of the face database based on the face features. Confidence levels can be used to represent the comparison results. For each of facial region images in facial image sequence D, multiple confidence levels can be obtained when the image is compared in the face database. Where, the comparison result with the highest confidence level is determined as the face identification result of the facial region image.
[0086] In step 206, in response to determining that the confidence level of the face identification result of the first facial region image is greater than a preset threshold, the first facial region image is determined as the target facial region image.
[0087] When a confidence level of the face identification result of a first facial region image exceeds the preset threshold, the face identification succeeds. The first facial image is determined as the target facial region image, and the face identification is no longer carried out for other facial region images in the target facial image sequence.
[0088] When the confidence level of the face identification result of the first facial region image does not exceed the preset threshold, and the first facial region image is not the last facial region image in the image sequence, then continue to perform face identification on a first facial region image that has not been identified and with highest image quality in the target facial image sequence.
[0089] In step 208, in response to determining that confidence levels of face identification results of all facial region images in the target facial image sequence are less than a preset threshold, a second facial region image, whose face identification result with a maximum confidence level, is determined as the target facial region image. [0090] When face identification is performed on all facial region images in the target facial image sequence, confidence levels of face identification results of the facial region images face identification are all less than the preset threshold, then the second facial region image, whose face identification result with a maximum confidence level in all face identification results of facial region images, is taken as the target facial region image.
[0091] In step 210, based on a face identification result of the target facial region image, identity information of the target face is determined.
[0092] The identity information associated with the face identification result of the target facial region image in the face database is taken as the identity information of the target face. Therefore, the identity information of the target object is determined by face identification.
[0093] In the face identification method provided by the embodiment, during performing face identification on multiple facial region images in a target facial image sequence, a facial region image with the highest image quality is identified firstly. When the identification fails, continue to use a facial region image with the highest quality of remaining images in the target facial image sequence for identification. It is equivalent to that the target facial image sequence is working as a quality selection model, which significantly improves an identification success rate for facial region images and a recall rate for high quality facial region images. Compared with a method that only performs face identification on a high-quality facial region image selected by a face quality model, the method does not need to conduct a high-cost data collection to get a large amount of data to train the face quality model for covering all scenarios about low quality facial images. In this way, the cost of face identification is reduced.
[0094] In an embodiment, the face identification method provided by this embodiment in the present disclosure can be applied to game place environment. In a scene of intelligent play place, players seated-in needs to be tracking for a long time. And when receiving the identification request issued by the upper-layer application, the face identification is performed on the tracked player to verify identity. However, the game place environment is really complex, such as, lightings change greatly, players keep many postures. It is difficult to cover all problem scenarios by data collection to train the face quality model to select a high-quality facial image for face identification.
[0095] The following is an illustration of the application of a face identification method in the game place provided by this embodiment of the present disclosure.
[0096] First of all, a face detection model, a facial image quality evaluation model and a face identification model need to be trained in advance.
[0097] For the face detection model, common face detection models such as Retina Net, YoloV3 or PCN can be used to complete training with general face data, or general face data and game-place- specific face data can be used to improve the accuracy of the model.
[0098] For the facial image quality evaluation model, a binary classifier based on deep learning may be used to complete training with general face quality data, or general face data and game-place-specific face data can be used to improve the accuracy of the model. For example, binary cross entropy loss can be used as a loss function during training.
[0099] For the face identification model, common neural networks such as resenet50 or SqueezeNet can be used to complete training with general face identification data. Common face identification loss functions such as ArcFace can be used for training.
[0100] Based on the models prepared above, the tracking video captured in real time by the camera in the game place can be processed as follows to maintain facial image sequences with time dispersion and high quality. In other examples, a tracking video pre-captured by the camera in the game place can also be processed.
[0101] Face detecting: Each frame image in the tracking video is input into the face detection model sequentially, and a face detection box in each frame image is obtained. An image in the face detection box is a candidate facial region image.
[0102] Image quality evaluating: Candidate facial region images are input into the facial image quality evaluation model frame by frame in real time, to obtain image quality scores.
[0103] Face tracking: Through a fast face tracking method, such as Kalman filter, the quality scores of all candidate facial region images corresponding to the same player's face can be obtained. These scores can be represented by a quality score sequence.
[0104] High-quality facial image sequence selecting: the sequence can be ranked according to a descending order of quality sores, then sparse process is performed on timing sequence, to make an interval distance between any two adjacent elements in the sequence is greater than the preset interval threshold; Then, K candidate facial region images are taken from head of the sequence as the final facial image sequence; the final facial image sequence includes K facial region images that ranked from high-quality to low-quality.
[0105] It should be noted that if the number of players in the tracking video is N, N facial image sequences also need to be maintained. When the tracking video keeps going over time, the maintained facial image sequences will also be updated. When receiving a face identification request issued by the upper-layer application, face identification is performed on a specified player in the face identification request.
[0106] The facial region images in the facial image sequence for the player are input into the face identification model one by one for feature extraction and identity searching. If the searching confidence level of the current facial region image is greater than the preset threshold, that is, the searching is successful, identity information in the face database associated with the searching result with the searching confidence level is output. If the searching confidence level of the current facial region image is less than the preset threshold, this searching confidence is recorded and the next facial region image is input into the face identification model. If the current facial region image is already the last one in the facial image sequence, identity information in the face database associated with the searching result with the highest searching confidence level in recorded searching confidence levels is output.
[0107] In a scene of intelligent play place, an identity of a player captured by a camera needs to be identified. The solution herein can be used to finish this goal, which can guarantee relative high identification accuracy. Based on the recall rates of high-quality facial region images and faces of players, subsequent works such as game table monitoring and user analysis can be better performed.
[0108] Those skilled in the art can understand that in the above methods of detailed description, the writing order of each step does not mean a strict execution order to limit any implementation process, and the specific execution order of each step should be determined according to its function and possible internal logic.
[0109] As shown in FIG. 3, FIG. 3 is the block diagram of a face identification apparatus shown in this embodiment of the present disclosure. The apparatus includes target facial image sequence determination module 31, target facial region image determination module 32, and identity information determination module 33.
[0110] The target facial image sequence determination module 31 is configured to determine a target facial image sequence for a target face, where the target facial image sequence comprises multiple facial region images for the target face.
[0111] The target facial region image determination module 32 is configured to perform face identification on at least one facial region image in the target facial image sequence, and determine, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence.
[0112] The identity information determination module 33 is configured to determine, based on a face identification result of the target facial region image, identity information of the target face.
[0113] In an example, the target facial image sequence includes the multiple facial region images for the target face extracted from multi-frame images in a video stream.
[0114] In an example, as shown in FIG. 4, based on foregoing embodiment of the apparatus, the apparatus also includes: facial image sequence generation module 30.
[0115] Facial image sequence generation module 30 is configured to: perform face detection based on each frame of image in an acquired video stream; for each of multiple faces detected in the video stream: track, in the video stream, the detected face to determine facial region images for the face in multi-frame images of the video stream; generate, based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face.
[0116] In an example, the target facial image sequence determination module 31 is configured to: determine that facial region images for the target face in multi-frame images of the video stream are multiple candidate facial region images for the target face; determine image qualities of the multiple candidate facial region images; rank, according to the image qualities, the multiple candidate facial region images to obtain a first sequence; determine the target facial image sequence based on the first sequence.
[0117] In an example, the target facial image sequence determination module 31 is configured to determine that a first sub-sequence of the first sequence is the target facial image sequence, the first sub-sequence comprising a preset number of candidate facial region images satisfied with preset image quality requirements.
[0118] In an example, the target facial image sequence determination module 31 is configured to: perform, based on corresponding timing sequence of the multiple candidate facial region images in the first sequence, sparse process for the first sequence to obtain a second sequence, where an interval distance between any two adjacent candidate facial region images in the second sequence in the corresponding timing sequence of the video stream is greater than the preset interval threshold.
[0119] In an example, the target facial image sequence determination module 31 is further configured to determine that a second sub-sequence of the second sequence is the target facial image sequence, the second sub- sequence comprising a preset number of candidate facial region images satisfied with preset image quality requirements.
[0120] In an example, when determining image qualities of the multiple candidate facial region images, the target facial image sequence determination module 31 is configured to: for each of the multiple candidate facial region images, perform quality evaluation for the candidate facial region image by a pre-trained facial image quality evaluation model, to determine a evaluated result of image quality of the candidate facial region image.
[0121] In an example, for the each of multiple faces detected in the video stream, after tracking, in the video stream, the detected face to determine the facial region images for the face in multi-frame images of the video stream, the facial image sequence generation module 30 is further configured to generate a facial identifier of the face. The target facial image sequence determination module 31 is configured to: take an acquired facial identifier in a face identification request as a facial identifier of the target face; determine, based on the facial identifier of the target face, the target facial image sequence for the target face from facial image sequences for the multiple faces.
[0122] In an example, the target facial image sequence determination module 31 is further configured to determine image qualities of the multiple candidate facial region images. The target facial region image determination module 32 is specifically configured to: in response to determining that a confidence level of the face identification result of a first facial region image is greater than a preset threshold, determine that the first facial region image is the target facial region image.
[0123] In an example, the target facial region image determination module is further configured to: in response to determining that confidence levels of face identification results of all facial region images in the target facial image sequence are less than a preset threshold, determine a second facial region image, whose face identification result with a maximum confidence, and take the second facial region image as the target facial region image.
[0124] The functions of each module in the above-mentioned apparatuses is detailed in the implementation of corresponding steps in the above-mentioned methods, which are not described here.
[0125] In the embodiments of the present disclosure, an electronic device is also provided, as shown in FIG. 5. The electronic device includes a memory 51 and a processor 52, where the memory is configured to store computer instructions capable of being run on the processor, and the processor 52 implements the face identification method of any one of embodiments of the present disclosure when executing the computer instructions.
[0126] In the embodiments of the present disclosure, a computer program product is also provided. The computer program product includes computer programs codes/instructions, when executed in a processor to perform the method of any one of embodiments of the present disclosure.
[0127] In the embodiments of the present disclosure, a computer-readable storage medium is provided, storing a computer program, where steps of the face identification method of any one of embodiments of the present disclosure are implemented when the program is executed by a processor.
[0128] For the embodiments of the apparatus, since the apparatus substantially corresponds to the method embodiments, relevant information can be referred to the description of the method embodiments. The apparatus embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present description. A person of ordinary skill in the art would understand and implement without creative efforts.
[0129] Specific embodiments of the present description are described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims may be performed in an order different from that in the embodiments and may still achieve the desired results. Moreover, the processes depicted in the figures do not necessarily require the particular order or sequence shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0130] A person skilled in the art upon consideration of the specification and practice of the present disclosure disclosed herein will readily appreciate other implementations of the present disclosure. The present disclosure is intended to cover any variations, uses, or adaptations of the present disclosure, and the variations, uses, and adaptations follow a general principle of the present disclosure and include common sense or common technical means in this technical field that are not disclosed in the present disclosure. The specification and the embodiments are considered as merely exemplary, and the real scope and spirit of the present disclosure are pointed out in the following claims.
[0131] It is to be understood that this specification is not limited to the precise construction already described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from its scope. The scope of this specification is limited only by the appended claims.
[0132] The foregoing descriptions are merely preferred embodiments of one or more embodiments of this specification, but are not intended to limit the one or more embodiments of this specification. Any modification, equivalent replacement, or improvement made within the spirit and principle of one or more embodiments of this specification shall fall within the protection scope of the one or more embodiments of this specification.

Claims

1. A face identification method, comprising: determining a target facial image sequence for a target face, wherein the target facial image sequence comprises multiple facial region images for the target face; performing face identification on at least one facial region image in the target facial image sequence; determining, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence; and determining, based on the face identification result of the target facial region image, identity information of the target face.
2. The method according to claim 1, wherein the target facial image sequence comprises the multiple facial region images for the target face extracted from multi-frame images in a video stream.
3. The method according to claim 1 or 2, further comprising: performing face detection based on each frame of image in an acquired video stream; and for each of multiple faces detected in the video stream: tracking, based on the video stream, the face to determine facial region images for the face in multi-frame images of the video stream; and generating, based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face.
4. The method according to claim 3, wherein determining the target facial image sequence for the target face comprises: determining the facial region images for the target face in multi-frame images of the video stream as multiple candidate facial region images for the target face; determining image qualities of the multiple candidate facial region images; ranking, according to the image qualities, the multiple candidate facial region images to obtain a first sequence; and determining the target facial image sequence based on the first sequence.
5. The method according to claim 4, wherein determining the target facial image sequence based on the first sequence comprises: determining a first sub-sequence of the first sequence as the target facial image sequence, the first sub- sequence comprising a preset number of candidate facial region images satisfied with a preset image quality requirement.
6. The method according to claim 4, wherein determining the target facial image sequence based on the first sequence comprises: performing, based on a corresponding timing sequence of the multiple candidate facial region images in the first sequence, sparse process for the first sequence to obtain a second sequence, wherein an interval distance between any two adjacent candidate facial region images in the second sequence is greater than a preset interval threshold; and determining the target facial image sequence based on the second sequence.
7. The method according to claim 6, wherein determining the target facial image sequence based on the second sequence comprises: determining a second sub- sequence of the second sequence as the target facial image sequence, the second sub-sequence comprising a preset number of candidate facial region images satisfied with a preset image quality requirement.
8. The method according to any one of claims 4 to 7, wherein determining image qualities of the multiple candidate facial region images comprises: for each of the multiple candidate facial region images, performing quality evaluation for the candidate facial region image by a pre-trained facial image quality evaluation model, to determine an evaluated result of image quality of the candidate facial region image.
9. The method according to any one of claims 3 to 8, wherein, for the each of multiple faces detected in the video stream, after tracking, based on the video stream, the face to determine the facial region images for the face in multi-frame images of the video stream, the method further comprises: generating a facial identifier of the face; and determining the target facial image sequence for the target face comprises: taking an acquired facial identifier in a face identification request as a facial identifier of the target face; and determining, based on the facial identifier of the target face, the target facial image sequence for the target face from facial image sequences for the multiple faces.
10. The method according to any one of claims 1 to 9, further comprising: determining image qualities of the multiple facial region images; and performing face identification on the at least one facial region image in the target facial image sequence, and determining, based on the confidence level of the face identification result of the at least one facial region image, the target facial region image from the target facial image sequence, comprises: in the case that the target facial region image is not determined, performing face identification on a first facial region image without face identification and with highest image quality in the target facial image sequence; and in response to determining that a confidence level of the face identification result of the first facial region image is greater than a preset threshold, determining that the first facial region image is the target facial region image.
11. The method according to claim 10, wherein performing face identification on the at least one facial region image in the target facial image sequence, and determining, based on the confidence level of the face identification result of the at least one facial region image, the target facial region image from the target facial image sequence, further comprises: in response to determining that confidence levels of face identification results of all facial region images in the target facial image sequence are less than the preset threshold, determining a second facial region image with a maximum confidence level; and taking the second facial region image as the target facial region image.
12. A face identification apparatus, comprising: a target facial image sequence determination module, configured to determine a target facial image sequence for a target face, wherein the target facial image sequence comprises multiple facial region images for the target face; a target facial region image determination module, configured to perform face identification on at least one facial region image in the target facial image sequence, and determine, based on a confidence level of a face identification result of the at least one facial region image, a target facial region image from the target facial image sequence; and an identity information determination module, configured to determine, based on a face identification result of the target facial region image, identity information of the target face.
13. The apparatus according to claim 12, wherein the target facial image sequence comprises the multiple facial region images for the target face extracted from multi-frame images in a video stream.
14. The apparatus according to claim 12 or 13, wherein the apparatus further comprises a facial image sequence generation module, configured to: perform face detection based on each frame of image in an acquired video stream; and for each of multiple faces detected in the video stream: track, based on the video stream, the face to determine facial region images for the face in multi-frame images of the video stream; and generate, based on the facial region images for the face in multi-frame images of the video stream, a facial image sequence for the face.
15. The apparatus according to claim 14, wherein the target facial image sequence determination module is configured to: determine the facial region images for the target face in multi-frame images of the video stream as multiple candidate facial region images for the target face; determine image qualities of the multiple candidate facial region images; rank, according to the image qualities, the multiple candidate facial region images to obtain a first sequence; and determine the target facial image sequence based on the first sequence.
16. The apparatus according to claim 15, wherein target the facial image sequence determination module is further configured to: determine a first sub-sequence of the first sequence as the facial image sequence, the first sub-sequence comprising a preset number of candidate facial region images satisfied with a preset image quality requirement.
17. The apparatus according to claim 15, wherein the target facial image sequence determination module is further configured to: perform, based on a corresponding timing sequence of the multiple candidate facial region images in the first sequence, sparse process for the first sequence to obtain a second sequence, wherein an interval distance between any two adjacent candidate facial region 21 images in the second sequence is greater than a preset interval threshold; and determine the target facial image sequence based on the second sequence.
18. The apparatus according to claim 17, wherein the target facial image sequence determination module is further configured to: determine a second sub-sequence of the second sequence as the target facial image sequence, the second sub-sequence comprising a preset number of candidate facial region images satisfied with a preset image quality requirement.
19. An electronic device, comprising a memory and a processor, wherein the memory is configured to store computer instructions capable of being run on the processor, and the processor implements the method according to any one of claims 1 to 11 when executing the computer instructions.
20. A computer-readable storage medium, storing a computer program, wherein the method according to any one of claims 1 to 11 is performed when the program is executed by a processor.
21. A computer program, comprising computer-readable codes which, when executed in an electronic device, cause a processor in the electronic device to perform the method of any one of claims 1 to 11.
PCT/IB2021/058720 2021-09-20 2021-09-24 Face identification methods and apparatuses WO2023041963A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2021240278A AU2021240278A1 (en) 2021-09-20 2021-09-24 Face identification methods and apparatuses
CN202180002767.6A CN113785304A (en) 2021-09-20 2021-09-24 Face recognition method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SG10202110328W 2021-09-20
SG10202110328W 2021-09-20

Publications (1)

Publication Number Publication Date
WO2023041963A1 true WO2023041963A1 (en) 2023-03-23

Family

ID=85601913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2021/058720 WO2023041963A1 (en) 2021-09-20 2021-09-24 Face identification methods and apparatuses

Country Status (1)

Country Link
WO (1) WO2023041963A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110129121A1 (en) * 2006-08-11 2011-06-02 Tessera Technologies Ireland Limited Real-time face tracking in a digital image acquisition device
US20170076146A1 (en) * 2015-09-11 2017-03-16 EyeVerify Inc. Fusing ocular-vascular with facial and/or sub-facial information for biometric systems
WO2019100608A1 (en) * 2017-11-21 2019-05-31 平安科技(深圳)有限公司 Video capturing device, face recognition method, system, and computer-readable storage medium
CN108288027B (en) * 2017-12-28 2020-10-27 新智数字科技有限公司 Image quality detection method, device and equipment
CN109544523B (en) * 2018-11-14 2021-01-01 北京智芯原动科技有限公司 Method and device for evaluating quality of face image based on multi-attribute face comparison
US20210166003A1 (en) * 2018-08-22 2021-06-03 Zhejiang Dahua Technology Co., Ltd. Systems and methods for selecting a best facial image of a target human face

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110129121A1 (en) * 2006-08-11 2011-06-02 Tessera Technologies Ireland Limited Real-time face tracking in a digital image acquisition device
US20170076146A1 (en) * 2015-09-11 2017-03-16 EyeVerify Inc. Fusing ocular-vascular with facial and/or sub-facial information for biometric systems
WO2019100608A1 (en) * 2017-11-21 2019-05-31 平安科技(深圳)有限公司 Video capturing device, face recognition method, system, and computer-readable storage medium
CN108288027B (en) * 2017-12-28 2020-10-27 新智数字科技有限公司 Image quality detection method, device and equipment
US20210166003A1 (en) * 2018-08-22 2021-06-03 Zhejiang Dahua Technology Co., Ltd. Systems and methods for selecting a best facial image of a target human face
CN109544523B (en) * 2018-11-14 2021-01-01 北京智芯原动科技有限公司 Method and device for evaluating quality of face image based on multi-attribute face comparison

Similar Documents

Publication Publication Date Title
Lin et al. Bsn: Boundary sensitive network for temporal action proposal generation
Li et al. Vlad3: Encoding dynamics of deep features for action recognition
CN111161311A (en) Visual multi-target tracking method and device based on deep learning
AU2021240278A1 (en) Face identification methods and apparatuses
CN109146921A (en) A kind of pedestrian target tracking based on deep learning
CN109685037B (en) Real-time action recognition method and device and electronic equipment
WO2020211242A1 (en) Behavior recognition-based method, apparatus and storage medium
CN111680543A (en) Action recognition method and device and electronic equipment
CN110796135A (en) Target positioning method and device, computer equipment and computer storage medium
Lin et al. Joint learning of local and global context for temporal action proposal generation
CN113762519A (en) Data cleaning method, device and equipment
CN111241928A (en) Face recognition base optimization method, system, equipment and readable storage medium
CN110852224B (en) Expression recognition method and related device
CN111291780A (en) Cross-domain network training and image recognition method
CN113627334A (en) Object behavior identification method and device
WO2023041963A1 (en) Face identification methods and apparatuses
CN111191587A (en) Pedestrian re-identification method and system
CN109492702B (en) Pedestrian re-identification method, system and device based on ranking measurement function
CN111062345A (en) Training method and device of vein recognition model and vein image recognition device
CN113591647B (en) Human motion recognition method, device, computer equipment and storage medium
CN115620089A (en) Object representation model training method, object representation method and device
CN114782997A (en) Pedestrian re-identification method and system based on multi-loss attention adaptive network
CN113628248A (en) Pedestrian residence time determining method and device and computer readable storage medium
CN113850160A (en) Method and device for counting repeated actions
WO2021247372A1 (en) Semi-supervised action-actor detection from tracking data in sport

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021240278

Country of ref document: AU

Date of ref document: 20210924

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21957405

Country of ref document: EP

Kind code of ref document: A1