US11315363B2 - Systems and methods for gait recognition via disentangled representation learning - Google Patents

Systems and methods for gait recognition via disentangled representation learning Download PDF

Info

Publication number
US11315363B2
US11315363B2 US17/155,350 US202117155350A US11315363B2 US 11315363 B2 US11315363 B2 US 11315363B2 US 202117155350 A US202117155350 A US 202117155350A US 11315363 B2 US11315363 B2 US 11315363B2
Authority
US
United States
Prior art keywords
gait
features
pose
feature
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US17/155,350
Other versions
US20210224524A1 (en
Inventor
Xiaoming Liu
Jian Wan
Kwaku Prakah-Asante
Mike Blommer
Ziyuan ZHANG
Luan Tran
Xi Yin
Yousef Atoum
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ford Global Technologies LLC
Michigan State University MSU
Original Assignee
Ford Global Technologies LLC
Michigan State University MSU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ford Global Technologies LLC, Michigan State University MSU filed Critical Ford Global Technologies LLC
Priority to US17/155,350 priority Critical patent/US11315363B2/en
Publication of US20210224524A1 publication Critical patent/US20210224524A1/en
Assigned to BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY reassignment BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ZHANG, Ziyuan, TRAN, LUAN, Atoum, Yousef, LIU, XIAOMING, YIN, Xi
Assigned to FORD GLOBAL TECHNOLOGIES LLC reassignment FORD GLOBAL TECHNOLOGIES LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WAN, Jian, PRAKAH-ASANTE, KWAKU O., BLOMMER, Mike
Application granted granted Critical
Publication of US11315363B2 publication Critical patent/US11315363B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06K9/623
    • G06K9/6256
    • G06K9/6268
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation

Definitions

  • the present disclosure relates to gait recognition and, more specifically, to gait recognition implemented as an authentication method.
  • the authentication system comprises: a camera, a feature extractor, an aggregator, a classifier and a data store.
  • the camera is configured to capture two or more images of an unknown person walking.
  • the feature extractor is configured to receive the two or more images and, for each image in the two or more images, operates to extract a set of appearance features and a set of pose features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person.
  • the feature extractor is a neural network trained to disentangle the pose features from the appearance features.
  • the aggregator is configured to receive multiple sets of pose features from the feature extractor and generate a gait feature for the unknown person.
  • the data store is configured to store a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person.
  • the classifier is configured to receive the gait feature from the aggregator and operates to identify the unknown person by comparing the gait feature to the plurality of gait features stored in the data store.
  • the authentication system may further include a pre-processor interposed between the camera and the feature extractor which operates to remove background from each of the two or more images.
  • the neural network is further defined as a convolutional neural network.
  • the neural network may be trained using cross reconstruction loss. That is, the neural network is trained by comparing a given image from the two or more images with a reconstructed image, where the reconstructed image was reconstructed using a set of pose features from one image in the two or more images and appearance features from another image in the two or more images.
  • the aggregator is further defined as a long short-term memory, such that the classifier averages output from the long short-term memory over time.
  • the authentication system further includes a verification module and an actuator.
  • the verification module receives an identity for the unknown person from the classifier and actuates the actuator based on the identity of the unknown person.
  • FIG. 1 is an example user authentication system incorporated into a vehicle according to principles of the present disclosure.
  • FIG. 2 is an example functional flow diagram depicting example architecture of a gait identification (ID) system according to principles of the present disclosure.
  • ID gait identification
  • FIG. 3 is an example functional block diagram of an example gait identification (ID) system according to principles of the present disclosure.
  • ID gait identification
  • FIG. 4 is an example flowchart depicting example authentication when requesting a ride share vehicle according to principles of the present disclosure.
  • FIG. 5 is an example flowchart depicting example generation and storage of a registered user reference gait feature according to principles of the present disclosure.
  • Pose features of a user provide unique and secure information for user authentication without requesting biometrics during a potential cumbersome explicit authentication request.
  • a registered user can be verified when walking towards a requested vehicle.
  • the requested vehicle may have cameras mounted at various points surrounding the vehicle and can identify individuals approaching the vehicle.
  • a gait identification system receives videos of the approaching individual and identifies or extracts two sets of features from the approaching individual: appearance and gait. Both features are represented as feature vectors by trained machine learning models.
  • the gait identification system removes appearance from the features extracted from a video of the approaching individual. Therefore, the resulting extracted features only include pose features.
  • the pose features of the approaching user can then be compared to a known or registered user database including pose features for each registered user.
  • the approaching user is authenticated and identified as the requesting user.
  • the requested vehicle may alert the driver that the approaching user is verified.
  • the authentication of the approaching user may prompt an actuator of the vehicle to unlock one of the doors of the vehicle or open one of the doors of the vehicle.
  • the walking pattern of individuals or gait is one of the most important biometrics modalities, which allows it to be used as an authentication metric.
  • Most of the existing gait recognition methods take silhouettes or articulated body models as the pose features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, objects being carried, and viewing angle.
  • the gait identification system of the present disclosure includes a novel autoencoder framework to explicitly disentangle pose (individual frames of the gait) and appearance features from RGB imagery and long short-term memory (LSTM)-based integration of pose features over time (as captured in a video) to produce a gait feature.
  • LSTM long short-term memory
  • the gait identification system collects a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since frontal-view walking contains minimal gait cues compared to other views.
  • FVG does include other important variations, for example, walking speed, objects being carried, and clothing worn.
  • Biometrics measure a user's unique physical and behavioral characteristics to recognize the identity of the user.
  • Gait is one of the biometrics modalities, such as face, fingerprint, and iris.
  • Gait recognition has the advantage that it can operate at a distance without user cooperation. Also, gait is difficult to camouflage. Therefore, authenticating a user's gait is not intrusive, does not require an explicit authentication request or explicit performance of an authentication measurement, and gait is difficult to forge. Due to these advantages, gait recognition is useful for many applications, such as person identification, criminal investigation, and healthcare.
  • the core of gait recognition lies in extracting gait-related features from the video frames of a walking person, where the prior approaches are categorized into two types: appearance-based and model-based methods.
  • the appearance-based methods such as gait energy image (GEI) or gait entropy image (GEnI), are defined by extracting silhouette masks. Specifically, GEI uses an averaged silhouette image as the gait representation for a video. These methods are popular in the gait recognition community for their simplicity and effectiveness. However, they often suffer from sizeable intra-subject appearance changes due to covariates including clothing changes, objects being carried, viewing angle changes, and walking speed variations.
  • model-based methods While GEI has a low computational cost and can handle low-resolution imagery, GEI can be sensitive to the above-mentioned covariates. In contrast, the model-based method first performs pose estimation and accepts articulated body skeleton as the gait feature. On the other hand, model-based methods fit articulated body models to images and extract kinematic features such as 2D body joints. While they are robust to some covariates such as clothing and speed, they require a relatively higher image resolution for reliable pose estimation and higher computational costs.
  • the gait ID system 108 receives video of users around the vehicle 104 from a camera 112 - 1 or a plurality of cameras 112 - 1 , 112 - 2 , . . . , 112 -N, collectively 112 , mounted on the vehicle 104 .
  • the plurality of cameras 112 may be mounted at various points on the vehicle 104 .
  • the user authentication system 100 may include the plurality of cameras 112 mounted on the vehicle 104 , positioned to capture an entirety of the environment surrounding the vehicle 104 (additional cameras capturing entire surrounding environment not shown). In this way, the plurality of cameras 112 can identify users approaching the vehicle 104 from a variety of directions.
  • the user authentication system 100 also includes an actuator 116 .
  • the gait ID system 108 upon authenticating a video stream of an approaching user received from the plurality of cameras 112 , instructs the actuator 116 to unlock a rear door of the vehicle 104 .
  • the actuator 116 may open the rear door of the vehicle 104 .
  • multiple actuators may be included for each door of the vehicle 104 , and the gait ID system 108 may instruct the actuator to unlock or open the door nearest the approaching user.
  • authentication of the user may also (or alternatively) result in a notification to the driver's phone or mobile computing device that the user has been authenticated and which camera captured the user, indicating a direction or location of the authenticated user.
  • the gait ID system 108 includes a novel convolutional neural network (CNN)-based model to automatically learn the disentangled gait feature, or appearance feature, from a walking video of an approaching user to verify and/or register the user.
  • CNN convolutional neural network
  • the CNN-based model relies on pose features of a walking user, as opposed to handcrafted GEI, or skeleton-based features. While many conventional gait databases study side-view imagery, the gait ID system 108 collects a new gait database where both gallery and probe are captured in frontal-views. While particular reference is made to convolutional neural networks, it is readily understood that other types of neural networks (e.g., residual neural networks) as well as other types of machine learning also fall within the scope of this disclosure.
  • the gait ID system 108 disentangles the gait feature from the visual appearance of the approaching user.
  • disentanglement is achieved by manually handcrafting the GEI or body skeleton, since neither has color information.
  • manual disentanglements may lose certain information or create redundant gait information.
  • GEI represents the average contours over time but not the dynamics of how body parts move.
  • certain body joints such as hands may have fixed positions and are redundant information to gait.
  • the gait ID system 108 automatically disentangles the pose features from appearance features and uses the extracted pose features to generate pose features for gait recognition.
  • the pose features are generated by extracting pose features from each frame of a captured video of an approaching user.
  • the disentanglement performed by the gait ID system 108 is realized by designing an autoencoder-based CNN with novel loss functions.
  • the encoder estimates two latent representations, (i) pose feature representation (that is, frame-based gait feature) and (ii) appearance feature representation, by employing two loss functions.
  • the two loss functions include (i) cross reconstruction loss and (ii) gait similarity loss.
  • the cross reconstruction loss enforces that the appearance feature of one frame, fused with the pose feature of another frame, can be decoded to the latter frame.
  • the gait similarity loss forces a sequence of pose features extracted from a video sequence of the same subject to be similar even under different conditions.
  • the pose features of a sequence are fed into a multi-layer LSTM with a designed incremental identity loss to generate the sequence-based gait feature, where two of which can use the cosine distance as the video-to-video similarity metric.
  • the FVG database may collect three frontal-view angles where the subject walks from left ⁇ 45°, 0°, and right 45° off the optical axes of the camera 112 - 1 or the plurality of cameras 112 . For each of three angles, different variants are explicitly captured including walking speed, clothing, carrying, clutter background, etc. Such a robust FVG database results in a more accurate CNN model for disentangling pose and appearance features.
  • the user authentication system 100 implements the gait ID system 108 to learn gait information from raw RGB video frames, which contain richer information, thus with higher potential of extracting discriminative pose features.
  • the present CNN-based approach has the advantage of being able to leverage a large amount of training data and learning more discriminative representation from data with multiple covariates to create an average gait feature representation from pose features extracted from a plurality of video frames.
  • the present FVG database focuses on the frontal view, with three different near frontal-view angles towards the camera, and other variations including walking speed, carrying, clothing, cluttered background and time.
  • the present method has only one encoder to disentangle the appearance and gait information, as shown in FIG. 2 , through the design of novel loss functions without the need for adversarial training.
  • the present method does not require adversarial training, which makes training more accessible.
  • FIG. 2 is a functional flow diagram depicting the architecture of a gait identification (ID) system 200 of the present disclosure.
  • ID gait identification
  • the objective is to design an algorithm, from which the pose features of video 1 and 2 are the same, while those of video 2 and 3 are different.
  • the long down coat can easily dominate the feature extraction, which would make videos 2 and 3 to be more similar than videos 1 and 2 in the latent space of pose features.
  • the core challenge, as well as the objective, of gait recognition is to extract pose features that are discriminative among subjects but invariant to different confounding factors, such as viewing angles, walking speeds, and appearance.
  • the approach of the gait ID system 200 is to achieve the gait feature representation via feature disentanglement by separating the gait feature from appearance information for a given walking video.
  • the input to the gait ID system 200 is video frames 204 , with background removed using a detection and segmentation method.
  • An encoder-decoder network 208 is used to disentangle the appearance and pose features for each video frame.
  • a multi-layer LSTM 212 explores the temporal dynamics of pose features and aggregates them into a sequence-based gait feature for identification purposes.
  • the gait ID system 200 learns to disentangle the gait feature from the visual appearance in an unsupervised manner. Since a video is composed of frames, disentanglement should be conducted on the frame level first. Because there is no dynamic information within a video frame, the gait ID system 200 disentangles the pose feature from the visual appearance for each frame. The dynamics of pose features over a sequence will contribute to the gait feature. In other words, the pose feature is the manifestation of video-based gait feature at a specific frame or point in time.
  • the encoder-decoder network 208 architecture is used with carefully designed loss functions to disentangle the pose feature from appearance feature.
  • the functions defined for learning the encoder, ⁇ , and decoder D include cross reconstruction loss and gait similarity loss.
  • the reconstructed I should be close to the original input I.
  • enforcing self-reconstruction loss as in a typical autoencoder cannot ensure the appearance f a learning appearance information across the video and f g representing pose information in each frame. Therefore, the cross reconstruction loss, using an appearance feature f a t 1 of one frame and pose feature f g t 2 of another one to reconstruct the latter frame:
  • the cross reconstruction loss can play a role as the self-reconstruction loss to make sure the two features are sufficiently representative to reconstruct video frames.
  • a pose feature of a current frame can be paired to the appearance feature of any frame in the same video to reconstruct the same target (using the decoder of the encoder-decoder network 208 ), it enforces the appearance features to be similar across all frames.
  • the cross reconstruction loss prevents the appearance feature f a from being over-represented, containing pose variation that changes between frames. However, appearance information may still be leaked into pose feature f g .
  • f a is a constant vector while f g encodes all the information of a video frame.
  • a gait similarity module 216 receives multiple videos of the same subject. Extra videos can introduce the change in appearance. Given two videos of the same subject with length n 1 , n 2 in two different conditions c 1 , c 2 . Ideally, c 1 , c 2 should contain difference in the user's appearance, for example, a change of clothes. In an implementation, only one video per user may be accessible for registration and matching.
  • gait similarity module 216 While appearance changes, the gait information should be consistent between two videos. Since it's almost impossible to enforce similarity on f g between video frames as it requires precise frame-level alignment, similarity between two videos' is enforced by averaging pose features using the gait similarity module 216 :
  • the current feature f g only contains the walking pose of the person in a specific instance, which can share similarity with another specific instance of a very different person.
  • the gait ID system 200 is looking for discriminative characteristics in a user's walking pattern. Therefore, modeling its temporal change is critical. This is where temporal modeling architectures like the recurrent neural network or LSTM work best.
  • the gait ID system 200 includes a multi-layer LSTM 212 structure to explore spatial (e.g., the shape of a person) and mainly, temporal (e.g., how the trajectory of subjects' body parts changes over time) information on pose features extracted from the input video frames 204 by the encoder-decoder network 208 .
  • pose features extracted from one video sequence are feed into the three-layer LSTM 212 .
  • the output of the LSTM 212 is connected to a classifier C, in this case, a linear classifier is used, to classify the user's identity.
  • h t be the output of the LSTM 212 at time step t, which is accumulated after feeding t pose features f g into the LSTM 212 :
  • h t LSTM( f g 1 ,f g 2 , . . . ,f g t ) (5)
  • the output h t is greatly affected by its last input f g t .
  • the LSTM output, h t can be varied across time steps.
  • the averaged LSTM output can be used as the gait feature for identification:
  • the LSTM 212 is expected to learn that the longer the video sequence, the more walking information it processes, then the more confident it identifies the subject. Instead of minimizing the loss on the final time step, all the intermediate outputs of every time step weighted by w t is used:
  • the gait ID system 200 including the encoder-decoder network 208 and LSTM 212 are jointly trained. Updating ⁇ to optimize id-inc-avg also helps to further generate pose features that have identity information and on which the LSTM 212 is able to explore temporal dynamics.
  • the output f gait t of the LSTM 212 is the gait feature of the video and used as the identity feature representation for matching or verifying an approaching user.
  • the cosine similarity score is used as the metric, as described as a distance metric between a known registered gait feature and present gait feature.
  • the gait ID system 200 receives video frames 204 with the person of interest segmented.
  • the foreground mask is obtained from the state-of-the-art instance segmentation, Mask R-CNN.
  • the soft mask returned by the network is kept, where each pixel indicates the probability of it being a person. This is partially due to the difficulty in choosing a threshold. Also, it prevents the loss in information due to the mask estimation error.
  • the encoder-decoder network 208 is a typical CNN. Encoder consisting of 4 stride-2 convolution layers following by Batch Normalization and Leaky ReLU activation. The decoder structure is an inverse of the encoder, built from transposed convolution, Batch Normalization and Leaky ReLU layers.
  • the final layer has a Sigmoid activation to bring the value into [0; 1] range as the input.
  • the classification part is a stacked 3-layer LSTM 212 , which has 256 hidden units in each of cells. Since video lengths are varied, a random crop of 20-frame sequence is applied; all shorter videos are discarded.
  • the gait ID system 300 includes an initial processing module 304 that receives a video as input.
  • the video may be obtained in real time by the camera 112 - 1 mounted on the vehicle 104 shown in FIG. 1 .
  • an instruction may also be input into the gait ID system 300 , after pose features are extracted, indicating whether the input video is to register a new user or to authenticate the presently recorded approaching user.
  • the initial processing module 304 is configured to prepare the received video for feature extraction.
  • the preparation includes cropping the video, parsing the video into individual frames, removing the background from each frame, etc.
  • each individual frame of the video is analyzed and the pose and appearance features separated for combination of only the pose features of each frame to construct the pose features of the approaching individual captured in the video.
  • the processed frames of the video are then forwarded to a feature identification module 308 .
  • the feature identification module 308 implements a trained machine learning model that has a similar architecture to the encoder-decoder network of FIG. 2 .
  • the feature identification module 308 separates, from each frame, pose and appearance features using the trained machine learning model, such as a CNN model.
  • the feature identification module 308 identifies the appearance feature and removes the appearance feature from each of the frames.
  • the feature identification module 308 may also be configured to enforce similarity between frames of the same individual across multiple videos.
  • the pose feature of each frame is forwarded to an aggregation module 312 .
  • the aggregation module 312 combines the pose features of each frame to generate a mean or averaged gait feature over time. Aggregating the pose feature of each frame is important to create a gait feature of the approaching user walking using a plurality of pose features since each pose feature includes the pose of the approaching user only in a specific instance.
  • the aggregation module 312 may implement an LSTM model that is trained to average pose features from individual pose features.
  • the aggregation module 312 also receives an instruction from, for example, a computing device operated by the user or an operator of the vehicle and/or ride share service, to instruct whether the input video is being used to register a new user or authenticate a present approaching user.
  • a ride share service if a user requests a vehicle through a ride share application, the user can choose to be authenticated based on gait. Alternatively, the ride share service can require such authentication. Then, if the gait ID system 300 implemented by the ride share service does not have any gait information for the user, the user may be registered by the requested vehicle. In such a situation, the operator of the vehicle may request that the user walk toward a camera mounted on the vehicle and the operator instructs the gait ID system 300 that the video is intended to register the user. When first registering, alternative authentication may be used.
  • a single reference video of the user may be used to register the user or a plurality of videos at different angles under difference conditions may be captured and stored for the user over a period of time.
  • the user may be registered at a different point other than when first ordering a vehicle. Therefore, when a user is first being registered, the operator of the vehicle including the gait ID system 300 may instruct the system that the present video is being captured for registration purposes of the user requesting the vehicle. Otherwise, the gait ID system 300 may assume (or know based on the user ID) that the user is registered.
  • the aggregation module 312 When the aggregation module 312 receives an instruction indicating the user is being registered, the aggregation module 312 directs the gait feature to be stored in a registered user gait database 316 corresponding to a user ID of the user. Then, when the user is being authenticated for a future ride request, the gait ID system 300 can access the gait feature of the user from the registered user gait database 316 according to the user ID to verify the user's identity.
  • the aggregation module 312 forwards the constructed present gait feature to a comparison module 320 .
  • the comparison module 320 obtains a stored gait feature of the approaching user from the registered user gait database 316 based on a user ID.
  • the registered user gait database 316 stores pose features with a corresponding user ID in order to compare the stored pose features to the real time analyzed pose features of approaching users.
  • the comparison module 320 compares the present gait feature to the stored gait feature by determining a distance value between the two features, for example, a cosine similarity score as a distance metric described previously. The difference between the two pose features is represented as a distance function. Then, the distance is forwarded to a verification module 324 which determines whether the distance is within a predetermined threshold. Then, the verification module 324 forwards an authentication instruction or an instruction that the approaching user is not authenticated to an instruction generation module 328 . The instruction generation module 328 sends the authentication instruction to an actuator control module 332 to actuate an actuator on the vehicle, operating to unlock and/or open a particular door of the vehicle when the user has been authenticated.
  • a distance value between the two features for example, a cosine similarity score as a distance metric described previously.
  • the difference between the two pose features is represented as a distance function.
  • the distance is forwarded to a verification module 324 which determines whether the distance is within a predetermined threshold. Then
  • an instruction may optionally be sent to an alert generation module 336 .
  • the alert generation module 336 may generate and transmit an alert to the computing device operated by the vehicle owner and/or a mobile computing device operated by the approaching user indicating that the user is not authenticated.
  • the alert may be visual, audio, and/or haptic feedback.
  • Control begins in response to an authentication request.
  • the authentication request may be received each time a user approaches a vehicle expecting a user. For example, after a ride is requested and the vehicle reaches a pick up location, the plurality of cameras mounted on the vehicle may be initiated and capturing users surrounding the vehicle.
  • the gait ID system may instruct the camera with the best view of the approaching user feed video captured of the approaching user to be authenticated.
  • the camera with the best view would be the camera facing the approaching user, the camera angle being parallel with the walking direction of the approaching user.
  • the requested user may perform a particular motion to initiate authentication, such as a wave motion that the initial processing module described in FIG. 3 can identify as a prompt to begin authentication.
  • the user may indicate using their phone or computing device that the user sees the vehicle and is going to begin approaching, so the gait ID system receives videos for any users surrounding the vehicle and attempts to authenticate all viewed users until one of the approaching users is authenticated.
  • control proceeds to 404 to obtain video of an approaching user.
  • control is receiving video from multiple cameras of multiple individuals at the same time. Therefore, control may be attempting to authenticate various users at the same time.
  • control continues to 408 to prepare the obtained or received video for feature extraction. The preparation may include parsing of the video into multiple frames, removing background pixels, etc.
  • Control then continues to 412 to extract a pose feature vector from each frame of the video. The extraction involves disentangling the pose feature of the frame from the appearance feature of the frame using machine learning.
  • control proceeds to 416 to aggregate the pose feature of each frame to generate a gait feature representing the approaching user in the video.
  • the gait feature is a mean representation of the pose features of each frame over time.
  • control continues to 420 to obtain a stored gait feature from a database corresponding to the requesting user.
  • the requesting user is the user that requested the vehicle.
  • control determines a distance between the gait feature and the stored gait feature.
  • control determines whether the distance is greater than a predetermined threshold. If yes, control has determined that the distance between the gait feature and the stored gait feature is too distant, indicating that the approaching user cannot be authenticated as the requesting user. Therefore, control proceeds to 432 to identify the user as not the requesting user. Control may then optionally proceed to 436 to generate an alert. Then, control ends.
  • an alert may not be necessary and, instead, continuous authentication attempts are performed in response to capturing a user approaching the vehicle.
  • control determines that the distance is less than the predetermined threshold
  • control proceeds to 440 to authenticate the approaching user as the requesting user. This is because the distance indicates that the gait feature and the stored gait feature of the requesting user are similar enough to verify the identity of the approaching user. Then, control proceeds to 444 to send an instruction to unlock the vehicle. In various implementations, control may instead send a verification to a computing device of the vehicle operator and indicate a direction or location of the authenticated user. Then, control ends.
  • Control begins in response to receiving a registration request.
  • a new user can register when first requesting the ride share service.
  • Registering involves allowing the capture of a frontal view video of the user walking toward a camera for gait feature extraction.
  • control obtains a video of the new user. Then, at 508 , control prepares the video for feature extraction. As mentioned previously, this preparation includes parsing the video into frames as well as removing background pixels from each frame. The preparation may further include cropping the video to only include a predetermined number of frames.
  • control extracts a pose feature vector from each frame of the video of the new user.
  • Control continues to 516 to aggregate the pose feature of each frame into a gait feature vector over time. Then, control proceeds to 520 to store the gait feature vector in the database as corresponding to the now registered user. Then, when authenticating an approaching user, the gait ID system can access the database of registered users. Then, control ends.
  • the techniques described herein may be implemented by one or more computer programs executed by one or more processors.
  • the computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium.
  • the computer programs may also include stored data.
  • Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
  • the present disclosure also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer.
  • a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • module or the term “controller” may be replaced with the term “circuit.”
  • the term “module” may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); a digital, analog, or mixed analog/digital discrete circuit; a digital, analog, or mixed analog/digital integrated circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor circuit (shared, dedicated, or group) that executes code; a memory circuit (shared, dedicated, or group) that stores code executed by the processor circuit; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. While various embodiments have been disclosed, other variations may be employed. All of the components and function may be interchanged in various combinations. It is intended by the following claims to cover these and any other departures from the disclosed embodiments which fall within the true spirit of this invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

Gait, the walking pattern of individuals, is one of the most important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, carrying and view angle. To remedy this issue, a novel AutoEncoder framework is presented to explicitly disentangle pose and appearance features from RGB imagery and a long short-term memory integration of pose features over time produces the gait feature.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application No. 62/964,313, filed on Jan. 22, 2020. The entire disclosure of the above application is incorporated herein by reference.
FIELD
The present disclosure relates to gait recognition and, more specifically, to gait recognition implemented as an authentication method.
BACKGROUND
Automatically authenticating a user's identity prior to the user reaching a vehicle door for a ride-sharing service, for example, is of significant value for customer convenience and security. Although biometrics, such as facial features or fingerprints, are widely used to identify a person, gait recognition has the advantage that it can operate at a distance without user cooperation. Additionally, a user's gait is a soft biometric trait, which is relatively difficult to impersonate.
The background description provided here is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
SUMMARY
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
An improved authentication system is presented. The authentication system comprises: a camera, a feature extractor, an aggregator, a classifier and a data store. The camera is configured to capture two or more images of an unknown person walking. The feature extractor is configured to receive the two or more images and, for each image in the two or more images, operates to extract a set of appearance features and a set of pose features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person. In one embodiment, the feature extractor is a neural network trained to disentangle the pose features from the appearance features. The aggregator is configured to receive multiple sets of pose features from the feature extractor and generate a gait feature for the unknown person. The data store is configured to store a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person. The classifier is configured to receive the gait feature from the aggregator and operates to identify the unknown person by comparing the gait feature to the plurality of gait features stored in the data store. The authentication system may further include a pre-processor interposed between the camera and the feature extractor which operates to remove background from each of the two or more images.
In one embodiment, the neural network is further defined as a convolutional neural network. The neural network may be trained using cross reconstruction loss. That is, the neural network is trained by comparing a given image from the two or more images with a reconstructed image, where the reconstructed image was reconstructed using a set of pose features from one image in the two or more images and appearance features from another image in the two or more images.
In some embodiments, the aggregator is further defined as a long short-term memory, such that the classifier averages output from the long short-term memory over time.
In other embodiments, the authentication system further includes a verification module and an actuator. The verification module receives an identity for the unknown person from the classifier and actuates the actuator based on the identity of the unknown person.
In another aspect, a computer-implemented method is presented for authenticating a person. The method includes: capturing a video of an unknown person walking; parsing the video into two or more image frames; for each image in the two or more images, disentangling a set of pose features from a set of appearance features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person; generating a gait feature for the unknown person from the multiple sets of pose features; and identifying the unknown person by comparing the gait feature for the unknown person to a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The present disclosure will become more fully understood from the detailed description and the accompanying drawings.
FIG. 1 is an example user authentication system incorporated into a vehicle according to principles of the present disclosure.
FIG. 2 is an example functional flow diagram depicting example architecture of a gait identification (ID) system according to principles of the present disclosure.
FIG. 3 is an example functional block diagram of an example gait identification (ID) system according to principles of the present disclosure.
FIG. 4 is an example flowchart depicting example authentication when requesting a ride share vehicle according to principles of the present disclosure.
FIG. 5 is an example flowchart depicting example generation and storage of a registered user reference gait feature according to principles of the present disclosure.
In the drawings, reference numbers may be reused to identify similar and/or identical elements.
DETAILED DESCRIPTION
Pose features of a user provide unique and secure information for user authentication without requesting biometrics during a potential cumbersome explicit authentication request. For example, implemented in a ride share system, a registered user can be verified when walking towards a requested vehicle. The requested vehicle may have cameras mounted at various points surrounding the vehicle and can identify individuals approaching the vehicle. During authentication of an approaching individual, a gait identification system receives videos of the approaching individual and identifies or extracts two sets of features from the approaching individual: appearance and gait. Both features are represented as feature vectors by trained machine learning models.
To ensure that registered users can be properly verified independent of a present outfit the user is wearing, that is, authenticated only based on their pose features, the gait identification system removes appearance from the features extracted from a video of the approaching individual. Therefore, the resulting extracted features only include pose features. The pose features of the approaching user can then be compared to a known or registered user database including pose features for each registered user.
If the pose features of the approaching user match the pose features of the registered user requesting the ride (within a confidence threshold), then the approaching user is authenticated and identified as the requesting user. In response, the requested vehicle may alert the driver that the approaching user is verified. Additionally or alternatively, the authentication of the approaching user may prompt an actuator of the vehicle to unlock one of the doors of the vehicle or open one of the doors of the vehicle.
The walking pattern of individuals or gait is one of the most important biometrics modalities, which allows it to be used as an authentication metric. Most of the existing gait recognition methods take silhouettes or articulated body models as the pose features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, objects being carried, and viewing angle. To remedy these issues, the gait identification system of the present disclosure includes a novel autoencoder framework to explicitly disentangle pose (individual frames of the gait) and appearance features from RGB imagery and long short-term memory (LSTM)-based integration of pose features over time (as captured in a video) to produce a gait feature.
The gait identification system collects a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since frontal-view walking contains minimal gait cues compared to other views. FVG does include other important variations, for example, walking speed, objects being carried, and clothing worn. With that, the gait identification system of the present disclosure demonstrates superior performance to the state of the arts quantitatively, the ability of feature disentanglement qualitatively, and provides promising computational efficiency.
Biometrics measure a user's unique physical and behavioral characteristics to recognize the identity of the user. Gait is one of the biometrics modalities, such as face, fingerprint, and iris. Gait recognition has the advantage that it can operate at a distance without user cooperation. Also, gait is difficult to camouflage. Therefore, authenticating a user's gait is not intrusive, does not require an explicit authentication request or explicit performance of an authentication measurement, and gait is difficult to forge. Due to these advantages, gait recognition is useful for many applications, such as person identification, criminal investigation, and healthcare.
The core of gait recognition lies in extracting gait-related features from the video frames of a walking person, where the prior approaches are categorized into two types: appearance-based and model-based methods. The appearance-based methods, such as gait energy image (GEI) or gait entropy image (GEnI), are defined by extracting silhouette masks. Specifically, GEI uses an averaged silhouette image as the gait representation for a video. These methods are popular in the gait recognition community for their simplicity and effectiveness. However, they often suffer from sizeable intra-subject appearance changes due to covariates including clothing changes, objects being carried, viewing angle changes, and walking speed variations.
While GEI has a low computational cost and can handle low-resolution imagery, GEI can be sensitive to the above-mentioned covariates. In contrast, the model-based method first performs pose estimation and accepts articulated body skeleton as the gait feature. On the other hand, model-based methods fit articulated body models to images and extract kinematic features such as 2D body joints. While they are robust to some covariates such as clothing and speed, they require a relatively higher image resolution for reliable pose estimation and higher computational costs.
Referring to FIG. 1, a user authentication system 100 is shown incorporated into a vehicle 104. The vehicle 104 may be a personal vehicle or registered as a ride share vehicle with a particular service. While the user authentication system 100 of the present disclosure is mainly described as being incorporated into a vehicle being used for ride share services, the user authentication system 100 may be used to authenticate an owner of a personal vehicle, an owner of a structure or building, and additional authentication systems granting access to particular, registered users. The vehicle 104 includes an operating system with a gait identification (ID) system 108. The gait ID system 108 identifies users that are approaching the vehicle 104 and authenticates the user's identity based on their gait. The gait ID system 108 receives video of users around the vehicle 104 from a camera 112-1 or a plurality of cameras 112-1, 112-2, . . . , 112-N, collectively 112, mounted on the vehicle 104. The plurality of cameras 112 may be mounted at various points on the vehicle 104.
The user authentication system 100 may include the plurality of cameras 112 mounted on the vehicle 104, positioned to capture an entirety of the environment surrounding the vehicle 104 (additional cameras capturing entire surrounding environment not shown). In this way, the plurality of cameras 112 can identify users approaching the vehicle 104 from a variety of directions.
The user authentication system 100 also includes an actuator 116. The gait ID system 108, upon authenticating a video stream of an approaching user received from the plurality of cameras 112, instructs the actuator 116 to unlock a rear door of the vehicle 104. In various implementations, the actuator 116 may open the rear door of the vehicle 104. Additionally or alternatively, multiple actuators may be included for each door of the vehicle 104, and the gait ID system 108 may instruct the actuator to unlock or open the door nearest the approaching user. As mentioned previously, authentication of the user may also (or alternatively) result in a notification to the driver's phone or mobile computing device that the user has been authenticated and which camera captured the user, indicating a direction or location of the authenticated user.
The gait ID system 108 includes a novel convolutional neural network (CNN)-based model to automatically learn the disentangled gait feature, or appearance feature, from a walking video of an approaching user to verify and/or register the user. The CNN-based model relies on pose features of a walking user, as opposed to handcrafted GEI, or skeleton-based features. While many conventional gait databases study side-view imagery, the gait ID system 108 collects a new gait database where both gallery and probe are captured in frontal-views. While particular reference is made to convolutional neural networks, it is readily understood that other types of neural networks (e.g., residual neural networks) as well as other types of machine learning also fall within the scope of this disclosure.
It is understandable that the challenge in designing a gait feature is the necessity of being invariant to the appearance variation due to clothing, viewing angle, carrying, etc. Therefore, the gait ID system 108 disentangles the gait feature from the visual appearance of the approaching user. For both appearance-based and model-based methods, disentanglement is achieved by manually handcrafting the GEI or body skeleton, since neither has color information. However, manual disentanglements may lose certain information or create redundant gait information. For example, GEI represents the average contours over time but not the dynamics of how body parts move. Similarly, for the body skeleton, when the approaching user is carrying an item, certain body joints such as hands may have fixed positions and are redundant information to gait.
To remedy the issues in handcrafted features, the gait ID system 108 automatically disentangles the pose features from appearance features and uses the extracted pose features to generate pose features for gait recognition. The pose features are generated by extracting pose features from each frame of a captured video of an approaching user. The disentanglement performed by the gait ID system 108 is realized by designing an autoencoder-based CNN with novel loss functions.
For each video frame, the encoder estimates two latent representations, (i) pose feature representation (that is, frame-based gait feature) and (ii) appearance feature representation, by employing two loss functions. The two loss functions include (i) cross reconstruction loss and (ii) gait similarity loss. The cross reconstruction loss enforces that the appearance feature of one frame, fused with the pose feature of another frame, can be decoded to the latter frame. The gait similarity loss forces a sequence of pose features extracted from a video sequence of the same subject to be similar even under different conditions. Finally, the pose features of a sequence are fed into a multi-layer LSTM with a designed incremental identity loss to generate the sequence-based gait feature, where two of which can use the cosine distance as the video-to-video similarity metric.
Furthermore, most prior work often chose a walking video of a side view, which has the richest gait information, as the gallery sequence. However, other viewing angles, such as the frontal view, can be very common when pedestrians are walking toward or away from a camera, such as the plurality of cameras 112. Prior work also focusing on frontal views are often based on RGB-D videos, which have richer depth information than RGB videos. Therefore, to encourage gait recognition from the frontal-view RGB videos that generally has the minimal amount of gait information, the CNN training videos are high-definition videos to construct the FVG database with a wide range of variations. In various implementations, the FVG database may collect three frontal-view angles where the subject walks from left −45°, 0°, and right 45° off the optical axes of the camera 112-1 or the plurality of cameras 112. For each of three angles, different variants are explicitly captured including walking speed, clothing, carrying, clutter background, etc. Such a robust FVG database results in a more accurate CNN model for disentangling pose and appearance features.
The user authentication system 100 implements the gait ID system 108 to learn gait information from raw RGB video frames, which contain richer information, thus with higher potential of extracting discriminative pose features. The present CNN-based approach has the advantage of being able to leverage a large amount of training data and learning more discriminative representation from data with multiple covariates to create an average gait feature representation from pose features extracted from a plurality of video frames. The present FVG database focuses on the frontal view, with three different near frontal-view angles towards the camera, and other variations including walking speed, carrying, clothing, cluttered background and time.
The present method has only one encoder to disentangle the appearance and gait information, as shown in FIG. 2, through the design of novel loss functions without the need for adversarial training. The present method does not require adversarial training, which makes training more accessible. To disentangle gait and appearance feature from the RGB information, there is no gait nor appearance label to be utilized, since the type of walking pattern or clothes cannot be defined as discrete classes.
FIG. 2 is a functional flow diagram depicting the architecture of a gait identification (ID) system 200 of the present disclosure. In an example, assuming there are three videos, where videos 1 and 2 capture subject A wearing a t-shirt and a long down coat, respectively, and in video 3 subject B wears the same long down coat as in video 2. The objective is to design an algorithm, from which the pose features of video 1 and 2 are the same, while those of video 2 and 3 are different. Clearly, this is a challenging objective, as the long down coat can easily dominate the feature extraction, which would make videos 2 and 3 to be more similar than videos 1 and 2 in the latent space of pose features. Indeed, the core challenge, as well as the objective, of gait recognition is to extract pose features that are discriminative among subjects but invariant to different confounding factors, such as viewing angles, walking speeds, and appearance.
The approach of the gait ID system 200 is to achieve the gait feature representation via feature disentanglement by separating the gait feature from appearance information for a given walking video. As shown in FIG. 2, the input to the gait ID system 200 is video frames 204, with background removed using a detection and segmentation method. An encoder-decoder network 208, with carefully designed loss functions, is used to disentangle the appearance and pose features for each video frame. Then, a multi-layer LSTM 212 explores the temporal dynamics of pose features and aggregates them into a sequence-based gait feature for identification purposes.
For the majority of gait recognition datasets, there is a limited appearance variation within each subject. Hence, appearance could be a discriminate cue for identification during training as many subjects can be easily distinguished by their clothes. Unfortunately, any networks or feature extractors relying on appearance will not generalize well on the test set or in practice, due to potentially diverse clothing or appearance between two videos of the same subject. This limitation on training sets also prevents models from learning good feature extractors if solely relying on identification objective.
Therefore, the gait ID system 200 learns to disentangle the gait feature from the visual appearance in an unsupervised manner. Since a video is composed of frames, disentanglement should be conducted on the frame level first. Because there is no dynamic information within a video frame, the gait ID system 200 disentangles the pose feature from the visual appearance for each frame. The dynamics of pose features over a sequence will contribute to the gait feature. In other words, the pose feature is the manifestation of video-based gait feature at a specific frame or point in time.
Therefore, the encoder-decoder network 208 architecture is used with carefully designed loss functions to disentangle the pose feature from appearance feature. The encoder, ε, encodes a feature representation of each frame, I, and explicitly splits it into two parts, namely appearance fa and pose fg features:
f a ,f g=ε(I)  (1)
These two features are expected to fully describe the original input image. As they can be decoded back to the original input through a decoder D:
I=D(f a ,f g)  (2)
The functions defined for learning the encoder, ε, and decoder D include cross reconstruction loss and gait similarity loss. The reconstructed I should be close to the original input I. However, enforcing self-reconstruction loss as in a typical autoencoder cannot ensure the appearance fa learning appearance information across the video and fg representing pose information in each frame. Therefore, the cross reconstruction loss, using an appearance feature fa t 1 of one frame and pose feature fg t 2 of another one to reconstruct the latter frame:
xreco n = ( f a t 1 , f g t 2 ) - I t 2 2 2 ( 3 )
where It is the video frame at the time step t.
The cross reconstruction loss can play a role as the self-reconstruction loss to make sure the two features are sufficiently representative to reconstruct video frames. On the other hand, as a pose feature of a current frame can be paired to the appearance feature of any frame in the same video to reconstruct the same target (using the decoder of the encoder-decoder network 208), it enforces the appearance features to be similar across all frames.
The cross reconstruction loss prevents the appearance feature fa from being over-represented, containing pose variation that changes between frames. However, appearance information may still be leaked into pose feature fg. In an extreme case, fa is a constant vector while fg encodes all the information of a video frame. In an example, to make fg “cleaner,” a gait similarity module 216 receives multiple videos of the same subject. Extra videos can introduce the change in appearance. Given two videos of the same subject with length n1, n2 in two different conditions c1, c2. Ideally, c1, c2 should contain difference in the user's appearance, for example, a change of clothes. In an implementation, only one video per user may be accessible for registration and matching.
While appearance changes, the gait information should be consistent between two videos. Since it's almost impossible to enforce similarity on fg between video frames as it requires precise frame-level alignment, similarity between two videos' is enforced by averaging pose features using the gait similarity module 216:
gait - si m = 1 n 1 t = 1 n 1 f g ( t , c 1 ) - 1 n 2 t = 1 n 2 f g ( t , c 2 ) 2 2 ( 4 )
Even when appearance and pose information can be disentangled for each video frame, the current feature fg only contains the walking pose of the person in a specific instance, which can share similarity with another specific instance of a very different person. The gait ID system 200 is looking for discriminative characteristics in a user's walking pattern. Therefore, modeling its temporal change is critical. This is where temporal modeling architectures like the recurrent neural network or LSTM work best.
As mentioned previously, the gait ID system 200 includes a multi-layer LSTM 212 structure to explore spatial (e.g., the shape of a person) and mainly, temporal (e.g., how the trajectory of subjects' body parts changes over time) information on pose features extracted from the input video frames 204 by the encoder-decoder network 208. As shown in FIG. 2, pose features extracted from one video sequence are feed into the three-layer LSTM 212. The output of the LSTM 212 is connected to a classifier C, in this case, a linear classifier is used, to classify the user's identity.
Let ht be the output of the LSTM 212 at time step t, which is accumulated after feeding t pose features fg into the LSTM 212:
h t=LSTM(f g 1 ,f g 2 , . . . ,f g t)  (5)
An option for identification is to add the classification loss on top of the LSTM output of the final time step:
Figure US11315363-20220426-P00001
id-single=−log(C k(h n))  (6)
which is the negative log likelihood that the classifier C correctly identifies the final output hn as its identity label k.
By the nature of LSTM, the output ht is greatly affected by its last input fg t. Hence the LSTM output, ht, can be varied across time steps. With a desire to obtain a gait feature that can be robust to the stopping instance of a walking cycle, the averaged LSTM output can be used as the gait feature for identification:
f gait t = 1 t s = 1 t h s ( 7 )
The identification loss can be rewritten as:
i d - a v g = - log ( C k ( f gait n ) ) = - log ( C k ( 1 n s = 1 n h s ) ) ( 8 )
The LSTM 212 is expected to learn that the longer the video sequence, the more walking information it processes, then the more confident it identifies the subject. Instead of minimizing the loss on the final time step, all the intermediate outputs of every time step weighted by wt is used:
i d - i n c - a v g = 1 n s = 1 n w t log ( C k ( 1 t s = 1 t h s ) ) ( 9 )
To this end, the overall training loss function is:
Figure US11315363-20220426-P00001
=
Figure US11315363-20220426-P00001
id-inc-avgr
Figure US11315363-20220426-P00001
xrecons
Figure US11315363-20220426-P00001
gait-sim  (10)
The gait ID system 200 including the encoder-decoder network 208 and LSTM 212 are jointly trained. Updating ε to optimize
Figure US11315363-20220426-P00001
id-inc-avg also helps to further generate pose features that have identity information and on which the LSTM 212 is able to explore temporal dynamics. At the test time, the output fgait t of the LSTM 212 is the gait feature of the video and used as the identity feature representation for matching or verifying an approaching user. The cosine similarity score is used as the metric, as described as a distance metric between a known registered gait feature and present gait feature.
The gait ID system 200 receives video frames 204 with the person of interest segmented. In an example embodiment, the foreground mask is obtained from the state-of-the-art instance segmentation, Mask R-CNN. Instead of using a zero-one mask by hard thresholding, the soft mask returned by the network is kept, where each pixel indicates the probability of it being a person. This is partially due to the difficulty in choosing a threshold. Also, it prevents the loss in information due to the mask estimation error.
Input is obtained by pixel-wise multiplication between the mask and RGB values, which is then resized. In the example embodiment, the encoder-decoder network 208 is a typical CNN. Encoder consisting of 4 stride-2 convolution layers following by Batch Normalization and Leaky ReLU activation. The decoder structure is an inverse of the encoder, built from transposed convolution, Batch Normalization and Leaky ReLU layers.
The final layer has a Sigmoid activation to bring the value into [0; 1] range as the input. The classification part is a stacked 3-layer LSTM 212, which has 256 hidden units in each of cells. Since video lengths are varied, a random crop of 20-frame sequence is applied; all shorter videos are discarded.
Referring now to FIG. 3, a functional block diagram of an example gait identification (ID) system 300 is shown. The gait ID system 300 includes an initial processing module 304 that receives a video as input. The video may be obtained in real time by the camera 112-1 mounted on the vehicle 104 shown in FIG. 1. In various implementations, an instruction may also be input into the gait ID system 300, after pose features are extracted, indicating whether the input video is to register a new user or to authenticate the presently recorded approaching user.
The initial processing module 304 is configured to prepare the received video for feature extraction. As mentioned above, the preparation includes cropping the video, parsing the video into individual frames, removing the background from each frame, etc. As also mentioned previously, each individual frame of the video is analyzed and the pose and appearance features separated for combination of only the pose features of each frame to construct the pose features of the approaching individual captured in the video.
The processed frames of the video are then forwarded to a feature identification module 308. As described above, the feature identification module 308 implements a trained machine learning model that has a similar architecture to the encoder-decoder network of FIG. 2. The feature identification module 308 separates, from each frame, pose and appearance features using the trained machine learning model, such as a CNN model. The feature identification module 308 identifies the appearance feature and removes the appearance feature from each of the frames. As described above with respect to the similarity loss features, the feature identification module 308 may also be configured to enforce similarity between frames of the same individual across multiple videos.
Then, the pose feature of each frame is forwarded to an aggregation module 312. The aggregation module 312 combines the pose features of each frame to generate a mean or averaged gait feature over time. Aggregating the pose feature of each frame is important to create a gait feature of the approaching user walking using a plurality of pose features since each pose feature includes the pose of the approaching user only in a specific instance. The aggregation module 312 may implement an LSTM model that is trained to average pose features from individual pose features. The aggregation module 312 also receives an instruction from, for example, a computing device operated by the user or an operator of the vehicle and/or ride share service, to instruct whether the input video is being used to register a new user or authenticate a present approaching user.
In the example of a ride share service, if a user requests a vehicle through a ride share application, the user can choose to be authenticated based on gait. Alternatively, the ride share service can require such authentication. Then, if the gait ID system 300 implemented by the ride share service does not have any gait information for the user, the user may be registered by the requested vehicle. In such a situation, the operator of the vehicle may request that the user walk toward a camera mounted on the vehicle and the operator instructs the gait ID system 300 that the video is intended to register the user. When first registering, alternative authentication may be used.
In various implementations, a single reference video of the user may be used to register the user or a plurality of videos at different angles under difference conditions may be captured and stored for the user over a period of time. Additionally or alternatively, the user may be registered at a different point other than when first ordering a vehicle. Therefore, when a user is first being registered, the operator of the vehicle including the gait ID system 300 may instruct the system that the present video is being captured for registration purposes of the user requesting the vehicle. Otherwise, the gait ID system 300 may assume (or know based on the user ID) that the user is registered.
When the aggregation module 312 receives an instruction indicating the user is being registered, the aggregation module 312 directs the gait feature to be stored in a registered user gait database 316 corresponding to a user ID of the user. Then, when the user is being authenticated for a future ride request, the gait ID system 300 can access the gait feature of the user from the registered user gait database 316 according to the user ID to verify the user's identity.
Otherwise, if the approaching user is being authenticated as opposed to registered, the aggregation module 312 forwards the constructed present gait feature to a comparison module 320. The comparison module 320 obtains a stored gait feature of the approaching user from the registered user gait database 316 based on a user ID. As mentioned previously, the registered user gait database 316 stores pose features with a corresponding user ID in order to compare the stored pose features to the real time analyzed pose features of approaching users.
The comparison module 320 compares the present gait feature to the stored gait feature by determining a distance value between the two features, for example, a cosine similarity score as a distance metric described previously. The difference between the two pose features is represented as a distance function. Then, the distance is forwarded to a verification module 324 which determines whether the distance is within a predetermined threshold. Then, the verification module 324 forwards an authentication instruction or an instruction that the approaching user is not authenticated to an instruction generation module 328. The instruction generation module 328 sends the authentication instruction to an actuator control module 332 to actuate an actuator on the vehicle, operating to unlock and/or open a particular door of the vehicle when the user has been authenticated.
Otherwise, if the instruction generation module 328 receives the instruction that the approaching user is not authenticated, then an instruction may optionally be sent to an alert generation module 336. The alert generation module 336 may generate and transmit an alert to the computing device operated by the vehicle owner and/or a mobile computing device operated by the approaching user indicating that the user is not authenticated. The alert may be visual, audio, and/or haptic feedback.
Referring to FIG. 4, a flowchart depicting example authentication when a user requests a vehicle through a ride share service is shown. Control begins in response to an authentication request. In various implementations, the authentication request may be received each time a user approaches a vehicle expecting a user. For example, after a ride is requested and the vehicle reaches a pick up location, the plurality of cameras mounted on the vehicle may be initiated and capturing users surrounding the vehicle. Once the gait ID system receives a video including a user approaching the vehicle for a predetermined amount of time, for example, if a user has been walking towards the vehicle for 5 seconds, the gait ID system may instruct the camera with the best view of the approaching user feed video captured of the approaching user to be authenticated.
In various implementations, the camera with the best view would be the camera facing the approaching user, the camera angle being parallel with the walking direction of the approaching user. In other implementations, the requested user may perform a particular motion to initiate authentication, such as a wave motion that the initial processing module described in FIG. 3 can identify as a prompt to begin authentication. In additional implementations, the user may indicate using their phone or computing device that the user sees the vehicle and is going to begin approaching, so the gait ID system receives videos for any users surrounding the vehicle and attempts to authenticate all viewed users until one of the approaching users is authenticated.
Once control receives the authentication request, control proceeds to 404 to obtain video of an approaching user. In various implementations, control is receiving video from multiple cameras of multiple individuals at the same time. Therefore, control may be attempting to authenticate various users at the same time. Then, control continues to 408 to prepare the obtained or received video for feature extraction. The preparation may include parsing of the video into multiple frames, removing background pixels, etc. Control then continues to 412 to extract a pose feature vector from each frame of the video. The extraction involves disentangling the pose feature of the frame from the appearance feature of the frame using machine learning. Once extracted, control proceeds to 416 to aggregate the pose feature of each frame to generate a gait feature representing the approaching user in the video. The gait feature is a mean representation of the pose features of each frame over time.
Then, control continues to 420 to obtain a stored gait feature from a database corresponding to the requesting user. The requesting user is the user that requested the vehicle. At 424, control determines a distance between the gait feature and the stored gait feature. Then, at 428, control determines whether the distance is greater than a predetermined threshold. If yes, control has determined that the distance between the gait feature and the stored gait feature is too distant, indicating that the approaching user cannot be authenticated as the requesting user. Therefore, control proceeds to 432 to identify the user as not the requesting user. Control may then optionally proceed to 436 to generate an alert. Then, control ends. In an implementation where the gait ID system is continuously identifying users that are in the vehicle's surrounding environment, an alert may not be necessary and, instead, continuous authentication attempts are performed in response to capturing a user approaching the vehicle.
Otherwise, if at 428 control determines that the distance is less than the predetermined threshold, control proceeds to 440 to authenticate the approaching user as the requesting user. This is because the distance indicates that the gait feature and the stored gait feature of the requesting user are similar enough to verify the identity of the approaching user. Then, control proceeds to 444 to send an instruction to unlock the vehicle. In various implementations, control may instead send a verification to a computing device of the vehicle operator and indicate a direction or location of the authenticated user. Then, control ends.
Referring to FIG. 5, a flowchart depicting example generation of a registered user reference gait feature is shown. Control begins in response to receiving a registration request. As described above, a new user can register when first requesting the ride share service. Registering involves allowing the capture of a frontal view video of the user walking toward a camera for gait feature extraction.
At 504, control obtains a video of the new user. Then, at 508, control prepares the video for feature extraction. As mentioned previously, this preparation includes parsing the video into frames as well as removing background pixels from each frame. The preparation may further include cropping the video to only include a predetermined number of frames.
At 512, control extracts a pose feature vector from each frame of the video of the new user. Control continues to 516 to aggregate the pose feature of each frame into a gait feature vector over time. Then, control proceeds to 520 to store the gait feature vector in the database as corresponding to the now registered user. Then, when authenticating an approaching user, the gait ID system can access the database of registered users. Then, control ends.
The techniques described herein may be implemented by one or more computer programs executed by one or more processors. The computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium. The computer programs may also include stored data. Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
Some portions of the above description present the techniques described herein in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times to refer to these arrangements of operations as modules or by functional names, without loss of generality.
Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Certain aspects of the described techniques include process steps and instructions described herein in the form of an algorithm. It should be noted that the described process steps and instructions could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer. Such a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and operations presented herein are not inherently related to any particular computer or other apparatus. Various systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will be apparent to those of skill in the art, along with equivalent variations. In addition, the present disclosure is not described with reference to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.
The term “module” or the term “controller” may be replaced with the term “circuit.” The term “module” may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); a digital, analog, or mixed analog/digital discrete circuit; a digital, analog, or mixed analog/digital integrated circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor circuit (shared, dedicated, or group) that executes code; a memory circuit (shared, dedicated, or group) that stores code executed by the processor circuit; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. While various embodiments have been disclosed, other variations may be employed. All of the components and function may be interchanged in various combinations. It is intended by the following claims to cover these and any other departures from the disclosed embodiments which fall within the true spirit of this invention.
The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.

Claims (19)

What is claimed is:
1. An authentication system, comprising:
a camera configured to capture two or more images of an unknown person walking;
a feature extractor configured to receive the two or more images and, for each image in the two or more images, operates to extract a set of appearance features and a set of pose features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person, wherein the feature extractor is a neural network trained to disentangle the pose features from the appearance features;
an aggregator configured to receive multiple sets of pose features from the feature extractor and generate a gait feature for the unknown person;
a data store configured to store a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person; and
a classifier configured to receive the gait feature from the aggregator and operates to identify the unknown person by comparing the gait feature to the plurality of gait features stored in the data store.
2. The authentication system of claim 1 further comprises a pre-processor interposed between the camera and the feature extractor, and operates to remove background from each of the two or more images.
3. The authentication system of claim 1 wherein the neural network is further defined as a convolutional neural network.
4. The authentication system of claim 1 wherein neural network is trained using cross reconstruction loss.
5. The authentication system of claim 1 wherein neural network is trained by comparing a given image from the two or more images with a reconstructed image, where the reconstructed image was reconstructed using a set of pose features from one image in the two or more images and appearance features from another image in the two or more images.
6. The authentication system of claim 1 wherein aggregator is further defined as a long short-term memory.
7. The authentication system of claim 1 wherein the classifier averages output from the long short-term memory over time.
8. The authentication system of claim 1 wherein the classifier compares the gait feature to the plurality of gait features by computing a cosine similarity score.
9. The authentication system of claim 1 further comprises a verification module and an actuator, wherein the verification module receives an identity for the unknown person from the classifier and actuates the actuator based on the identity of the unknown person.
10. A computer-implemented method for authenticating a person, comprising:
capturing, by a camera, a video of an unknown person walking;
parsing, by an image processor, the video into two or more image frames;
for each image in the two or more images, disentangling, by the image processor, a set of pose features from a set of appearance features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person;
generating, by the image processor, a gait feature for the unknown person from the multiple sets of pose features; and
identifying, by the image processor, the unknown person by comparing the gait feature for the unknown person to a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person.
11. The computer-implemented method of claim 10 further comprises removing background from each of the two or more images before the step of disentangling.
12. The computer-implemented method of claim 10 further comprises disentangling a set of pose features from a set of appearance features using a neural network.
13. The computer-implemented method of claim 12 further comprises training the neural network using cross reconstruction loss.
14. The computer-implemented method of claim 12 further comprises training the neural network by comparing a given image from the two or more images with a reconstructed image, where the reconstructed image was reconstructed using a set of pose features from one image in the two or more images and appearance features from another image in the two or more images.
15. The computer-implemented method of claim 12 further comprises receiving another video of the unknown person walking; from images comprising the another video, disentangling a second set of pose features; and training the neural network by enforcing similarity between the set of pose features and the second set of pose features.
16. The computer-implemented method of claim 10 wherein generating a gait feature for the unknown person from the multiple sets of pose features further comprises aggregating the multiple sets of pose features using a long short-term memory and averaging output from the long short-term memory over time.
17. The computer-implemented method of claim 1 further comprises training the long short-term memory using a loss function and the loss function is defined as negative log likelihood that a classifier correctly identifies output of the long short-term memory.
18. The computer-implemented method of claim 10 further comprises comparing the gait feature for the unknown person to a plurality of gait features by computing a cosine similarity score.
19. The computer-implemented method of claim 10 further comprises actuating an actuator based on the identity of the unknown person.
US17/155,350 2020-01-22 2021-01-22 Systems and methods for gait recognition via disentangled representation learning Active US11315363B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/155,350 US11315363B2 (en) 2020-01-22 2021-01-22 Systems and methods for gait recognition via disentangled representation learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202062964313P 2020-01-22 2020-01-22
US17/155,350 US11315363B2 (en) 2020-01-22 2021-01-22 Systems and methods for gait recognition via disentangled representation learning

Publications (2)

Publication Number Publication Date
US20210224524A1 US20210224524A1 (en) 2021-07-22
US11315363B2 true US11315363B2 (en) 2022-04-26

Family

ID=76857141

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/155,350 Active US11315363B2 (en) 2020-01-22 2021-01-22 Systems and methods for gait recognition via disentangled representation learning

Country Status (1)

Country Link
US (1) US11315363B2 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11734955B2 (en) * 2017-09-18 2023-08-22 Board Of Trustees Of Michigan State University Disentangled representation learning generative adversarial network for pose-invariant face recognition
EP3938935B1 (en) * 2019-03-15 2023-03-29 Sony Group Corporation A concept for authenticating a user of a mobile device
US11163372B2 (en) * 2020-04-01 2021-11-02 Toyota Motor North America, Inc Transport gait and gesture interpretation
US11544969B2 (en) * 2021-04-27 2023-01-03 Zhejiang Gongshang University End-to-end multimodal gait recognition method based on deep learning
CN113869151B (en) * 2021-09-14 2024-09-24 武汉大学 Cross-view gait recognition method and system based on feature fusion
CN113887358B (en) * 2021-09-23 2024-05-31 南京信息工程大学 Gait recognition method based on partial learning decoupling characterization
CN113569828B (en) * 2021-09-27 2022-03-08 南昌嘉研科技有限公司 Human body posture recognition method, system, storage medium and equipment
CN114140873A (en) * 2021-11-09 2022-03-04 武汉众智数字技术有限公司 Gait recognition method based on convolutional neural network multi-level features
CN114612932A (en) * 2022-03-07 2022-06-10 银河水滴科技(北京)有限公司 Gait big data retrieval method and system and terminal equipment
CN117113205B (en) * 2023-08-29 2024-09-06 北京自动化控制设备研究所 Pedestrian unknown gait type identification method based on real-time inertial parameters

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7330566B2 (en) * 2003-05-15 2008-02-12 Microsoft Corporation Video-based gait recognition
US7397931B2 (en) * 2004-08-03 2008-07-08 Matsushita Electric Industrial Co., Ltd. Human identification apparatus and human searching/tracking apparatus
US7613325B2 (en) * 2003-08-21 2009-11-03 Panasonic Corporation Human detection device and human detection method
US7711146B2 (en) * 2006-03-09 2010-05-04 General Electric Company Method and system for performing image re-identification
US8460220B2 (en) * 2009-12-18 2013-06-11 General Electric Company System and method for monitoring the gait characteristics of a group of individuals
US20130204545A1 (en) * 2009-12-17 2013-08-08 James C. Solinsky Systems and methods for sensing balanced-action for improving mammal work-track efficiency
US20140270402A1 (en) * 2011-07-29 2014-09-18 University Of Ulster Gait recognition methods and systems
US9589365B2 (en) * 2014-02-27 2017-03-07 Ricoh Company, Ltd. Method and apparatus for expressing motion object
US10327671B2 (en) * 2014-02-17 2019-06-25 Hong Kong Baptist University Algorithms for gait measurement with 3-axes accelerometer/gyro in mobile devices
US20200205697A1 (en) * 2018-12-30 2020-07-02 Altumview Systems Inc. Video-based fall risk assessment system
US20210346761A1 (en) * 2020-05-06 2021-11-11 Agile Human Performance, Inc. Automated gait evaluation for retraining of running form using machine learning and digital video data
US20220044028A1 (en) * 2017-03-30 2022-02-10 Nec Corporation Information processing apparatus, control method, and program

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7330566B2 (en) * 2003-05-15 2008-02-12 Microsoft Corporation Video-based gait recognition
US7613325B2 (en) * 2003-08-21 2009-11-03 Panasonic Corporation Human detection device and human detection method
US7397931B2 (en) * 2004-08-03 2008-07-08 Matsushita Electric Industrial Co., Ltd. Human identification apparatus and human searching/tracking apparatus
US7711146B2 (en) * 2006-03-09 2010-05-04 General Electric Company Method and system for performing image re-identification
US20130204545A1 (en) * 2009-12-17 2013-08-08 James C. Solinsky Systems and methods for sensing balanced-action for improving mammal work-track efficiency
US8460220B2 (en) * 2009-12-18 2013-06-11 General Electric Company System and method for monitoring the gait characteristics of a group of individuals
US20140270402A1 (en) * 2011-07-29 2014-09-18 University Of Ulster Gait recognition methods and systems
US10327671B2 (en) * 2014-02-17 2019-06-25 Hong Kong Baptist University Algorithms for gait measurement with 3-axes accelerometer/gyro in mobile devices
US9589365B2 (en) * 2014-02-27 2017-03-07 Ricoh Company, Ltd. Method and apparatus for expressing motion object
US20220044028A1 (en) * 2017-03-30 2022-02-10 Nec Corporation Information processing apparatus, control method, and program
US20200205697A1 (en) * 2018-12-30 2020-07-02 Altumview Systems Inc. Video-based fall risk assessment system
US20210346761A1 (en) * 2020-05-06 2021-11-11 Agile Human Performance, Inc. Automated gait evaluation for retraining of running form using machine learning and digital video data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Ebenezer et al., "Gait Verification System Through Multiperson Signature Matching for Unobtrusive Biometric Authentication" (Year: 2019). *

Also Published As

Publication number Publication date
US20210224524A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
US11315363B2 (en) Systems and methods for gait recognition via disentangled representation learning
Takemura et al. Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition
US11657525B2 (en) Extracting information from images
US10565433B2 (en) Age invariant face recognition using convolutional neural networks and set distances
Alotaibi et al. Improved gait recognition based on specialized deep convolutional neural network
EP3287943B1 (en) Liveness test method and liveness test computing apparatus
Lee et al. Visual tracking and recognition using probabilistic appearance manifolds
US11941918B2 (en) Extracting information from images
Han et al. Face recognition with contrastive convolution
US20220327189A1 (en) Personalized biometric anti-spoofing protection using machine learning and enrollment data
El Khiyari et al. Age invariant face recognition using convolutional neural networks and set distances
US11961333B2 (en) Disentangled representations for gait recognition
Nguyen et al. Complex-valued iris recognition network
Ganapathi et al. Unconstrained ear detection using ensemble‐based convolutional neural network model
Jalali et al. Deformation invariant and contactless palmprint recognition using convolutional neural network
Choras Multimodal biometrics for person authentication
Sujanthi et al. Iris Liveness Detection using Deep Learning Networks
Holle et al. Local line binary pattern and Fuzzy K-NN for palm vein recognition
Zhu et al. LFN: Based on the convolutional neural network of gait recognition method
Zaghetto et al. Touchless multiview fingerprint quality assessment: rotational bad-positioning detection using artificial neural networks
Li et al. Robust visual tracking based on an effective appearance model
Parise Human identification by walk recognition using relevance vector machine
Jain et al. Face matching and retrieval: Applications in forensics
Zaidan et al. Ear Recognition System Based on CLAHE and Convolution Neural Network
Garud et al. Fingerprint and Palmprint Recognition using neighborhood operation and FAST features

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

AS Assignment

Owner name: BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY, MICHIGAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, ZIYUAN;LIU, XIAOMING;TRAN, LUAN;AND OTHERS;SIGNING DATES FROM 20210622 TO 20220324;REEL/FRAME:059389/0089

Owner name: FORD GLOBAL TECHNOLOGIES LLC, MICHIGAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WAN, JIAN;PRAKAH-ASANTE, KWAKU O.;BLOMMER, MIKE;SIGNING DATES FROM 20210304 TO 20210310;REEL/FRAME:059388/0992

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE