US11315363B2 - Systems and methods for gait recognition via disentangled representation learning - Google Patents
Systems and methods for gait recognition via disentangled representation learning Download PDFInfo
- Publication number
- US11315363B2 US11315363B2 US17/155,350 US202117155350A US11315363B2 US 11315363 B2 US11315363 B2 US 11315363B2 US 202117155350 A US202117155350 A US 202117155350A US 11315363 B2 US11315363 B2 US 11315363B2
- Authority
- US
- United States
- Prior art keywords
- gait
- features
- pose
- feature
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G06K9/623—
-
- G06K9/6256—
-
- G06K9/6268—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
Definitions
- the present disclosure relates to gait recognition and, more specifically, to gait recognition implemented as an authentication method.
- the authentication system comprises: a camera, a feature extractor, an aggregator, a classifier and a data store.
- the camera is configured to capture two or more images of an unknown person walking.
- the feature extractor is configured to receive the two or more images and, for each image in the two or more images, operates to extract a set of appearance features and a set of pose features, such that the appearance features are indicative of visual appearance of the unknown person and the pose features are indicative of pose of the unknown person.
- the feature extractor is a neural network trained to disentangle the pose features from the appearance features.
- the aggregator is configured to receive multiple sets of pose features from the feature extractor and generate a gait feature for the unknown person.
- the data store is configured to store a plurality of gait features, where each gait feature in the plurality of gait features is associated with a known person.
- the classifier is configured to receive the gait feature from the aggregator and operates to identify the unknown person by comparing the gait feature to the plurality of gait features stored in the data store.
- the authentication system may further include a pre-processor interposed between the camera and the feature extractor which operates to remove background from each of the two or more images.
- the neural network is further defined as a convolutional neural network.
- the neural network may be trained using cross reconstruction loss. That is, the neural network is trained by comparing a given image from the two or more images with a reconstructed image, where the reconstructed image was reconstructed using a set of pose features from one image in the two or more images and appearance features from another image in the two or more images.
- the aggregator is further defined as a long short-term memory, such that the classifier averages output from the long short-term memory over time.
- the authentication system further includes a verification module and an actuator.
- the verification module receives an identity for the unknown person from the classifier and actuates the actuator based on the identity of the unknown person.
- FIG. 1 is an example user authentication system incorporated into a vehicle according to principles of the present disclosure.
- FIG. 2 is an example functional flow diagram depicting example architecture of a gait identification (ID) system according to principles of the present disclosure.
- ID gait identification
- FIG. 3 is an example functional block diagram of an example gait identification (ID) system according to principles of the present disclosure.
- ID gait identification
- FIG. 4 is an example flowchart depicting example authentication when requesting a ride share vehicle according to principles of the present disclosure.
- FIG. 5 is an example flowchart depicting example generation and storage of a registered user reference gait feature according to principles of the present disclosure.
- Pose features of a user provide unique and secure information for user authentication without requesting biometrics during a potential cumbersome explicit authentication request.
- a registered user can be verified when walking towards a requested vehicle.
- the requested vehicle may have cameras mounted at various points surrounding the vehicle and can identify individuals approaching the vehicle.
- a gait identification system receives videos of the approaching individual and identifies or extracts two sets of features from the approaching individual: appearance and gait. Both features are represented as feature vectors by trained machine learning models.
- the gait identification system removes appearance from the features extracted from a video of the approaching individual. Therefore, the resulting extracted features only include pose features.
- the pose features of the approaching user can then be compared to a known or registered user database including pose features for each registered user.
- the approaching user is authenticated and identified as the requesting user.
- the requested vehicle may alert the driver that the approaching user is verified.
- the authentication of the approaching user may prompt an actuator of the vehicle to unlock one of the doors of the vehicle or open one of the doors of the vehicle.
- the walking pattern of individuals or gait is one of the most important biometrics modalities, which allows it to be used as an authentication metric.
- Most of the existing gait recognition methods take silhouettes or articulated body models as the pose features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, objects being carried, and viewing angle.
- the gait identification system of the present disclosure includes a novel autoencoder framework to explicitly disentangle pose (individual frames of the gait) and appearance features from RGB imagery and long short-term memory (LSTM)-based integration of pose features over time (as captured in a video) to produce a gait feature.
- LSTM long short-term memory
- the gait identification system collects a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since frontal-view walking contains minimal gait cues compared to other views.
- FVG does include other important variations, for example, walking speed, objects being carried, and clothing worn.
- Biometrics measure a user's unique physical and behavioral characteristics to recognize the identity of the user.
- Gait is one of the biometrics modalities, such as face, fingerprint, and iris.
- Gait recognition has the advantage that it can operate at a distance without user cooperation. Also, gait is difficult to camouflage. Therefore, authenticating a user's gait is not intrusive, does not require an explicit authentication request or explicit performance of an authentication measurement, and gait is difficult to forge. Due to these advantages, gait recognition is useful for many applications, such as person identification, criminal investigation, and healthcare.
- the core of gait recognition lies in extracting gait-related features from the video frames of a walking person, where the prior approaches are categorized into two types: appearance-based and model-based methods.
- the appearance-based methods such as gait energy image (GEI) or gait entropy image (GEnI), are defined by extracting silhouette masks. Specifically, GEI uses an averaged silhouette image as the gait representation for a video. These methods are popular in the gait recognition community for their simplicity and effectiveness. However, they often suffer from sizeable intra-subject appearance changes due to covariates including clothing changes, objects being carried, viewing angle changes, and walking speed variations.
- model-based methods While GEI has a low computational cost and can handle low-resolution imagery, GEI can be sensitive to the above-mentioned covariates. In contrast, the model-based method first performs pose estimation and accepts articulated body skeleton as the gait feature. On the other hand, model-based methods fit articulated body models to images and extract kinematic features such as 2D body joints. While they are robust to some covariates such as clothing and speed, they require a relatively higher image resolution for reliable pose estimation and higher computational costs.
- the gait ID system 108 receives video of users around the vehicle 104 from a camera 112 - 1 or a plurality of cameras 112 - 1 , 112 - 2 , . . . , 112 -N, collectively 112 , mounted on the vehicle 104 .
- the plurality of cameras 112 may be mounted at various points on the vehicle 104 .
- the user authentication system 100 may include the plurality of cameras 112 mounted on the vehicle 104 , positioned to capture an entirety of the environment surrounding the vehicle 104 (additional cameras capturing entire surrounding environment not shown). In this way, the plurality of cameras 112 can identify users approaching the vehicle 104 from a variety of directions.
- the user authentication system 100 also includes an actuator 116 .
- the gait ID system 108 upon authenticating a video stream of an approaching user received from the plurality of cameras 112 , instructs the actuator 116 to unlock a rear door of the vehicle 104 .
- the actuator 116 may open the rear door of the vehicle 104 .
- multiple actuators may be included for each door of the vehicle 104 , and the gait ID system 108 may instruct the actuator to unlock or open the door nearest the approaching user.
- authentication of the user may also (or alternatively) result in a notification to the driver's phone or mobile computing device that the user has been authenticated and which camera captured the user, indicating a direction or location of the authenticated user.
- the gait ID system 108 includes a novel convolutional neural network (CNN)-based model to automatically learn the disentangled gait feature, or appearance feature, from a walking video of an approaching user to verify and/or register the user.
- CNN convolutional neural network
- the CNN-based model relies on pose features of a walking user, as opposed to handcrafted GEI, or skeleton-based features. While many conventional gait databases study side-view imagery, the gait ID system 108 collects a new gait database where both gallery and probe are captured in frontal-views. While particular reference is made to convolutional neural networks, it is readily understood that other types of neural networks (e.g., residual neural networks) as well as other types of machine learning also fall within the scope of this disclosure.
- the gait ID system 108 disentangles the gait feature from the visual appearance of the approaching user.
- disentanglement is achieved by manually handcrafting the GEI or body skeleton, since neither has color information.
- manual disentanglements may lose certain information or create redundant gait information.
- GEI represents the average contours over time but not the dynamics of how body parts move.
- certain body joints such as hands may have fixed positions and are redundant information to gait.
- the gait ID system 108 automatically disentangles the pose features from appearance features and uses the extracted pose features to generate pose features for gait recognition.
- the pose features are generated by extracting pose features from each frame of a captured video of an approaching user.
- the disentanglement performed by the gait ID system 108 is realized by designing an autoencoder-based CNN with novel loss functions.
- the encoder estimates two latent representations, (i) pose feature representation (that is, frame-based gait feature) and (ii) appearance feature representation, by employing two loss functions.
- the two loss functions include (i) cross reconstruction loss and (ii) gait similarity loss.
- the cross reconstruction loss enforces that the appearance feature of one frame, fused with the pose feature of another frame, can be decoded to the latter frame.
- the gait similarity loss forces a sequence of pose features extracted from a video sequence of the same subject to be similar even under different conditions.
- the pose features of a sequence are fed into a multi-layer LSTM with a designed incremental identity loss to generate the sequence-based gait feature, where two of which can use the cosine distance as the video-to-video similarity metric.
- the FVG database may collect three frontal-view angles where the subject walks from left ⁇ 45°, 0°, and right 45° off the optical axes of the camera 112 - 1 or the plurality of cameras 112 . For each of three angles, different variants are explicitly captured including walking speed, clothing, carrying, clutter background, etc. Such a robust FVG database results in a more accurate CNN model for disentangling pose and appearance features.
- the user authentication system 100 implements the gait ID system 108 to learn gait information from raw RGB video frames, which contain richer information, thus with higher potential of extracting discriminative pose features.
- the present CNN-based approach has the advantage of being able to leverage a large amount of training data and learning more discriminative representation from data with multiple covariates to create an average gait feature representation from pose features extracted from a plurality of video frames.
- the present FVG database focuses on the frontal view, with three different near frontal-view angles towards the camera, and other variations including walking speed, carrying, clothing, cluttered background and time.
- the present method has only one encoder to disentangle the appearance and gait information, as shown in FIG. 2 , through the design of novel loss functions without the need for adversarial training.
- the present method does not require adversarial training, which makes training more accessible.
- FIG. 2 is a functional flow diagram depicting the architecture of a gait identification (ID) system 200 of the present disclosure.
- ID gait identification
- the objective is to design an algorithm, from which the pose features of video 1 and 2 are the same, while those of video 2 and 3 are different.
- the long down coat can easily dominate the feature extraction, which would make videos 2 and 3 to be more similar than videos 1 and 2 in the latent space of pose features.
- the core challenge, as well as the objective, of gait recognition is to extract pose features that are discriminative among subjects but invariant to different confounding factors, such as viewing angles, walking speeds, and appearance.
- the approach of the gait ID system 200 is to achieve the gait feature representation via feature disentanglement by separating the gait feature from appearance information for a given walking video.
- the input to the gait ID system 200 is video frames 204 , with background removed using a detection and segmentation method.
- An encoder-decoder network 208 is used to disentangle the appearance and pose features for each video frame.
- a multi-layer LSTM 212 explores the temporal dynamics of pose features and aggregates them into a sequence-based gait feature for identification purposes.
- the gait ID system 200 learns to disentangle the gait feature from the visual appearance in an unsupervised manner. Since a video is composed of frames, disentanglement should be conducted on the frame level first. Because there is no dynamic information within a video frame, the gait ID system 200 disentangles the pose feature from the visual appearance for each frame. The dynamics of pose features over a sequence will contribute to the gait feature. In other words, the pose feature is the manifestation of video-based gait feature at a specific frame or point in time.
- the encoder-decoder network 208 architecture is used with carefully designed loss functions to disentangle the pose feature from appearance feature.
- the functions defined for learning the encoder, ⁇ , and decoder D include cross reconstruction loss and gait similarity loss.
- the reconstructed I should be close to the original input I.
- enforcing self-reconstruction loss as in a typical autoencoder cannot ensure the appearance f a learning appearance information across the video and f g representing pose information in each frame. Therefore, the cross reconstruction loss, using an appearance feature f a t 1 of one frame and pose feature f g t 2 of another one to reconstruct the latter frame:
- the cross reconstruction loss can play a role as the self-reconstruction loss to make sure the two features are sufficiently representative to reconstruct video frames.
- a pose feature of a current frame can be paired to the appearance feature of any frame in the same video to reconstruct the same target (using the decoder of the encoder-decoder network 208 ), it enforces the appearance features to be similar across all frames.
- the cross reconstruction loss prevents the appearance feature f a from being over-represented, containing pose variation that changes between frames. However, appearance information may still be leaked into pose feature f g .
- f a is a constant vector while f g encodes all the information of a video frame.
- a gait similarity module 216 receives multiple videos of the same subject. Extra videos can introduce the change in appearance. Given two videos of the same subject with length n 1 , n 2 in two different conditions c 1 , c 2 . Ideally, c 1 , c 2 should contain difference in the user's appearance, for example, a change of clothes. In an implementation, only one video per user may be accessible for registration and matching.
- gait similarity module 216 While appearance changes, the gait information should be consistent between two videos. Since it's almost impossible to enforce similarity on f g between video frames as it requires precise frame-level alignment, similarity between two videos' is enforced by averaging pose features using the gait similarity module 216 :
- the current feature f g only contains the walking pose of the person in a specific instance, which can share similarity with another specific instance of a very different person.
- the gait ID system 200 is looking for discriminative characteristics in a user's walking pattern. Therefore, modeling its temporal change is critical. This is where temporal modeling architectures like the recurrent neural network or LSTM work best.
- the gait ID system 200 includes a multi-layer LSTM 212 structure to explore spatial (e.g., the shape of a person) and mainly, temporal (e.g., how the trajectory of subjects' body parts changes over time) information on pose features extracted from the input video frames 204 by the encoder-decoder network 208 .
- pose features extracted from one video sequence are feed into the three-layer LSTM 212 .
- the output of the LSTM 212 is connected to a classifier C, in this case, a linear classifier is used, to classify the user's identity.
- h t be the output of the LSTM 212 at time step t, which is accumulated after feeding t pose features f g into the LSTM 212 :
- h t LSTM( f g 1 ,f g 2 , . . . ,f g t ) (5)
- the output h t is greatly affected by its last input f g t .
- the LSTM output, h t can be varied across time steps.
- the averaged LSTM output can be used as the gait feature for identification:
- the LSTM 212 is expected to learn that the longer the video sequence, the more walking information it processes, then the more confident it identifies the subject. Instead of minimizing the loss on the final time step, all the intermediate outputs of every time step weighted by w t is used:
- the gait ID system 200 including the encoder-decoder network 208 and LSTM 212 are jointly trained. Updating ⁇ to optimize id-inc-avg also helps to further generate pose features that have identity information and on which the LSTM 212 is able to explore temporal dynamics.
- the output f gait t of the LSTM 212 is the gait feature of the video and used as the identity feature representation for matching or verifying an approaching user.
- the cosine similarity score is used as the metric, as described as a distance metric between a known registered gait feature and present gait feature.
- the gait ID system 200 receives video frames 204 with the person of interest segmented.
- the foreground mask is obtained from the state-of-the-art instance segmentation, Mask R-CNN.
- the soft mask returned by the network is kept, where each pixel indicates the probability of it being a person. This is partially due to the difficulty in choosing a threshold. Also, it prevents the loss in information due to the mask estimation error.
- the encoder-decoder network 208 is a typical CNN. Encoder consisting of 4 stride-2 convolution layers following by Batch Normalization and Leaky ReLU activation. The decoder structure is an inverse of the encoder, built from transposed convolution, Batch Normalization and Leaky ReLU layers.
- the final layer has a Sigmoid activation to bring the value into [0; 1] range as the input.
- the classification part is a stacked 3-layer LSTM 212 , which has 256 hidden units in each of cells. Since video lengths are varied, a random crop of 20-frame sequence is applied; all shorter videos are discarded.
- the gait ID system 300 includes an initial processing module 304 that receives a video as input.
- the video may be obtained in real time by the camera 112 - 1 mounted on the vehicle 104 shown in FIG. 1 .
- an instruction may also be input into the gait ID system 300 , after pose features are extracted, indicating whether the input video is to register a new user or to authenticate the presently recorded approaching user.
- the initial processing module 304 is configured to prepare the received video for feature extraction.
- the preparation includes cropping the video, parsing the video into individual frames, removing the background from each frame, etc.
- each individual frame of the video is analyzed and the pose and appearance features separated for combination of only the pose features of each frame to construct the pose features of the approaching individual captured in the video.
- the processed frames of the video are then forwarded to a feature identification module 308 .
- the feature identification module 308 implements a trained machine learning model that has a similar architecture to the encoder-decoder network of FIG. 2 .
- the feature identification module 308 separates, from each frame, pose and appearance features using the trained machine learning model, such as a CNN model.
- the feature identification module 308 identifies the appearance feature and removes the appearance feature from each of the frames.
- the feature identification module 308 may also be configured to enforce similarity between frames of the same individual across multiple videos.
- the pose feature of each frame is forwarded to an aggregation module 312 .
- the aggregation module 312 combines the pose features of each frame to generate a mean or averaged gait feature over time. Aggregating the pose feature of each frame is important to create a gait feature of the approaching user walking using a plurality of pose features since each pose feature includes the pose of the approaching user only in a specific instance.
- the aggregation module 312 may implement an LSTM model that is trained to average pose features from individual pose features.
- the aggregation module 312 also receives an instruction from, for example, a computing device operated by the user or an operator of the vehicle and/or ride share service, to instruct whether the input video is being used to register a new user or authenticate a present approaching user.
- a ride share service if a user requests a vehicle through a ride share application, the user can choose to be authenticated based on gait. Alternatively, the ride share service can require such authentication. Then, if the gait ID system 300 implemented by the ride share service does not have any gait information for the user, the user may be registered by the requested vehicle. In such a situation, the operator of the vehicle may request that the user walk toward a camera mounted on the vehicle and the operator instructs the gait ID system 300 that the video is intended to register the user. When first registering, alternative authentication may be used.
- a single reference video of the user may be used to register the user or a plurality of videos at different angles under difference conditions may be captured and stored for the user over a period of time.
- the user may be registered at a different point other than when first ordering a vehicle. Therefore, when a user is first being registered, the operator of the vehicle including the gait ID system 300 may instruct the system that the present video is being captured for registration purposes of the user requesting the vehicle. Otherwise, the gait ID system 300 may assume (or know based on the user ID) that the user is registered.
- the aggregation module 312 When the aggregation module 312 receives an instruction indicating the user is being registered, the aggregation module 312 directs the gait feature to be stored in a registered user gait database 316 corresponding to a user ID of the user. Then, when the user is being authenticated for a future ride request, the gait ID system 300 can access the gait feature of the user from the registered user gait database 316 according to the user ID to verify the user's identity.
- the aggregation module 312 forwards the constructed present gait feature to a comparison module 320 .
- the comparison module 320 obtains a stored gait feature of the approaching user from the registered user gait database 316 based on a user ID.
- the registered user gait database 316 stores pose features with a corresponding user ID in order to compare the stored pose features to the real time analyzed pose features of approaching users.
- the comparison module 320 compares the present gait feature to the stored gait feature by determining a distance value between the two features, for example, a cosine similarity score as a distance metric described previously. The difference between the two pose features is represented as a distance function. Then, the distance is forwarded to a verification module 324 which determines whether the distance is within a predetermined threshold. Then, the verification module 324 forwards an authentication instruction or an instruction that the approaching user is not authenticated to an instruction generation module 328 . The instruction generation module 328 sends the authentication instruction to an actuator control module 332 to actuate an actuator on the vehicle, operating to unlock and/or open a particular door of the vehicle when the user has been authenticated.
- a distance value between the two features for example, a cosine similarity score as a distance metric described previously.
- the difference between the two pose features is represented as a distance function.
- the distance is forwarded to a verification module 324 which determines whether the distance is within a predetermined threshold. Then
- an instruction may optionally be sent to an alert generation module 336 .
- the alert generation module 336 may generate and transmit an alert to the computing device operated by the vehicle owner and/or a mobile computing device operated by the approaching user indicating that the user is not authenticated.
- the alert may be visual, audio, and/or haptic feedback.
- Control begins in response to an authentication request.
- the authentication request may be received each time a user approaches a vehicle expecting a user. For example, after a ride is requested and the vehicle reaches a pick up location, the plurality of cameras mounted on the vehicle may be initiated and capturing users surrounding the vehicle.
- the gait ID system may instruct the camera with the best view of the approaching user feed video captured of the approaching user to be authenticated.
- the camera with the best view would be the camera facing the approaching user, the camera angle being parallel with the walking direction of the approaching user.
- the requested user may perform a particular motion to initiate authentication, such as a wave motion that the initial processing module described in FIG. 3 can identify as a prompt to begin authentication.
- the user may indicate using their phone or computing device that the user sees the vehicle and is going to begin approaching, so the gait ID system receives videos for any users surrounding the vehicle and attempts to authenticate all viewed users until one of the approaching users is authenticated.
- control proceeds to 404 to obtain video of an approaching user.
- control is receiving video from multiple cameras of multiple individuals at the same time. Therefore, control may be attempting to authenticate various users at the same time.
- control continues to 408 to prepare the obtained or received video for feature extraction. The preparation may include parsing of the video into multiple frames, removing background pixels, etc.
- Control then continues to 412 to extract a pose feature vector from each frame of the video. The extraction involves disentangling the pose feature of the frame from the appearance feature of the frame using machine learning.
- control proceeds to 416 to aggregate the pose feature of each frame to generate a gait feature representing the approaching user in the video.
- the gait feature is a mean representation of the pose features of each frame over time.
- control continues to 420 to obtain a stored gait feature from a database corresponding to the requesting user.
- the requesting user is the user that requested the vehicle.
- control determines a distance between the gait feature and the stored gait feature.
- control determines whether the distance is greater than a predetermined threshold. If yes, control has determined that the distance between the gait feature and the stored gait feature is too distant, indicating that the approaching user cannot be authenticated as the requesting user. Therefore, control proceeds to 432 to identify the user as not the requesting user. Control may then optionally proceed to 436 to generate an alert. Then, control ends.
- an alert may not be necessary and, instead, continuous authentication attempts are performed in response to capturing a user approaching the vehicle.
- control determines that the distance is less than the predetermined threshold
- control proceeds to 440 to authenticate the approaching user as the requesting user. This is because the distance indicates that the gait feature and the stored gait feature of the requesting user are similar enough to verify the identity of the approaching user. Then, control proceeds to 444 to send an instruction to unlock the vehicle. In various implementations, control may instead send a verification to a computing device of the vehicle operator and indicate a direction or location of the authenticated user. Then, control ends.
- Control begins in response to receiving a registration request.
- a new user can register when first requesting the ride share service.
- Registering involves allowing the capture of a frontal view video of the user walking toward a camera for gait feature extraction.
- control obtains a video of the new user. Then, at 508 , control prepares the video for feature extraction. As mentioned previously, this preparation includes parsing the video into frames as well as removing background pixels from each frame. The preparation may further include cropping the video to only include a predetermined number of frames.
- control extracts a pose feature vector from each frame of the video of the new user.
- Control continues to 516 to aggregate the pose feature of each frame into a gait feature vector over time. Then, control proceeds to 520 to store the gait feature vector in the database as corresponding to the now registered user. Then, when authenticating an approaching user, the gait ID system can access the database of registered users. Then, control ends.
- the techniques described herein may be implemented by one or more computer programs executed by one or more processors.
- the computer programs include processor-executable instructions that are stored on a non-transitory tangible computer readable medium.
- the computer programs may also include stored data.
- Non-limiting examples of the non-transitory tangible computer readable medium are nonvolatile memory, magnetic storage, and optical storage.
- the present disclosure also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer.
- a computer program may be stored in a tangible computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
- module or the term “controller” may be replaced with the term “circuit.”
- the term “module” may refer to, be part of, or include: an Application Specific Integrated Circuit (ASIC); a digital, analog, or mixed analog/digital discrete circuit; a digital, analog, or mixed analog/digital integrated circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor circuit (shared, dedicated, or group) that executes code; a memory circuit (shared, dedicated, or group) that stores code executed by the processor circuit; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. While various embodiments have been disclosed, other variations may be employed. All of the components and function may be interchanged in various combinations. It is intended by the following claims to cover these and any other departures from the disclosed embodiments which fall within the true spirit of this invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
Abstract
Description
f a ,f g=ε(I) (1)
I=D(f a ,f g) (2)
where It is the video frame at the time step t.
h t=LSTM(f g 1 ,f g 2 , . . . ,f g t) (5)
id-single=−log(C k(h n)) (6)
which is the negative log likelihood that the classifier C correctly identifies the final output hn as its identity label k.
The identification loss can be rewritten as:
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/155,350 US11315363B2 (en) | 2020-01-22 | 2021-01-22 | Systems and methods for gait recognition via disentangled representation learning |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062964313P | 2020-01-22 | 2020-01-22 | |
US17/155,350 US11315363B2 (en) | 2020-01-22 | 2021-01-22 | Systems and methods for gait recognition via disentangled representation learning |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210224524A1 US20210224524A1 (en) | 2021-07-22 |
US11315363B2 true US11315363B2 (en) | 2022-04-26 |
Family
ID=76857141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/155,350 Active US11315363B2 (en) | 2020-01-22 | 2021-01-22 | Systems and methods for gait recognition via disentangled representation learning |
Country Status (1)
Country | Link |
---|---|
US (1) | US11315363B2 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11734955B2 (en) * | 2017-09-18 | 2023-08-22 | Board Of Trustees Of Michigan State University | Disentangled representation learning generative adversarial network for pose-invariant face recognition |
EP3938935B1 (en) * | 2019-03-15 | 2023-03-29 | Sony Group Corporation | A concept for authenticating a user of a mobile device |
US11163372B2 (en) * | 2020-04-01 | 2021-11-02 | Toyota Motor North America, Inc | Transport gait and gesture interpretation |
US11544969B2 (en) * | 2021-04-27 | 2023-01-03 | Zhejiang Gongshang University | End-to-end multimodal gait recognition method based on deep learning |
CN113869151B (en) * | 2021-09-14 | 2024-09-24 | 武汉大学 | Cross-view gait recognition method and system based on feature fusion |
CN113887358B (en) * | 2021-09-23 | 2024-05-31 | 南京信息工程大学 | Gait recognition method based on partial learning decoupling characterization |
CN113569828B (en) * | 2021-09-27 | 2022-03-08 | 南昌嘉研科技有限公司 | Human body posture recognition method, system, storage medium and equipment |
CN114140873A (en) * | 2021-11-09 | 2022-03-04 | 武汉众智数字技术有限公司 | Gait recognition method based on convolutional neural network multi-level features |
CN114612932A (en) * | 2022-03-07 | 2022-06-10 | 银河水滴科技(北京)有限公司 | Gait big data retrieval method and system and terminal equipment |
CN117113205B (en) * | 2023-08-29 | 2024-09-06 | 北京自动化控制设备研究所 | Pedestrian unknown gait type identification method based on real-time inertial parameters |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7330566B2 (en) * | 2003-05-15 | 2008-02-12 | Microsoft Corporation | Video-based gait recognition |
US7397931B2 (en) * | 2004-08-03 | 2008-07-08 | Matsushita Electric Industrial Co., Ltd. | Human identification apparatus and human searching/tracking apparatus |
US7613325B2 (en) * | 2003-08-21 | 2009-11-03 | Panasonic Corporation | Human detection device and human detection method |
US7711146B2 (en) * | 2006-03-09 | 2010-05-04 | General Electric Company | Method and system for performing image re-identification |
US8460220B2 (en) * | 2009-12-18 | 2013-06-11 | General Electric Company | System and method for monitoring the gait characteristics of a group of individuals |
US20130204545A1 (en) * | 2009-12-17 | 2013-08-08 | James C. Solinsky | Systems and methods for sensing balanced-action for improving mammal work-track efficiency |
US20140270402A1 (en) * | 2011-07-29 | 2014-09-18 | University Of Ulster | Gait recognition methods and systems |
US9589365B2 (en) * | 2014-02-27 | 2017-03-07 | Ricoh Company, Ltd. | Method and apparatus for expressing motion object |
US10327671B2 (en) * | 2014-02-17 | 2019-06-25 | Hong Kong Baptist University | Algorithms for gait measurement with 3-axes accelerometer/gyro in mobile devices |
US20200205697A1 (en) * | 2018-12-30 | 2020-07-02 | Altumview Systems Inc. | Video-based fall risk assessment system |
US20210346761A1 (en) * | 2020-05-06 | 2021-11-11 | Agile Human Performance, Inc. | Automated gait evaluation for retraining of running form using machine learning and digital video data |
US20220044028A1 (en) * | 2017-03-30 | 2022-02-10 | Nec Corporation | Information processing apparatus, control method, and program |
-
2021
- 2021-01-22 US US17/155,350 patent/US11315363B2/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7330566B2 (en) * | 2003-05-15 | 2008-02-12 | Microsoft Corporation | Video-based gait recognition |
US7613325B2 (en) * | 2003-08-21 | 2009-11-03 | Panasonic Corporation | Human detection device and human detection method |
US7397931B2 (en) * | 2004-08-03 | 2008-07-08 | Matsushita Electric Industrial Co., Ltd. | Human identification apparatus and human searching/tracking apparatus |
US7711146B2 (en) * | 2006-03-09 | 2010-05-04 | General Electric Company | Method and system for performing image re-identification |
US20130204545A1 (en) * | 2009-12-17 | 2013-08-08 | James C. Solinsky | Systems and methods for sensing balanced-action for improving mammal work-track efficiency |
US8460220B2 (en) * | 2009-12-18 | 2013-06-11 | General Electric Company | System and method for monitoring the gait characteristics of a group of individuals |
US20140270402A1 (en) * | 2011-07-29 | 2014-09-18 | University Of Ulster | Gait recognition methods and systems |
US10327671B2 (en) * | 2014-02-17 | 2019-06-25 | Hong Kong Baptist University | Algorithms for gait measurement with 3-axes accelerometer/gyro in mobile devices |
US9589365B2 (en) * | 2014-02-27 | 2017-03-07 | Ricoh Company, Ltd. | Method and apparatus for expressing motion object |
US20220044028A1 (en) * | 2017-03-30 | 2022-02-10 | Nec Corporation | Information processing apparatus, control method, and program |
US20200205697A1 (en) * | 2018-12-30 | 2020-07-02 | Altumview Systems Inc. | Video-based fall risk assessment system |
US20210346761A1 (en) * | 2020-05-06 | 2021-11-11 | Agile Human Performance, Inc. | Automated gait evaluation for retraining of running form using machine learning and digital video data |
Non-Patent Citations (1)
Title |
---|
Ebenezer et al., "Gait Verification System Through Multiperson Signature Matching for Unobtrusive Biometric Authentication" (Year: 2019). * |
Also Published As
Publication number | Publication date |
---|---|
US20210224524A1 (en) | 2021-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11315363B2 (en) | Systems and methods for gait recognition via disentangled representation learning | |
Takemura et al. | Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition | |
US11657525B2 (en) | Extracting information from images | |
US10565433B2 (en) | Age invariant face recognition using convolutional neural networks and set distances | |
Alotaibi et al. | Improved gait recognition based on specialized deep convolutional neural network | |
EP3287943B1 (en) | Liveness test method and liveness test computing apparatus | |
Lee et al. | Visual tracking and recognition using probabilistic appearance manifolds | |
US11941918B2 (en) | Extracting information from images | |
Han et al. | Face recognition with contrastive convolution | |
US20220327189A1 (en) | Personalized biometric anti-spoofing protection using machine learning and enrollment data | |
El Khiyari et al. | Age invariant face recognition using convolutional neural networks and set distances | |
US11961333B2 (en) | Disentangled representations for gait recognition | |
Nguyen et al. | Complex-valued iris recognition network | |
Ganapathi et al. | Unconstrained ear detection using ensemble‐based convolutional neural network model | |
Jalali et al. | Deformation invariant and contactless palmprint recognition using convolutional neural network | |
Choras | Multimodal biometrics for person authentication | |
Sujanthi et al. | Iris Liveness Detection using Deep Learning Networks | |
Holle et al. | Local line binary pattern and Fuzzy K-NN for palm vein recognition | |
Zhu et al. | LFN: Based on the convolutional neural network of gait recognition method | |
Zaghetto et al. | Touchless multiview fingerprint quality assessment: rotational bad-positioning detection using artificial neural networks | |
Li et al. | Robust visual tracking based on an effective appearance model | |
Parise | Human identification by walk recognition using relevance vector machine | |
Jain et al. | Face matching and retrieval: Applications in forensics | |
Zaidan et al. | Ear Recognition System Based on CLAHE and Convolution Neural Network | |
Garud et al. | Fingerprint and Palmprint Recognition using neighborhood operation and FAST features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
AS | Assignment |
Owner name: BOARD OF TRUSTEES OF MICHIGAN STATE UNIVERSITY, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, ZIYUAN;LIU, XIAOMING;TRAN, LUAN;AND OTHERS;SIGNING DATES FROM 20210622 TO 20220324;REEL/FRAME:059389/0089 Owner name: FORD GLOBAL TECHNOLOGIES LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WAN, JIAN;PRAKAH-ASANTE, KWAKU O.;BLOMMER, MIKE;SIGNING DATES FROM 20210304 TO 20210310;REEL/FRAME:059388/0992 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |