US20220230437A1 - System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements - Google Patents

System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements Download PDF

Info

Publication number
US20220230437A1
US20220230437A1 US17/535,417 US202117535417A US2022230437A1 US 20220230437 A1 US20220230437 A1 US 20220230437A1 US 202117535417 A US202117535417 A US 202117535417A US 2022230437 A1 US2022230437 A1 US 2022230437A1
Authority
US
United States
Prior art keywords
video
video frames
frames
labeled
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/535,417
Inventor
Joe Chin
Todd EAGLIN
Sam Pigott
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sparrow Vision Inc
Original Assignee
Sparrow Vision Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sparrow Vision Inc filed Critical Sparrow Vision Inc
Priority to US17/535,417 priority Critical patent/US20220230437A1/en
Publication of US20220230437A1 publication Critical patent/US20220230437A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • G09B5/10Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations all student stations being capable of presenting the same information simultaneously

Definitions

  • the subject disclosure relates to video analysis processes, and more particularly, to a system and method for real-time automatic video key frame analysis for critical motion in complex athletic movements.
  • Video analysis of bio-mechanical movement is becoming very popular for applications such as sports motion and physical therapy.
  • identifying which frames in a video include the content of most significance can be a challenge when pouring through hundreds of frames in a sequence. Trying to do so quickly is prone to significant error.
  • a user would try to freeze frame different points in a general section of frames to show for example, a human subject where the body was deficient in a movement.
  • video can be analyzed remotely. The analyzer can extract select frames to show the subject user the flaws in mechanics.
  • single frames lack context and seeing the action in real-time or at a high frame rate is more desirable.
  • Pose estimation is prone to errors in lighting, background, and clothing type.
  • Using poses for intermediate data (feature vector) to other models injects human intelligence into the process, which has significant issues.
  • the processing power needed makes it near impossible to run analysis using pose estimation in real-time at high frame-rates on a mobile device.
  • a method for identifying critical points of motion in a video includes using a feature extraction neural network trained to identify features among a plurality of images of movements. The features are associated with a critical point of movement.
  • a video sequence is received including a plurality of video frames comprising a motion captured by a camera.
  • the feature extraction neural network identifies individual frames from the plurality of video frames. The identified individual frames include a known critical point in the captured motion based on identified features associated with a critical point of movement. The identified frames which show the critical points in the captured motion are displayed.
  • FIG. 1 is a flowchart of a process for generating an artificial intelligence module configured to analyze video for critical motion according to an illustrative embodiment of the subject technology.
  • FIG. 2 is a diagrammatic view of a system capturing a golf swing for use in critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 3 is a diagrammatic view of a system capturing a weightlifting movement for use in critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 4 is a diagrammatic view of a user viewing a software embodiment of the subject technology displaying critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 5 is a flowchart of a method of extracting critical motion video frames from a video according to an illustrative embodiment of the subject technology.
  • FIG. 6 is a flowchart of a method of automated identification of frames for video analysis of critical motion according to an illustrative embodiment of the subject technology.
  • FIG. 7 is a flowchart of a computer implemented method for converting manual labels into machine readable labels according to an illustrative embodiment of the subject technology.
  • FIG. 8 is a block diagram of an architecture for generating real-time automatic video key frame analysis for critical motion in complex athletic movements according to an illustrative embodiment of the subject technology.
  • FIG. 9 is a block diagram of a computing device for analyzing critical motion in complex athletic movements according to an illustrative embodiment of the subject technology.
  • illustrative embodiments of the subject technology provide processes which automate the identification of critical points of motion in a video that captured a sequence of movements.
  • aspects of the subject technology provide training for an artificial intelligence (A.I.) or machine learning module.
  • the embodiments described below are capable of engineering training data in such a way that builds a neural network configured to learn and identify the optimal features from a video sequence, which is far superior to using for example, pose-estimation. It is computationally faster and significantly less prone to error.
  • the automated process generates the display of video frames from a video that show the critical points of motion in a movement sequence.
  • aspects of the subject technology may be applied to analysis of a bio-mechanical movement.
  • a bio-mechanical movement For example, in sports, specific sequences of movements have the same general motion but identifying the inefficient placement of one body part to another during the motion is a challenge.
  • Finding the video frames that show these points in the movement when one subject person's body is different than another is susceptible to error when done manually or by other techniques such as pose estimation.
  • aspects of the subject technology improve the accuracy in identifying the critical points in videos between different subjects. Some embodiments include additional analysis once critical frames are found.
  • Embodiments may include a finely tuned neural network that identifies body parts for a particular critical frame (accuracy being key for body part identification). Body parts may then be evaluated for positioning deviation from an ideal position.
  • An ideal position may, in some applications, be associated with generating an optimal characteristic (for example, maximum power, successful contact, successful object path after contact, among other characteristics depending on the movement).
  • Other analysis including for example, finding objects (golf club, barbell, projectile) may also be performed on a specific critical frame giving more insight into user/player performance. For example, the position of a golf club may be considered crucial to performance in some critical frames.
  • FIG. 1 shows a method of training an A.I. component of the system according to an illustrative embodiment.
  • the method may include initial steps using expert analysis to generate a training set which the A.I. component uses as a baseline to begin automatically identifying video frames showing critical points of motion when presented with a video sequence.
  • step 10 may include a data labeling process. In the data labeling process, experts may select for labeling the exact frame in a video where a critical motion takes place.
  • Step 20 may include automatic key frame labeling.
  • the initial labeled data may be engineered using an automatic process that converts it from human labels to labels better understood by a machine.
  • image similarity scoring using for example, a structured similarity index may be included prior to training.
  • an automatic process may determine the optimal statistical range that other images should be counted as a keyframe for training.
  • the unlabeled video frames may be scored based on an image similarity algorithm for [0,1] ⁇ 1 being exactly the same.
  • Other embodiments may determine the median of the distribution after removing outliers.
  • Another embodiment may compute the mean/standard deviation and discard anything outside 1 standard deviation.
  • Step 30 may include training feature extraction.
  • the engineered labels may then be used to train a feature extraction neural network.
  • a custom constructed neural network may be generated which is ideal for extracting high level features capable of distinguishing critical points of motion from an image.
  • Embodiments may integrate a unique and custom loss function which the model uses during training to evaluate itself and learn an optimal maximum. Through trial and error, and research, the optimal amount of stochastic augmentations which may be applied to input images may be found. This step may alter images in such a way and just enough that challenges the neural network to learn better. Too little and the network may not reach an optimal maximum: too much and the network may not learn anything.
  • Step 40 may include training a temporal network.
  • the feature extraction network may be used to train the temporal model.
  • a custom-built neural network may be developed, ideal for understanding high level features over time. These features may be extracted from the neural network in the previous step. In this step the same loss function may be utilized to boost training.
  • the final output of the above process may be a neural network embodiment which may be executed/performed by computer hardware in real-time to analyze videos.
  • the neural network takes in sequences of images, for example, video, and then identifies the best image from the set that matches a desired critical motion to analyze further.
  • some embodiments may include a software embodiment which executes the processes of identifying the frames with critical points of motion for display to an end user.
  • FIGS. 2 and 3 show example applications using aspects of the subject technology.
  • a subject 60 is performing a golf swing.
  • the subject 60 's swing may be captured by a digital camera 50 .
  • the captured video may be transferred to another computing device which runs a software embodiment that identifies critical motion and displays one or more frames.
  • FIG. 3 shows a subject 80 performing a weightlifting motion.
  • Some embodiments may include the software loaded onto a mobile computing device 70 (for example, a smart phone, portable tablet, wearable smart device, etc.) which includes a camera and may include a digital display.
  • the user may run the software embodiment on the captured video.
  • the software may extract from the captured video a video frame with a critical point of motion for display on the computing device.
  • FIG. 4 shows an example use case where video captured previously (for example, by the camera 50 or mobile computing device 70 ) is loaded onto a computing device with a display.
  • a plurality of frames 100 from a video sequence of a golf swing are shown simultaneously on the display in temporal order.
  • the software may identify a frame 110 as showing a critical point in the motion of a golf swing. Identification of the frame 110 may be highlighted in some way for a user 90 to understand which is the frame that requires attention. The user 90 may then look at the frame 110 to analyze further for issues, if any, present in the swing motion.
  • FIG. 5 shows a process 500 of automated video frame processing according to an illustrative embodiment of the subject technology.
  • the process 500 may include receiving 510 video sequencing data of a person's motion captured by a camera. Individual images from the video sequence may be processed 520 for feature extraction by a neural network. The extracted features are forwarded 530 to the temporal neural network. The output from the temporal neural network model may be converted 540 to timestamps from the original video sequence.
  • FIG. 6 shows a process 600 for automated identification of frames for video analysis of critical motion according to an illustrative embodiment.
  • Raw video sequence data may be retrieved 605 from a camera.
  • frames in the raw video data may be selected and/or labeled 610 with timestamps.
  • the frames selected may be chosen based on showing a critical motion within a movement sequence inside the frame.
  • the labels may be human determined.
  • a processor converts 615 the human determined labels into machine readable labels.
  • the processor may train 620 a neural network for feature extraction using the machine determined labels.
  • the features used for training the neural network may be from the raw video frames and in some embodiments, other video.
  • Output data from the feature extractor neural network may be used to train 625 the temporal neural network.
  • the feature extractor module converts a single two-dimensional image into a compressed representation optimal for temporal training.
  • the temporal model uses that representation to then learn underlying sequence of events and when the critical frames occur.
  • the temporal neural network is being trained on the input feature vectors from the feature extractor neural network. Then the temporal neural network is predicting the critical key frames using these feature vectors.
  • Training is split into two steps in some embodiments because the computational power/memory requirement to construct a single monolithic neural network that goes from video images directly to critical frames may be too heavy for some computing systems.
  • the subject approach is highly optimized since the process(es) require significantly less computational power/memory and is significantly faster to construct. This may be done by splitting training into two steps (feature extractor and temporal model).
  • the feature extractor neural network essentially transforms/maps an image to a latent (hidden space) representation.
  • the latent space is a compressed representation that optimizes training for the temporal model.
  • the temporal model uses the latent space representation to predict the critical frames.
  • the temporal model also learns the underlying temporal patterns between critical frames; for example, the sequence of events and the time that passes between critical frames.
  • FIG. 7 shows a computer implemented method 700 for converting human labels into machine readable labels according to an illustrative embodiment.
  • the processor receives 710 human labeled video frames. For a unique timestamp, a video frame is extracted 710 . The extracted video frame is labeled 720 and placed in a labeled category. Some or all of the remaining frames within the same timestamp are extracted and placed 730 into an unlabeled category. A temporal distance may be calculated 740 between all non-labeled frames in a video and the labeled critical key frames. Unlabeled frames are associated 750 with the labeled frame from which they have the shortest temporal distance.
  • the processor For each labeled frame and their unlabeled frames, the processor generates 760 an image comparison score based on similarity.
  • the similarity score may be calculated using different algorithms. For example, one possible similarity algorithm may be based on determining a normalized root mean squared error. Another algorithm may be based on a structural similarity index. As will be understood, other similarity score bases may be used without departing from the scope of the disclosed embodiments.
  • scores may be aggregated and statistically analyzed 770 to find optimal ranges. For each labeled frame and associated unlabeled frames, if an unlabeled frame similarity score falls within a statistical range, the unlabeled frame may be labeled 780 . Any remaining unlabeled frames may be placed 790 into a no-label category.
  • the statistical range for determining whether an unlabeled frame should be converted into a labeled frame may be based on a distribution curve of all the similarity values calculated for a specific keyframe. For example, a normalized root mean squared error may be used as a similarity algorithm. Using this algorithm comparing the specific keyframe to itself would produce a value 0.0 because they are exactly equal. As a particular frame gets further away in time from the keyframe the similarity value may go up stopping at 1.0 because it's normalized from 0.0 to 1.0. The system may collect all these values for a specific keyframe across all the videos in the training dataset. Assuming for example, there are 2000 similarity metrics for a keyframe, a statistical analysis may include removing outliers.
  • the median value of the list of similarity metrics may be determined. If the median value is for example, 0.25, this value may be used to determine which frames may receive a label from those un-labeled frames associated to a keyframe. Based on the similarity algorithm, the higher the value the more dis-similar the frame is. So, any frame with a score greater than 0.25 (as an example using the illustrate value above), is placed into the no-label category. Inversely if the score is less than or equal to 0.25 it is labeled as keyframe.
  • users may benefit from the various aspects of the subject technology when applied to for example, analyzing the motion of a movement to identify flaws or inefficiencies in the movement (for example, hitches, improper alignment of a body part, etc.)
  • the output is provided nearly instantaneously in a software application embodiment which displays a frame with a critical point of motion right after the video is taken. Users are able to use the application anywhere and may do so on site where they are moving/performing a motion. For examples, golfers may video their golf stroke during a round and through the subject technology, see on display why their shot went awry.
  • FIG. 8 illustrates an example architecture 800 for generating real-time automatic video key frame analysis for critical motion in complex athletic movements.
  • Architecture 800 includes a network 806 that allows various computing devices 802 ( 1 ) to 802 (N) to communicate with each other, as well as other elements that are connected to the network 806 , such as an image input data source 812 , an image analytics service server 816 , and the cloud 820 .
  • the network 806 may be, without limitation, a local area network (“LAN”), a virtual private network (“VPN”), a cellular network, the Internet, or a combination thereof.
  • the network 806 may include a mobile network that is communicatively coupled to a private network, sometimes referred to as an intranet that provides various ancillary services, such as communication with various application stores, libraries, and the Internet.
  • the network 806 allows the image critical points analytics engine 810 , which is a software program running on the image analytics service server 816 , to communicate with the image input data source 812 , computing devices 802 ( 1 ) to 802 (N), and the cloud 820 , to provide analysis of critical points in complex movements.
  • the data processing is performed at least in part on the cloud 820 .
  • Image data may be in the form of video sequence data files that may be communicated over the network 806 with the image critical points analytics engine 810 of the image analytics service server 816 .
  • user devices typically take the form of portable handsets, smart-phones, tablet computers, personal digital assistants (PDAs), and smart watches, which generally include cameras integrated into their respective device packaging.
  • PDAs personal digital assistants
  • a computing device e.g., 802 (N) may send a request 103 (N) to image critical points analytics engine 810 to analyze the features of video sequence data captured by the computing device 802 (N), so that critical points in the motion captured are identified and analyzed for deviation from an ideal form/positioning.
  • image input data source 812 and image critical points analytics engine 810 are illustrated by way of example to be on different platforms, it will be understood that in various embodiments, the image input data source 812 and the image analytics service server 816 may be combined. In other embodiments, these computing platforms may be implemented by virtual computing devices in the form of virtual machines or software containers that are hosted in a cloud 820 , thereby providing an elastic architecture for processing and storage.
  • FIG. 9 is a functional block diagram illustration of a computer hardware platform that can communicate with various networked components, such as a training input data source, the cloud, etc.
  • FIG. 9 illustrates a network or host computer platform 900 , as may be used to implement a server, such as the image analytics service server 816 of FIG. 8 .
  • the computer platform 900 may include a central processing unit (CPU) 904 , a hard disk drive (HDD) 906 , random access memory (RAM) and/or read only memory (ROM) 908 , a keyboard 910 , a mouse 912 , a display 914 , and a communication interface 916 , which are connected to a system bus 902 .
  • CPU central processing unit
  • HDD hard disk drive
  • RAM random access memory
  • ROM read only memory
  • the HDD 906 has capabilities that include storing a program that can execute various processes, such as the neural network engine 940 , in a manner described herein.
  • the neural network engine 940 may be part of the image analytics service server 816 of FIG. 8 .
  • the neural network engine 940 may be configured to analyze image data for critical points during a complex movement under the embodiments described above.
  • the neural network engine 940 may have various modules configured to perform different functions.
  • the deployment engine 940 may include sub-modules.
  • a feature extraction module 942 For example, a feature extraction module 942 , an image temporal analyzer 944 , a temporal distance calculator 946 , an image similarity engine 948 , an image label categorization module 950 , and a machine learning module 956 .
  • the functions of these sub-modules have been discussed above in the various flowcharts of for example, FIGS. 1, 5, 6, and 7 .
  • the computer platform 900 may represent an end user computer (for example, computing devices 802 ( 1 ) to 802 (N)) of FIG. 8 .
  • the various end user computer platforms 900 may include mixed configurations of hardware elements and software packages.
  • aspects of the disclosed invention may be embodied as a system, method or process, or computer program product. Accordingly, aspects of the disclosed invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Furthermore, aspects of the disclosed invention may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
  • a computer readable storage medium may be any tangible or non-transitory medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology.
  • a disclosure relating to an aspect may apply to all configurations, or one or more configurations.
  • An aspect may provide one or more examples.
  • a phrase such as an aspect may refer to one or more aspects and vice versa.
  • a phrase such as an “embodiment” does not imply that such embodiment is essential to the subject technology or that such embodiment applies to all configurations of the subject technology.
  • a disclosure relating to an embodiment may apply to all embodiments, or one or more embodiments.
  • An embodiment may provide one or more examples.
  • a phrase such an embodiment may refer to one or more embodiments and vice versa.
  • a phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology.
  • a disclosure relating to a configuration may apply to all configurations, or one or more configurations.
  • a configuration may provide one or more examples.
  • a phrase such a configuration may refer to one or more configurations and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

A system and method train a neural network to process video for the identification of critical points of motion in a movement. Embodiments include a process for training the neural network and a process for identifying the video frames including a critical point based on the criteria of the neural network. An exemplary embodiment includes a software application which may be accessible through a computing device to analyze videos on site.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application having Ser. No. 63/083,286 filed Sep. 25, 2020, which is hereby incorporated by reference herein in its entirety.
  • FIELD
  • The subject disclosure relates to video analysis processes, and more particularly, to a system and method for real-time automatic video key frame analysis for critical motion in complex athletic movements.
  • BACKGROUND
  • Video analysis of bio-mechanical movement is becoming very popular for applications such as sports motion and physical therapy. However, identifying which frames in a video include the content of most significance can be a challenge when pouring through hundreds of frames in a sequence. Trying to do so quickly is prone to significant error. Conventionally, a user would try to freeze frame different points in a general section of frames to show for example, a human subject where the body was deficient in a movement. With the proliferation of mobile computing devices, video can be analyzed remotely. The analyzer can extract select frames to show the subject user the flaws in mechanics. However, single frames lack context and seeing the action in real-time or at a high frame rate is more desirable.
  • Some approaches have attempted to automate the process of identifying critical points in motion. Previous solutions may use for example, pose estimation to analyze movement, which is a flawed technique and very computationally slow and inefficient. Pose estimation is prone to errors in lighting, background, and clothing type. Using poses for intermediate data (feature vector) to other models injects human intelligence into the process, which has significant issues. Moreover, the processing power needed makes it near impossible to run analysis using pose estimation in real-time at high frame-rates on a mobile device.
  • SUMMARY
  • In one aspect of the disclosure, a method for identifying critical points of motion in a video is disclosed. The method includes using a feature extraction neural network trained to identify features among a plurality of images of movements. The features are associated with a critical point of movement. A video sequence is received including a plurality of video frames comprising a motion captured by a camera. The feature extraction neural network identifies individual frames from the plurality of video frames. The identified individual frames include a known critical point in the captured motion based on identified features associated with a critical point of movement. The identified frames which show the critical points in the captured motion are displayed.
  • It is understood that other configurations of the subject technology will become readily apparent to those skilled in the art from the following detailed description, wherein various configurations of the subject technology are shown and described by way of illustration. As will be realized, the subject technology is capable of other and different configurations and its several details are capable of modification in various other respects, all without departing from the scope of the subject technology. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a process for generating an artificial intelligence module configured to analyze video for critical motion according to an illustrative embodiment of the subject technology.
  • FIG. 2 is a diagrammatic view of a system capturing a golf swing for use in critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 3 is a diagrammatic view of a system capturing a weightlifting movement for use in critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 4 is a diagrammatic view of a user viewing a software embodiment of the subject technology displaying critical motion analysis according to an illustrative embodiment of the subject technology.
  • FIG. 5 is a flowchart of a method of extracting critical motion video frames from a video according to an illustrative embodiment of the subject technology.
  • FIG. 6 is a flowchart of a method of automated identification of frames for video analysis of critical motion according to an illustrative embodiment of the subject technology.
  • FIG. 7 is a flowchart of a computer implemented method for converting manual labels into machine readable labels according to an illustrative embodiment of the subject technology.
  • FIG. 8 is a block diagram of an architecture for generating real-time automatic video key frame analysis for critical motion in complex athletic movements according to an illustrative embodiment of the subject technology.
  • FIG. 9 is a block diagram of a computing device for analyzing critical motion in complex athletic movements according to an illustrative embodiment of the subject technology.
  • DETAILED DESCRIPTION
  • The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject technology. However, it will be apparent to those skilled in the art that the subject technology may be practiced without these specific details. Like or similar components are labeled with identical element numbers for ease of understanding.
  • In general, and referring to the Figures, illustrative embodiments of the subject technology provide processes which automate the identification of critical points of motion in a video that captured a sequence of movements. Aspects of the subject technology provide training for an artificial intelligence (A.I.) or machine learning module. The embodiments described below are capable of engineering training data in such a way that builds a neural network configured to learn and identify the optimal features from a video sequence, which is far superior to using for example, pose-estimation. It is computationally faster and significantly less prone to error. In another aspect, the automated process generates the display of video frames from a video that show the critical points of motion in a movement sequence.
  • In an illustrative embodiment, aspects of the subject technology may be applied to analysis of a bio-mechanical movement. For example, in sports, specific sequences of movements have the same general motion but identifying the inefficient placement of one body part to another during the motion is a challenge. Generally, there are certain points in the motion that are more critical to generating an outcome of success from the movement than others. Finding the video frames that show these points in the movement when one subject person's body is different than another is susceptible to error when done manually or by other techniques such as pose estimation. Aspects of the subject technology improve the accuracy in identifying the critical points in videos between different subjects. Some embodiments include additional analysis once critical frames are found. Embodiments may include a finely tuned neural network that identifies body parts for a particular critical frame (accuracy being key for body part identification). Body parts may then be evaluated for positioning deviation from an ideal position. An ideal position may, in some applications, be associated with generating an optimal characteristic (for example, maximum power, successful contact, successful object path after contact, among other characteristics depending on the movement). Other analysis including for example, finding objects (golf club, barbell, projectile) may also be performed on a specific critical frame giving more insight into user/player performance. For example, the position of a golf club may be considered crucial to performance in some critical frames.
  • FIG. 1 shows a method of training an A.I. component of the system according to an illustrative embodiment. The method may include initial steps using expert analysis to generate a training set which the A.I. component uses as a baseline to begin automatically identifying video frames showing critical points of motion when presented with a video sequence. For example, step 10 may include a data labeling process. In the data labeling process, experts may select for labeling the exact frame in a video where a critical motion takes place.
  • Step 20 may include automatic key frame labeling. In the keyframe labeling, the initial labeled data may be engineered using an automatic process that converts it from human labels to labels better understood by a machine. In some embodiments, image similarity scoring using for example, a structured similarity index may be included prior to training. In some embodiments, using the labeled data, an automatic process may determine the optimal statistical range that other images should be counted as a keyframe for training. For example, the unlabeled video frames may be scored based on an image similarity algorithm for [0,1]−1 being exactly the same. Other embodiments may determine the median of the distribution after removing outliers. Another embodiment may compute the mean/standard deviation and discard anything outside 1 standard deviation.
  • Step 30 may include training feature extraction. In training for feature extraction, the engineered labels may then be used to train a feature extraction neural network. A custom constructed neural network may be generated which is ideal for extracting high level features capable of distinguishing critical points of motion from an image. Embodiments may integrate a unique and custom loss function which the model uses during training to evaluate itself and learn an optimal maximum. Through trial and error, and research, the optimal amount of stochastic augmentations which may be applied to input images may be found. This step may alter images in such a way and just enough that challenges the neural network to learn better. Too little and the network may not reach an optimal maximum: too much and the network may not learn anything.
  • Step 40 may include training a temporal network. The feature extraction network may be used to train the temporal model. In some embodiments, a custom-built neural network may be developed, ideal for understanding high level features over time. These features may be extracted from the neural network in the previous step. In this step the same loss function may be utilized to boost training.
  • The final output of the above process may be a neural network embodiment which may be executed/performed by computer hardware in real-time to analyze videos. The neural network takes in sequences of images, for example, video, and then identifies the best image from the set that matches a desired critical motion to analyze further. Once the neural network is trained, some embodiments may include a software embodiment which executes the processes of identifying the frames with critical points of motion for display to an end user.
  • FIGS. 2 and 3 show example applications using aspects of the subject technology. In these figures, two different sports movements are being performed. In FIG. 2, a subject 60 is performing a golf swing. In some embodiments, the subject 60's swing may be captured by a digital camera 50. The captured video may be transferred to another computing device which runs a software embodiment that identifies critical motion and displays one or more frames. FIG. 3 shows a subject 80 performing a weightlifting motion. Some embodiments may include the software loaded onto a mobile computing device 70 (for example, a smart phone, portable tablet, wearable smart device, etc.) which includes a camera and may include a digital display. The user may run the software embodiment on the captured video. The software may extract from the captured video a video frame with a critical point of motion for display on the computing device.
  • FIG. 4 shows an example use case where video captured previously (for example, by the camera 50 or mobile computing device 70) is loaded onto a computing device with a display. A plurality of frames 100 from a video sequence of a golf swing are shown simultaneously on the display in temporal order. The software may identify a frame 110 as showing a critical point in the motion of a golf swing. Identification of the frame 110 may be highlighted in some way for a user 90 to understand which is the frame that requires attention. The user 90 may then look at the frame 110 to analyze further for issues, if any, present in the swing motion.
  • FIG. 5 shows a process 500 of automated video frame processing according to an illustrative embodiment of the subject technology. The process 500 may include receiving 510 video sequencing data of a person's motion captured by a camera. Individual images from the video sequence may be processed 520 for feature extraction by a neural network. The extracted features are forwarded 530 to the temporal neural network. The output from the temporal neural network model may be converted 540 to timestamps from the original video sequence.
  • FIG. 6 shows a process 600 for automated identification of frames for video analysis of critical motion according to an illustrative embodiment. Raw video sequence data may be retrieved 605 from a camera. In some embodiments, frames in the raw video data may be selected and/or labeled 610 with timestamps. The frames selected may be chosen based on showing a critical motion within a movement sequence inside the frame. The labels may be human determined. A processor converts 615 the human determined labels into machine readable labels. The processor may train 620 a neural network for feature extraction using the machine determined labels. The features used for training the neural network may be from the raw video frames and in some embodiments, other video. Output data from the feature extractor neural network may be used to train 625 the temporal neural network.
  • In general, the feature extractor module converts a single two-dimensional image into a compressed representation optimal for temporal training. The temporal model uses that representation to then learn underlying sequence of events and when the critical frames occur. The temporal neural network is being trained on the input feature vectors from the feature extractor neural network. Then the temporal neural network is predicting the critical key frames using these feature vectors. Training is split into two steps in some embodiments because the computational power/memory requirement to construct a single monolithic neural network that goes from video images directly to critical frames may be too heavy for some computing systems. The subject approach is highly optimized since the process(es) require significantly less computational power/memory and is significantly faster to construct. This may be done by splitting training into two steps (feature extractor and temporal model). The feature extractor neural network essentially transforms/maps an image to a latent (hidden space) representation. The latent space is a compressed representation that optimizes training for the temporal model. The temporal model uses the latent space representation to predict the critical frames. The temporal model also learns the underlying temporal patterns between critical frames; for example, the sequence of events and the time that passes between critical frames.
  • FIG. 7 shows a computer implemented method 700 for converting human labels into machine readable labels according to an illustrative embodiment. The processor receives 710 human labeled video frames. For a unique timestamp, a video frame is extracted 710. The extracted video frame is labeled 720 and placed in a labeled category. Some or all of the remaining frames within the same timestamp are extracted and placed 730 into an unlabeled category. A temporal distance may be calculated 740 between all non-labeled frames in a video and the labeled critical key frames. Unlabeled frames are associated 750 with the labeled frame from which they have the shortest temporal distance. For each labeled frame and their unlabeled frames, the processor generates 760 an image comparison score based on similarity. The similarity score may be calculated using different algorithms. For example, one possible similarity algorithm may be based on determining a normalized root mean squared error. Another algorithm may be based on a structural similarity index. As will be understood, other similarity score bases may be used without departing from the scope of the disclosed embodiments. In some embodiments, scores may be aggregated and statistically analyzed 770 to find optimal ranges. For each labeled frame and associated unlabeled frames, if an unlabeled frame similarity score falls within a statistical range, the unlabeled frame may be labeled 780. Any remaining unlabeled frames may be placed 790 into a no-label category.
  • The statistical range for determining whether an unlabeled frame should be converted into a labeled frame may be based on a distribution curve of all the similarity values calculated for a specific keyframe. For example, a normalized root mean squared error may be used as a similarity algorithm. Using this algorithm comparing the specific keyframe to itself would produce a value 0.0 because they are exactly equal. As a particular frame gets further away in time from the keyframe the similarity value may go up stopping at 1.0 because it's normalized from 0.0 to 1.0. The system may collect all these values for a specific keyframe across all the videos in the training dataset. Assuming for example, there are 2000 similarity metrics for a keyframe, a statistical analysis may include removing outliers. For simplicity sake to explain the process, for a process that includes a basic outlier removal, the median value of the list of similarity metrics may be determined. If the median value is for example, 0.25, this value may be used to determine which frames may receive a label from those un-labeled frames associated to a keyframe. Based on the similarity algorithm, the higher the value the more dis-similar the frame is. So, any frame with a score greater than 0.25 (as an example using the illustrate value above), is placed into the no-label category. Inversely if the score is less than or equal to 0.25 it is labeled as keyframe.
  • During the development of one embodiment, domain experts labelled a start and end point for a critical time period in a video. This resulted in inconsistent data. due to human's inability to consistently perceive tiny visual changes in images across many video sequences. These aggregated inconsistencies can result in too much randomness that are difficult to be modeled statistically. As will be appreciated, the automated portions of the subject technology (for example, via computer recognition) are more adept at identifying tiny visual changes and can consistently manage these across a wide variety of video sequences. This produces a statistically stable distribution.
  • As will be appreciated, users may benefit from the various aspects of the subject technology when applied to for example, analyzing the motion of a movement to identify flaws or inefficiencies in the movement (for example, hitches, improper alignment of a body part, etc.) The output is provided nearly instantaneously in a software application embodiment which displays a frame with a critical point of motion right after the video is taken. Users are able to use the application anywhere and may do so on site where they are moving/performing a motion. For examples, golfers may video their golf stroke during a round and through the subject technology, see on display why their shot went awry.
  • FIG. 8 illustrates an example architecture 800 for generating real-time automatic video key frame analysis for critical motion in complex athletic movements. Architecture 800 includes a network 806 that allows various computing devices 802(1) to 802(N) to communicate with each other, as well as other elements that are connected to the network 806, such as an image input data source 812, an image analytics service server 816, and the cloud 820.
  • The network 806 may be, without limitation, a local area network (“LAN”), a virtual private network (“VPN”), a cellular network, the Internet, or a combination thereof. For example, the network 806 may include a mobile network that is communicatively coupled to a private network, sometimes referred to as an intranet that provides various ancillary services, such as communication with various application stores, libraries, and the Internet. The network 806 allows the image critical points analytics engine 810, which is a software program running on the image analytics service server 816, to communicate with the image input data source 812, computing devices 802(1) to 802(N), and the cloud 820, to provide analysis of critical points in complex movements. In one embodiment, the data processing is performed at least in part on the cloud 820.
  • For purposes of later discussion, several user devices appear in the drawing, to represent some examples of the computing devices that may be the source of image data. Image data may be in the form of video sequence data files that may be communicated over the network 806 with the image critical points analytics engine 810 of the image analytics service server 816. Today, user devices typically take the form of portable handsets, smart-phones, tablet computers, personal digital assistants (PDAs), and smart watches, which generally include cameras integrated into their respective device packaging.
  • For example, a computing device (e.g., 802(N)) may send a request 103(N) to image critical points analytics engine 810 to analyze the features of video sequence data captured by the computing device 802(N), so that critical points in the motion captured are identified and analyzed for deviation from an ideal form/positioning.
  • While the image input data source 812 and image critical points analytics engine 810 are illustrated by way of example to be on different platforms, it will be understood that in various embodiments, the image input data source 812 and the image analytics service server 816 may be combined. In other embodiments, these computing platforms may be implemented by virtual computing devices in the form of virtual machines or software containers that are hosted in a cloud 820, thereby providing an elastic architecture for processing and storage.
  • As discussed above, functions relating critical point motion analysis of the subject disclosure can be performed with the use of one or more computing devices connected for data communication via wireless or wired communication, as shown in FIG. 8. FIG. 9 is a functional block diagram illustration of a computer hardware platform that can communicate with various networked components, such as a training input data source, the cloud, etc. In particular, FIG. 9 illustrates a network or host computer platform 900, as may be used to implement a server, such as the image analytics service server 816 of FIG. 8.
  • The computer platform 900 may include a central processing unit (CPU) 904, a hard disk drive (HDD) 906, random access memory (RAM) and/or read only memory (ROM) 908, a keyboard 910, a mouse 912, a display 914, and a communication interface 916, which are connected to a system bus 902.
  • In one embodiment, the HDD 906, has capabilities that include storing a program that can execute various processes, such as the neural network engine 940, in a manner described herein. The neural network engine 940 may be part of the image analytics service server 816 of FIG. 8. Generally, the neural network engine 940 may be configured to analyze image data for critical points during a complex movement under the embodiments described above. The neural network engine 940 may have various modules configured to perform different functions. In some embodiments, the deployment engine 940 may include sub-modules. For example, a feature extraction module 942, an image temporal analyzer 944, a temporal distance calculator 946, an image similarity engine 948, an image label categorization module 950, and a machine learning module 956. The functions of these sub-modules have been discussed above in the various flowcharts of for example, FIGS. 1, 5, 6, and 7.
  • In another embodiment, the computer platform 900 may represent an end user computer (for example, computing devices 802(1) to 802(N)) of FIG. 8. In this context, the various end user computer platforms 900 may include mixed configurations of hardware elements and software packages.
  • As will be appreciated by one skilled in the art, aspects of the disclosed invention may be embodied as a system, method or process, or computer program product. Accordingly, aspects of the disclosed invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Furthermore, aspects of the disclosed invention may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
  • Any combination of one or more computer readable media may be utilized. In the context of this disclosure, a computer readable storage medium may be any tangible or non-transitory medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • Aspects of the disclosed invention are described below with reference to block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. The previous description provides various examples of the subject technology, and the subject technology is not limited to these examples. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. Pronouns in the masculine (e.g., his) include the feminine and neuter gender (e.g., her and its) and vice versa. Headings and subheadings, if any, are used for convenience only and do not limit the invention.
  • A phrase such as an “aspect” does not imply that such aspect is essential to the subject technology or that such aspect applies to all configurations of the subject technology. A disclosure relating to an aspect may apply to all configurations, or one or more configurations. An aspect may provide one or more examples. A phrase such as an aspect may refer to one or more aspects and vice versa. A phrase such as an “embodiment” does not imply that such embodiment is essential to the subject technology or that such embodiment applies to all configurations of the subject technology. A disclosure relating to an embodiment may apply to all embodiments, or one or more embodiments. An embodiment may provide one or more examples. A phrase such an embodiment may refer to one or more embodiments and vice versa. A phrase such as a “configuration” does not imply that such configuration is essential to the subject technology or that such configuration applies to all configurations of the subject technology. A disclosure relating to a configuration may apply to all configurations, or one or more configurations. A configuration may provide one or more examples. A phrase such a configuration may refer to one or more configurations and vice versa.
  • The word “exemplary” is used herein to mean “serving as an example or illustration.” Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs.

Claims (17)

What is claimed is:
1. A method for identifying critical points of motion in a video, comprising:
using a feature extraction neural network trained to identify features among a plurality of images of movements, wherein the features are associated with a critical point of movement;
receiving a video sequence including a plurality of video frames comprising a motion captured by a camera;
identifying, by the feature extraction neural network, individual frames from the plurality of video frames, wherein the identified individual frames include a known critical point in the captured motion based on identified features associated with a critical point of movement; and
displaying the identified frames which show the critical points in the captured motion.
2. The method of claim 1, further comprising training a temporal neural network based on input from the feature extraction neural network.
3. The method of claim 2, further comprising converting an output of the temporal neural network model into time stamps of the plurality of video frames.
4. A method of automated identification of frames for video analysis of critical motion, comprising:
receiving a raw sequence of video data;
receiving selections of the raw sequence of video data, wherein the selections include manual entered labels;
automatically converting the manual entered labels to machine readable labels;
training a neural network feature extractor based on the machine readable labels to identify video frames that include critical points of motion.
5. The method of claim 4, further comprising training a temporal neural network based on an output of the neural network feature extractor.
6. The method of claim 4, further comprising:
extracting a selected video frame for each unique timestamp;
labeling the extracted video frames;
grouping remaining unlabeled video frames; and
calculating a temporal distance between labeled video frames and unlabeled video frames.
7. The method of claim 6, further comprising associating unlabeled video frames with a labeled video frame based on a shortest temporal distance between each unlabeled video frame to the labeled video frames.
8. The method of claim 7, further comprising generating a similarity score for each labeled video frame and its associated unlabeled video frames based on an image comparison of content in the labeled video frame and the labeled video frame's associated unlabeled video frames.
9. The method of claim 8, further comprising labeling one of the unlabeled video frames in the event the similarity score for the labeled video frame and the labeled video frame's associated unlabeled video frames falls within a threshold range of values.
10. The method of claim 9, further comprising using the unlabeled video frames that has been labeled in training the neural network feature extractor to identify video frames that include critical points of motion.
11. A computer program product for automated identification of frames for video analysis of critical motion, the computer program product comprising:
one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising:
receiving a raw sequence of video data;
receiving selections of the raw sequence of video data, wherein the selections include manual entered labels;
automatically converting the manual entered labels to machine readable labels;
training a neural network feature extractor based on the machine readable labels to identify video frames that include critical points of motion.
12. The computer program product of claim 11, wherein the program instructions further comprise training a temporal neural network based on an output of the neural network feature extractor.
13. The computer program product of claim 11, wherein the program instructions further comprise:
extracting a selected video frame for each unique timestamp;
labeling the extracted video frames;
grouping remaining unlabeled video frames; and
calculating a temporal distance between labeled video frames and unlabeled video frames.
14. The computer program product of claim 13, wherein the program instructions further comprise associating unlabeled video frames with a labeled video frame based on a shortest temporal distance between each unlabeled video frame to the labeled video frames.
15. The computer program product of claim 14, wherein the program instructions further comprise generating a similarity score for each labeled video frame and its associated unlabeled video frames based on an image comparison of content in the labeled video frame and the labeled video frame's associated unlabeled video frames.
16. The computer program product of claim 15, wherein the program instructions further comprise labeling one of the unlabeled video frames in the event the similarity score for the labeled video frame and the labeled video frame's associated unlabeled video frames falls within a threshold range of values.
17. The computer program product of claim 16, wherein the program instructions further comprise using the unlabeled video frames that has been labeled in training the neural network feature extractor to identify video frames that include critical points of motion.
US17/535,417 2020-09-25 2021-11-24 System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements Pending US20220230437A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/535,417 US20220230437A1 (en) 2020-09-25 2021-11-24 System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063083286P 2020-09-25 2020-09-25
US17/535,417 US20220230437A1 (en) 2020-09-25 2021-11-24 System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements

Publications (1)

Publication Number Publication Date
US20220230437A1 true US20220230437A1 (en) 2022-07-21

Family

ID=82405185

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/521,638 Active US11972699B1 (en) 2020-09-25 2021-11-08 Virtualized education system that tracks student attendance and provides a remote learning platform
US17/535,417 Pending US20220230437A1 (en) 2020-09-25 2021-11-24 System and method for real-time automatic video key frame analysis for critical motion in complex athletic movements

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US17/521,638 Active US11972699B1 (en) 2020-09-25 2021-11-08 Virtualized education system that tracks student attendance and provides a remote learning platform

Country Status (1)

Country Link
US (2) US11972699B1 (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030073064A1 (en) * 2001-10-12 2003-04-17 Lee Riggs Methods and systems for registering and authenticating recipients of training provided through data communications networks to remote electronic devices
US20040151347A1 (en) * 2002-07-19 2004-08-05 Helena Wisniewski Face recognition system and method therefor
US8353705B2 (en) * 2004-08-16 2013-01-15 Incom Corporation Attendance tracking system
WO2011149558A2 (en) * 2010-05-28 2011-12-01 Abelow Daniel H Reality alternate
US20130280689A1 (en) * 2011-12-22 2013-10-24 Via Response Technologies, LLC Interactive Presentation System and Associated Methods
US20130330704A1 (en) * 2012-06-12 2013-12-12 American Virtual Academy Student information system
WO2014134217A1 (en) * 2013-02-26 2014-09-04 Noland Bryan Lee System and method of automated gunshot emergency response system
US20140370484A1 (en) * 2013-06-14 2014-12-18 edMosphere LLC Apparatus and method for measuring school climate
CN103345719B (en) * 2013-07-03 2016-12-07 青岛大学 A kind of experimental teaching performance rating system

Also Published As

Publication number Publication date
US11972699B1 (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN110941990B (en) Method and device for evaluating human body actions based on skeleton key points
CN108256433B (en) Motion attitude assessment method and system
CN109145784B (en) Method and apparatus for processing video
WO2021114892A1 (en) Environmental semantic understanding-based body movement recognition method, apparatus, device, and storage medium
WO2018019126A1 (en) Video category identification method and device, data processing device and electronic apparatus
CN108288051B (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
US20190392587A1 (en) System for predicting articulated object feature location
WO2018228218A1 (en) Identification method, computing device, and storage medium
CN108499107B (en) Control method and device for virtual role in virtual reality and storage medium
US9013591B2 (en) Method and system of determing user engagement and sentiment with learned models and user-facing camera images
Prasuhn et al. A HOG-based hand gesture recognition system on a mobile device
CN108197592B (en) Information acquisition method and device
US11551479B2 (en) Motion behavior pattern classification method, system and device
US20210390316A1 (en) Method for identifying a video frame of interest in a video sequence, method for generating highlights, associated systems
CN109523344A (en) Product information recommended method, device, computer equipment and storage medium
CN107590473B (en) Human face living body detection method, medium and related device
US11403882B2 (en) Scoring metric for physical activity performance and tracking
Bloom et al. Linear latent low dimensional space for online early action recognition and prediction
CN105426929A (en) Object shape alignment device, object processing device and methods thereof
US11449718B2 (en) Methods and systems of real time movement classification using a motion capture suit
CN109063611A (en) A kind of face recognition result treating method and apparatus based on video semanteme
CN110610169B (en) Picture marking method and device, storage medium and electronic device
Kayaoglu et al. Affect recognition using key frame selection based on minimum sparse reconstruction
CN114049581A (en) Weak supervision behavior positioning method and device based on action fragment sequencing
US11450010B2 (en) Repetition counting and classification of movements systems and methods

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED