WO2024059156A1 - Systems and methods for marksmanship improvement through machine learning - Google Patents

Systems and methods for marksmanship improvement through machine learning Download PDF

Info

Publication number
WO2024059156A1
WO2024059156A1 PCT/US2023/032669 US2023032669W WO2024059156A1 WO 2024059156 A1 WO2024059156 A1 WO 2024059156A1 US 2023032669 W US2023032669 W US 2023032669W WO 2024059156 A1 WO2024059156 A1 WO 2024059156A1
Authority
WO
WIPO (PCT)
Prior art keywords
shooter
shot
target
score
data
Prior art date
Application number
PCT/US2023/032669
Other languages
French (fr)
Inventor
Rick Hangartner
Francisco Martin
Poul Petersen
Ken Baldwin
Jim Shur
Sergio DE SIMONE
Beatriz Garcia
Oscar Rovira
Pablo Gonzalez
Alvaro CLEMENTE
Candido ZURIAGA
Original Assignee
AccuShoot, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AccuShoot, Inc. filed Critical AccuShoot, Inc.
Publication of WO2024059156A1 publication Critical patent/WO2024059156A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/003Repetitive work cycles; Sequence of movements
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41AFUNCTIONAL FEATURES OR DETAILS COMMON TO BOTH SMALLARMS AND ORDNANCE, e.g. CANNONS; MOUNTINGS FOR SMALLARMS OR ORDNANCE
    • F41A33/00Adaptations for training; Gun simulators
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41GWEAPON SIGHTS; AIMING
    • F41G3/00Aiming or laying means
    • F41G3/26Teaching or practice apparatus for gun-aiming or gun-laying
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • G09B9/003Simulators for teaching or training purposes for military purposes and tactics
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F41WEAPONS
    • F41GWEAPON SIGHTS; AIMING
    • F41G3/00Aiming or laying means
    • F41G3/26Teaching or practice apparatus for gun-aiming or gun-laying
    • F41G3/2616Teaching or practice apparatus for gun-aiming or gun-laying using a light emitting device
    • F41G3/2694Teaching or practice apparatus for gun-aiming or gun-laying using a light emitting device for simulating a target

Definitions

  • a system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions.
  • One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
  • One general aspect includes a method for improving shooting performance. The method also includes receiving video data of a shooter; determining one or more body landmarks of the shooter, tracking the one or more body landmarks during a shot to generate shot motion data, determining a score of the shot, associating the shot motion data with the score, and generating recommendations for altering the motion data on a subsequent shot.
  • Implementations may include one or more of the following features.
  • the method where determining one or more body landmarks of the shooter may include generating a wire frame model by connecting the body landmarks.
  • Associating the shot motion data with the score may include executing a classification and regression tree machine learning model to identify causal relationship between the shot motion data and the score.
  • the method may include determining, through image analysis of the video data of the shooter, a grip of the shooter.
  • the method may include analyzing the grip of the shooter and providing, on a display screen, grip recommendations to alter the grip.
  • Determining the score of the shot may include: receiving target video data; performing image analysis on the received target video data; determining a hit on the target; and determining a score of the hit.
  • Receiving the video data includes capturing video data by a mobile phone.
  • Determining one or more body landmarks includes determining 17 body landmarks. Tracking the one or more body landmarks includes generating a bounding box around each of the one or more body landmarks.
  • One general aspect includes a method for improving a causal consequence of body movement.
  • the method also includes receiving video data of a body motion; determining one or more body landmarks viewable in the video data of the body motion; tracking the one or more body landmarks during an action; generating, based at least in part on the tracking the one or more body landmarks, motion data; determining a score associated with the motion data; associating the motion data with the score; and generating recommendations for altering the motion data on a subsequent action.
  • Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
  • Implementations may include one or more of the following features.
  • the method where receiving the video data includes capturing the video data by a mobile computing device.
  • the method may include executing a machine learning model to correlate the motion data with the score.
  • the machine learning model is configured to determine the motion data that results in a reduced score.
  • the method may include predicting, by the machine learning model, a predicted score based on the motion data.
  • FIG.1A illustrates an image of a shooter captured by an image capture device, in accordance with some embodiments.
  • FIG.1B illustrates a wire frame model of the shooter from FIG.1A, in accordance with some embodiments.
  • FIG.2 illustrates motion data associated with a plurality of body landmarks, in accordance with some embodiments.
  • FIG.3 illustrates motion data associated with a plurality of body landmarks, and automatic shot detection based on the motion data, in accordance with some embodiments.
  • FIG.4 illustrates analysis of motion data associated with a plurality of body landmarks and identifying shooting errors based on the motion data, in accordance with some embodiments.
  • FIG.5 illustrates analysis of motion data associated with a plurality of body landmarks and identifying shooting errors based on the motion data, in accordance with some embodiments.
  • FIG.6 illustrates analysis of motion data associated with a plurality of body landmarks, in accordance with some embodiments.
  • FIG.7 illustrates analysis of motion data associated with a plurality of body landmarks and identifying a change in pose that least to improvements in performance, in accordance with some embodiments.
  • FIG.8A illustrates an image of a shooter taken with an imaging device, in accordance with some embodiments.
  • FIG.8B illustrates a computer-generated wire frame model of the shooter of FIG. 8A with identified body landmarks, in accordance with some embodiments.
  • FIG.9 illustrates motion data associated with a plurality of body landmarks and identifying body motion leading to reduced performance, in accordance with some embodiments.
  • FIGs.10A and 10B illustrate diagnostic targets that identify shooter behaviors based on shot patterns, in accordance with some embodiments.
  • FIG.11 illustrates an image of a shooter’s hand and a computer-generated wire frame model associated with body landmarks that identify and analyze a shooter’s grip, in accordance with some embodiments.
  • FIG.12A is an illustration of a computer software program user interface that captures video images, tracks motion of body landmarks, and analyzes a shooter’s pose and motion data, in accordance with some embodiments.
  • FIG.12B is an illustration of a computer software program user interface that analyzes a shooter’s performance and automatically scores the performance, in accordance with some embodiments.
  • FIG313A and 13B illustrate a system for marksmanship digitizing and analyzing, in accordance with some embodiments.
  • FIG.13C illustrates a system for marksmanship digitizing and analyzing, in accordance with some embodiments.
  • FIG.14 illustrates a system for marksmanship digitizing and analyzing, in accordance with some embodiments.
  • FIG.15 illustrates the logic of a machine learning algorithm to quantify shot samples, in accordance with some embodiments.
  • FIG.16 illustrates a sample decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments.
  • FIG.17 illustrates an annotated decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments.
  • FIG.18 illustrates a pruned decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments.
  • FIG.19A illustrates a system for capturing video data of a shooter and a target, in accordance with some embodiments.
  • FIGs.19B and 19C illustrate image data of a left-hand view and a right-hand view of a shooter captured by imaging devices, in accordance with some embodiments.
  • FIG.19D illustrates image data of an overhead view of a shooter captured by an imaging device, in accordance with some embodiments.
  • FIG.19E illustrates image data of a target captured by an imaging device, in accordance with some embodiments.
  • FIG.20 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance, in accordance with some embodiments.
  • FIG.21 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance showing shot placement, and automated scoring, in accordance with some embodiments.
  • FIG.22 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance allowing a selection of body landmarks and showing the motion data associated with the selected body landmarks, in accordance with some embodiments.
  • FIG.23 illustrates several of the key technologies enabled in embodiments described herein along with the features the key technologies facilitate.
  • FIG.24 illustrates a system that uses multiple source collaboration and machine learning to detect shots fired, analyze shooting position and provide recommendations for improvement, and analyze grip position and provide recommendations for improvement, a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance, in accordance with some embodiments.
  • FIG.25 is a process flow for capturing motion data, analyzing the motion data, and generating recommendations for improvement, in accordance with some embodiments.
  • FIG.26 is a process flow for correlating a firearm with an individual shooter, in accordance with some embodiments.
  • FIG.27 illustrates a sample system for determining an acoustical signature, in accordance with some embodiments.
  • FIG.28 illustrates a sample flow chart for pre-training a DNN, in accordance with some embodiments;
  • FIG.29 illustrates a sample flow chart for deriving spectrogram weighting, in accordance with some embodiments;
  • FIG.30 illustrates a sample flow chart for classifying and determining an acoustic signature.
  • FIG.31 illustrates a system configured for automatic scoring of a shooting target, in accordance with some embodiments.
  • FIG.32 illustrates a sample process flow for classifying and scoring a target, in accordance with some embodiments
  • FIG.33 illustrates a sample process flow for identifying and classifying a target, in accordance with some embodiments
  • FIG.34 illustrates a sample process flow for registering a target and determining scoring hits, in accordance with some embodiments
  • FIGs.35A, 35B, and 35C illustrate a method for initializing a target scoring system to identify a target, in accordance with some embodiments
  • FIG.36 illustrates a sample process flow for detecting impacts on a target, in accordance with some embodiments
  • FIG.37 illustrates a sample process flow for scoring impacts on a target, in accordance with some embodiments.
  • FIG.38 illustrates a sample user interface for automated target scoring in a software application, in accordance with some embodiments.
  • DETAILED DESCRIPTION [0055] Body Landmark and Pose Estimation [0056]
  • the system includes a machine vision and machine learning system that can track and estimate the pose of a participant, and determine positive and negative factors impacting the quality of the participation, such as pose, motion, anticipation, recoil, grip, stance, among other things.
  • a computer vision system that can track several body landmarks simultaneously, and in some cases, associate the motion of one or more body landmarks with marksmanship accuracy.
  • a system may identify and track any number of body landmarks, such as 3, or 5, or 11, or 17, or 21, or 25, or 30, or more body landmarks.
  • the system can associate motion of the landmarks with rounds sent down range and scoring of individual rounds.
  • the detection of rounds sent down range may be determined by motion one or more suitable markers, such as motion of the participants hand or wrist (or other body marker) in response to recoil, a sound of the firearm, a pressure wave associated with the muzzle blast, a target hit, or some other marker.
  • the system may further monitor the participant from one, two, three, or more perspectives and analyze the movement of each body landmark, and may further monitor shot accuracy and correlate the body landmark motion with accuracy. Based on the accuracy, the system may further provide an analysis of the body motion that causes less than perfect accuracy and may further suggest ways to ameliorate the body motion to improve accuracy.
  • the motion capture may be performed by one or more cameras aimed generally at the participant, and one or more cameras aimed at a target.
  • one or more of the cameras are associated with a mobile computing device, such as, for example, a smartphone, a tablet, a laptop, a digital personal assistant, and a wearable device (e.g., watch, glasses, body cam, smart hat, etc.).
  • a wearable device may include a sensor, such as an accelerometer, a vibration sensor, a motion sensor, or other sensor to provide motion data to the system.
  • the system tracks body marker position over time and generates motion plots.
  • one or more cameras may capture one or more views of a shooter 100.
  • the camera may capture video data of the shooter as the shooter draws, takes aim, fires a shot, reloads, and/or adjusts position.
  • a computer system may receive the video data, analyze the video data, and create a model associate with the shooter as in FIG 1B.
  • the computer system identifies body landmarks, and connects the body landmarks into a wire frame model 102 that tracks the pose and movements of the body landmarks of the shooter.
  • the body landmarks may include one or more of nose, left ear, left eye, left hip, left knee 103, right ear, right eye, right hip, left ankle, left elbow, left wrist, right knee 104, right ankle, right elbow 106, right wrist 108, left shoulder, and right shoulder 110.
  • a single camera may capture two-dimensional motion data associated with one or more of the body landmarks.
  • two or more cameras may be used to capture three-dimensional motion data of one or more of the body landmarks.
  • the body landmarks may be tracked over time, such as during a shooting string (e.g., a shooting session) and the motion of one or more of the body landmarks may be tracked during this time.
  • two-dimensional motion is tracked in x and y directions corresponding with side-to-side movement and vertical movement.
  • three-dimensional movement of the body landmarks is tracked in x, y, and z directions.
  • a graph of selected body landmarks 200 is illustrated.
  • the body landmarks can be user selectable by the user to focus on individual or combinations of body landmarks for review.
  • the topmost line 202 represents the right wrist of a right-handed shooter in a horizontal direction
  • the third line down 203 represents the right wrist vertical direction over time.
  • the movement of the wrist moves up and down during the course of the motion capture.
  • the system may correlate the motion of the one or more body landmarks with events or stages, during the shooting string.
  • the system may correlate this position and movement with getting ready to begin.
  • the second stage 206 which shows the right wrist moving upwardly over a very short interval may be correlated with drawing a pistol from a holster.
  • the third stage 208 shows the right wrist remaining largely stable in the vertical plane; however, there are sharp peaks to the movement, 207a-207c which may be correlated with shots fired from the pistol.
  • the fourth stage 210, the right wrist moves downwardly and the returns to the firing position.
  • the system may correlate this motion with a reloading operation.
  • the system is trained on training data, which may be supervised learning, to correlate similar motion with the various stages.
  • the shooter’s right wrist initially moves upwardly as the shooter takes aim, and then settles down onto the target, which is followed by peaks in the movement 213a-213c, which may be correlated with shots being fired down range.
  • the right wrist moves downwardly again to an initial position, which may be correlated with holstering the pistol.
  • the example focused on the shooter’s right wrist in a vertical direction it should be apparent that any of the body landmarks can be viewed, analyzed, and the motion or combinations of motion can be correlated with events or actions by the shooter, including groups of body landmarks.
  • a closeup view of the shooting stage 300 is depicted illustrating the sharp peaks in the right wrist movement 302 and right elbow movement 304 in the vertical direction.
  • the system can analyze the motion data and automatically determine when a shot has been fired.
  • the system can be configured to correlate the sharp peaks in vertical motion of the wrist and/or elbow with the shots fired. As shown in FIG.3, each of the arrows 306 may coincide with a shot being fired.
  • audio data may be correlated with the motion data to provide additional cues as to when a shot is fired. In some cases, the audio data may be combined and/or synched with the motion data to provide additional details about a shot being fired.
  • the motion data may be used to infer additional information regarding the shooter, his habits, his posture, and other cues that may be brought to the attention of the shooter in an effort to improve the shooter’s accuracy.
  • motion data 400 is displayed for the right wrist 402 and right elbow 404 of a shooter.
  • the system may apply a trend curve 406 to the motion data that may represent normalized motion data.
  • the system may make inferences and/or determinations based on the motion data 400. For example, as shown in FIG.4, once a shooter places the firearm on target and attempts to hold the firearm steady, at 408, the shooter will lower the wrist and elbow, at 410, immediately followed by a shot fired 412.
  • the system may recognize this pattern and determine that the motion of lowering the wrist and/or elbow immediately before a shot is evidence of the shooter trying to anticipate the recoil of the firearm and attempting to brace against it. In many cases, anticipating the recoil dramatically reduces the accuracy of the shot since the shooter is moving the firearm off target in anticipation of the recoil that happens as the shot is fired. Similar examples of motion data that may reduce the shooter’s accuracy include flinch, pre-ignition push, trigger jerk, closing of the eyes, among others. [0069] In some cases, the system may provide information to the shooter regarding the recoil anticipation and provide information, which may include one or more drills or practice sessions, in an effort to improve the shooter’s motion data relating to recoil anticipation.
  • the system may identify a practice regimen that may include dry firing, skip loading (e.g., mixing live rounds in a magazine of dummy rounds), or other skill building drills.
  • FIG.4 also shows that the shooter is experiencing drift in the posture. For instance, before the first shot 412, the shooters right wrist and right arm are position at a first vertical height 414, and before the second shot 416, the shooters right wrist and right elbow are positioned at a second vertical height 418 higher than the first vertical height. This data shows that the shooter did not return to the same position from the first shot to the second shot, and as a consequence, the shooter’s sight picture will be slightly different which may reduce the accuracy of sequential shots.
  • motion data 500 Motion data associated with a right-handed shooter’s right wrist 502 and right elbow 504 shows not only recoil anticipation where the motion data shows a lowering of the body parts right before a shot, but also shows a trend line 506 that evidences that the shooter’s wrist and elbow continually drift upwardly during a shooting string. The failure of the shooter to return to the same position in between shots can dramatically reduce the accuracy and precision of the shots fired within the string.
  • the system will recognize the drift in one or more of the body landmarks and may provide this information to the shooter. In some cases, the system will provide information on a display screen associated with a mobile computing device.
  • the system may be implemented on a mobile computing device associated with a shooter, and a display screen on the mobile computing device may provide information, instructions, or practice drills to the shooter to improve the drift and the accuracy issues resulting therefrom.
  • the system may correlate body landmark motion with other correctable defects in the shooter’s position or posture. For instance, with references to FIGs.6 and 7, which show motion data 600 of body landmarks, motion data 600 shows the motion of a plurality of body landmarks during a shooting string. The lowermost line graph depicts right ankle motion data 602 that evidences that the shooter changed a position of the right foot. Changing positions during a shooting string is likely to affect the sight picture, accuracy, precision, and other metrics associated with shooting.
  • the system may determine that the change in foot position was either positive or negative in terms of shot scoring and may provide recommendations to the shooter based on this change in posture.
  • the system may view the shooting accuracy and/or precision of the shots fired 604a – 604e both before and after the relocation of the foot and determine whether moving the foot had a positive or negative impact on the shooting performance.
  • the system will correlate a scoring of the target (e.g., shot accuracy and/or precision) with body landmarks and motion and can indicate to the shooter which positions of individual body landmarks affected their shooting performance, either for the better or for the worse.
  • the system can be configured, through machine learning, to associate certain poses, motions, and combinations with shooting performance.
  • the body landmark motion and combinations of motions may be associated with improved shooting performance while others may be associated with decreased shooting performance.
  • the system may normalize the motion data to generate normalized coordinates of the position of each body part during all the session.
  • the score and its moving average may be represented by a signal, such as by displaying it in a user interface.
  • the motion data and/or the score may be stored in a data file that can be analyzed in either near-real time, or saved for later analysis.
  • the motion data may be associated with shot pattern data, such as the x and y coordinate of each shot and the shot coordinates may be associated in time with the motion data occurring at the time the shot was fired. Additionally, a score may be assigned to the shot and saved with the shot data.
  • One or more machine learning approaches may be applied to the motion data and shot data to generate correlations between the motion and shot accuracy.
  • convolutional deep neural networks CNN
  • Other deep-learning models that are oriented toward classification may also be used to correlate the motion data and shot data to identify patterns that lead to an increase in accuracy or a decrease in accuracy. Transformations that correlate arbitrary collections of attributes with other arbitrary collections of attributes might also be used.
  • FIG 8A illustrates a sample camera angle of a shooter 800 and FIG 8B illustrates a resulting wire frame model 802 of the shooter that allows the system to track motion of the shooter’s body including the selected body landmarks.
  • the wire frame model may include representations of each major joint and body artifact which may include one or more of a shooter’s nose or chin 804, right shoulder 806, right elbow 808, right wrist 810, hips 812, left femur 814, left knee 816, left lower leg 818, left ankle 820, right femur 822, right knee 824, right lower leg 826, and right ankle 828, among others.
  • the system may be trained on motion data from a variety of shooters and historical performance of those shooters correlated with the motion data.
  • FIG.9 illustrates x-axis motion data 900 associated with a shooters head 902 and nose 904. As can be seen, during a shooting string, after each shot 906a – 906d, the shooter’s head moved back, which may be correlated with a shooting performance.
  • the system may determine that the shooter’s head moved after each shot, such as to look at the target, and may additional have drifted and not returned to the exact same spot during the shooting string which caused the shooter to perform below a threshold value.
  • the system may be configured with logic that determines a likely cause and effect based upon either the shooter’s motions and/or the scoring of the target.
  • a grouping of shots lands in the 1:30 position 1004, this may be indicative of the grip error known as heeling in which the heel of the hand pushes the butt of the pistol to the left in anticipation of the shot which forces the muzzle to the right.
  • a grouping of shots lands in the 3:00 position 1006, which is indicative of the thumb of the shooting hand applying too much pressure and pushes the side of the pistol to the right which forces the muzzle to the right.
  • this may be indicative of a shooter tightening the grip while shooting (e.g., lobstering).
  • a group of shots lands in the 9:00 position 1016, this may indicate too little finger on the trigger. This typically causes the shooter to squeeze the trigger at an angle during the final rearward movement of the trigger which has a tendency to push the muzzle to the left.
  • a group of shots lands in the 10:30 position 1018, this may indicate that the shooter is pushing in anticipation of the recoil with insufficient follow through.
  • the system may be programmed to view target hits during a shooting string, and in combination with the body landmark motion, determine whether the shooter is guilty of recoil anticipation, trigger control errors, and/or grip errors. The system can make this determination for individual shooters and can recommend practice exercises and practice strings for addressing the specific shooting issues.
  • the system can access data on previous engagements (DOPE) associated with a shooter, which may include previous shooting sessions, records, scores, and analysis.
  • DOPE previous engagements
  • complete shooting sessions can be recorded and stored so that users can review them along with determination from the machine learning system that identifies and/or highlights deficiencies, mistakes, and changes in the shooters posture or technique that improve or reduce the shooter’s performance.
  • Synergies come from the synchronization of all the available sources of information as described herein.
  • the basic conventional ML model may map the target shot pattern ⁇ ⁇ ⁇ into a shooter behavior ⁇ ⁇ ⁇ , ⁇ ⁇ ⁇ B.
  • the behavior may be a four-dimensional (4D) phenomenon that includes three-dimensional (3D) motion data plus time.
  • the shot pattern ⁇ is a two-dimensional (2D) projection of an evolving 3-D phenomenon (2D plus time).
  • This model may be founded on causal hypotheses.
  • An expert may look at a much larger collection of complex shooter behavior ⁇ * ⁇ ⁇ and map that into a larger collection of shot patterns ⁇ * , fXP : B * ⁇ ⁇ .
  • the behavior ⁇ * may be a 4D phenomenon and the shot pattern ⁇ ⁇ a 2D phenomenon.
  • a relatively simple iterative model for expert leaning may include predicting the shot pattern from observed shooter behavior: fXP: B* ⁇ ⁇ .
  • the model may then assess the difference between the predicted shot pattern and the actual shot pattern: ⁇ ( ⁇ ', ⁇ ).
  • the model may then adapt the causally informed prediction model fXP based on the difference ⁇ ( ⁇ ', ⁇ ). This may be repeated for each n-shot drill.
  • This type of learning model treats the shooter behavior ⁇ ⁇ ⁇ * as a 3D phenomenon and the shot pattern ⁇ ⁇ ⁇ * as a 2D phenomenon. In some cases, the model is enhanced that treat B * as a 4D phenomenon, ⁇ * as a 3D phenomenon, or both.
  • a pose analysis method represents the 4D shooter behavior B * by a 3D projection of B ** (2D plus time).
  • the model can objectively identify good shot patterns (e.g., tightly grouped around the bullseye) from the rest of the “not good” shot patterns, by executing a decision function ⁇ ⁇ : ⁇ ⁇ ⁇ , where D is a binary variable.
  • the model may also be configured to relate shooter behavior E * to how (why) shot patterns ⁇ are “not good”.
  • the sampled time signals for shooter behavior may be reduced for a single drill as a single line in a dataset.
  • a bounding box may be derived around the set of X-Y coordinates of each of the body landmarks in a drill.
  • the bounding boxes may be denoted as the shooter behaviors B.
  • the shooter behaviors B and shot patterns ⁇ are represented in polar coordinates as a radius and angle.
  • the shooter behaviors B (which in some examples is 17) are 2D having an (x,y) coordinate, or an (r, ⁇ ) representation.
  • the shot pattern ⁇ ⁇ ⁇ is a causal consequence of shooter behavior ⁇ ⁇ ⁇
  • the method can treat the shot patterns ⁇ as proxies for the unobservable third dimension of the shooter behaviors B.
  • the method may treat the shooter behaviors ⁇ ⁇ ⁇ and shot patterns ⁇ ⁇ ⁇ as features and scores ⁇ ⁇ ⁇ as the target for an ML model ⁇ ML: B x ⁇ ⁇ S.
  • the model can be analyzed for patterns ( ⁇ , ⁇ , s) ⁇ ⁇ L, ⁇ ⁇ resulting in low and high scores. Implicit or explicit clustering techniques might be used for this.
  • models ⁇ ML: B x ⁇ ⁇ S can be built for multiple drills for a specific shooter and for drills for multiple shooters.
  • the model quality can be assessed by comparing the actual score ⁇ ⁇ ⁇ and predicted score ⁇ ' ⁇ ⁇ . Some models could be sequentially updated from these comparisons.
  • low scoring drills ( ⁇ , ⁇ , ⁇ ) can be assessed against ⁇ to determine which of the 17 components of the observable shooter behavior ⁇ and the shot pattern ⁇ as the proxy for unobservable shooter behavior are likely causes for the low score. This may be easiest for explainable models.
  • Unobservable behaviors may be described by the usual labels in diagnostic targets like those above.
  • a shooter diagnostic application can also differentiate low scores due to sighting misalignment, perhaps easiest to understand as shots tightly grouped around a centroid other than the target bullseye.
  • the system determines that a reduced score has been obtained and generates recommendations for improving the score.
  • the term “reduced score” is used to mean a score that is less than a target score.
  • the target score may be a perfect score, a participant’s highest historical score, a participant’s average score, or some other metric.
  • a reduced score is any score lower than a “10.”
  • a score may be assigned that is less than a desired score, such as, for example, a penalty kick in soccer may be a binary outcome with a miss being a reduced score as compared to a goal being scored.
  • a golfer may, on average, hit their drive 275 yards.
  • a reduced score may result from a drive carrying 250 yards, which is below the golfer’s average drive distance, and the system may observe behavior and determine which behavior(s) resulted in the reduced score.
  • the system relies on artificial intelligence (AI) and/or machine learning (ML) for two determinations: detecting shooter movements and shot placement and analyzing shooter behaviors for how they result in shot placement. Having described detecting movements, there are multiple ways of analyzing how shooter behaviors result in shot placement. Firstly, an expert-based approach utilizes comparing an individual shooter’s behavior for each drill to an expert’s assessment of what should result in a good shot. Secondly, a data-based approach is conducted by building models from repetitions of the shooting behavior (“drill”) by a single shooter or multiple shooters. This may be considered an AI/ML-based discovery strategy, that determines what behaviors are correlated with good shots.
  • AI artificial intelligence
  • ML machine learning
  • AI/ML may be used to automate and enhance expert-based analysis in the sense that if prototypical "ideal" behaviors are known a priori, models for these a priori known behaviors could be fitted to the data for a single shooter or multiple shooters.
  • a transformer ML model may be mimic automated and enhanced expert-based analyses by combining a pre-trained data-based model for inferred relationships between language fragments (analogous to inferring relationships between shooter behaviors and shot placement) with additional stages of data-based adaption.
  • the system determines key points for a shooter’s hands 1100.
  • Key points may coincide with each moveable joint of the wrist, hand, and fingers and their respective locations and position relative to each other.
  • the system may connect the key points into a wire frame model 1102 of the shooter’s hand, which allows precise monitoring of pose and motion.
  • FIG.11 illustrates a plurality of key points associated with a shooter’s hand which may be used to determine the grip that the shooter is using. For example, by referring to the key points of a shooter’s hand, the system may determine that the shooter is using a thumbs forward, thumb over, cup and saucer, wrist grip, trigger guard gamer, or another style grip. Different grips and pressure applied by the hands can impart motion to the pistol and the system may determine that a different, or modified, grip would result in better performance.
  • the system receives signals associated with the position and motion of the shooter and processes the signals to find mistakes in shooter stance and grip. In addition, by combining single frame analysis (e.g., finding insights from the relative positions between different body parts at a specific moment in time, which an analysis across time), it is possible to identify mistakes due to changes in the shooter’s position. [0108] In addition, the system can focus on the position of the hands on the pistol. A hand tracking machine learning model can be used to track the position of each bone in each finger in a video frame. The system will receive signals associated with the position of each finger across time.
  • an application program may be executed on a mobile computing device and receive image data from an imaging sensors associated with the mobile computing device.
  • the methods and processes can be programmed into an application program, or a set of instructions, that can be executed on a computing device.
  • the application program can be executed on a mobile computing device and the audio/video capture, analysis, scoring, recommendations, training exercises, and other feedback can be performed with the mobile computing device.
  • the system receives video and/or audio from multiple video capture devices capturing different views of the same shooter. The system may use the multiple, different views of the shooter in the analysis and feedback to the shooter for improving performance.
  • the system running as an application on a mobile computing device 1200, captures video frames of a shooter 1202 , establishes body landmarks to track over time, creates a wire frame model of the shooter 1203, tracks shots fired 1204 and provides feedback to the shooter 1202.
  • the system may identify the stance of the shooter, “Weaver” stance in the illustrated example, tracks the number of shots 1204, and provides feedback 1206 on each shot. For instance, a shot taken at 7 seconds after beginning the string shows a hit at center of mass.
  • the system identifies a reloading operation 1208 at 8 seconds that lasts 1.2 seconds, followed by a shot at 11 seconds showing that the shooter anticipated the recoil, and the shot was off center.
  • this functionality of the system may be viewed as a drill instructor that tracks the shots, provides feedback, and offers suggestions for improving posture, grip, stance, trigger pull, and the like, to improve shooter performance.
  • FIG.12B illustrates an additional screen 1210 that may be displayed by the system in a Spotter modality in which the mobile computing device may have its video capture device pointed at the target.
  • the mobile computing device may use an internal or external lens to get a better view of the target and the mobile computing device may be coupled to a spotting scope or other type of optical or electronic telephoto zoom lens in order to get a better image of the target.
  • the system may be toggled between the different modes, such as Spotter, Drill Instructor, DOPE, Locker (in which information about the different firearms owned by the shooter may be stored), and program settings.
  • the Spotter mode may show an additional screen 1210 with a view of the target upon which target hits may be visible.
  • the system may also display other information, such as the firearm 1212a and ammunition 1212b combination being fired, the distance to the target 1214, the type of target 1216, the time of the shooting string 1218, the number of shots fired, the number of target hits 1220, and the score 1222, among other things. It may also show an image of the target 1224 and may further highlight the hits 1226 on the image of the target 1224. It should be appreciated that the Spotter mode may show the information, including the time elapsed, hits, score, and number of shots in real time. In addition, the system may also store the data associated with a shooting string for later playback, review, and analysis.
  • a shooting range 1301 may be outfitted with one or more sensors, such as image sensors, projectors, and computing devices.
  • FIG 13A illustrates a shooting range 1301 looking down range from the perspective of the shooter 1302.
  • FIG 13B illustrates a shooting range 1301 from a side view
  • FIG 13C illustrates a shooting range 1301 from a top plan view.
  • One or more cameras 1304 may be mounted within the shooting range to capture video of the shooter which may be from multiple angles.
  • the cameras 1304 may be any suitable type of video capture device, including without limitation, CCD cameras, thermal cameras, dual thermal cameras 1305 (e.g., a capture device having both thermal and optical capture capabilities), among others.
  • the cameras 1304 may be mounted at any suitable location within the range, such as, for example, to the sides of the shooter 1304a, 1304b, facing the front of the shooter 1304c, overhead 1304d, among others.
  • the cameras 1304 may be mounted to a gantry system that provides a portable structure for supporting the one or cameras 1304 to capture multiple angles of the shooter.
  • a projector 1306 may be provided to project an image onto a screen 1308.
  • the projector may project any target image onto the screen or onto a shooting wall, and a shooter can practice dry firing (e.g., without firing live ammunition) and the system can register hits on the projected image.
  • a shooter can practice dry firing (e.g., without firing live ammunition) and the system can register hits on the projected image.
  • an image may show a target on the screen and a shooter may dry fire at the target.
  • the system can register the point of aim at the time of trigger pull and display the hit onto the projected target image.
  • the thermal imaging camera 1304 may detect the laser hit location and the system may be configured to display the hit on the projected target and the system may further register and score the hit. In this way, shooters can practice using a shooting simulator using their own firearm without the need to travel to a dedicated shooting range.
  • the system may recommend dry firing practice to ameliorate bad shooting habits.
  • FIG.14 illustrates the system configured with multiple source synchronization.
  • a shooter 1402 is captured with multiple video capture devices 1404 from different angles.
  • the system obtains features from the shooter 1406, such as stance, pose, and motion.
  • the system may analyze the captured audio signal 1408 to complement video shot detection.
  • the system may analyze both video and audio to detect a shot fired by the shooter, such as by correlating a spike in an audio waveform with a sudden muzzle raise of the firearm.
  • the system may be configured to detect shooting stages 1410, such as a first shooting string, reloading, second shooting string, including shots fired detection.
  • the system may additional identify and determine shooting errors 1412 and identify the errors as well as drills and exercises to address the shooting errors.
  • the error detection may be iterated and further error corrections 1414 may be proposed.
  • the system may also incorporate shot detection 1416, as described herein, including correlation of shot detection with audio data.
  • the system may further capture images 1418, such as video images, of the target and register hits on the target, which may be correlated with motion data before and during the trigger pull. As described herein, the system may determine the shot location 1420, and ultimately determine a score 1422 for one or more of the shots fired. Some embodiments of the system thus provide an intelligent tutoring/adaptive training solution that turns the time- intensive and non-scalable live training environment into an automated and adaptive virtual scenario-based training solution.
  • the model is based on instructions that are executed by one or more processors that cause the processors to perform various acts. For example, the method may include collecting (as many as possible) pose analysis files, which may be stored as a comma separated value (CSV) file.
  • CSV comma separated value
  • FIG.15 illustrates a shot bounding box 1500 being determined by the system.
  • a "drillparser.py” script may be configured to combine the pose analysis files. The system may then find the shots in the CSV files, such as for each of the 17 pose landmarks and the shots derive an oriented bounded box 1500 around the x-y coordinates for the 1.0 second time window up-to and including the shot, then used the angle ⁇ 1502 and dimensions l 1504 and w 1506 of the bounding box 1500 as factors in a decision tree models for the score.
  • the "allshots" approach processes data on a CSV file basis, so a bounding box 1500 is drawn around all the shots, as one row in the resulting dataset.
  • the "oneshot” and “oneshotd” approach may handle each shot individually, so the shot bounding box 1500 may be an infinitesimal box around single shot so each row in the dataset is one shot.
  • the “oneshotd” approach additionally orients the direction of bounding boxes around the 17 pose landmarks in time order while the "oneshot” approach ignores time. [0122]
  • the system thus is able to derive an oriented bounded box around each shot which may be oriented the direction of the box around the body landmarks.
  • These datasets are handled in BigML as: [0124] 1. Uploaded to BigML as Source objects [0125] 2. Source objects are converted to Dataset objects [0126] 3. Dataset objects are split into Training (80%)/Test (20%) datasets. [0127] 4.
  • Tree Models are built using selected factors from the Training datasets [0128] 5. BatchPredictions are created using the appropriate Model and Test Datasets [0129] 6. Models are download as JSON PML objects for the next steps [0130] 4)
  • the “modelannotator.py” script annotates the model PML files as needed for the explanation operation. In some cases, this annotation consists solely of adding a “targets” attribute to each node that is a list of all of the target (score) values reachable in the tree from that node. [0131] 5)
  • the “itemexplainer.py” script uses a (virtual) pruned annotated model to “explain” which pose elements for a new drill result in prediction of an unsatisfactory score.
  • Factors derived from shot samples may include one or more of the following factors: shot_phi with a bounding box rotation (- ⁇ ⁇ ⁇ ⁇ ⁇ ); shot_l – bounding box length; shot_w – bounding box width; shot_lw – bounding box ara; shot theta – bounding box center rotation (- ⁇ ⁇ ⁇ ⁇ ⁇ ); shot_r – bounding box center radius.
  • the bounding box 1500 is drawn around the shot samples.
  • each body landmark can likewise be used to create a bounding box with length, width, area, and rotation and correlated to the shot factors.
  • the system is configured to use one or more cameras for machine vision of a shooter, determine one or more body landmarks (in some cases up to 17 or more body landmarks) and track the movement of each of these landmarks over time during a shooting drill.
  • the system can correlate the time-bound body landmark movement with a scored shot and determine if the shot is good or bad.
  • a good shot may be relative to the DOPE for the shooter, firearm, and ammunition combination. For example, where a single shot is closer to the aiming point than an average shot for the shooter, this may be categorized as a good shot.
  • the system can correlate shooter behaviors with good or bad shots. Furthermore, by analyzing the shooter behaviors (e.g., body landmark motion), the system can predict whether a shot is a good shot or a bad shot without even seeing the target results. For example, a good shot may be considered a shot within the 9 or 10 ring of a target, while a bad shot may be any shot outside the 9 ring. Depending on the expertise of the marksman, the definition of good shot and bad shot may be varied. As an example, for a very skilled shooter, anything outside the 10 ring may be considered a bad shot.
  • FIG.16 illustrates some embodiments of a decision tree model 1600 for correlating body landmark factors with shot samples.
  • a decision tree algorithm is a machine learning algorithm that uses a decision tree to make predictions. It follows a tree-like model of decisions and their possible consequences. In some cases, the algorithm works by recursively splitting the data into subsets based on the most significant feature at each node of the tree.
  • various features are extracted to represent the kinematic properties of body movements.
  • the decision tree depicted in FIG 16, serves as the core of the model in some embodiments. It is selected for its ability to elucidate complex, non-linear relationships within the data while maintaining interpretability. The tree's depth is optimized to minimize overfitting through techniques like cross-validation or a predetermined maximum depth. The choice of splitting criteria, whether based on Gini impurity or information gain, depends on the specific problem.
  • Pruning methods such as enforcing a minimum number of samples per leaf, are applied to control excessive branching.
  • Training of the model may involve the recursive construction of the decision tree.
  • the tree has nodes that correlate with the body landmarks. For example, a left_ankle_phi node 1602 may branch into a right_wrist_lw node 1604 and a right_knee_phi node 1606 and these branches can show how the bounding box rotation of the right ankle motion data affects the resulting shot placement in combination with the bounding box area of the right wrist or a rotation of the bounding box associated with the right knee.
  • the decision tree can correlate a shot placement with a combination of features.
  • the training data is partitioned based on feature values, optimizing the chosen loss function, such as mean squared error or cross-entropy. Hyperparameters can be fine-tuned via cross-validation, and the model's performance is assessed using various metrics.
  • the decision tree model continuously updates as new body landmark data becomes available for each shot fired. For each input feature vector derived from the live landmark data, the model traverses the decision tree, reaching a leaf node. The label assigned to the leaf node (indicating a positive or negative outcome) is utilized as the model's prediction.
  • FIG.17 illustrates an annotated model 1700 showing various values at several of the nodes.
  • the area of the bounding box for the right wrist 1702 indicates values of 0.47, 0.51, 0.57, 0.59, and 0.61. This indicates that during the shooting string in which shots were being fired, the shooter moved her right wrist within an area defined by the displayed values.
  • the system identifies that the left knee 1704 moved in a certain pattern, which can be correlated with either good shots or bad shots to look for a causality between the movement of the right wrist in combination with movement of the left knee.
  • Other features can be similarly annotated within the model.
  • FIG.18 illustrates a pruned model 1800 which allows a deeper dive into various features and their interrelation with one another.
  • the feature of the right_knee_phi 1802 e.g., the rotation of the bounding box associated with movement of the right knee during a shooting string
  • the right_knee_phi 1802 is 0.8 or 0.82 and the right_shoulder_l bounding box length 1804 is 0.09
  • the second result node 1808 is associate with a bad shooting performance while the third node 1810 is associated with a good shooting performance.
  • These non-linear causal relationships can be determined by the machine learning model, such that the system, by executing one or more machine learning algorithms, can determine which motions lead to better shooting results on poorer shooting results.
  • isolation forests encode a dataset of trees such that leaf instances for each leaf are identical. Node splits may be chosen randomly rather than to reduce impurity of the child target values.
  • nodes are leaves when the training instances at the node have the same target value.
  • rare instances in the training dataset reaches a leaf earlier than less rare instances.
  • new anomalous instances have short paths relative to tree depth for all trees in forest, which makes identifying anomalies more efficient.
  • a classification and regression tree (CART) is a predictive model which explains how an outcome variable can be predicted based on other values.
  • a CART- style decision tree is one where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable.
  • CART-style decision trees may be useful in predicting an outcome of the shot based upon the motion data of one or more body landmarks.
  • CART- style decision trees are multi-category classifiers (e.g., N>1), while isolation trees are single category classifiers (e.g., “anomalous or not”).
  • FIG.19A illustrates a gantry 1900 system that may provide a portable mounting structure for accommodating one or more imaging devices, including one or more video cameras.
  • the structure includes one or more upright posts 1902 and one or more cross bars 1904.
  • the gantry structure 1900 may be placed around the shooter, and in some cases, down-range of the shooter.
  • the gantry 1900 may position the cross bar 1904 at a location that is about one foot ( ⁇ 3m) to 8 feet ( ⁇ 2.4m) above the shooter and between one foot ( ⁇ 3m) and 15 feet ( ⁇ 4.5m) in front of the shooter.
  • cameras are positioned on each upright and on the cross bar. Therefore, in some embodiments, two, three, or more cameras are positioned on the gantry with some of the cameras aimed at the shooter and one or more cameras may additionally be aimed down range at the target.
  • FIGS 19B, 19C, and 19D illustrate various views captured by the cameras mounted to the gantry 1900.
  • a first camera 1908 may be mounted to the cross bar 1904 or the upright 1902 and is positioned to capture a left-side view (FIG.19B) of the shooter.
  • the camera may be configured to capture the entire shooter’s body, or may be configured to capture the shooter’s upper body and head.
  • a second camera 1910 may be mounted on the cross bar 1904 or the upright 1902 and configured to capture a right-side view (FIG.19C) of the shooter.
  • the camera may be configured to capture the entire shooter’s body, or may be configured to capture the shooter’s upper body and head.
  • the first camera 1908 and the second camera 1910 utilize different fields of view, such that one of the camera captures the entire body of the shooter, while the other one of the cameras only captures a portion of the shooter’s body.
  • a third camera 1912 may be positioned on the cross bar 1904 and configured to capture an overhead view 1914 (FIG.19D) of the shooter. By positioning camera to capture the shooter’s motion from various angles, the system can correlate the video data from each camera and determine 3-dimensinoal motion data of the selected body landmarks.
  • a fourth camera may be located to capture video data of the target 1916 (FIG. 19E) and the target impacts 1918.
  • FIG.20 illustrates a computer program user interface 2000 that can be used with the system and methods described herein.
  • a computer program may be configured to receive the video data from the one or more cameras, synchronize the video data, determine when shots are fired, and register and score hits or misses on the target.
  • the user interface 2000 may include a start recording button 2002 that allows the shooter to start the video capture. In some cases, a shooting string may be times, and the start recoding button may additionally start a timer.
  • the user interface may further include a timer 2004 associated with the session.
  • the user interface may be presented on a mobile computing device associated with the user, or on a mobile computing device associated with a facility.
  • a gantry system may be set up at a facility and a computing device associated with the facility may be connected to the gantry system and configured to receive the video data and provide the performance feedback of the embodiments described herein.
  • the user interface may be provided on any suitable display, such as a television, a touch-screen display, a tablet screen, a smart phone screen, or any other visual computer interface.
  • the user interface 2000 may display indicia associated with a shooting string, such as, for example, a video of the shooting string and motion during the shooting string 2102.
  • the video may be displayed in a playback window and offers controls 2104 for playing, pausing, adjusting volume, and scrubbing through the video.
  • controls 2104 for playing, pausing, adjusting volume, and scrubbing through the video.
  • there may be controls for selecting different views 2106 that is, controls allow the viewer to select video clips captured by different cameras during the shooting string, and may allow the view to watch different views of the shooting string, either individually or in a combined view, such as a side by side view.
  • the views may be synchronized so the viewer can see different views of the same event at the same time.
  • the user interface 2000 may additionally show the target 2110 and may identify hits 2112 that the system registered on the target.
  • the user interface 2000 may additionally display a score 2114 of the most recent shot along with an average score 2116 for the string.
  • FIG 22 illustrates additional view of the user interface 2000 in which a user can specify a body landmark selection 2202.
  • the user interface 2000 provides a selection for signal setting 2204, which allows a user to specify details of Y coordinate motion or X coordinate motion.
  • the user interface 2000 in response to the user selection, may display the motion data 2206 associated with the selection. This type of review and analysis allows a shooter to very specifically view the motion of individual body landmarks during a shooting string, and in addition, can specifically view horizontal movement, vertical movement, or both for review.
  • FIG.23 illustrates some of the features and technologies employed by embodiments of the described systems and methods.
  • the disclosed system utilizes machine vision (e.g., computer vision) 2302 to track body landmarks of a participant (e.g., shooter), and also tracks changes to a target to provide automated target scoring 2304.
  • Systems and methods may also utilize audio processing 2306 to enable shot time detection 2308, which may also be combined with computer vision techniques.
  • the disclosed systems and methods may also utilize signal processing 2310 to provide for shooter’s position correction 2312, including pose, posture, motion, grip, trigger pull, and others.
  • FIG.24 illustrates a method 2400 according to embodiments described herein.
  • the system may receive video data and optionally audio data.
  • the system may be configured to detect shots 2402 and score through image processing on video data of a target.
  • the system may process video data to determine shots fired and pose analysis of a shooter 2404.
  • the system may analyze audio data for shot detection 2406.
  • the shot detection 2402 provides coordinates (e.g., x,y coordinates) of shots within the target 2408.
  • the pose analysis of the shooter 2404 provides coordinates (e.g., x,y coordinates) of body landmarks during the shooting process 2410.
  • the audio shot detection 2406 provides an exact time of a shot 2412.
  • the shooting analysis may include one or more machine learning algorithms that receive the shot and body data, and through machine learning algorithms, detects and predicts cause and effect for the shot performance of the shooter with the firearm and ammunition combination, which may be referred to as conducting a shoot analysis 2414.
  • embodiments of the system may generate one or more of a session score 2416, shooting position recommendations 2418, shooting mistakes, and grip analysis 2420, among others.
  • FIG.25 illustrates a sample process 2500 flow for using machine vision and machine learning to track body landmarks of a participant and generate recommendations for improvement.
  • the systems and methods described herein can be used for any event, such as a sporting event, that benefits from repeatability and accurate body kinematics. Some such events include, in addition to shooting, archery, golf, bowling, darts, running, swimming, pole vaulting, football, baseball, basketball, hockey, and many other types of sports.
  • the system is configured to track body landmarks of a participant and determine ways to alter the body motion to improve performance.
  • the system receives video data of a participant. This may come from a single image capture device, or two image capture devices, or three or more image capture devices.
  • An image capture device may be any suitable imaging device that is configure to capture sequential images of a participant, and may include any consumer-grade or professional grade video camera, including cameras regularly incorporated into mobile computing devices.
  • the system determines one or more body landmarks of the participant.
  • the body landmarks may be associated with any joint, body part, limb, or a location associated with a joint, limb, or body part.
  • the system generates a wireframe based on the one or more body landmarks, and may use less than all of the body landmarks in generating the wireframe model.
  • the body landmarks are tracked during performance of an activity to generate motion data.
  • body landmarks may be created for the hands, wrists, arms, head, shoulders, torso, waist, knees, ankles, and feet of a golfer which may be tracked during a swing of a golf club.
  • a score is determined and associated with the performance. As described, in an activity involving a projectile, the score may be associated with the path or destination of the projectile. In golf, for example, the score may be based on the distance, direction, nearness to a target, or a metric associated with an average or a past performance of the participant. In short, any metric may be used to evaluate the quality of the outcome of the performance. [0165] At block 2510, the score is associated with the performance.
  • the system generates recommendations for altering the motion on a subsequent performance to improve the outcome.
  • the recommendation may involve the hands, including a grip on a firearm, golf club, bat, stick, and the like.
  • the recommendations may also include a change is weight distribution or transfer.
  • the recommendations may include the motion of the hands, head, shoulders, body, legs, feet, or other body part.
  • the recommendations include suggestions for altering the motion of the one or more body landmarks in an effort to improve the score of the performance in a subsequent try.
  • a system that can quickly compute a signature for the acoustic sound of a shooter’s firearm, firing specific ammunition, in a given environment, using consumer-grade audio recording equipment (e.g., an iPhone, a tablet, a telephone, a video camera).
  • consumer-grade audio recording equipment e.g., an iPhone, a tablet, a telephone, a video camera.
  • the system can receive an input audio or audio/visual file and compute the signature of the firearm based on the sound in the captured recording.
  • the recording may be captured by any suitable audio and/or video capture device, such as, without limitation, security cameras, traffic cameras, video cameras, television cameras, mobile device recorders such as smart phones or tablets, as well as other capture devices.
  • the capture devices are readily available consumer-grade recording devices.
  • the acoustic signature may further include determining the make, model, silencer, ammo type, ammo manufacturer, and other characteristics of the firearm blast bast upon the acoustical signature from a discharged firearm.
  • This capability could be used to detect and separate the shooter’s shots from those of other shooters on a typical shooting range, which may be useful, such as for automatic scoring.
  • a version of this capability might be used to identify specific firearms and ammunition from sound recordings.
  • the described solutions operate in near real-time on a single, consumer recording device such as a mobile phone using only a modest amount of training data.
  • the terms “real-time” or “near real time” are broad terms and in the context of this disclosure, relate to receiving input data, processing the input data, and outputting the results of the data analysis with little to no perceived latency by a human.
  • a system as described herein that outputs analyzed data within less than one second is considered near-real time.
  • a system that operates in real-time or near-real time may limit the amount of computation for machine-learning, or at least training models, that the method can use to characterize a particular firearm firing appropriate ammunition.
  • Prior approaches describe a two-step approach to classification. For example, some prior approaches may use a pre-trained instance of a relatively generic pretrained DNN instance as an approximate classifier (predictor) and use an additional trainable step on the classifier output to improve the DNN prediction.
  • the DNN treats a finite length time segment of the time-varying spectrogram of the shot sound as an image and differentiates the pooled image for each firearm and ammunition pair from the pooled images of the other firearm and ammunition pairs.
  • the described system overlays a trainable weighting mask on an image created from a time segment of the time-varying spectrogram and a straightforward training method for adjusting the mask using just a few instances of shot sounds for the shooter’s firearm and ammunition as recorded in a particular environment with a particular device. Training only the weighting mask is significantly less computationally intensive than training the DNN itself. Consequently, according to some embodiments, the DNN may not be trained, or may be trained to a much lesser degree than prior methods, and the weighting mask receives the training. This approach has several benefits.
  • the adjusted mask can be viewed as the signature for the combination of firearm, ammunition, environment and recording device.
  • This signature is not used to search a catalog of signatures for that combination of factors, but to condition the spectrograms input to enhance the classification performance of the DNN for the firearm and ammunition in a particular environment using the given recording device.
  • the enhanced DNN classification can then be used to improve detection and differentiation of shooter’s shots from those of other shooters.
  • This approach has shown significant advantages in the intensity of the required computations, results in a much faster analysis, and can be used for numerous firearms in many different environments.
  • the weighting mask can be updated either in online fashion using stochastic gradient descent or in offline fashion using gradient descent.
  • the training data may be for a specific shooter using a specific firearm and ammunition combination in a given environment, which results in a much smaller data set than if data points were agglomerated from numerous shooters in different environments.
  • both algorithms may estimate the gradient of the function computed by the DNN in each iteration of the weighting matrix update. While this is not computationally prohibitive, it may turn out that just the sign of the gradient or a roughly quantized version is sufficient. That simplification depends in most part on whether the DNN computes a monotonic increasing or decreasing function in each input.
  • the disclosed system is capable of quickly fingerprinting a firearm and ammunition pair in a particular environment using a specific recording device.
  • the system and methods described herein can very quickly determine an acoustical signature of a firearm and ammunition combination in an environment. This allows the system to differentiate the analyzed acoustical signature from other firearm and ammunition combinations.
  • a scoring system will have a reduced number of false positives and missed shots because the system will be able to determine that a specific firearm and ammunition combination were utilized, which may be time wise matched with a hit on a target.
  • the system may be executed on a mobile computing device and used at a shooting range.
  • the shots fired by the shooter of interest will generally have an audio file that is dominated by shots fired by the shooter of interest, which may aid in determining whether a fired shot is from the shooter of interest.
  • the recording device e.g., mobile computing device
  • the recording device has a microphone that is pointed down range, such as where the mobile computing device has a camera pointed at a target, such as for automated target scoring, in which case he audio from shots fired by the shooter of interest will have a volume that is more difficult to distinguish from other shooters at the range.
  • described embodiments may use training of a machine learning algorithm, or training of a weighting mask, to quickly differentiate shots fired by the intended shooters from all other shooters at the range.
  • the classifier relies on feature extraction in the form of time- frequency spectrograms.
  • Mel-frequency cepstral coefficients (MFCCs) vectors could be used as an alternative raw power spectrum density vector (PSDs) for the frequency representation of the spectrogram.
  • PSDs raw power spectrum density vector
  • MFCs Mel frequency cepstrums
  • the MFC’s may be applied to firearm fingerprinting purposes.
  • MFCCs may be generated through executing a series of steps, including, without limitation: i) window the signal segment and compute the Fast Fourier Transform (FFT) of the signal segment, ii) combine linear FFT coefficients into the MEL frequency filter bank coefficients, iii) take the logs of those coefficients, and iv) compute the discrete cosine transform (DCT) of the log MEL filter bank coefficients.
  • FFT Fast Fourier Transform
  • DCT discrete cosine transform
  • the FFT of the signal segment generally results in a peak at the applied frequency along with other peaks, referred to as side lobes, which are typically on either side of the peak frequency.
  • the DCT in some cases, expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. In some cases, fewer or more than the disclosed steps may be implemented to arrive at firearm fingerprints based on shot sounds. For instance, in some cases, only steps i), ii), and iii) described above may be used for shot sounds. [0180] According to some embodiments, MFCC or PSD coefficients may be used as inputs to a DNN trained to classify time-frequency spectrograms into some number of independent categories.
  • the MFCC and/or PSD coefficients may input two categories as in the proposal, or some number of categories corresponding to (firearm, ammo) pairs, or some larger number of categories corresponding to tuples of k>2 attributes.
  • the DNN may place a new shot sound "in the neighborhood of" whatever class the maximal classifier output of all the classifier outputs represents. For example, a shot sound may be initially classified through a nearest neighbor approach, such as a k-NN algorithm. Subsequent analysis may further classify the shot sound.
  • Inner layers of the DNN may represent different sets of attributes of the input. These may likewise be used for classifying the shot sounds and training.
  • training the DNN essentially defines a surface with multiple local optima and the DNN can be thought of as directing a new input to the most appropriate local optimum.
  • following the pre-trained DNN with one or more trainable layers might be thought of as extracting a different set of more optimum attributes for shot sounds.
  • preceding the pre-trained DNN with a trainable layer may include, in some cases, weighting coefficients, might be thought of as adjusting the pre-trained DNN so that the shooter's shots of interesting are the most positive examples of all shots in the pre- trained DNN places "in the neighborhood of" whatever class the maximal classifier output of all the classifier outputs represents.
  • the decision threshold may be adjusted to optimize the confusion matrix for the training dataset used to adjust the trainable input layer or another test dataset for some useful criteria.
  • the FFT of a windowed segment of the shot audio can be computed in O(n log n) time.
  • Computing MFCCs has the same time order but includes computing two O(n log n) operations.
  • the MFCCs may not be significantly better than FFTs and they may be omitted.
  • the MEL frequency log spectrum (omit the discrete cosine transform (DCT) yielding the cepstrum) might be an improvement over raw FFTs with only O(n) extra computation cost.
  • the MEL frequency log spectrum is used rather than the DCT yielding the cepstrum.
  • a DNN composed of layers of convolutional neural networks (CNNs)
  • CNNs convolutional neural networks
  • the FFT takes advantage of regularities in the FFT kernel that generally won't exist in a composition of essentially arbitrary CNNs.
  • examples described herein have an FFT of the input time signal, and the pre-trained DNN consisting of CNNs may be implemented in the frequency domain.
  • the following DNN-based quickly-trainable category recognizer may be implemented to quickly determine an acoustic signature of a firearm and ammunition combination.
  • RM ⁇ RK denote the analog transformation by a trained deep neural net from a real-valued M-dimensional input vector to a K-dimensional vector of class probabilities.
  • a final discrete output mapping ⁇ : ⁇ K ⁇ ⁇ ⁇ selects the most probable of the ⁇ classes.
  • the system may be configured to expand the trained DNN into an enhanced binary classifier for class ⁇ . In some cases, the system may add a rapidly trainable input stage to the DNN implemented as the Hadamard product “ ⁇ ” of a weighting matrix ⁇ and the input vector ⁇ .
  • the Hadamard product is a binary operation that takes in two matrices, such as a weighting matrix W and the input vector x, and returns a matrix of the multiplied corresponding elements.
  • the system can then follow the DNN class probability vector output with a selector function: [0191] ⁇ : ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ that provides the single class probability of class ⁇ .
  • ⁇ Z ⁇ 1, ..., ZM ⁇ , all instances of the same class k.
  • the system can be tuned to better recognize similar instances of class k through any of a number of suitable ways.
  • one method is online learning that is typically used for large training datasets.
  • ⁇ ⁇ +1 ⁇ ⁇ ⁇ ⁇ [ ⁇ ⁇ + ⁇ ⁇ ( ⁇ ( ⁇ ) ⁇ ⁇ ( ⁇ ( ⁇ ⁇ ⁇ ⁇ ); ⁇ )) ⁇ ( ⁇ ⁇ ⁇ ( ⁇ ⁇ ⁇ ⁇ ⁇ ) ⁇ ⁇ ]
  • 1, ..., ⁇ [0195]
  • 0 ⁇ ⁇ ⁇ 1 is an adaption constant that weights the relative contribution of ⁇ to ⁇ .
  • a firearm discharge results in multiple acoustic events, such as, for example, the muzzle blast created by the expansion of gases within the chamber and exiting through the barrel, and the ballistic shockwave generated by the projectile, which, in most cases, is supersonic, but may also be subsonic in some cases.
  • FIG.26 illustrates a sample process for correlating a firearm with a particular shooter 2600, in accordance with some embodiments.
  • the range may have other shooters who are also at the range discharging firearms.
  • a busy range may have 10, 20, or 30 or more shooters who may all be shooting at the same time. It can be quite difficult for an acoustic system to register a shot fired by the shooter of interest through the cacophony of firearm discharges.
  • the system is configured to distinguish between the shooter of interest’s firearm and those of other shooters that range.
  • the system receives video data of a shooter, which also includes audio data. This may be received, for example, through a multiple-camera system, dedicated microphones, or consumer-grade audio/video capture devices, such as a mobile computing device.
  • the system may determine body landmarks of the shooter.
  • the system may track the one or more body landmarks while the shooter fires a shot and generate motion data associated with the body landmarks during the shot.
  • the system correlates audio data with motion data to determine that a shot has been fired by the shooter of interest.
  • the system may receive audio data indicating a firearm discharge, which may be represented as a spike in an audio wave file. This may be correlated with motion data, such as the shooter’s wrist, that indicates that recoil from the firearm displaced the shooter’s hand, thus indicating that a shot has been fired.
  • the system is trained to distinguish the shooter’s firearm from other firearms at the range. In this case, the system can be trained to differentiate shots fired from the shooter of interest and other shooters at the range.
  • the system can determine the score of the shot and associate the score with the motion data leading to the score.
  • the system may analyze the motion data in combination with the score and determine any mistakes that the shooter made, and provide suggestions to identify the mistake and/or how to address the mistake in the future. The system may also provide training exercises in order to allow the shooter to address the mistake and improve his shooting performance.
  • FIG.27 illustrates an example system 2700 that uses online learning, audio 2702 is received and is converted to an incremental spectrogram 2704.
  • the spectrogram 2704 is framed 2706, such as by a time window, and used to determine MFCCs, such as by determining the FFT of the windowed signal, combining linear FFT coefficients into the MEL frequency filter bank coefficient, determining the logs of the coefficients, and determining the DCT of the log MEL filter bank coefficients.
  • the determined MFCC’s can be entered into a rapidly trainable input stage 2708 and then delivered to a DNN multi-class classifier 2710.
  • the selector 2734 determines the classifier output k with the highest class probability to classify a shot.
  • the rapidly trainable input stage 2708 may be referred to as a trainable weighting mask, or just mask.
  • the mask may be adjusted using instances of shot sounds for the shooter’s firearm and ammunition as recorded in a particular environment. In some cases, training only the weighting mask is significantly less computationally intensive than training the DNN.
  • the trained (e.g., adjusted) weighing mask may represent the signature for the combination of firearm, ammunition, environment, and recording device. This signature is not necessarily used to search a catalog of signatures, but rather, is used to condition the spectrograms input to enhance the classification performance of the DNN for the firearm and ammunition in a particular environment using the given recording device.
  • the training dataset for a specific shooter (e.g., a specific firearm and ammunition combination) is small, therefore the weighting mask can be updated either using online approaches, such as by using stochastic gradient descent, or offline such as by using gradient descent techniques.
  • Inner layers of the DNN classifier 2710 may represent different sets of attributes of the input.
  • the DNN is trained to define a surface with multiple local optima and the DNN can act to direct a new input to the most appropriate local optimum.
  • the enhanced DNN classification can then be used to improve detection and differentiation of a shots from one firearm from those of other shooters.
  • the weighting mask ⁇ of the input stage 2708 to the pre-trained DNN classifier 2710 may be adapted by an online learning loop 2712 that optimizes the weighting mask ⁇ .
  • the online learning loop 2712 includes a copy 2714, 2716, and 2718 of the shot classifier input stage 2708, 2710, and 2734.
  • Block 2730 of the adaption loop computes the vector sign of the gradient of the DNN output with respect to spectrogram input for the current output spectrogram x(n) from the framing block 2706.
  • a raw incremental adjustment to the current weighting vector ⁇ i(n) is then computed by the Hadamard multiplier 2732 as the product of the current shot spectrogram from the framer 2706 and the sign vector of the classifier gradient 2730.
  • the vector multiplier 2722 then scales the raw incremental adjustment to the weight vector ⁇ i(n) by an arbitrary value ⁇ .
  • the vector adder 2724 computes a preliminary updated weight by adding the current incremental adjustment to the current ⁇ i(n).
  • This preliminary weight is transformed by the vector ⁇ ⁇ ⁇ ⁇ [] operation 126 to an updated weight vector ⁇ i+1(n).
  • the weight adaption loop 2712 just described is iterated, as symbolized by the delta operator 2728, until the weight vector converges , for example
  • the result ⁇ i+1(n) is then selected as the weight vector ⁇ (n+1) for classifying the next shot.
  • Offline learning may be used, such as when ⁇ is small.
  • offline learning may require about the same amount of computations as online learning.
  • online learning has the advantage over offline learning that stochastic gradient descent doesn’t require accessing the entire training dataset ⁇ in every iteration of the update function while offline learning does.
  • Offline learning offers the advantage of finding a local optimum while online learning may only approximate it.
  • the pre-trained network extracts many levels of increasingly abstract features, such as from images, and the additional trainable layers are used to focus on a problem of interest.
  • many of the systems and methods described herein function in a much different way that results in a system that is more efficient, much quicker, and more accurate.
  • the systems described herein in many cases, only add a single layer to the input of a pre-trained network. In some use cases, a camera is pointed at a target rather than focused on a shooter, and the camera and microphone pick up other shots without being able to natively determine that the discharge came from a shooter of interest.
  • the acoustical signature is generated quickly and, in many cases, is performed on a mobile device that includes a camera and a microphone (e.g., smartphone).
  • the mobile device may execute instructions (e.g., an application) that includes the components and systems described herein so that the classification and acoustical signature determination is performed on the mobile device.
  • the pre-trained network may be trained on a relatively encompassing universe of shots.
  • the trainable input layer may pre-distort the input data to achieve a highly probable recognition by the pre-trained network of the particular firearm with ammunition and environment. This may then increase the likelihood of detecting the shot of interest and rejecting all other shots.
  • a process 2800 begins 2802 and at block 2804 an audio file is opened, which may be a first audio file, or a next or subsequent audio file.
  • the audio file is used and the system, at step 2806, captures a block of samples framing a first and/or next shot. In other words, each shot is windowed in a time-bound sample.
  • a spectrogram associated with the samples is generated and labeled with attributes.
  • the system determines whether the most recent sample is associated with a last shot, and if not, the system returns to block 2806 to capture a block of samples associated with a subsequent shot. If so, the system proceeds to block 2812 and determines whether the labeled spectrogram created at block 2808 is the last file. If not, the system returns to block 2804 to open or capture a next audio file. If the system determines the most recent file is the last file, the system proceeds to block 2814 where the labelled spectrograms are aggregated. At block 2816, the DNN is trained on the labeled spectrograms.
  • a process 2900 begins 2902 and at block 2904 a set of shots is captured.
  • the shots may be captured by an audio and/or video recording device, or may include opening a file associated with one or more shots.
  • the system captures a block of samples framing a first and/or next shot. In other words, each shot is windowed in a time-bound sample.
  • a spectrogram associated with the samples is created and labeled with attributes.
  • the system determines whether the most recent spectrogram is associated with a last shot, and if not, the system returns to block 2906 to capture a block of samples associated with a subsequent shot. If so, the system proceeds to block 2912 and determines whether the labeled spectrogram created at block 2908 is the last set. If not, the system returns to block 2904 to capture a next set of shots. If the system determines the most recent file is the last file, the system proceeds to block 2914 where the labelled spectrograms are aggregated. At block 2916, the system updates W (weighting) until the value converges. The system stops at block 2918 with a spectrogram weighting.
  • a process for classifying a shot 3000 begins at bock 3002, and at block 3004, the system captures a block of samples framing a shot. At block 3006, the system determines a spectrogram associated with the block of samples. At block 3008, the system determines a Hadamard product of the spectrogram and weight matrix. At block 3010, the Hadamard product of the spectrogram and weight matrix is applied to the DNN. At block 3012, the system makes a binary decision, such as the shot was either associated with the signature of the firearm in question or it was not. In some cases, the system is able to identify the type of firearm and the ammunition that was fired through the firearm.
  • the system may include one or more processors and one or more computer readable media that may store various modules, applications, programs, or other data.
  • the computer-readable media may include instructions that, when executed by the one or more processors, cause the processors to perform the operations described herein for the system.
  • the processor(s) may include a central processing unit (CPU), a graphical processing unit (GPU), both CPU and GPU, a microprocessor, a digital signal processor or other processing units or components known in the art.
  • CPU central processing unit
  • GPU graphical processing unit
  • microprocessor a digital signal processor or other processing units or components known in the art.
  • the functionally described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs), etc.
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • ASSPs application-specific standard products
  • SOCs system-on-a-chip systems
  • CPLDs complex programmable logic devices
  • each of the processor(s) may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • the one or more control systems, computer controller and remote control may include one or more cores.
  • Automated Target Scoring [0226] In some cases, the described system operates in near real-time on a single, consumer recording device such as a mobile phone using only a modest amount of training data.
  • real-time or “near real time” are broad terms and in the context of this disclosure, relate to receiving input data, processing the input data, and outputting the results of the data analysis with little to no perceived latency by a human.
  • a system as described herein that outputs analyzed data within less than one second is considered near-real time.
  • a system that operates in real-time or near-real time may limit the amount of computation for machine-learning, or at least training models, that the method can use to characterize a particular target acquisition, classification, and scoring.
  • Prior approaches to automated target scoring have relied upon sound triangulation, light triangulation, and piezoelectric sensors triangulation. Sound triangulation has been attempted by using sound-chamber targets, which use the Mach wave of the projectile to determine its position as it passes through the target.
  • a sound triangulation automated scoring system operates by using microphones to measure the sound wave of the projectile as it passes through the target.
  • a light triangulation automated scoring system uses three or more lasers, such as infrared lasers. The three or more lasers are used to triangulate the position of the projectile as it passes through the target.
  • a piezoelectric sensor triangulation system relies on a series of piezoelectric sensors on a plate that sense vibrations caused by projectiles impacting a target.
  • FIG.31 illustrates a system 3100 configured to automatically identify a target, classify the target, determine impacts on the target, and score the shooting string.
  • the system 3100 may include computing resources 3102, which may be a mobile computing device associated with a participant at a shooting range and may include any one or more of a number of mobile computing devices, such as, for example, a smart phone, a tablet computer, a lap top computer, or other suitable computing device.
  • the computing resources 3102 typically includes one or more processors 3104 and memory 3106 that store one or more modules 3108.
  • the modules 3108 may store instructions that, when executed, cause the one or more processors 3104 to perform various acts.
  • the computing resources 3102 may further include data storage, which may be remote storage, such as a remote server or a cloud-based storage system, or local storage, or a combination.
  • the data storage may store data on previous engagements (DOPE) which can allow for data tracking over time as well as comparative data between different shooters, different firearms, different ammunition, different targets, different environments, and the like.
  • DOPE previous engagements
  • the storage system may further allow historical trend analysis, which can be used to show shooter performance over time, including tracking improvements.
  • the data storage may also be analyzed to provide performance predictions, rankings, social features, among other benefits.
  • the system may incorporate one or more imaging sensors 3110, such as any suitable video camera.
  • the imaging sensor 3110 may be associated with the computing resources 3102.
  • the computing resources 3102 may be a smart phone with built in camera 3110.
  • the camera 3110 may be pointed to capture images of a target 3112.
  • the target may be located at any distance from the shooter and the camera 3110 may be aimed and/or zoomed to capture images of the target.
  • the camera may be coupled to a lens, such as a spotting scope or camera lens to allow the camera to get a closer view of the target through optical or digital zooming.
  • the computing resources 3102 may include instructions (e.g., modules 3108) that allow the computing device to initialize a target 3114, detect impacts on the target 3116, and score the impacts on the target 3118, among other things.
  • FIG.32 illustrates a decision tree 3200 configured to detect, identify, and classify a target.
  • the system is not made aware of the type of target before the system begins looking for a target.
  • a scoring system may be preprogrammed with the target that the shooter will be aiming at. This makes it easy for the system to understand the size and shape of the target, and the location and boundaries of each scoring ring or region.
  • the system is configured to automatically determine, without a priori data regarding the type of target, and determine the scoring rings and regions.
  • the system may have one or more video capture devices, which may be integrated into one or more mobile computing devices.
  • a mobile computing device may be one or more of a mobile phone, a smart phone, a tablet, a laptop, a personal digital assistant, smart glasses, a body cam, a wearable computing device, or some other computing device that a user may carry to a shooting range.
  • the mobile device may actuate a camera and capture one or more frames of a target 3202.
  • the computing device may have instructions that analyzes the one or more frames by using any suitable image analysis algorithms, to identify a target in the one or more frames. If the target is detected and classified at block 3204, the target is registered with the system 3206 and the scoring rings and regions are determined. The system may capture additional image frames that contain the target and look for differences from one frame to the next that may correlate with impacts on the target.
  • Moving averages are a fundamental mathematical and statistical technique applied in image analysis and machine learning for various purposes, including noise reduction, feature extraction, and trend analysis. They involve the calculation of the average value of pixel intensities or other data points within a moving window or kernel across an image or dataset. Moving averages can be used to extract meaningful features from images. For example, by sliding a small window across an image and calculating the average pixel values within that window, important information can be highlighted. For example, in edge detection, the moving average can emphasize areas with abrupt changes in pixel intensity, helping to identify edges or boundaries of the target and the scoring regions. Edge detection can also be used to identify impacts on the target.
  • moving averages are used for time-series data analysis.
  • moving averages can be used to establish a baseline behavior for a system. Any data points that deviate significantly from this baseline may be flagged as anomalies or outliers. These anomalies may be further analyzed to determine impacts on the target.
  • sequential moving averages are generated, they may be combined as long-term moving averages.
  • the moving average images may be compared with the long-term moving averages to determine differences from one frame to a subsequent frame that indicate a change to the target, which is most likely associated with an impact on the target.
  • the impacts are selected and classified.
  • the system determines the boundaries of the scoring ring and determines the location of each impact and associates the location of each impact with a score for the impact.
  • the system determines whether the target has been detected. If not, then at block 3216, the system proceeds to detect the target. If the target has been detected, the system, at block 3218 classifies the target, such as by identifying the boundaries of the target, the boundaries of the scoring rings, and the value of the scoring rings.
  • FIG.33 further describes the initial steps that the system may take to identify and classify a target 3300 by analyzing one or more image frames.
  • Object detection is a computer vision technique that involves identifying and locating multiple objects within an image or video stream. Unlike image classification, which determines the presence of a single object class in an entire image, object detection provides a more granular understanding by not only recognizing objects but also specifying their positions through bounding boxes.
  • object detection algorithms typically output bounding boxes that enclose the detected objects. These bounding boxes consist of coordinates (x, y) for the object's top-left corner and dimensions (width and height) defining the object's spatial extent within the image. [0242]
  • the system applies object detection to one or more images of the target and searches for the target.
  • the object detection model is generic with respect to targets, which allows the system to detect any target, regardless of size or shape.
  • the system assumes it has located the target and defines the bounding box around the target.
  • finding the target in the same location in subsequent images comprises determining a moving average of the images to determine the target location, size, and shape.
  • the target is optionally classified.
  • the system may be configured to detect objects, classify each detected object into predefined classes or categories. This allows the system to distinguish between different object types, such as circular targets, ovoid targets, rectangular targets, silhouette targets, or otherwise.
  • a target classifier may be applied to the image within the bounding box. The system thus determines which reference target image to apply.
  • the system registers the target to the reference target image. In some cases, this involves applying a contrast adjustment to the image.
  • This may also involve iteratively modifying the initial bounding box, such as by adjusting its corners, then projecting the adjusted bounding box onto the reference target image.
  • the difference between the two may be applied as a score, and a hill-climbing technique may be applied to find the optimum corners, which can be correlated to be the initial location of the image.
  • a hill-climbing technique is an optimization algorithm used to find the local maximum (or minimum) of a given objective function. By iteratively making small steps in the direction that leads to a higher value, the algorithm determines the highest value, lowest value, and thus can be used to determined the boundaries of the target.
  • object detection is combined with semantic segmentation to provide pixel-level object masks.
  • FIG.34 describes the process 3400 of registering the target 3300 and determining impacts on the target.
  • the target may be reregistered, such as by performing a hill-climbing technique to search for new set of best corners for the target.
  • the hill-climbing search uses a mean-squared distance in perceptual space technique.
  • mean-squared distance also called mean-squared perceptual error
  • mean-squared perceptual error is a metric used to measure the similarity or dissimilarity between two data points involve perceptual data, such as target corners.
  • relevant perceptual features are extracted – in this case, target corners, edges, scoring rings, etc.
  • the features can be visual descriptors which can be represented as a vector of perceptual features. These feature vectors capture the relevant information for each data point in a reduce and more informative form.
  • the mean-squared distance between two points (represented as their respective feature vectors) is generated by determining the squared differences between corresponding features and calculating the mean of these squared differences.
  • the system may apply a transformation matrix, which can be used to map the set of corners to an image.
  • the image upon which the coordinates are mapped has a dimension of 160 pixels, and in some cases is less than 160 pixels.
  • the moving average images are updated, and in some cases, long- term moving averages are those of about 10 seconds or longer, while short-term moving averages are those of about 0.1 second. In some cases, a video camera may capture upward of 30 frames per second. For the short-term moving averages, this equates to averaging about 3 frames to determine the short-term moving average.
  • the system determines the difference between moving averages.
  • the long-term moving average will be associated with a static target that hasn’t changed over 10 seconds or so, and it can be compared with the short-term moving average which reflects a change in the image. Therefore, the difference between the short-term moving average and the long-term moving average will highlight changes to the image, such as an impact on the target.
  • the system may convolve any difference images with a simple impact kernel, which in some cases may be a 5x5 uniform weight, square kernel, and look for the maximal block-wise locations in the difference image.
  • the kernel typically refers to a convolutional filter that can be used to process and modify pixel values, such as for feature extraction.
  • the square kernel may convolve (or move) across the image and at each position, the kernel’s values can be multiplied with the pixel values in the corresponding neighborhood and the results can be summed to produce a new pixel value in the output image.
  • the size of the kernel may be altered to adjust the extent of the neighborhood considered during convolution and may include any of a hole set of aperture kernels, having non-uniform weights, and may have any suitable size.
  • the convolution will return a set of potential impacts on the target.
  • the set of potential impacts may be further filtered, such as by using simple stats on a window surrounding the flagged difference. In some cases, the window is selected to be a 16x16 window with the difference in the middle of the window.
  • the system may also apply some business rules to the windowed difference, such as, for example, the system should not detect multiple impacts in exactly the same location.
  • the impacts on the target are determined. In some cases, this is accomplished by passing the filtered set of differences to an impact classifier for scoring. If the score of the difference is above a threshold value, the location is marked as an impact and another window may be placed around the impact. In some case a 10x10 window is placed around the impact location in the long-term average with the short-term average for between 5-10 following frames. This ensures that the same impact is not detected again.
  • the difference is windowed by a first window, and if the difference exceeds a threshold score, the difference is windowed by a second window smaller than the first window.
  • the windowed difference associated with the short-term moving average may be added to the long-term moving average for at least 5 frames, or at least 6 frames, or at least 10 frames, or at least 12 frames, or at least 15 frames or more.
  • the score of the difference is below a threshold, the difference is marked as a false impact and the system won’t need to evaluate and classify it over again.
  • the system may receive audio data associated with a shot being fired and determine that a shot has been fired based on the audio data.
  • the audio data is correlated with the target images and the system can convolve the difference target image in response to the audio data indicating that a shot has been fired.
  • the system may not need to continuously convolve the difference images.
  • the system can determine, through audio data, that a shot has been fired and then update the shot-term moving average and convolve the difference images to look for a shot.
  • the system is configured to discriminate shots fired by the user aiming at the target against other shooters at the shooting range. In this way, the system can know when the shooter of interest fires a shot even where other active shooters are present at the range.
  • the audio data may be used in the impact detection, such as by correlating the audio of a shot fired with an impact appearing on the target images.
  • FIGs.35A – 35C illustrate and describes initializing the scoring system by identifying and classifying a target.
  • the system can automatically determine the boundaries of the target, while in some embodiments, user input may define the boundaries of a target. For example, using a human to computer interface (e.g., touch screen, mouse, stylus, touch pad, or the like), a human may draw a boundary around the target to aid the system in identifying the target.
  • the system uses machine vision to identify the target and its boundaries.
  • FIG.35A illustrates an image 3500 captured by a camera associated with the system.
  • the image may include a target stand 3502, a target 3504, target securing clips 3506, and other features within the field of view.
  • the system may determine an initial bounding box 3508 around the identified target, such as by a trained target detection model.
  • a user may define the initial bounding box, such as by drawing on the computer display with a human to computer interface.
  • the human to computer interface may be any suitable interface and in some cases is a touch screen, a pen, a mouse, a trackball, or the like.
  • the initial bounding box may not accurately conform to the edges and corners of the target, especially in those cases where the bounding box is defined by the user.
  • the initial bounding box and target image may be referred to as an initialization frame.
  • the initialization frame may be converted to a Lab color space, which includes the components lightness, green to red axis, and blue to yellow axis to generate perceptual uniformity.
  • the luminance channel is equalized via contrast limited adaptive histogram equalization (CLAHE).
  • FIG.35B illustrates a target in which the coordinates for the target are determined, as described above, and the coordinates in many cases imply a quadrilateral, which can be projected onto a reference target image 3510.
  • the reference target image 3510 may also be converted to Lab color space and the squared difference (in Lab space) may be generated between the projection and the target.
  • FIG 35C illustrates the best coordinates that have been determined, such as by the minimum difference over several random restarts of the hill-climbing algorithm.
  • the coordinates may then be used to apply the updated bounding box 3512. Therefore, even where the target image is skewed, such as the viewing angle from the camera appear to show the target as a parallelogram rather than a rectangle, the initial bounding box can be modified to conform with the shape of the target as presented in the image captured by the camera.
  • the system may define the edges of the target through image analysis; however, in some cases, the edges of the target are irrelevant and it is only the scoring rings that are important. Therefore, in some cases, the system is configured to identify scoring rings and is not concerned with target boundaries. In addition, the system may not need to classify the target, but rather, only need identify scoring rings. For example, the system may determine, through one or more machine learning models, that the target represents a center bulls-eye target with sequential scoring rings. The system may assign score values with each ring, such as ten points for the bullseye, nine points for the next larger ring, and so on.
  • the system may identify a target with five bullseye sized circles spaced throughout the target and assign a value of ten points to each of these scoring rings.
  • One or more of the multiple bulls-eye sized rings may have radially space larger scoring rings that may be assigned lesser values than the bulls-eye sized ring.
  • the system may omit a step of classifying a target, and just focus on the size and location of the scoring rings.
  • FIG.36 illustrates and describes impact detection and scoring of the detected impacts.
  • a machine learning model may be executed to determine if the differences between image frames are likely to be projectile impacts on the target 3504.
  • the target 3504 may be re-registered, such as by applying a hill-climbing techniques with possible corner coordinates as in the target initialization step.
  • differences are windowed 3602a, 3602b, 3602c, and the difference image 3612 between the current target and the long-term target average is generated.
  • the difference image may be convolved with an impact kernel (e.g., a windowed kernel that scans for image differences). Any point that is several standard deviations over the long-term exponential moving average (EMA) of the maximum convolution value is flagged as a possible impact 3604a, 3604b, 3604c.
  • an impact kernel e.g., a windowed kernel that scans for image differences.
  • Any point that is several standard deviations over the long-term exponential moving average (EMA) of the maximum convolution value is flagged as a possible impact 3604a, 3604b, 3604c.
  • EMA exponential moving average
  • FIG.37 illustrates and describes impact scoring on a target 3504. Different scoring zones may be determined by the system based on computer vision, by references a registered target from a previous shooting session, by retrieving a stored target model from a known target database, or in some other way.
  • the scoring zones 3702 on the target may be represented as the union area of one or more simple shapes (e.g., ellipses, rectangles, circles, triangles, etc.)
  • the coordinates of the detected impact 3704 may be normalized and converted to the axes implied by the reference image.
  • the impacts may be overlaid on the reference image, and the reference image can be used for coordinates of the impact.
  • the coordinates may be cartesian coordinates expressed in x,y values. In some cases, the coordinates may be radial coordinates that express the impact as an angle and distance, such as from the center of the target. The system can then determine whether the impact is wholly within a single scoring zone or intrudes on a scoring zone boundary, which allows the system to accurately score the impact.
  • the system may use simple geometry to determine whether any significant part of the given impact is within any of the simple shapes for each target zone.
  • the system uses the coordinates for the impacts in further analyses. For example, by generating and storing coordinates for a given shooting string, the grouping can be quantified, which can be used as a measure of improvement over time. Similarly, a shooter’s moment of angle (MOA), which is a measure of a group size in inches and minutes of an angle and from center to center and edge to edge can be determined. A grouping may further be used to define pose, grip, or motion errors during the shooting string. The grouping may be quantified, including size of group, rotation of group, or other metric.
  • the scoring can be quantified in any suitable metric.
  • the scoring is point based, in which each zone of the target receives a number of points that are added or subtracted from an initial amount.
  • missing shots or extra shots fired are scored as negative values or higher values depending on the type of scoring.
  • timed scoring is used, in which the total time is reflected in the score, and misses may be penalized by an increase in the time.
  • group size is used to determine scoring, and extra shots or missed shots may penalize the group size.
  • other metrics and combinations of metrics may be determined by the system for scoring a particular shooting string.
  • the system may be configured to return, for each shot in a shooting string, a coordinate of the impact with the time in which the shot happened in a way that number of metrics combining the location and time can be used to range the shooting string. For instance, the time between shots may be measured, or the shot after a buzzer or other start signal can be tracked and stored with an accuracy metric.
  • the system may draw a bounding box surrounding one or more of the impacts. When a shooting string is finished, the system may draw a bounding box that contains each of the shots within the group and determines a metric based on the bounding box to determine a score.
  • the system can be configured to determine a bounding box 3802, which may pass through the center of the outermost impacts, or along an edge of the impacts.
  • the system may determine any of a number of metrics, such as, without limitation, a group size 3804, an overall group width 3806, a group height 3808, bounding box rotation angle, MOA, elevation offset 3810, windage offset 3812, and may further determine the shooting distance 3814, which may either be manually input or determined based on detected flight time of the projectile.
  • the system may be configured to register the sounds of the gunfire, the shockwave of the projectile or powder deflagration, motion of the firearm or shooter, or some other indicator that a shot has been fired.
  • the system may then detect when an impact on target happens and determine the flight time of the ammunition and based on the firearm, the ammunition, and/or the powder loading, determine a target distance.
  • This process can be done in near real time by a simple consumer grade mobile computing device.
  • the mobile computing device may utilize a zoom feature of a built-in image capture device.
  • an external zooming lens may be used to acquire image frames by the mobile computing device.
  • a mobile phone may be coupled to a spotting scope, which provides an optical zoom through the spotting scope to allow the mobile computing device to capture clearer images of a target that may be down range.
  • Some mobile computing devices may rely on digital zoom to capture one or more images of a target positioned down range.
  • the system may further determine and display the number of shots fired 3816 in the current shooting string, an average split time 3818 for each shot, which may be helpful for timed shooting events.
  • the system may further show a score associated with each shot 3820 and a cumulative score 3822 of the shooting string.
  • Some embodiments further provide for an automated and automatic scoring system that can quickly identify a target, classify a target including identifying scoring rings of the target, and score impact hits on a target at a shooting range.
  • the system is stored and executed on consumer-grade mobile computing devices (e.g. an iPhone, a tablet, a telephone, a video camera).
  • the system includes a video camera device that is pointed at the target of interest, and the system is configured to identify the target, classify the target, determine shot impacts on the target, and score the impacts on the target.
  • the system is configured to prompt a shooter as to the shooting stage.
  • the system may be configured for utilization during a CMP high-power rifle competition, and the system may prompt a user that the present stage requires sending twenty shots downrange from an off-hand position during a 20-minute window.
  • the system is aware of how many shots to expect during a shooting stage (referred to as a string of fire), and may prompt a user with information associated with a current shooting stage, such as number of shots, timeframe, and shooting position.
  • a shooter may enter information associated with a shooting stage, such as, for example, a number of shots the system should expect, the firearm used, and the distance to the target, among others.
  • the system is manually started and stopped and only identifies shots on target during a time at which the system has been started.
  • the system utilizes a gantry type arrangement in which a plurality of recording devices may be used to capture audio and video of the participant and/or the target.
  • an imaging device can be pointed at the target, which may have a zoom lens, digital zoom feature or rely on an external lens, such as a camera mounted to a spotting scope for capturing images of a target located down range.
  • the system is configured to synchronize multiple sources, such as one or more video frames, and/or audio data from one or more audio capture devices.
  • the system is configured to synchronize multiple video frames and audio data from one or more audio/video capture devices.
  • the described system thus provides a comprehensive firearm training solution that allows shooters, even at busy ranges, to track body motion, identify errors in technique, correlate scores with the errors in technique, automatically score a target, and differentiate shots fired from the participant’s firearm from other firearms.
  • the system may utilize one or more machine learning models for synchronization, prediction, verification, and may further be trained to analyze scoring data and associate the scoring data with the motion data to determine correlations between specific motion data (e.g., behaviors) and scoring trends. As an example, the system may correlate that a shooter’s wrist pivots downwardly before shots that typically score outside and below the 10 ring and determine that the shooter is anticipating the firearm recoil before the shot.
  • the system may then provide feedback to the user with not only information related to the motion/score correlation but may also provide one or more exercises or drills to allow the user to recognize and address the behavior resulting in the reduced score.
  • the system may further be trained to distinguish the discharge of a firearm associated with the shooter of interest, even at a busy range with numerous shooters.
  • a similar process may be used with any motion data from any activity or sport, as described elsewhere herein.
  • the system may track body motion in two or three dimensions and from multiple angles.
  • the two- or three- dimensional body motion data may be correlated, synchronized, and analyzed to determine two- or three-dimensional motion data, which can be further correlated with a resulting score.
  • inventions of the described system are described in relation to a shooter firing a string of shots, it should be understood that the systems and methods described herein are applicable to capturing any type of body motion and applicable to other sports where body motion may lead to performance metrics.
  • embodiments of the systems described herein may be used to track, critique, and improve body motion such as basketball free throw shooting, golf swings, figure skating elements, archery, soccer, baseball swing, or any other sport or motion where the movements of a set of observable body landmarks can be recorded in time and there is some observed causal consequence of the movement.
  • the system may include one or more processors and one or more computer readable media that may store various modules, applications, programs, or other data.
  • the computer-readable media may include instructions that, when executed by the one or more processors, cause the processors to perform the operations described herein for the system.
  • the processor(s) may include a central processing unit (CPU), a graphical processing unit (GPU), both CPU and GPU, a microprocessor, a digital signal processor or other processing units or components known in the art.
  • the functionally described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs), etc.
  • FPGAs field-programmable gate arrays
  • ASICs application-specific integrated circuits
  • ASSPs application-specific standard products
  • SOCs system-on-a-chip systems
  • CPLDs complex programmable logic devices
  • each of the processor(s) may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • the one or more control systems, computer controller and remote control may include one or more cores.
  • Embodiments may be provided as a computer program product including a non- transitory machine-readable storage medium having stored thereon instructions (in compressed or uncompressed form) that may be used to program a computer (or other electronic device) to perform processes or methods described herein.
  • the computer-readable media may include volatile and/or nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer- readable instructions, data structures, program modules, or other data.
  • the machine-readable storage medium may include, but is not limited to, hard drives, floppy diskettes, optical disks, CD-ROMs, DVDs, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, flash memory, magnetic or optical cards, solid-state memory devices, or other types of media/machine-readable medium suitable for storing electronic instructions. Further, embodiments may also be provided as a computer program product including a transitory machine-readable signal (in compressed or uncompressed form). Examples of machine-readable signals, whether modulated using a carrier or not, include, but are not limited to, signals that a computer system or machine hosting or running a computer program can be configured to access, including signals downloaded through the Internet or other networks.
  • Conditional language such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain implementations could include, while other implementations do not include, certain features, elements, and/or operations. Thus, such conditional language generally is not intended to imply that features, elements, and/or operations are in any way required for one or more implementations or that one or more implementations necessarily include logic for deciding, with or without user input or prompting, whether these features, elements, and/or operations are included or are to be performed in any particular implementation.
  • illustrated data structures may store more or less information than is described, such as when other illustrated data structures instead lack or include such information respectively, or when the amount or types of information that is stored is altered.
  • the various methods and systems as illustrated in the figures and described herein represent example implementations. The methods and systems may be implemented in software, hardware, or a combination thereof in other implementations. Similarly, the order of any method may be changed and various elements may be added, reordered, combined, omitted, modified, etc., in other implementations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Image Analysis (AREA)

Abstract

A body movement improvement system is configured to receive video data, track one or more body landmarks associated with the movement to determine motion of the body landmarks during an action, correlate the motion of the body landmarks with a score of the action, and use machine learning techniques to determine the motion that is detrimental to the score of the action. The system can also recommend drills for the participant to improve the score of the action.

Description

SYSTEMS AND METHODS FOR MARKSMANSHIP IMPROVEMENT THROUGH MACHINE LEARNING CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/406,245, filed September 13, 2022, entitled “SYSTEMS AND METHODS FOR MARKSMANSHIP IMPROVEMENT THROUGH MACHINE LEARNING,” and U.S. Provisional Patent Application No.63/406,208, filed September 13, 2022, entitled “SYSTEMS AND METHODS FOR AUTOMATED TARGET IDENTIFICATION, CLASSIFICATION, AND SCORING,” and U.S. Provisional Patent Application No. 63/406,241, filed September 13, 2022, entitled “SYSTEMS AND METHODS FOR MARKSMANSHIP DIGITIZING AND ANALYZING,” the contents of which are incorporated herein by reference in their entirety. BACKGROUND [0002] Target shooting is enjoyed by millions of people a year and according to many reports, the number of people who routinely target shoot has increased over the past ten years and is continuing to increase. In the US alone, it is estimated that over 52 million people routinely shoot at targets. There are many different types of recreational shooting activities, from simple plinking with handguns or rifles at paper or steel targets, to skilled long-range rifle shooting matches that require a high degree of discipline and skill, to fun and fast-paced shooting of pistols at popup or stationary targets or shotgun shooting at skeet, trap, sporting clays and more. Apart from recreational shooting, there is a growing number of target shooters that practice as part of their occupation, such as law enforcement, military, and security personnel. [0003] While many of the participants are happy with their current skill level, there is a growing number of marksmen who desire to improve their skills. However, in some cases, many of the participants do not know how to improve. The growing number of people at shooting ranges continues to increase and these participants, whether there for sport, recreation, personal defense, or public defense, have a desire to improve their skills. However, without personalized shooting instruction, or oftentimes even with personalized instruction, it can be difficult to quantify improvements, and any issues that affect accuracy. Moreover, many participants do not know how to get better. [0004] There is thus a need for a system and methods that can analyze a participant’s habits and provide feedback and recommendations for improvement. There is a further need for a system that is capable of providing the aforementioned benefits using consumer-grade equipment, and in near-real time. These and other benefits will become readily apparent from the disclosure that follows. SUMMARY [0005] A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method for improving shooting performance. The method also includes receiving video data of a shooter; determining one or more body landmarks of the shooter, tracking the one or more body landmarks during a shot to generate shot motion data, determining a score of the shot, associating the shot motion data with the score, and generating recommendations for altering the motion data on a subsequent shot. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. [0006] Implementations may include one or more of the following features. The method where determining one or more body landmarks of the shooter may include generating a wire frame model by connecting the body landmarks. Associating the shot motion data with the score may include executing a classification and regression tree machine learning model to identify causal relationship between the shot motion data and the score. The method may include determining, through image analysis of the video data of the shooter, a grip of the shooter. The method may include analyzing the grip of the shooter and providing, on a display screen, grip recommendations to alter the grip. Determining the score of the shot may include: receiving target video data; performing image analysis on the received target video data; determining a hit on the target; and determining a score of the hit. Receiving the video data includes capturing video data by a mobile phone. Determining one or more body landmarks includes determining 17 body landmarks. Tracking the one or more body landmarks includes generating a bounding box around each of the one or more body landmarks. The method may include executing a machine learning model to correlate the shot motion data with the score. The machine learning model is configured to determine the motion data that results in an off-center target hit. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer- accessible medium. [0007] One general aspect includes a method for improving a causal consequence of body movement. The method also includes receiving video data of a body motion; determining one or more body landmarks viewable in the video data of the body motion; tracking the one or more body landmarks during an action; generating, based at least in part on the tracking the one or more body landmarks, motion data; determining a score associated with the motion data; associating the motion data with the score; and generating recommendations for altering the motion data on a subsequent action. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. [0008] Implementations may include one or more of the following features. The method where receiving the video data includes capturing the video data by a mobile computing device. The method may include executing a machine learning model to correlate the motion data with the score. The machine learning model is configured to determine the motion data that results in a reduced score. The method may include predicting, by the machine learning model, a predicted score based on the motion data. The method may include comparing the predicted score with the score. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium. BRIEF DESCRIPTION OF THE DRAWINGS [0009] The accompanying drawings are part of the disclosure and are incorporated into the present specification. The drawings illustrate examples of embodiments of the disclosure and, in conjunction with the description and claims, serve to explain, at least in part, various principles, features, or aspects of the disclosure. Certain embodiments of the disclosure are described more fully below with reference to the accompanying drawings. However, various aspects of the disclosure may be implemented in many different forms and should not be construed as being limited to the implementations set forth herein. Like numbers refer to like, but not necessarily the same or identical, elements throughout. [0010] FIG.1A illustrates an image of a shooter captured by an image capture device, in accordance with some embodiments. [0011] FIG.1B illustrates a wire frame model of the shooter from FIG.1A, in accordance with some embodiments. [0012] FIG.2 illustrates motion data associated with a plurality of body landmarks, in accordance with some embodiments. [0013] FIG.3 illustrates motion data associated with a plurality of body landmarks, and automatic shot detection based on the motion data, in accordance with some embodiments. [0014] FIG.4 illustrates analysis of motion data associated with a plurality of body landmarks and identifying shooting errors based on the motion data, in accordance with some embodiments. [0015] FIG.5 illustrates analysis of motion data associated with a plurality of body landmarks and identifying shooting errors based on the motion data, in accordance with some embodiments. [0016] FIG.6 illustrates analysis of motion data associated with a plurality of body landmarks, in accordance with some embodiments. [0017] FIG.7 illustrates analysis of motion data associated with a plurality of body landmarks and identifying a change in pose that least to improvements in performance, in accordance with some embodiments. [0018] FIG.8A illustrates an image of a shooter taken with an imaging device, in accordance with some embodiments. [0019] FIG.8B illustrates a computer-generated wire frame model of the shooter of FIG. 8A with identified body landmarks, in accordance with some embodiments. [0020] FIG.9 illustrates motion data associated with a plurality of body landmarks and identifying body motion leading to reduced performance, in accordance with some embodiments. [0021] FIGs.10A and 10B illustrate diagnostic targets that identify shooter behaviors based on shot patterns, in accordance with some embodiments. [0022] FIG.11 illustrates an image of a shooter’s hand and a computer-generated wire frame model associated with body landmarks that identify and analyze a shooter’s grip, in accordance with some embodiments. [0023] FIG.12A is an illustration of a computer software program user interface that captures video images, tracks motion of body landmarks, and analyzes a shooter’s pose and motion data, in accordance with some embodiments. [0024] FIG.12B is an illustration of a computer software program user interface that analyzes a shooter’s performance and automatically scores the performance, in accordance with some embodiments. [0025] FIG313A and 13B illustrate a system for marksmanship digitizing and analyzing, in accordance with some embodiments. [0026] FIG.13C illustrates a system for marksmanship digitizing and analyzing, in accordance with some embodiments. [0027] FIG.14 illustrates a system for marksmanship digitizing and analyzing, in accordance with some embodiments. [0028] FIG.15 illustrates the logic of a machine learning algorithm to quantify shot samples, in accordance with some embodiments. [0029] FIG.16 illustrates a sample decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments. [0030] FIG.17 illustrates an annotated decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments. [0031] FIG.18 illustrates a pruned decision tree machine learning model that correlates motion data of body landmarks with shot performance, in accordance with some embodiments. [0032] FIG.19A illustrates a system for capturing video data of a shooter and a target, in accordance with some embodiments. [0033] FIGs.19B and 19C illustrate image data of a left-hand view and a right-hand view of a shooter captured by imaging devices, in accordance with some embodiments. [0034] FIG.19D illustrates image data of an overhead view of a shooter captured by an imaging device, in accordance with some embodiments. [0035] FIG.19E illustrates image data of a target captured by an imaging device, in accordance with some embodiments. [0036] FIG.20 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance, in accordance with some embodiments. [0037] FIG.21 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance showing shot placement, and automated scoring, in accordance with some embodiments. [0038] FIG.22 illustrates a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance allowing a selection of body landmarks and showing the motion data associated with the selected body landmarks, in accordance with some embodiments. [0039] FIG.23 illustrates several of the key technologies enabled in embodiments described herein along with the features the key technologies facilitate. [0040] FIG.24 illustrates a system that uses multiple source collaboration and machine learning to detect shots fired, analyze shooting position and provide recommendations for improvement, and analyze grip position and provide recommendations for improvement, a sample user interface of a computer application configured to receive, analyze, and score marksmanship performance, in accordance with some embodiments. [0041] FIG.25 is a process flow for capturing motion data, analyzing the motion data, and generating recommendations for improvement, in accordance with some embodiments. [0042] FIG.26 is a process flow for correlating a firearm with an individual shooter, in accordance with some embodiments. [0043] FIG.27 illustrates a sample system for determining an acoustical signature, in accordance with some embodiments; [0044] FIG.28 illustrates a sample flow chart for pre-training a DNN, in accordance with some embodiments; [0045] FIG.29 illustrates a sample flow chart for deriving spectrogram weighting, in accordance with some embodiments; and [0046] FIG.30 illustrates a sample flow chart for classifying and determining an acoustic signature. [0047] FIG.31 illustrates a system configured for automatic scoring of a shooting target, in accordance with some embodiments. [0048] FIG.32 illustrates a sample process flow for classifying and scoring a target, in accordance with some embodiments; [0049] FIG.33 illustrates a sample process flow for identifying and classifying a target, in accordance with some embodiments; [0050] FIG.34 illustrates a sample process flow for registering a target and determining scoring hits, in accordance with some embodiments; [0051] FIGs.35A, 35B, and 35C illustrate a method for initializing a target scoring system to identify a target, in accordance with some embodiments; [0052] FIG.36 illustrates a sample process flow for detecting impacts on a target, in accordance with some embodiments; and [0053] FIG.37 illustrates a sample process flow for scoring impacts on a target, in accordance with some embodiments. [0054] FIG.38 illustrates a sample user interface for automated target scoring in a software application, in accordance with some embodiments. DETAILED DESCRIPTION [0055] Body Landmark and Pose Estimation [0056] According to some embodiments, a system is described that uses computer vision and machine learning to significantly reduce the manual and laborious nature of current training technologies, thus providing a continuously learning and adapting system. According to some embodiments, the system includes a machine vision and machine learning system that can track and estimate the pose of a participant, and determine positive and negative factors impacting the quality of the participation, such as pose, motion, anticipation, recoil, grip, stance, among other things. This may be performed, in large part, by a computer vision system that can track several body landmarks simultaneously, and in some cases, associate the motion of one or more body landmarks with marksmanship accuracy. As an example, a system may identify and track any number of body landmarks, such as 3, or 5, or 11, or 17, or 21, or 25, or 30, or more body landmarks. As the system tracks the position of the landmarks, which may be in two dimensions, or in three dimensions, the system can associate motion of the landmarks with rounds sent down range and scoring of individual rounds. The detection of rounds sent down range may be determined by motion one or more suitable markers, such as motion of the participants hand or wrist (or other body marker) in response to recoil, a sound of the firearm, a pressure wave associated with the muzzle blast, a target hit, or some other marker. [0057] The system may further monitor the participant from one, two, three, or more perspectives and analyze the movement of each body landmark, and may further monitor shot accuracy and correlate the body landmark motion with accuracy. Based on the accuracy, the system may further provide an analysis of the body motion that causes less than perfect accuracy and may further suggest ways to ameliorate the body motion to improve accuracy. [0058] The motion capture may be performed by one or more cameras aimed generally at the participant, and one or more cameras aimed at a target. In some cases, one or more of the cameras are associated with a mobile computing device, such as, for example, a smartphone, a tablet, a laptop, a digital personal assistant, and a wearable device (e.g., watch, glasses, body cam, smart hat, etc.). In some cases, a wearable device may include a sensor, such as an accelerometer, a vibration sensor, a motion sensor, or other sensor to provide motion data to the system. In some embodiments, the system tracks body marker position over time and generates motion plots. [0059] With reference to FIG.1A, one or more cameras may capture one or more views of a shooter 100. The camera may capture video data of the shooter as the shooter draws, takes aim, fires a shot, reloads, and/or adjusts position. A computer system may receive the video data, analyze the video data, and create a model associate with the shooter as in FIG 1B. In some cases, the computer system identifies body landmarks, and connects the body landmarks into a wire frame model 102 that tracks the pose and movements of the body landmarks of the shooter. In some instances, the body landmarks may include one or more of nose, left ear, left eye, left hip, left knee 103, right ear, right eye, right hip, left ankle, left elbow, left wrist, right knee 104, right ankle, right elbow 106, right wrist 108, left shoulder, and right shoulder 110. Of course, other body landmarks are possible, but for the sake of efficiency throughout this disclosure, we will focus on these seventeen body landmarks. In some embodiments, a single camera may capture two-dimensional motion data associated with one or more of the body landmarks. In some examples, two or more cameras may be used to capture three-dimensional motion data of one or more of the body landmarks. [0060] The body landmarks may be tracked over time, such as during a shooting string (e.g., a shooting session) and the motion of one or more of the body landmarks may be tracked during this time. In some cases, two-dimensional motion is tracked in x and y directions corresponding with side-to-side movement and vertical movement. In some cases, three-dimensional movement of the body landmarks is tracked in x, y, and z directions. [0061] With reference to FIG.2, a graph of selected body landmarks 200 is illustrated. The body landmarks can be user selectable by the user to focus on individual or combinations of body landmarks for review. For instance, the topmost line 202 represents the right wrist of a right-handed shooter in a horizontal direction, while the third line down 203 represents the right wrist vertical direction over time. As can be seen, the movement of the wrist moves up and down during the course of the motion capture. In some cases, the system may correlate the motion of the one or more body landmarks with events or stages, during the shooting string. [0062] For instance, during the first stage 204, the right wrist is relatively low, and the system may correlate this position and movement with getting ready to begin. The second stage 206 which shows the right wrist moving upwardly over a very short interval may be correlated with drawing a pistol from a holster. The third stage 208 shows the right wrist remaining largely stable in the vertical plane; however, there are sharp peaks to the movement, 207a-207c which may be correlated with shots fired from the pistol. [0063] During the fourth stage, 210, the right wrist moves downwardly and the returns to the firing position. The system may correlate this motion with a reloading operation. In some cases, the system is trained on training data, which may be supervised learning, to correlate similar motion with the various stages. [0064] During the fifth stage, 212, the shooter’s right wrist initially moves upwardly as the shooter takes aim, and then settles down onto the target, which is followed by peaks in the movement 213a-213c, which may be correlated with shots being fired down range. [0065] In the sixth stage, 214, the right wrist moves downwardly again to an initial position, which may be correlated with holstering the pistol. [0066] While the example focused on the shooter’s right wrist in a vertical direction, it should be apparent that any of the body landmarks can be viewed, analyzed, and the motion or combinations of motion can be correlated with events or actions by the shooter, including groups of body landmarks. [0067] With reference to FIG.3, a closeup view of the shooting stage 300 is depicted illustrating the sharp peaks in the right wrist movement 302 and right elbow movement 304 in the vertical direction. The system can analyze the motion data and automatically determine when a shot has been fired. The system can be configured to correlate the sharp peaks in vertical motion of the wrist and/or elbow with the shots fired. As shown in FIG.3, each of the arrows 306 may coincide with a shot being fired. In addition, audio data may be correlated with the motion data to provide additional cues as to when a shot is fired. In some cases, the audio data may be combined and/or synched with the motion data to provide additional details about a shot being fired. The motion data may be used to infer additional information regarding the shooter, his habits, his posture, and other cues that may be brought to the attention of the shooter in an effort to improve the shooter’s accuracy. [0068] With reference to FIG 4, motion data 400 is displayed for the right wrist 402 and right elbow 404 of a shooter. In some cases, the system may apply a trend curve 406 to the motion data that may represent normalized motion data. In addition, the system may make inferences and/or determinations based on the motion data 400. For example, as shown in FIG.4, once a shooter places the firearm on target and attempts to hold the firearm steady, at 408, the shooter will lower the wrist and elbow, at 410, immediately followed by a shot fired 412. They system may recognize this pattern and determine that the motion of lowering the wrist and/or elbow immediately before a shot is evidence of the shooter trying to anticipate the recoil of the firearm and attempting to brace against it. In many cases, anticipating the recoil dramatically reduces the accuracy of the shot since the shooter is moving the firearm off target in anticipation of the recoil that happens as the shot is fired. Similar examples of motion data that may reduce the shooter’s accuracy include flinch, pre-ignition push, trigger jerk, closing of the eyes, among others. [0069] In some cases, the system may provide information to the shooter regarding the recoil anticipation and provide information, which may include one or more drills or practice sessions, in an effort to improve the shooter’s motion data relating to recoil anticipation. For example, the system may identify a practice regimen that may include dry firing, skip loading (e.g., mixing live rounds in a magazine of dummy rounds), or other skill building drills. [0070] FIG.4 also shows that the shooter is experiencing drift in the posture. For instance, before the first shot 412, the shooters right wrist and right arm are position at a first vertical height 414, and before the second shot 416, the shooters right wrist and right elbow are positioned at a second vertical height 418 higher than the first vertical height. This data shows that the shooter did not return to the same position from the first shot to the second shot, and as a consequence, the shooter’s sight picture will be slightly different which may reduce the accuracy of sequential shots. [0071] With reference to FIG.5, drift is further demonstrated by the motion data 500. Motion data associated with a right-handed shooter’s right wrist 502 and right elbow 504 shows not only recoil anticipation where the motion data shows a lowering of the body parts right before a shot, but also shows a trend line 506 that evidences that the shooter’s wrist and elbow continually drift upwardly during a shooting string. The failure of the shooter to return to the same position in between shots can dramatically reduce the accuracy and precision of the shots fired within the string. [0072] In some cases, the system will recognize the drift in one or more of the body landmarks and may provide this information to the shooter. In some cases, the system will provide information on a display screen associated with a mobile computing device. For example, the system may be implemented on a mobile computing device associated with a shooter, and a display screen on the mobile computing device may provide information, instructions, or practice drills to the shooter to improve the drift and the accuracy issues resulting therefrom. [0073] Similarly, the system may correlate body landmark motion with other correctable defects in the shooter’s position or posture. For instance, with references to FIGs.6 and 7, which show motion data 600 of body landmarks, motion data 600 shows the motion of a plurality of body landmarks during a shooting string. The lowermost line graph depicts right ankle motion data 602 that evidences that the shooter changed a position of the right foot. Changing positions during a shooting string is likely to affect the sight picture, accuracy, precision, and other metrics associated with shooting. The system may determine that the change in foot position was either positive or negative in terms of shot scoring and may provide recommendations to the shooter based on this change in posture. The system may view the shooting accuracy and/or precision of the shots fired 604a – 604e both before and after the relocation of the foot and determine whether moving the foot had a positive or negative impact on the shooting performance. [0074] In some cases, the system will correlate a scoring of the target (e.g., shot accuracy and/or precision) with body landmarks and motion and can indicate to the shooter which positions of individual body landmarks affected their shooting performance, either for the better or for the worse. [0075] The system can be configured, through machine learning, to associate certain poses, motions, and combinations with shooting performance. In some cases, the body landmark motion and combinations of motions may be associated with improved shooting performance while others may be associated with decreased shooting performance. [0076] The system may normalize the motion data to generate normalized coordinates of the position of each body part during all the session. The score and its moving average may be represented by a signal, such as by displaying it in a user interface. The motion data and/or the score may be stored in a data file that can be analyzed in either near-real time, or saved for later analysis. [0077] The motion data may be associated with shot pattern data, such as the x and y coordinate of each shot and the shot coordinates may be associated in time with the motion data occurring at the time the shot was fired. Additionally, a score may be assigned to the shot and saved with the shot data. [0078] One or more machine learning approaches may be applied to the motion data and shot data to generate correlations between the motion and shot accuracy. For example, in some embodiments, convolutional deep neural networks (CNN) may be designed to location features in a collection of motion and shot data. Other deep-learning models that are oriented toward classification may also be used to correlate the motion data and shot data to identify patterns that lead to an increase in accuracy or a decrease in accuracy. Transformations that correlate arbitrary collections of attributes with other arbitrary collections of attributes might also be used. [0079] FIG 8A illustrates a sample camera angle of a shooter 800 and FIG 8B illustrates a resulting wire frame model 802 of the shooter that allows the system to track motion of the shooter’s body including the selected body landmarks. As can be seen, the system is able to determine a shooter’s pose, stance, and motion throughout a shooting string. The wire frame model may include representations of each major joint and body artifact which may include one or more of a shooter’s nose or chin 804, right shoulder 806, right elbow 808, right wrist 810, hips 812, left femur 814, left knee 816, left lower leg 818, left ankle 820, right femur 822, right knee 824, right lower leg 826, and right ankle 828, among others. In some cases, the system may be trained on motion data from a variety of shooters and historical performance of those shooters correlated with the motion data. In this case, the system can determine the impact that specific poses and/or motions have on the shooting performance. In some cases, the system may rely on motion data from a single shooter to aid that shooter in making adjustments and/or further training exercises to improve the shooter’s performance. [0080] FIG.9 illustrates x-axis motion data 900 associated with a shooters head 902 and nose 904. As can be seen, during a shooting string, after each shot 906a – 906d, the shooter’s head moved back, which may be correlated with a shooting performance. That is, the system may determine that the shooter’s head moved after each shot, such as to look at the target, and may additional have drifted and not returned to the exact same spot during the shooting string which caused the shooter to perform below a threshold value. [0081] In some cases, the system may be configured with logic that determines a likely cause and effect based upon either the shooter’s motions and/or the scoring of the target. With references to FIGs.10A and 10B, which show a diagnostic target for a left-handed shooter, and a right-handed shooter 1000, respectively. The right and left-handed targets are mirror images of each other, so only one target will be described hereinbelow. [0082] Assuming that a shooter is able to hold the pistol on target, motion associated with the shooter may be responsible for off-center hits. Where the off-center hits are regularly grouped on one side of the target, there are some likely issues that may be causing the off- center hits. For example, if a grouping of shots lands in the 12:00 position, the system may determine that the shooter is breaking his wrists up 1002 (e.g., riding the recoil), which often happens in anticipation of a recoil. [0083] If a grouping of shots lands in the 1:30 position 1004, this may be indicative of the grip error known as heeling in which the heel of the hand pushes the butt of the pistol to the left in anticipation of the shot which forces the muzzle to the right. [0084] If a grouping of shots lands in the 3:00 position 1006, which is indicative of the thumb of the shooting hand applying too much pressure and pushes the side of the pistol to the right which forces the muzzle to the right. [0085] Where a group of shots lands in the 4:30 position 1008, this may be indicative of a shooter tightening the grip while shooting (e.g., lobstering). As the trigger is pulled it causes the front sight to dip which pushes the shots low and to the right for a right-handed shooter. [0086] When a group of shots lands in the 6:00 position 1010, this may be indicative of breaking the wrist downwardly and pushing forward. This is often an unconscious effort to control the recoil and prevent the muzzle from lifting. [0087] Where a group of shots lands in the 7:00 position 1012, this may be indicative of jerking or slapping the trigger. This may be indicative of a shooter trying to pull the trigger the instant the sights align with the target. [0088] When a group of shots lands in the 8:00 position 1014, this may be indicative of the shooter tightening their grip during a shot. [0089] Where a group of shots lands in the 9:00 position 1016, this may indicate too little finger on the trigger. This typically causes the shooter to squeeze the trigger at an angle during the final rearward movement of the trigger which has a tendency to push the muzzle to the left. [0090] Where a group of shots lands in the 10:30 position 1018, this may indicate that the shooter is pushing in anticipation of the recoil with insufficient follow through. [0091] The system may be programmed to view target hits during a shooting string, and in combination with the body landmark motion, determine whether the shooter is guilty of recoil anticipation, trigger control errors, and/or grip errors. The system can make this determination for individual shooters and can recommend practice exercises and practice strings for addressing the specific shooting issues. [0092] In some cases, the system can access data on previous engagements (DOPE) associated with a shooter, which may include previous shooting sessions, records, scores, and analysis. [0093] Along with real-time scoring, complete shooting sessions can be recorded and stored so that users can review them along with determination from the machine learning system that identifies and/or highlights deficiencies, mistakes, and changes in the shooters posture or technique that improve or reduce the shooter’s performance. [0094] Synergies come from the synchronization of all the available sources of information as described herein. [0095] With continued reference to FIGS 10A and 10B in which a diagnostic target is illustrated, the basic conventional ML model may map the target shot pattern ϕ ∈ Φ into a shooter behavior β ∈ ^^, ^^^ Φ → B. The behavior may be a four-dimensional (4D) phenomenon that includes three-dimensional (3D) motion data plus time. The shot pattern Φ is a two-dimensional (2D) projection of an evolving 3-D phenomenon (2D plus time). This model may be founded on causal hypotheses. [0096] An expert may look at a much larger collection of complex shooter behavior ^^* ⊃ ^^ and map that into a larger collection of shot patterns Φ*, fXP : B* → Φ. The behavior
Figure imgf000017_0001
^^* may be a 4D phenomenon and the shot pattern ϕ ∈
Figure imgf000017_0002
a 2D phenomenon. One may speculate this is a causal model in principle because the expert conforms the model according to his or her understanding of a shooter’s physiological kinematics and psychology and to the physics of shooting. [0097] A relatively simple iterative model for expert leaning may include predicting the shot pattern from observed shooter behavior: fXP: B* → Φ. The model may then assess the difference between the predicted shot pattern and the actual shot pattern: δ(ϕ', ϕ). The model may then adapt the causally informed prediction model fXP based on the difference δ(ϕ', ϕ). This may be repeated for each n-shot drill. [0098] This type of learning model treats the shooter behavior β ∈ ^^* as a 3D phenomenon and the shot pattern ϕ ∈ Φ* as a 2D phenomenon. In some cases, the model is enhanced that treat B* as a 4D phenomenon, Φ* as a 3D phenomenon, or both. According to some embodiments, a pose analysis method represents the 4D shooter behavior B* by a 3D projection of B** (2D plus time). [0099] The model can objectively identify good shot patterns (e.g., tightly grouped around the bullseye) from the rest of the “not good” shot patterns, by executing a decision function ^^ ^^: Φ → ^^, where D is a binary variable. The model may also be configured to relate shooter behavior E* to how (why) shot patterns ϕ are “not good”. [0100] In some embodiments, the sampled time signals for shooter behavior may be reduced for a single drill as a single line in a dataset. A bounding box may be derived around the set of X-Y coordinates of each of the body landmarks in a drill. The bounding boxes may be denoted as the shooter behaviors B. Thus, there may be a shooter behavior for each one of the body landmarks during a time period. In some cases, the shooter behaviors B and shot patterns Φ are represented in polar coordinates as a radius and angle. The shooter behaviors B (which in some examples is 17) are 2D having an (x,y) coordinate, or an (r,θ) representation. The shot pattern ϕ ∈ Φ is a causal consequence of shooter behavior β ∈ ^^, the method can treat the shot patterns Φ as proxies for the unobservable third dimension of the shooter behaviors B. The method may treat the shooter behaviors β ∈ ^^ and shot patterns ϕ ∈ Φ as features and scores ^^ ∈ ^^ as the target for an ML model ^^ML: B x Φ → S. The model can be analyzed for patterns (β, ϕ, s) ∈ ΨL, Ψ ^^ resulting in low and high scores. Implicit or explicit clustering techniques might be used for this. In some cases, models ^^ML: B x Φ → S can be built for multiple drills for a specific shooter and for drills for multiple shooters. In some cases, given a particular drill (β, ϕ, ^^) and model ^^ML : B x Φ → S, the model quality can be assessed by comparing the actual score ^^ ∈ ^^ and predicted score ^^' ∈ ^^. Some models could be sequentially updated from these comparisons. [0102] In some embodiments, low scoring drills (β, ϕ, ^^) can be assessed against Ψ to determine which of the 17 components of the observable shooter behavior β and the shot pattern ϕ as the proxy for unobservable shooter behavior are likely causes for the low score. This may be easiest for explainable models. Unobservable behaviors may be described by the usual labels in diagnostic targets like those above. In some instances, a shooter diagnostic application can also differentiate low scores due to sighting misalignment, perhaps easiest to understand as shots tightly grouped around a centroid other than the target bullseye. In some cases, the system determines that a reduced score has been obtained and generates recommendations for improving the score. As used herein, the term “reduced score” is used to mean a score that is less than a target score. The target score may be a perfect score, a participant’s highest historical score, a participant’s average score, or some other metric. For example, in a shooting event, the highest score for a given shot is oftentimes a “10.” In this case, a reduced score is any score lower than a “10.” Similarly, in other sports, a score may be assigned that is less than a desired score, such as, for example, a penalty kick in soccer may be a binary outcome with a miss being a reduced score as compared to a goal being scored. In reference to a golf shot, a golfer may, on average, hit their drive 275 yards. A reduced score may result from a drive carrying 250 yards, which is below the golfer’s average drive distance, and the system may observe behavior and determine which behavior(s) resulted in the reduced score. [0103] According to some embodiments, the system relies on artificial intelligence (AI) and/or machine learning (ML) for two determinations: detecting shooter movements and shot placement and analyzing shooter behaviors for how they result in shot placement. Having described detecting movements, there are multiple ways of analyzing how shooter behaviors result in shot placement. Firstly, an expert-based approach utilizes comparing an individual shooter’s behavior for each drill to an expert’s assessment of what should result in a good shot. Secondly, a data-based approach is conducted by building models from repetitions of the shooting behavior (“drill”) by a single shooter or multiple shooters. This may be considered an AI/ML-based discovery strategy, that determines what behaviors are correlated with good shots. [0104] In some cases, AI/ML may be used to automate and enhance expert-based analysis in the sense that if prototypical "ideal" behaviors are known a priori, models for these a priori known behaviors could be fitted to the data for a single shooter or multiple shooters. [0105] From that perspective, a transformer ML model may be mimic automated and enhanced expert-based analyses by combining a pre-trained data-based model for inferred relationships between language fragments (analogous to inferring relationships between shooter behaviors and shot placement) with additional stages of data-based adaption. [0106] With reference to FIG 11, the system determines key points for a shooter’s hands 1100. Key points may coincide with each moveable joint of the wrist, hand, and fingers and their respective locations and position relative to each other. The system may connect the key points into a wire frame model 1102 of the shooter’s hand, which allows precise monitoring of pose and motion. FIG.11 illustrates a plurality of key points associated with a shooter’s hand which may be used to determine the grip that the shooter is using. For example, by referring to the key points of a shooter’s hand, the system may determine that the shooter is using a thumbs forward, thumb over, cup and saucer, wrist grip, trigger guard gamer, or another style grip. Different grips and pressure applied by the hands can impart motion to the pistol and the system may determine that a different, or modified, grip would result in better performance. [0107] The system receives signals associated with the position and motion of the shooter and processes the signals to find mistakes in shooter stance and grip. In addition, by combining single frame analysis (e.g., finding insights from the relative positions between different body parts at a specific moment in time, which an analysis across time), it is possible to identify mistakes due to changes in the shooter’s position. [0108] In addition, the system can focus on the position of the hands on the pistol. A hand tracking machine learning model can be used to track the position of each bone in each finger in a video frame. The system will receive signals associated with the position of each finger across time. [0109] With reference to FIGs 12A and 12B, an application program is shown that may be executed on a mobile computing device and receive image data from an imaging sensors associated with the mobile computing device. As with any of the embodiments described herein, the methods and processes can be programmed into an application program, or a set of instructions, that can be executed on a computing device. In some cases, the application program can be executed on a mobile computing device and the audio/video capture, analysis, scoring, recommendations, training exercises, and other feedback can be performed with the mobile computing device. In some cases, the system receives video and/or audio from multiple video capture devices capturing different views of the same shooter. The system may use the multiple, different views of the shooter in the analysis and feedback to the shooter for improving performance. [0110] As shown in FIG.12A, the system, running as an application on a mobile computing device 1200, captures video frames of a shooter 1202 , establishes body landmarks to track over time, creates a wire frame model of the shooter 1203, tracks shots fired 1204 and provides feedback to the shooter 1202. As illustrated, the system may identify the stance of the shooter, “Weaver” stance in the illustrated example, tracks the number of shots 1204, and provides feedback 1206 on each shot. For instance, a shot taken at 7 seconds after beginning the string shows a hit at center of mass. The system identifies a reloading operation 1208 at 8 seconds that lasts 1.2 seconds, followed by a shot at 11 seconds showing that the shooter anticipated the recoil, and the shot was off center. In some cases, this functionality of the system may be viewed as a drill instructor that tracks the shots, provides feedback, and offers suggestions for improving posture, grip, stance, trigger pull, and the like, to improve shooter performance. [0111] FIG.12B illustrates an additional screen 1210 that may be displayed by the system in a Spotter modality in which the mobile computing device may have its video capture device pointed at the target. In some cases, the mobile computing device may use an internal or external lens to get a better view of the target and the mobile computing device may be coupled to a spotting scope or other type of optical or electronic telephoto zoom lens in order to get a better image of the target. [0112] The system may be toggled between the different modes, such as Spotter, Drill Instructor, DOPE, Locker (in which information about the different firearms owned by the shooter may be stored), and program settings. The Spotter mode may show an additional screen 1210 with a view of the target upon which target hits may be visible. The system may also display other information, such as the firearm 1212a and ammunition 1212b combination being fired, the distance to the target 1214, the type of target 1216, the time of the shooting string 1218, the number of shots fired, the number of target hits 1220, and the score 1222, among other things. It may also show an image of the target 1224 and may further highlight the hits 1226 on the image of the target 1224. It should be appreciated that the Spotter mode may show the information, including the time elapsed, hits, score, and number of shots in real time. In addition, the system may also store the data associated with a shooting string for later playback, review, and analysis. For instance, the Drill Instructor mode may review the DOPE associated with a shooter and firearm and ammunition combination and provide a detailed analysis of mistakes the shooter makes with the particular firearm and offer exercise or suggestions to ameliorate the mistakes in order to improve shooter performance. [0113] With reference to FIGS 13A, 13B, and 13C, an example architecture for data capture 1300 is illustrated. According to some embodiments, a shooting range 1301 may be outfitted with one or more sensors, such as image sensors, projectors, and computing devices. FIG 13A illustrates a shooting range 1301 looking down range from the perspective of the shooter 1302. FIG 13B illustrates a shooting range 1301 from a side view, and FIG 13C illustrates a shooting range 1301 from a top plan view. [0114] One or more cameras 1304 may be mounted within the shooting range to capture video of the shooter which may be from multiple angles. The cameras 1304 may be any suitable type of video capture device, including without limitation, CCD cameras, thermal cameras, dual thermal cameras 1305 (e.g., a capture device having both thermal and optical capture capabilities), among others. The cameras 1304 may be mounted at any suitable location within the range, such as, for example, to the sides of the shooter 1304a, 1304b, facing the front of the shooter 1304c, overhead 1304d, among others. [0115] In some cases, the cameras 1304 may be mounted to a gantry system that provides a portable structure for supporting the one or cameras 1304 to capture multiple angles of the shooter. [0116] In some embodiments, a projector 1306 may be provided to project an image onto a screen 1308. In some cases, the projector may project any target image onto the screen or onto a shooting wall, and a shooter can practice dry firing (e.g., without firing live ammunition) and the system can register hits on the projected image. For example, an image may show a target on the screen and a shooter may dry fire at the target. The system can register the point of aim at the time of trigger pull and display the hit onto the projected target image. There are compatible devices that fire a laser through the barrel of the firearm when the trigger is pulled to indicate where a shot would have impacted the target. The thermal imaging camera 1304 may detect the laser hit location and the system may be configured to display the hit on the projected target and the system may further register and score the hit. In this way, shooters can practice using a shooting simulator using their own firearm without the need to travel to a dedicated shooting range. The system may recommend dry firing practice to ameliorate bad shooting habits. [0117] FIG.14 illustrates the system configured with multiple source synchronization. In some embodiments, a shooter 1402 is captured with multiple video capture devices 1404 from different angles. The system obtains features from the shooter 1406, such as stance, pose, and motion. The system may analyze the captured audio signal 1408 to complement video shot detection. That is, the system may analyze both video and audio to detect a shot fired by the shooter, such as by correlating a spike in an audio waveform with a sudden muzzle raise of the firearm. [0118] The system may be configured to detect shooting stages 1410, such as a first shooting string, reloading, second shooting string, including shots fired detection. The system may additional identify and determine shooting errors 1412 and identify the errors as well as drills and exercises to address the shooting errors. The error detection may be iterated and further error corrections 1414 may be proposed. The system may also incorporate shot detection 1416, as described herein, including correlation of shot detection with audio data. [0119] The system may further capture images 1418, such as video images, of the target and register hits on the target, which may be correlated with motion data before and during the trigger pull. As described herein, the system may determine the shot location 1420, and ultimately determine a score 1422 for one or more of the shots fired. Some embodiments of the system thus provide an intelligent tutoring/adaptive training solution that turns the time- intensive and non-scalable live training environment into an automated and adaptive virtual scenario-based training solution. [0120] According to some embodiments, the model is based on instructions that are executed by one or more processors that cause the processors to perform various acts. For example, the method may include collecting (as many as possible) pose analysis files, which may be stored as a comma separated value (CSV) file. [0121] FIG.15 illustrates a shot bounding box 1500 being determined by the system. A "drillparser.py" script may be configured to combine the pose analysis files. The system may then find the shots in the CSV files, such as for each of the 17 pose landmarks and the shots derive an oriented bounded box 1500 around the x-y coordinates for the 1.0 second time window up-to and including the shot, then used the angle ϕ 1502 and dimensions l 1504 and w 1506 of the bounding box 1500 as factors in a decision tree models for the score. The "allshots" approach processes data on a CSV file basis, so a bounding box 1500 is drawn around all the shots, as one row in the resulting dataset. The "oneshot" and "oneshotd" approach may handle each shot individually, so the shot bounding box 1500 may be an infinitesimal box around single shot so each row in the dataset is one shot. The "oneshotd" approach additionally orients the direction of bounding boxes around the 17 pose landmarks in time order while the "oneshot" approach ignores time. [0122] The system thus is able to derive an oriented bounded box around each shot which may be oriented the direction of the box around the body landmarks. [0123] 3) These datasets are handled in BigML as: [0124] 1. Uploaded to BigML as Source objects [0125] 2. Source objects are converted to Dataset objects [0126] 3. Dataset objects are split into Training (80%)/Test (20%) datasets. [0127] 4. Tree Models are built using selected factors from the Training datasets [0128] 5. BatchPredictions are created using the appropriate Model and Test Datasets [0129] 6. Models are download as JSON PML objects for the next steps [0130] 4) The “modelannotator.py” script annotates the model PML files as needed for the explanation operation. In some cases, this annotation consists solely of adding a “targets” attribute to each node that is a list of all of the target (score) values reachable in the tree from that node. [0131] 5) The “itemexplainer.py” script uses a (virtual) pruned annotated model to “explain” which pose elements for a new drill result in prediction of an unsatisfactory score. [0132] 6) Of course, instances for which the predicted score differs significantly from the actual score can be used to adjust the model by repeating the model training through machine learning. [0133] Factors derived from shot samples may include one or more of the following factors: shot_phi with a bounding box rotation (-π ≤ ϕ ≤ π); shot_l – bounding box length; shot_w – bounding box width; shot_lw – bounding box ara; shot theta – bounding box center rotation (-π ≤ ϕ ≤ π); shot_r – bounding box center radius. The bounding box 1500 is drawn around the shot samples. [0134] A very similar approach can be used to derive factors from body landmark samples. As the body landmarks move over time, each body landmark can likewise be used to create a bounding box with length, width, area, and rotation and correlated to the shot factors. [0135] In some cases, the system is configured to use one or more cameras for machine vision of a shooter, determine one or more body landmarks (in some cases up to 17 or more body landmarks) and track the movement of each of these landmarks over time during a shooting drill. The system can correlate the time-bound body landmark movement with a scored shot and determine if the shot is good or bad. A good shot may be relative to the DOPE for the shooter, firearm, and ammunition combination. For example, where a single shot is closer to the aiming point than an average shot for the shooter, this may be categorized as a good shot. Furthermore, by looking at several shots over time, the system can correlate shooter behaviors with good or bad shots. Furthermore, by analyzing the shooter behaviors (e.g., body landmark motion), the system can predict whether a shot is a good shot or a bad shot without even seeing the target results. For example, a good shot may be considered a shot within the 9 or 10 ring of a target, while a bad shot may be any shot outside the 9 ring. Depending on the expertise of the marksman, the definition of good shot and bad shot may be varied. As an example, for a very skilled shooter, anything outside the 10 ring may be considered a bad shot. [0136] Finally, by correlating the shooter behavior with the target accuracy, the system is able to provide an output to the shooter on the behavior that causes reduced accuracy. Moreover, the system my recommend one or more specific drills to the shooter to address the behavior that causes reduced accuracy. [0137] FIG.16 illustrates some embodiments of a decision tree model 1600 for correlating body landmark factors with shot samples. A decision tree algorithm is a machine learning algorithm that uses a decision tree to make predictions. It follows a tree-like model of decisions and their possible consequences. In some cases, the algorithm works by recursively splitting the data into subsets based on the most significant feature at each node of the tree. [0138] From the body landmark data, various features are extracted to represent the kinematic properties of body movements. These features encompass body landmark positions over time, a bounding box describing the movement of the body landmark during a time prior to and including the shot fired, relative distances between landmarks, and temporal derivatives, which capture dynamic aspects of motion. To facilitate model convergence, feature scaling is carried out. [0139] The decision tree, depicted in FIG 16, serves as the core of the model in some embodiments. It is selected for its ability to elucidate complex, non-linear relationships within the data while maintaining interpretability. The tree's depth is optimized to minimize overfitting through techniques like cross-validation or a predetermined maximum depth. The choice of splitting criteria, whether based on Gini impurity or information gain, depends on the specific problem. Pruning methods, such as enforcing a minimum number of samples per leaf, are applied to control excessive branching. [0140] Training of the model may involve the recursive construction of the decision tree. The tree has nodes that correlate with the body landmarks. For example, a left_ankle_phi node 1602 may branch into a right_wrist_lw node 1604 and a right_knee_phi node 1606 and these branches can show how the bounding box rotation of the right ankle motion data affects the resulting shot placement in combination with the bounding box area of the right wrist or a rotation of the bounding box associated with the right knee. Similarly, the decision tree can correlate a shot placement with a combination of features. At each node, the training data is partitioned based on feature values, optimizing the chosen loss function, such as mean squared error or cross-entropy. Hyperparameters can be fine-tuned via cross-validation, and the model's performance is assessed using various metrics. [0141] During real-time operation, the decision tree model continuously updates as new body landmark data becomes available for each shot fired. For each input feature vector derived from the live landmark data, the model traverses the decision tree, reaching a leaf node. The label assigned to the leaf node (indicating a positive or negative outcome) is utilized as the model's prediction. In the illustrated example, a right shoulder bounding box length between the values of 0.80 and 0.82 is consistent with a predicted result of the shooter’s performance, shown at leaf nodes 1608 and 1610. [0142] In the context of this model, positive outcomes signify successful execution of specific movements or achievement of desired shot placement. Conversely, negative outcomes denote incorrect movements, deviations from desired postures, or off-center shot placement. [0143] The model's performance is rigorously assessed through metrics including but not limited to accuracy, precision, recall, F1-score, and area under the receiver operating characteristic (ROC) curve. Confusion matrices are employed to quantify the model's proficiency in classifying positive and negative outcomes. [0144] FIG.17 illustrates an annotated model 1700 showing various values at several of the nodes. For example, during a shooting string, the area of the bounding box for the right wrist 1702 indicates values of 0.47, 0.51, 0.57, 0.59, and 0.61. This indicates that during the shooting string in which shots were being fired, the shooter moved her right wrist within an area defined by the displayed values. Following the branch below the right wrist 1702, for values that were below the average of the values, the system identifies that the left knee 1704 moved in a certain pattern, which can be correlated with either good shots or bad shots to look for a causality between the movement of the right wrist in combination with movement of the left knee. Other features can be similarly annotated within the model. The annotated features can be stored in a feature vector and corelated with each shot fired for analysis on how the different features combine to result in a particular shot performance. [0145] FIG.18 illustrates a pruned model 1800 which allows a deeper dive into various features and their interrelation with one another. For example, when reviewing the feature of the right_knee_phi 1802 (e.g., the rotation of the bounding box associated with movement of the right knee during a shooting string), we can see values of 0.8, 0.82, and 0.9. Given these values, we can see that the right_knee_phi value of 0.9 provides one result at a first result node 1806. Where the right_knee_phi 1802 is 0.8 or 0.82 and the right_shoulder_l bounding box length 1804 is 0.09, we see second results at a second result node 1808 and a third result node 1810. In some cases, the second result node 1808 is associate with a bad shooting performance while the third node 1810 is associated with a good shooting performance. These non-linear causal relationships can be determined by the machine learning model, such that the system, by executing one or more machine learning algorithms, can determine which motions lead to better shooting results on poorer shooting results. [0146] In some cases, isolation forests encode a dataset of trees such that leaf instances for each leaf are identical. Node splits may be chosen randomly rather than to reduce impurity of the child target values. In some cases, nodes are leaves when the training instances at the node have the same target value. In some instances, rare instances in the training dataset reaches a leaf earlier than less rare instances. According to some embodiments, new anomalous instances have short paths relative to tree depth for all trees in forest, which makes identifying anomalies more efficient. [0147] In some cases, a classification and regression tree (CART) is a predictive model which explains how an outcome variable can be predicted based on other values. A CART- style decision tree is one where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable. It builds a binary tree to partition the feature space into segments that are homogeneous with respect to the target variable and is a recursive algorithm that makes binary splits on the input features based on specific criteria, creating a tree-like structure. CART-style decision trees may be useful in predicting an outcome of the shot based upon the motion data of one or more body landmarks. CART- style decision trees are multi-category classifiers (e.g., N>1), while isolation trees are single category classifiers (e.g., “anomalous or not”). In some cases, the CART-style decision trees can be reduced to isolation trees by designating M < N categories as “not-anomalous,” N-M as “anomalous” and pruning the branches that go to leaves with the N-M “anomalous” to their root node. This may be analogous to training an isolation tree on just the M<N “not- anomalous” instances, which allows impurity reducing splits and impure “not-anomalous” leaves. [0148] FIG.19A illustrates a gantry 1900 system that may provide a portable mounting structure for accommodating one or more imaging devices, including one or more video cameras. In some cases, the structure includes one or more upright posts 1902 and one or more cross bars 1904. The gantry structure 1900 may be placed around the shooter, and in some cases, down-range of the shooter. For instance, the gantry 1900 may position the cross bar 1904 at a location that is about one foot (≈ 3m) to 8 feet (≈ 2.4m) above the shooter and between one foot (≈ 3m) and 15 feet (≈ 4.5m) in front of the shooter. In some examples, cameras are positioned on each upright and on the cross bar. Therefore, in some embodiments, two, three, or more cameras are positioned on the gantry with some of the cameras aimed at the shooter and one or more cameras may additionally be aimed down range at the target. [0149] FIGS 19B, 19C, and 19D illustrate various views captured by the cameras mounted to the gantry 1900. A first camera 1908 may be mounted to the cross bar 1904 or the upright 1902 and is positioned to capture a left-side view (FIG.19B) of the shooter. The camera may be configured to capture the entire shooter’s body, or may be configured to capture the shooter’s upper body and head. [0150] A second camera 1910 may be mounted on the cross bar 1904 or the upright 1902 and configured to capture a right-side view (FIG.19C) of the shooter. The camera may be configured to capture the entire shooter’s body, or may be configured to capture the shooter’s upper body and head. In some cases, the first camera 1908 and the second camera 1910 utilize different fields of view, such that one of the camera captures the entire body of the shooter, while the other one of the cameras only captures a portion of the shooter’s body. [0151] A third camera 1912 may be positioned on the cross bar 1904 and configured to capture an overhead view 1914 (FIG.19D) of the shooter. By positioning camera to capture the shooter’s motion from various angles, the system can correlate the video data from each camera and determine 3-dimensinoal motion data of the selected body landmarks. [0152] A fourth camera may be located to capture video data of the target 1916 (FIG. 19E) and the target impacts 1918. The video data from each of the cameras can be synchronized and analyzed to determine when a shot is fired and to correlate a shot being fired with a registered hit or a miss on the target. [0153] FIG.20 illustrates a computer program user interface 2000 that can be used with the system and methods described herein. For instance, a computer program may be configured to receive the video data from the one or more cameras, synchronize the video data, determine when shots are fired, and register and score hits or misses on the target. The user interface 2000 may include a start recording button 2002 that allows the shooter to start the video capture. In some cases, a shooting string may be times, and the start recoding button may additionally start a timer. The user interface may further include a timer 2004 associated with the session. The user interface may be presented on a mobile computing device associated with the user, or on a mobile computing device associated with a facility. For example, a gantry system may be set up at a facility and a computing device associated with the facility may be connected to the gantry system and configured to receive the video data and provide the performance feedback of the embodiments described herein. [0154] The user interface may be provided on any suitable display, such as a television, a touch-screen display, a tablet screen, a smart phone screen, or any other visual computer interface. With additional reference to FIG.21, the user interface 2000 may display indicia associated with a shooting string, such as, for example, a video of the shooting string and motion during the shooting string 2102. The video may be displayed in a playback window and offers controls 2104 for playing, pausing, adjusting volume, and scrubbing through the video. In addition, there may be controls for selecting different views 2106, that is, controls allow the viewer to select video clips captured by different cameras during the shooting string, and may allow the view to watch different views of the shooting string, either individually or in a combined view, such as a side by side view. The views may be synchronized so the viewer can see different views of the same event at the same time. [0155] The user interface 2000 may additionally show the target 2110 and may identify hits 2112 that the system registered on the target. The user interface 2000 may additionally display a score 2114 of the most recent shot along with an average score 2116 for the string. Of course, other information may be displayed, as desired by the user, which may include the training tips, the motion data that lead to an off-center shot, or other mistake during the shooting string. [0156] FIG 22 illustrates additional view of the user interface 2000 in which a user can specify a body landmark selection 2202. In addition, the user interface 2000 provides a selection for signal setting 2204, which allows a user to specify details of Y coordinate motion or X coordinate motion. The user interface 2000, in response to the user selection, may display the motion data 2206 associated with the selection. This type of review and analysis allows a shooter to very specifically view the motion of individual body landmarks during a shooting string, and in addition, can specifically view horizontal movement, vertical movement, or both for review. [0157] FIG.23 illustrates some of the features and technologies employed by embodiments of the described systems and methods. In many embodiments, the disclosed system utilizes machine vision (e.g., computer vision) 2302 to track body landmarks of a participant (e.g., shooter), and also tracks changes to a target to provide automated target scoring 2304. Systems and methods may also utilize audio processing 2306 to enable shot time detection 2308, which may also be combined with computer vision techniques. The disclosed systems and methods may also utilize signal processing 2310 to provide for shooter’s position correction 2312, including pose, posture, motion, grip, trigger pull, and others. Systems and methods described herein also apply machine learning 2314 in order to determine causality for off-center shots, which may include analyses on the shooter’s pose, grip 2316, trigger pull, stance, body landmark motion, among others. [0158] FIG.24 illustrates a method 2400 according to embodiments described herein. The system may receive video data and optionally audio data. The system may be configured to detect shots 2402 and score through image processing on video data of a target. The system may process video data to determine shots fired and pose analysis of a shooter 2404. In addition, the system may analyze audio data for shot detection 2406. [0159] The shot detection 2402 provides coordinates (e.g., x,y coordinates) of shots within the target 2408. The pose analysis of the shooter 2404 provides coordinates (e.g., x,y coordinates) of body landmarks during the shooting process 2410. The audio shot detection 2406 provides an exact time of a shot 2412. The shooting analysis may include one or more machine learning algorithms that receive the shot and body data, and through machine learning algorithms, detects and predicts cause and effect for the shot performance of the shooter with the firearm and ammunition combination, which may be referred to as conducting a shoot analysis 2414. For example, embodiments of the system may generate one or more of a session score 2416, shooting position recommendations 2418, shooting mistakes, and grip analysis 2420, among others. [0160] FIG.25 illustrates a sample process 2500 flow for using machine vision and machine learning to track body landmarks of a participant and generate recommendations for improvement. As discussed herein, the systems and methods described herein can be used for any event, such as a sporting event, that benefits from repeatability and accurate body kinematics. Some such events include, in addition to shooting, archery, golf, bowling, darts, running, swimming, pole vaulting, football, baseball, basketball, hockey, and many other types of sports. In any event, the system is configured to track body landmarks of a participant and determine ways to alter the body motion to improve performance. [0161] At block 2502, the system receives video data of a participant. This may come from a single image capture device, or two image capture devices, or three or more image capture devices. An image capture device may be any suitable imaging device that is configure to capture sequential images of a participant, and may include any consumer-grade or professional grade video camera, including cameras regularly incorporated into mobile computing devices. [0162] At block 2504, the system determines one or more body landmarks of the participant. The body landmarks may be associated with any joint, body part, limb, or a location associated with a joint, limb, or body part. In some cases, the system generates a wireframe based on the one or more body landmarks, and may use less than all of the body landmarks in generating the wireframe model. [0163] At block 2506, the body landmarks are tracked during performance of an activity to generate motion data. As a non-limiting example, body landmarks may be created for the hands, wrists, arms, head, shoulders, torso, waist, knees, ankles, and feet of a golfer which may be tracked during a swing of a golf club. [0164] At block 2508, a score is determined and associated with the performance. As described, in an activity involving a projectile, the score may be associated with the path or destination of the projectile. In golf, for example, the score may be based on the distance, direction, nearness to a target, or a metric associated with an average or a past performance of the participant. In short, any metric may be used to evaluate the quality of the outcome of the performance. [0165] At block 2510, the score is associated with the performance. That is, a connection is made between the performance and the determined score, which may be stored for later analysis and to determine trends in performance over time, or to compare one performer with another. [0166] At block 2512, the system generates recommendations for altering the motion on a subsequent performance to improve the outcome. In some cases, the recommendation may involve the hands, including a grip on a firearm, golf club, bat, stick, and the like. The recommendations may also include a change is weight distribution or transfer. The recommendations may include the motion of the hands, head, shoulders, body, legs, feet, or other body part. In many cases, the recommendations include suggestions for altering the motion of the one or more body landmarks in an effort to improve the score of the performance in a subsequent try. [0167] Acoustical Firearm Signature [0168] According to some embodiments, a system is described that can quickly compute a signature for the acoustic sound of a shooter’s firearm, firing specific ammunition, in a given environment, using consumer-grade audio recording equipment (e.g., an iPhone, a tablet, a telephone, a video camera). In some cases, the system can receive an input audio or audio/visual file and compute the signature of the firearm based on the sound in the captured recording. The recording may be captured by any suitable audio and/or video capture device, such as, without limitation, security cameras, traffic cameras, video cameras, television cameras, mobile device recorders such as smart phones or tablets, as well as other capture devices. In some cases, the capture devices are readily available consumer-grade recording devices. The acoustic signature may further include determining the make, model, silencer, ammo type, ammo manufacturer, and other characteristics of the firearm blast bast upon the acoustical signature from a discharged firearm. [0169] This capability could be used to detect and separate the shooter’s shots from those of other shooters on a typical shooting range, which may be useful, such as for automatic scoring. A version of this capability might be used to identify specific firearms and ammunition from sound recordings. [0170] In some cases, the described solutions operate in near real-time on a single, consumer recording device such as a mobile phone using only a modest amount of training data. As used herein, the terms “real-time” or “near real time” are broad terms and in the context of this disclosure, relate to receiving input data, processing the input data, and outputting the results of the data analysis with little to no perceived latency by a human. In other words, a system as described herein that outputs analyzed data within less than one second is considered near-real time. A system that operates in real-time or near-real time may limit the amount of computation for machine-learning, or at least training models, that the method can use to characterize a particular firearm firing appropriate ammunition. In addition, there are methods which don’t use ML techniques, by which we mean they don’t adapt (“train”) a model on a large number of samples. Some of these other methods may work well but may also be more computationally intensive or shift the computational load to shot-identification time, such as, for example, searching a large dictionary of signatures for a “match.” [0171] Prior approaches to utilizing artificial intelligence (AI) to analyze the sound of guns present a straightforward approach to using deep neural networks (DNNs) for classifying shot sounds by firearm and ammunition type. In prior approaches, the DNN must be pre-trained on a large number of shot sounds of each firearm and appropriate ammunition type. These sounds may not have been captured on the same recording device nor in the same shooting conditions. That potentially increases the generalization capability of the trained DNN instance for categorizing firearm and ammunition type regardless of environment and recording device. However, it decreases its capability to identify the shot sound of a specific firearm and ammunition type in an arbitrary environment using an arbitrary recording device. [0172] Prior approaches describe a two-step approach to classification. For example, some prior approaches may use a pre-trained instance of a relatively generic pretrained DNN instance as an approximate classifier (predictor) and use an additional trainable step on the classifier output to improve the DNN prediction. In some cases, the DNN treats a finite length time segment of the time-varying spectrogram of the shot sound as an image and differentiates the pooled image for each firearm and ammunition pair from the pooled images of the other firearm and ammunition pairs. This approach has several drawbacks, including the use of a generic DNN instance that is not particularly adept at classifying sounds from different environments, or different ammunition. [0173] According to some embodiments, the described system overlays a trainable weighting mask on an image created from a time segment of the time-varying spectrogram and a straightforward training method for adjusting the mask using just a few instances of shot sounds for the shooter’s firearm and ammunition as recorded in a particular environment with a particular device. Training only the weighting mask is significantly less computationally intensive than training the DNN itself. Consequently, according to some embodiments, the DNN may not be trained, or may be trained to a much lesser degree than prior methods, and the weighting mask receives the training. This approach has several benefits. [0174] For example, in abstract terms, the adjusted mask can be viewed as the signature for the combination of firearm, ammunition, environment and recording device. This signature is not used to search a catalog of signatures for that combination of factors, but to condition the spectrograms input to enhance the classification performance of the DNN for the firearm and ammunition in a particular environment using the given recording device. The enhanced DNN classification can then be used to improve detection and differentiation of shooter’s shots from those of other shooters. [0175] This approach has shown significant advantages in the intensity of the required computations, results in a much faster analysis, and can be used for numerous firearms in many different environments. For instance, because the training data set for a specific shooter is small, the weighting mask can be updated either in online fashion using stochastic gradient descent or in offline fashion using gradient descent. In other words, the training data may be for a specific shooter using a specific firearm and ammunition combination in a given environment, which results in a much smaller data set than if data points were agglomerated from numerous shooters in different environments. [0176] In addition, both algorithms may estimate the gradient of the function computed by the DNN in each iteration of the weighting matrix update. While this is not computationally prohibitive, it may turn out that just the sign of the gradient or a roughly quantized version is sufficient. That simplification depends in most part on whether the DNN computes a monotonic increasing or decreasing function in each input. A significant body of literature exists investigating the properties of functions computed by DNNs. However, it’s not easy to find a short answer to this question. In some instances, the DNN does if all of the internal layers use a linear or affine activation function and the output layer uses a monotonic non- decreasing activation function. [0177] Accordingly, in some embodiments, the disclosed system is capable of quickly fingerprinting a firearm and ammunition pair in a particular environment using a specific recording device. In other words, the system and methods described herein can very quickly determine an acoustical signature of a firearm and ammunition combination in an environment. This allows the system to differentiate the analyzed acoustical signature from other firearm and ammunition combinations. This may be especially useful in a crowded shooting range where it is important to differentiate one shooter from another, such as, for example, in combination with a system that performs automatic target scoring. By being able to differentiate firearm and ammunition pairs, a scoring system will have a reduced number of false positives and missed shots because the system will be able to determine that a specific firearm and ammunition combination were utilized, which may be time wise matched with a hit on a target. In some examples, the system may be executed on a mobile computing device and used at a shooting range. Where the mobile computing device has a microphone pointed in the general direction of the shooter of interest, the shots fired by the shooter of interest will generally have an audio file that is dominated by shots fired by the shooter of interest, which may aid in determining whether a fired shot is from the shooter of interest. In some cases, the recording device (e.g., mobile computing device) has a microphone that is pointed down range, such as where the mobile computing device has a camera pointed at a target, such as for automated target scoring, in which case he audio from shots fired by the shooter of interest will have a volume that is more difficult to distinguish from other shooters at the range. In these cases, described embodiments may use training of a machine learning algorithm, or training of a weighting mask, to quickly differentiate shots fired by the intended shooters from all other shooters at the range. [0178] In some cases, the classifier relies on feature extraction in the form of time- frequency spectrograms. Mel-frequency cepstral coefficients (MFCCs) vectors could be used as an alternative raw power spectrum density vector (PSDs) for the frequency representation of the spectrogram. Initially, Mel frequency cepstrums (MFCs) were developed for speech processing and the way information is thought to be encoded in the speech waveforms. In some cases, the MFC’s may be applied to firearm fingerprinting purposes. As used throughout this disclosure a “firearm fingerprint” is used to refer to an acoustical signature, or a specific sound or image of sounds that identifies a specific firearm and ammunition combination. [0179] In some cases, MFCCs may be generated through executing a series of steps, including, without limitation: i) window the signal segment and compute the Fast Fourier Transform (FFT) of the signal segment, ii) combine linear FFT coefficients into the MEL frequency filter bank coefficients, iii) take the logs of those coefficients, and iv) compute the discrete cosine transform (DCT) of the log MEL filter bank coefficients. The FFT of the signal segment generally results in a peak at the applied frequency along with other peaks, referred to as side lobes, which are typically on either side of the peak frequency. The DCT, in some cases, expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. In some cases, fewer or more than the disclosed steps may be implemented to arrive at firearm fingerprints based on shot sounds. For instance, in some cases, only steps i), ii), and iii) described above may be used for shot sounds. [0180] According to some embodiments, MFCC or PSD coefficients may be used as inputs to a DNN trained to classify time-frequency spectrograms into some number of independent categories. For instance, the MFCC and/or PSD coefficients may input two categories as in the proposal, or some number of categories corresponding to (firearm, ammo) pairs, or some larger number of categories corresponding to tuples of k>2 attributes. [0181] In some cases, the DNN may place a new shot sound "in the neighborhood of" whatever class the maximal classifier output of all the classifier outputs represents. For example, a shot sound may be initially classified through a nearest neighbor approach, such as a k-NN algorithm. Subsequent analysis may further classify the shot sound. [0182] Inner layers of the DNN may represent different sets of attributes of the input. These may likewise be used for classifying the shot sounds and training. [0183] From an optimization theory perspective, training the DNN essentially defines a surface with multiple local optima and the DNN can be thought of as directing a new input to the most appropriate local optimum. In some embodiments, following the pre-trained DNN with one or more trainable layers might be thought of as extracting a different set of more optimum attributes for shot sounds. [0184] In some cases, preceding the pre-trained DNN with a trainable layer, may include, in some cases, weighting coefficients, might be thought of as adjusting the pre-trained DNN so that the shooter's shots of interesting are the most positive examples of all shots in the pre- trained DNN places "in the neighborhood of" whatever class the maximal classifier output of all the classifier outputs represents. In some cases, the decision threshold may be adjusted to optimize the confusion matrix for the training dataset used to adjust the trainable input layer or another test dataset for some useful criteria. [0185] According to some embodiments, the FFT of a windowed segment of the shot audio can be computed in O(n log n) time. Computing MFCCs has the same time order but includes computing two O(n log n) operations. In some cases, the MFCCs may not be significantly better than FFTs and they may be omitted. [0186] According to some embodiments, the MEL frequency log spectrum (omit the discrete cosine transform (DCT) yielding the cepstrum) might be an improvement over raw FFTs with only O(n) extra computation cost. Therefore, in some examples, the MEL frequency log spectrum is used rather than the DCT yielding the cepstrum. [0187] In general, there is no fast form for computing the function computed by a DNN (composed of layers of convolutional neural networks (CNNs)) like the FFT. The FFT takes advantage of regularities in the FFT kernel that generally won't exist in a composition of essentially arbitrary CNNs. However, examples described herein have an FFT of the input time signal, and the pre-trained DNN consisting of CNNs may be implemented in the frequency domain. [0188] As a non-limiting example, the following DNN-based quickly-trainable category recognizer may be implemented to quickly determine an acoustic signature of a firearm and ammunition combination. [0189] Let f: RM → RK denote the analog transformation by a trained deep neural net from a real-valued M-dimensional input vector to a K-dimensional vector of class probabilities. A final discrete output mapping Ψ: ^^K → ^^≤ ^^ selects the most probable of the ^^ classes. [0190] The system may be configured to expand the trained DNN into an enhanced binary classifier for class ^^. In some cases, the system may add a rapidly trainable input stage to the DNN implemented as the Hadamard product “⊙” of a weighting matrix ^^ and the input vector ^^. In some cases, the Hadamard product is a binary operation that takes in two matrices, such as a weighting matrix W and the input vector x, and returns a matrix of the multiplied corresponding elements. The system can then follow the DNN class probability vector output with a selector function: [0191] Γ: ^^ ^^ × ^^≤ ^^ → ^^ that provides the single class probability of class ^^. The enhanced DNN may implement the function: [0192] y( ^^) = Γ( ^^( ^^ ⊙ ^^( ^^)); ^^) [0193] Suppose we have a set of additional input vectors ζ = {Z¬1, …, ZM}, all instances of the same class k. The system can be tuned to better recognize similar instances of class k through any of a number of suitable ways. For example, one method is online learning that is typically used for large training datasets. For online learning, we initially set W=1 and then update it sequentially for the vectors in ζ using stochastic gradient descent as: [0194] ^^ ^^+1 = ^^ ^^ ^^ ^^[ ^^ ^^ + α · ( ^^( ^^) − Γ( ^^( ^^ ^^ ⊙ ^^ ^^); ^^)) · ( ∇ ^^ ^^( ^^ ^^ ⊙ ^^ ^^) ⊙ ^^ ^^) ] ^^ = 1, ..., ^^ [0195] where 0 < α < 1 is an adaption constant that weights the relative contribution of ζ to ^^. The target value ^^( ^^) = 1.0 for every vector in ζ. The iteration is performed until it converges for each ^^ in ζ. Here ^^ ^^ ^^ ^^[ ... ] denotes elementwise ^^ ^^ ^^ ^^() of the argument matrix. [0196] A firearm discharge results in multiple acoustic events, such as, for example, the muzzle blast created by the expansion of gases within the chamber and exiting through the barrel, and the ballistic shockwave generated by the projectile, which, in most cases, is supersonic, but may also be subsonic in some cases. The acoustic events are the result of variables that generate the firearm signature, and may include the firearm type, make, model, barrel length, ammunition type, powder quantity and identity, projectile weight, and projectile shape, among others. [0197] FIG.26 illustrates a sample process for correlating a firearm with a particular shooter 2600, in accordance with some embodiments. When a shooter visits a shooting range in order to practice, the range may have other shooters who are also at the range discharging firearms. In some cases, a busy range may have 10, 20, or 30 or more shooters who may all be shooting at the same time. It can be quite difficult for an acoustic system to register a shot fired by the shooter of interest through the cacophony of firearm discharges. In some cases, the system is configured to distinguish between the shooter of interest’s firearm and those of other shooters that range. [0198] For instance, at block 2602 the system receives video data of a shooter, which also includes audio data. This may be received, for example, through a multiple-camera system, dedicated microphones, or consumer-grade audio/video capture devices, such as a mobile computing device. [0199] At block 2604, the system may determine body landmarks of the shooter. [0200] At block 2606, the system may track the one or more body landmarks while the shooter fires a shot and generate motion data associated with the body landmarks during the shot. [0201] At block 2608, the system correlates audio data with motion data to determine that a shot has been fired by the shooter of interest. In some cases, the system may receive audio data indicating a firearm discharge, which may be represented as a spike in an audio wave file. This may be correlated with motion data, such as the shooter’s wrist, that indicates that recoil from the firearm displaced the shooter’s hand, thus indicating that a shot has been fired. In some cases, the system is trained to distinguish the shooter’s firearm from other firearms at the range. In this case, the system can be trained to differentiate shots fired from the shooter of interest and other shooters at the range. [0202] At block 2610, the system can determine the score of the shot and associate the score with the motion data leading to the score. [0203] At block 2612, the system may analyze the motion data in combination with the score and determine any mistakes that the shooter made, and provide suggestions to identify the mistake and/or how to address the mistake in the future. The system may also provide training exercises in order to allow the shooter to address the mistake and improve his shooting performance. [0204] With reference to FIG.27, which illustrates an example system 2700 that uses online learning, audio 2702 is received and is converted to an incremental spectrogram 2704. The spectrogram 2704 is framed 2706, such as by a time window, and used to determine MFCCs, such as by determining the FFT of the windowed signal, combining linear FFT coefficients into the MEL frequency filter bank coefficient, determining the logs of the coefficients, and determining the DCT of the log MEL filter bank coefficients. The determined MFCC’s can be entered into a rapidly trainable input stage 2708 and then delivered to a DNN multi-class classifier 2710. The selector 2734 determines the classifier output k with the highest class probability to classify a shot. The rapidly trainable input stage 2708 may be referred to as a trainable weighting mask, or just mask. The mask may be adjusted using instances of shot sounds for the shooter’s firearm and ammunition as recorded in a particular environment. In some cases, training only the weighting mask is significantly less computationally intensive than training the DNN. The trained (e.g., adjusted) weighing mask may represent the signature for the combination of firearm, ammunition, environment, and recording device. This signature is not necessarily used to search a catalog of signatures, but rather, is used to condition the spectrograms input to enhance the classification performance of the DNN for the firearm and ammunition in a particular environment using the given recording device. In some cases, the training dataset for a specific shooter (e.g., a specific firearm and ammunition combination) is small, therefore the weighting mask can be updated either using online approaches, such as by using stochastic gradient descent, or offline such as by using gradient descent techniques. [0205] Inner layers of the DNN classifier 2710 may represent different sets of attributes of the input. In some cases, the DNN is trained to define a surface with multiple local optima and the DNN can act to direct a new input to the most appropriate local optimum. The enhanced DNN classification can then be used to improve detection and differentiation of a shots from one firearm from those of other shooters. [0206] The weighting mask ^^ of the input stage 2708 to the pre-trained DNN classifier 2710 may be adapted by an online learning loop 2712 that optimizes the weighting mask ^^. [0207] The online learning loop 2712 includes a copy 2714, 2716, and 2718 of the shot classifier input stage 2708, 2710, and 2734. When a new shot n is detected, the subtractor 2720 in the online learning loop 2712 compares the result of the selected output ^^ from the copy of shot classifier 2714, 2716, and 2718 to the expected classification ^^( ^^) = 1.0 to compute a classification error term. [0208] Block 2730 of the adaption loop computes the vector sign of the gradient of the DNN output with respect to spectrogram input for the current output spectrogram x(n) from the framing block 2706. [0209] A raw incremental adjustment to the current weighting vector ^^i(n) is then computed by the Hadamard multiplier 2732 as the product of the current shot spectrogram from the framer 2706 and the sign vector of the classifier gradient 2730. The vector multiplier 2722 then scales the raw incremental adjustment to the weight vector ^^i(n) by an arbitrary value α. [0210] Finally, the vector adder 2724 computes a preliminary updated weight by adding the current incremental adjustment to the current ^^i(n). This preliminary weight is transformed by the vector ^^ ^^ ^^ ^^[] operation 126 to an updated weight vector ^^i+1(n). [0211] The weight adaption loop 2712 just described is iterated, as symbolized by the delta operator 2728, until the weight vector converges , for example | ^^i(n) - ^^i+1(n)| ≤ ^^. The result ^^i+1(n) is then selected as the weight vector ^^(n+1) for classifying the next shot. [0212] Offline learning may be used, such as when ζ is small. In some cases, offline learning may be initiated by setting ^^ = ^^ and updating it using gradient descent as: [0213] ^^ ^^+1 = ^^ ^^ ^^ ^^[ ^^ ^^ + α · 1/ ^^, for i=1 to M: ∑ {( ^^( ^^) − Γ( ^^( ^^ ^^ ⊙ ^^ ^^); ^^)) · (
Figure imgf000041_0001
[0214] In some cases, offline learning may require about the same amount of computations as online learning. However, online learning has the advantage over offline learning that stochastic gradient descent doesn’t require accessing the entire training dataset ζ in every iteration of the update function while offline learning does. Offline learning offers the advantage of finding a local optimum while online learning may only approximate it. [0215] While the gradient ∇x f evaluated at the current argument ^^ ^^ ⊙ ^^ ^^ is not excessively costly to compute, it may be sufficient to pre-compute sgn(∇x f(1)) or sgn (∇x f (1 · β)) where 1 is the unit vector if the DNN normalizes the input. In some cases, this may significantly simplify offline learning in this situation at the cost of only approximating a local optimum like online learning. [0216] While there are systems aimed at enhancing a DNN to classify shots, they do so by adding layers to the output of a pre-trained network to customize it to a particular task. The pre-trained network extracts many levels of increasingly abstract features, such as from images, and the additional trainable layers are used to focus on a problem of interest. In contrast, many of the systems and methods described herein function in a much different way that results in a system that is more efficient, much quicker, and more accurate. The systems described herein, in many cases, only add a single layer to the input of a pre-trained network. In some use cases, a camera is pointed at a target rather than focused on a shooter, and the camera and microphone pick up other shots without being able to natively determine that the discharge came from a shooter of interest. The acoustical signature is generated quickly and, in many cases, is performed on a mobile device that includes a camera and a microphone (e.g., smartphone). The mobile device may execute instructions (e.g., an application) that includes the components and systems described herein so that the classification and acoustical signature determination is performed on the mobile device. In some cases, the pre-trained network may be trained on a relatively encompassing universe of shots. The trainable input layer may pre-distort the input data to achieve a highly probable recognition by the pre-trained network of the particular firearm with ammunition and environment. This may then increase the likelihood of detecting the shot of interest and rejecting all other shots. [0217] According to some embodiments, the systems and methods described herein will converge to a useful weighting mask where the gradient of the function computed by the DNN is monotonic non-decreasing or monotonic non-increasing in each input. [0218] With reference to FIG.28, which illustrates pretraining a DNN, a process 2800 begins 2802 and at block 2804 an audio file is opened, which may be a first audio file, or a next or subsequent audio file. The audio file is used and the system, at step 2806, captures a block of samples framing a first and/or next shot. In other words, each shot is windowed in a time-bound sample. At block 2808, a spectrogram associated with the samples is generated and labeled with attributes. [0219] At block 2810, the system determines whether the most recent sample is associated with a last shot, and if not, the system returns to block 2806 to capture a block of samples associated with a subsequent shot. If so, the system proceeds to block 2812 and determines whether the labeled spectrogram created at block 2808 is the last file. If not, the system returns to block 2804 to open or capture a next audio file. If the system determines the most recent file is the last file, the system proceeds to block 2814 where the labelled spectrograms are aggregated. At block 2816, the DNN is trained on the labeled spectrograms. The system stops at block 2818 with a trained DNN. [0220] With reference to FIG.29, which illustrates deriving spectrogram weighting, a process 2900 begins 2902 and at block 2904 a set of shots is captured. The shots may be captured by an audio and/or video recording device, or may include opening a file associated with one or more shots. At block 2906, the system captures a block of samples framing a first and/or next shot. In other words, each shot is windowed in a time-bound sample. At block 2908, a spectrogram associated with the samples is created and labeled with attributes. [0221] At block 2910, the system determines whether the most recent spectrogram is associated with a last shot, and if not, the system returns to block 2906 to capture a block of samples associated with a subsequent shot. If so, the system proceeds to block 2912 and determines whether the labeled spectrogram created at block 2908 is the last set. If not, the system returns to block 2904 to capture a next set of shots. If the system determines the most recent file is the last file, the system proceeds to block 2914 where the labelled spectrograms are aggregated. At block 2916, the system updates W (weighting) until the value converges. The system stops at block 2918 with a spectrogram weighting. [0222] With reference to FIG.30, a process for classifying a shot 3000 is illustrated. The process begins at bock 3002, and at block 3004, the system captures a block of samples framing a shot. At block 3006, the system determines a spectrogram associated with the block of samples. At block 3008, the system determines a Hadamard product of the spectrogram and weight matrix. At block 3010, the Hadamard product of the spectrogram and weight matrix is applied to the DNN. At block 3012, the system makes a binary decision, such as the shot was either associated with the signature of the firearm in question or it was not. In some cases, the system is able to identify the type of firearm and the ammunition that was fired through the firearm. For instance, when receiving an audio sample, the system, without any prior knowledge of the firearm in question, can determine that the acoustic signature of the firearm in the audio sample corresponds with a 230 grain round-nose projectile fired from a Beretta .45ACP. At block 3014, the process stops. [0223] The system may include one or more processors and one or more computer readable media that may store various modules, applications, programs, or other data. The computer-readable media may include instructions that, when executed by the one or more processors, cause the processors to perform the operations described herein for the system. [0224] In some implementations, the processor(s) may include a central processing unit (CPU), a graphical processing unit (GPU), both CPU and GPU, a microprocessor, a digital signal processor or other processing units or components known in the art. Alternatively, or in addition, the functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that may be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs), etc. Additionally, each of the processor(s) may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems. The one or more control systems, computer controller and remote control, may include one or more cores. [0225] Automated Target Scoring [0226] In some cases, the described system operates in near real-time on a single, consumer recording device such as a mobile phone using only a modest amount of training data. As used herein, the terms “real-time” or “near real time” are broad terms and in the context of this disclosure, relate to receiving input data, processing the input data, and outputting the results of the data analysis with little to no perceived latency by a human. In other words, a system as described herein that outputs analyzed data within less than one second is considered near-real time. A system that operates in real-time or near-real time may limit the amount of computation for machine-learning, or at least training models, that the method can use to characterize a particular target acquisition, classification, and scoring. [0227] Prior approaches to automated target scoring have relied upon sound triangulation, light triangulation, and piezoelectric sensors triangulation. Sound triangulation has been attempted by using sound-chamber targets, which use the Mach wave of the projectile to determine its position as it passes through the target. A sound triangulation automated scoring system operates by using microphones to measure the sound wave of the projectile as it passes through the target. The sound of the projectile passing through the target can then be used from a multitude of audio sensors (e.g., microphones) to determine the location that the projectile passed through the target. [0228] A light triangulation automated scoring system uses three or more lasers, such as infrared lasers. The three or more lasers are used to triangulate the position of the projectile as it passes through the target. A piezoelectric sensor triangulation system relies on a series of piezoelectric sensors on a plate that sense vibrations caused by projectiles impacting a target. [0229] FIG.31 illustrates a system 3100 configured to automatically identify a target, classify the target, determine impacts on the target, and score the shooting string. The system 3100 may include computing resources 3102, which may be a mobile computing device associated with a participant at a shooting range and may include any one or more of a number of mobile computing devices, such as, for example, a smart phone, a tablet computer, a lap top computer, or other suitable computing device. The computing resources 3102 typically includes one or more processors 3104 and memory 3106 that store one or more modules 3108. The modules 3108 may store instructions that, when executed, cause the one or more processors 3104 to perform various acts. The computing resources 3102 may further include data storage, which may be remote storage, such as a remote server or a cloud-based storage system, or local storage, or a combination. The data storage may store data on previous engagements (DOPE) which can allow for data tracking over time as well as comparative data between different shooters, different firearms, different ammunition, different targets, different environments, and the like. [0230] The storage system may further allow historical trend analysis, which can be used to show shooter performance over time, including tracking improvements. The data storage may also be analyzed to provide performance predictions, rankings, social features, among other benefits. [0231] The system may incorporate one or more imaging sensors 3110, such as any suitable video camera. In some cases, the imaging sensor 3110 may be associated with the computing resources 3102. For instance, in some embodiments, the computing resources 3102 may be a smart phone with built in camera 3110. [0232] The camera 3110 may be pointed to capture images of a target 3112. The target may be located at any distance from the shooter and the camera 3110 may be aimed and/or zoomed to capture images of the target. In some embodiments, the camera may be coupled to a lens, such as a spotting scope or camera lens to allow the camera to get a closer view of the target through optical or digital zooming. [0233] The computing resources 3102 may include instructions (e.g., modules 3108) that allow the computing device to initialize a target 3114, detect impacts on the target 3116, and score the impacts on the target 3118, among other things. [0234] FIG.32 illustrates a decision tree 3200 configured to detect, identify, and classify a target. According to some embodiments, the system is not made aware of the type of target before the system begins looking for a target. For example, in some prior systems, a scoring system may be preprogrammed with the target that the shooter will be aiming at. This makes it easy for the system to understand the size and shape of the target, and the location and boundaries of each scoring ring or region. In the illustrated embodiments, the system is configured to automatically determine, without a priori data regarding the type of target, and determine the scoring rings and regions. For example, the system may have one or more video capture devices, which may be integrated into one or more mobile computing devices. As used herein, a mobile computing device may be one or more of a mobile phone, a smart phone, a tablet, a laptop, a personal digital assistant, smart glasses, a body cam, a wearable computing device, or some other computing device that a user may carry to a shooting range. [0235] The mobile device may actuate a camera and capture one or more frames of a target 3202. The computing device may have instructions that analyzes the one or more frames by using any suitable image analysis algorithms, to identify a target in the one or more frames. If the target is detected and classified at block 3204, the target is registered with the system 3206 and the scoring rings and regions are determined. The system may capture additional image frames that contain the target and look for differences from one frame to the next that may correlate with impacts on the target. The frames may be compared and at block 3208, moving averages may be generated. Moving averages are a fundamental mathematical and statistical technique applied in image analysis and machine learning for various purposes, including noise reduction, feature extraction, and trend analysis. They involve the calculation of the average value of pixel intensities or other data points within a moving window or kernel across an image or dataset. Moving averages can be used to extract meaningful features from images. For example, by sliding a small window across an image and calculating the average pixel values within that window, important information can be highlighted. For example, in edge detection, the moving average can emphasize areas with abrupt changes in pixel intensity, helping to identify edges or boundaries of the target and the scoring regions. Edge detection can also be used to identify impacts on the target. [0236] In some examples, moving averages are used for time-series data analysis. As an example, for detecting anomalies, moving averages can be used to establish a baseline behavior for a system. Any data points that deviate significantly from this baseline may be flagged as anomalies or outliers. These anomalies may be further analyzed to determine impacts on the target. [0237] As sequential moving averages are generated, they may be combined as long-term moving averages. At block 3210, the moving average images may be compared with the long-term moving averages to determine differences from one frame to a subsequent frame that indicate a change to the target, which is most likely associated with an impact on the target. [0238] At block 3212, the impacts are selected and classified. For example, the system determines the boundaries of the scoring ring and determines the location of each impact and associates the location of each impact with a score for the impact. [0239] Returning to block 3204, where the target has not been previously detected and classified, such as where a shooter is initializing the system, or has replaced the target, the system, at block 3214 determines whether the target has been detected. If not, then at block 3216, the system proceeds to detect the target. If the target has been detected, the system, at block 3218 classifies the target, such as by identifying the boundaries of the target, the boundaries of the scoring rings, and the value of the scoring rings. [0240] Where the system has not detected the target, it may capture one or more additional image frames and analyze the one or more additional image frames to determine that a target is located within the field of view of the imaging device. Once a target is detected, the system can then classify the target to determine its size, and the relative position and size of the scoring rings or regions. [0241] FIG.33 further describes the initial steps that the system may take to identify and classify a target 3300 by analyzing one or more image frames. Object detection is a computer vision technique that involves identifying and locating multiple objects within an image or video stream. Unlike image classification, which determines the presence of a single object class in an entire image, object detection provides a more granular understanding by not only recognizing objects but also specifying their positions through bounding boxes. In some embodiments, object detection algorithms typically output bounding boxes that enclose the detected objects. These bounding boxes consist of coordinates (x, y) for the object's top-left corner and dimensions (width and height) defining the object's spatial extent within the image. [0242] At block 3302, the system applies object detection to one or more images of the target and searches for the target. In some embodiments, the object detection model is generic with respect to targets, which allows the system to detect any target, regardless of size or shape. At block 3304, where a target is found in the same location in subsequent images (e.g., 2 or more images, 3 or more images, 4 or more images, etc.) the system assumes it has located the target and defines the bounding box around the target. In some cases, finding the target in the same location in subsequent images comprises determining a moving average of the images to determine the target location, size, and shape. [0243] At block 3306, the target is optionally classified. In addition to locating objects, the system may be configured to detect objects, classify each detected object into predefined classes or categories. This allows the system to distinguish between different object types, such as circular targets, ovoid targets, rectangular targets, silhouette targets, or otherwise. [0244] A target classifier may be applied to the image within the bounding box. The system thus determines which reference target image to apply. [0245] At block 3308, the system registers the target to the reference target image. In some cases, this involves applying a contrast adjustment to the image. This may also involve iteratively modifying the initial bounding box, such as by adjusting its corners, then projecting the adjusted bounding box onto the reference target image. The difference between the two may be applied as a score, and a hill-climbing technique may be applied to find the optimum corners, which can be correlated to be the initial location of the image. A hill-climbing technique is an optimization algorithm used to find the local maximum (or minimum) of a given objective function. By iteratively making small steps in the direction that leads to a higher value, the algorithm determines the highest value, lowest value, and thus can be used to determined the boundaries of the target. In some cases, object detection is combined with semantic segmentation to provide pixel-level object masks. This allows for a more precise understanding of object boundaries within an image, such as the target boundaries and the scoring ring boundaries. [0246] At block 3310, the system has initialized and registered the target and begins looking for impacts across subsequent moving averages. [0247] FIG.34 describes the process 3400 of registering the target 3300 and determining impacts on the target. At block 3402, the target may be reregistered, such as by performing a hill-climbing technique to search for new set of best corners for the target. In some cases, the hill-climbing search uses a mean-squared distance in perceptual space technique. For example, mean-squared distance, also called mean-squared perceptual error, is a metric used to measure the similarity or dissimilarity between two data points involve perceptual data, such as target corners. In some cases, for each data point, relevant perceptual features are extracted – in this case, target corners, edges, scoring rings, etc. The features can be visual descriptors which can be represented as a vector of perceptual features. These feature vectors capture the relevant information for each data point in a reduce and more informative form. [0248] The mean-squared distance between two points (represented as their respective feature vectors) is generated by determining the squared differences between corresponding features and calculating the mean of these squared differences. The resulting mean-squared distance provides a quantitative measure of dissimilarity between the two data points in perceptual space. [0249] At block 3404, the system may apply a transformation matrix, which can be used to map the set of corners to an image. In some cases, the image upon which the coordinates are mapped has a dimension of 160 pixels, and in some cases is less than 160 pixels. [0250] At block 3406, the moving average images are updated, and in some cases, long- term moving averages are those of about 10 seconds or longer, while short-term moving averages are those of about 0.1 second. In some cases, a video camera may capture upward of 30 frames per second. For the short-term moving averages, this equates to averaging about 3 frames to determine the short-term moving average. [0251] At block 3408, the system determines the difference between moving averages. For example, the long-term moving average will be associated with a static target that hasn’t changed over 10 seconds or so, and it can be compared with the short-term moving average which reflects a change in the image. Therefore, the difference between the short-term moving average and the long-term moving average will highlight changes to the image, such as an impact on the target. The system may convolve any difference images with a simple impact kernel, which in some cases may be a 5x5 uniform weight, square kernel, and look for the maximal block-wise locations in the difference image. The kernel typically refers to a convolutional filter that can be used to process and modify pixel values, such as for feature extraction. The square kernel may convolve (or move) across the image and at each position, the kernel’s values can be multiplied with the pixel values in the corresponding neighborhood and the results can be summed to produce a new pixel value in the output image. Of course, the size of the kernel may be altered to adjust the extent of the neighborhood considered during convolution and may include any of a hole set of aperture kernels, having non-uniform weights, and may have any suitable size. [0252] The convolution will return a set of potential impacts on the target. The set of potential impacts may be further filtered, such as by using simple stats on a window surrounding the flagged difference. In some cases, the window is selected to be a 16x16 window with the difference in the middle of the window. Of course, other window sizes are entirely plausible, and the pixel values described herein are only illustrative of some embodiments. The system may also apply some business rules to the windowed difference, such as, for example, the system should not detect multiple impacts in exactly the same location. [0253] At block 3410, the impacts on the target are determined. In some cases, this is accomplished by passing the filtered set of differences to an impact classifier for scoring. If the score of the difference is above a threshold value, the location is marked as an impact and another window may be placed around the impact. In some case a 10x10 window is placed around the impact location in the long-term average with the short-term average for between 5-10 following frames. This ensures that the same impact is not detected again. Accordingly, the difference is windowed by a first window, and if the difference exceeds a threshold score, the difference is windowed by a second window smaller than the first window. The windowed difference associated with the short-term moving average may be added to the long-term moving average for at least 5 frames, or at least 6 frames, or at least 10 frames, or at least 12 frames, or at least 15 frames or more. In some cases, where the score of the difference is below a threshold, the difference is marked as a false impact and the system won’t need to evaluate and classify it over again. [0254] According to some embodiments, the system may receive audio data associated with a shot being fired and determine that a shot has been fired based on the audio data. In some cases, the audio data is correlated with the target images and the system can convolve the difference target image in response to the audio data indicating that a shot has been fired. In some cases, the system may not need to continuously convolve the difference images. In this case, the system can determine, through audio data, that a shot has been fired and then update the shot-term moving average and convolve the difference images to look for a shot. In some cases, the system is configured to discriminate shots fired by the user aiming at the target against other shooters at the shooting range. In this way, the system can know when the shooter of interest fires a shot even where other active shooters are present at the range. [0255] In some cases, the audio data may be used in the impact detection, such as by correlating the audio of a shot fired with an impact appearing on the target images. [0256] FIGs.35A – 35C illustrate and describes initializing the scoring system by identifying and classifying a target. In some cases, the system can automatically determine the boundaries of the target, while in some embodiments, user input may define the boundaries of a target. For example, using a human to computer interface (e.g., touch screen, mouse, stylus, touch pad, or the like), a human may draw a boundary around the target to aid the system in identifying the target. However, in many embodiments, the system uses machine vision to identify the target and its boundaries. FIG.35A illustrates an image 3500 captured by a camera associated with the system. The image may include a target stand 3502, a target 3504, target securing clips 3506, and other features within the field of view. The system may determine an initial bounding box 3508 around the identified target, such as by a trained target detection model. In some embodiments, a user may define the initial bounding box, such as by drawing on the computer display with a human to computer interface. The human to computer interface may be any suitable interface and in some cases is a touch screen, a pen, a mouse, a trackball, or the like. The initial bounding box may not accurately conform to the edges and corners of the target, especially in those cases where the bounding box is defined by the user. The initial bounding box and target image may be referred to as an initialization frame. The initialization frame may be converted to a Lab color space, which includes the components lightness, green to red axis, and blue to yellow axis to generate perceptual uniformity. In some cases, the luminance channel is equalized via contrast limited adaptive histogram equalization (CLAHE). [0257] FIG.35B illustrates a target in which the coordinates for the target are determined, as described above, and the coordinates in many cases imply a quadrilateral, which can be projected onto a reference target image 3510. The reference target image 3510 may also be converted to Lab color space and the squared difference (in Lab space) may be generated between the projection and the target. A hill-climbing algorithm may be applied on the coordinates, where possible deltas are small changes to the coordinates and better solutions may be determined by the squared perceptual difference. [0258] FIG 35C illustrates the best coordinates that have been determined, such as by the minimum difference over several random restarts of the hill-climbing algorithm. The coordinates may then be used to apply the updated bounding box 3512. Therefore, even where the target image is skewed, such as the viewing angle from the camera appear to show the target as a parallelogram rather than a rectangle, the initial bounding box can be modified to conform with the shape of the target as presented in the image captured by the camera. [0259] In some embodiments, the system may define the edges of the target through image analysis; however, in some cases, the edges of the target are irrelevant and it is only the scoring rings that are important. Therefore, in some cases, the system is configured to identify scoring rings and is not concerned with target boundaries. In addition, the system may not need to classify the target, but rather, only need identify scoring rings. For example, the system may determine, through one or more machine learning models, that the target represents a center bulls-eye target with sequential scoring rings. The system may assign score values with each ring, such as ten points for the bullseye, nine points for the next larger ring, and so on. Similarly, the system may identify a target with five bullseye sized circles spaced throughout the target and assign a value of ten points to each of these scoring rings. One or more of the multiple bulls-eye sized rings may have radially space larger scoring rings that may be assigned lesser values than the bulls-eye sized ring. Thus, the system may omit a step of classifying a target, and just focus on the size and location of the scoring rings. [0260] FIG.36 illustrates and describes impact detection and scoring of the detected impacts. A machine learning model may be executed to determine if the differences between image frames are likely to be projectile impacts on the target 3504. The target 3504 may be re-registered, such as by applying a hill-climbing techniques with possible corner coordinates as in the target initialization step. By comparing the short-term moving average difference with the long-term moving average, differences are windowed 3602a, 3602b, 3602c, and the difference image 3612 between the current target and the long-term target average is generated. [0261] The difference image may be convolved with an impact kernel (e.g., a windowed kernel that scans for image differences). Any point that is several standard deviations over the long-term exponential moving average (EMA) of the maximum convolution value is flagged as a possible impact 3604a, 3604b, 3604c. [0262] The possible impacts, 3604a-3604c, are fed into a machine learning model (e.g., classifier) 3606 that determines if the differences are likely to be actual impacts. If the differences are above a threshold value, the system marks the differences as actual impacts 3608. However, where the differences are below a threshold value, the system marks the differences as false impacts 3610. [0263] FIG.37 illustrates and describes impact scoring on a target 3504. Different scoring zones may be determined by the system based on computer vision, by references a registered target from a previous shooting session, by retrieving a stored target model from a known target database, or in some other way. The scoring zones 3702 on the target may be represented as the union area of one or more simple shapes (e.g., ellipses, rectangles, circles, triangles, etc.) The coordinates of the detected impact 3704 may be normalized and converted to the axes implied by the reference image. In other words, the impacts may be overlaid on the reference image, and the reference image can be used for coordinates of the impact. The coordinates may be cartesian coordinates expressed in x,y values. In some cases, the coordinates may be radial coordinates that express the impact as an angle and distance, such as from the center of the target. The system can then determine whether the impact is wholly within a single scoring zone or intrudes on a scoring zone boundary, which allows the system to accurately score the impact. The system may use simple geometry to determine whether any significant part of the given impact is within any of the simple shapes for each target zone. [0264] In some embodiments, the system uses the coordinates for the impacts in further analyses. For example, by generating and storing coordinates for a given shooting string, the grouping can be quantified, which can be used as a measure of improvement over time. Similarly, a shooter’s moment of angle (MOA), which is a measure of a group size in inches and minutes of an angle and from center to center and edge to edge can be determined. A grouping may further be used to define pose, grip, or motion errors during the shooting string. The grouping may be quantified, including size of group, rotation of group, or other metric. [0265] The scoring can be quantified in any suitable metric. In some cases, the scoring is point based, in which each zone of the target receives a number of points that are added or subtracted from an initial amount. In some cases, missing shots or extra shots fired are scored as negative values or higher values depending on the type of scoring. In some cases, timed scoring is used, in which the total time is reflected in the score, and misses may be penalized by an increase in the time. In some cases, group size is used to determine scoring, and extra shots or missed shots may penalize the group size. Of course, other metrics and combinations of metrics may be determined by the system for scoring a particular shooting string. [0266] The system may be configured to return, for each shot in a shooting string, a coordinate of the impact with the time in which the shot happened in a way that number of metrics combining the location and time can be used to range the shooting string. For instance, the time between shots may be measured, or the shot after a buzzer or other start signal can be tracked and stored with an accuracy metric. [0267] In some cases, as the impacts are identified, the system may draw a bounding box surrounding one or more of the impacts. When a shooting string is finished, the system may draw a bounding box that contains each of the shots within the group and determines a metric based on the bounding box to determine a score. [0268] As shown in FIG.38, which illustrates a user interface 3800 of a system that has been developed and is operational according to some of the embodiments described herein, the system can be configured to determine a bounding box 3802, which may pass through the center of the outermost impacts, or along an edge of the impacts. The system may determine any of a number of metrics, such as, without limitation, a group size 3804, an overall group width 3806, a group height 3808, bounding box rotation angle, MOA, elevation offset 3810, windage offset 3812, and may further determine the shooting distance 3814, which may either be manually input or determined based on detected flight time of the projectile. [0269] For example, the system may be configured to register the sounds of the gunfire, the shockwave of the projectile or powder deflagration, motion of the firearm or shooter, or some other indicator that a shot has been fired. The system may then detect when an impact on target happens and determine the flight time of the ammunition and based on the firearm, the ammunition, and/or the powder loading, determine a target distance. This process can be done in near real time by a simple consumer grade mobile computing device. In some cases, the mobile computing device may utilize a zoom feature of a built-in image capture device. In some cases, an external zooming lens may be used to acquire image frames by the mobile computing device. For example, a mobile phone may be coupled to a spotting scope, which provides an optical zoom through the spotting scope to allow the mobile computing device to capture clearer images of a target that may be down range. Some mobile computing devices may rely on digital zoom to capture one or more images of a target positioned down range. [0270] The system may further determine and display the number of shots fired 3816 in the current shooting string, an average split time 3818 for each shot, which may be helpful for timed shooting events. The system may further show a score associated with each shot 3820 and a cumulative score 3822 of the shooting string. [0271] Some embodiments further provide for an automated and automatic scoring system that can quickly identify a target, classify a target including identifying scoring rings of the target, and score impact hits on a target at a shooting range. In some cases, the system is stored and executed on consumer-grade mobile computing devices (e.g. an iPhone, a tablet, a telephone, a video camera). In some cases, the system includes a video camera device that is pointed at the target of interest, and the system is configured to identify the target, classify the target, determine shot impacts on the target, and score the impacts on the target. In some cases, the system is configured to prompt a shooter as to the shooting stage. For instance, the system may be configured for utilization during a CMP high-power rifle competition, and the system may prompt a user that the present stage requires sending twenty shots downrange from an off-hand position during a 20-minute window. In some cases, the system is aware of how many shots to expect during a shooting stage (referred to as a string of fire), and may prompt a user with information associated with a current shooting stage, such as number of shots, timeframe, and shooting position. In some cases, a shooter may enter information associated with a shooting stage, such as, for example, a number of shots the system should expect, the firearm used, and the distance to the target, among others. In some cases, the system is manually started and stopped and only identifies shots on target during a time at which the system has been started. [0272] Comprehensive Training System [0273] As described above, in some embodiments, the system utilizes a gantry type arrangement in which a plurality of recording devices may be used to capture audio and video of the participant and/or the target. In some cases, an imaging device can be pointed at the target, which may have a zoom lens, digital zoom feature or rely on an external lens, such as a camera mounted to a spotting scope for capturing images of a target located down range. In some cases, the system is configured to synchronize multiple sources, such as one or more video frames, and/or audio data from one or more audio capture devices. In some cases, the system is configured to synchronize multiple video frames and audio data from one or more audio/video capture devices. [0274] The described system thus provides a comprehensive firearm training solution that allows shooters, even at busy ranges, to track body motion, identify errors in technique, correlate scores with the errors in technique, automatically score a target, and differentiate shots fired from the participant’s firearm from other firearms. [0275] The system may utilize one or more machine learning models for synchronization, prediction, verification, and may further be trained to analyze scoring data and associate the scoring data with the motion data to determine correlations between specific motion data (e.g., behaviors) and scoring trends. As an example, the system may correlate that a shooter’s wrist pivots downwardly before shots that typically score outside and below the 10 ring and determine that the shooter is anticipating the firearm recoil before the shot. The system may then provide feedback to the user with not only information related to the motion/score correlation but may also provide one or more exercises or drills to allow the user to recognize and address the behavior resulting in the reduced score. The system may further be trained to distinguish the discharge of a firearm associated with the shooter of interest, even at a busy range with numerous shooters. A similar process may be used with any motion data from any activity or sport, as described elsewhere herein. [0276] In some embodiments that utilize multiple video capture devices, the system may track body motion in two or three dimensions and from multiple angles. The two- or three- dimensional body motion data may be correlated, synchronized, and analyzed to determine two- or three-dimensional motion data, which can be further correlated with a resulting score. [0277] While embodiments of the described system are described in relation to a shooter firing a string of shots, it should be understood that the systems and methods described herein are applicable to capturing any type of body motion and applicable to other sports where body motion may lead to performance metrics. For example, embodiments of the systems described herein may be used to track, critique, and improve body motion such as basketball free throw shooting, golf swings, figure skating elements, archery, soccer, baseball swing, or any other sport or motion where the movements of a set of observable body landmarks can be recorded in time and there is some observed causal consequence of the movement. [0278] The system may include one or more processors and one or more computer readable media that may store various modules, applications, programs, or other data. The computer-readable media may include instructions that, when executed by the one or more processors, cause the processors to perform the operations described herein for the system. [0279] In some implementations, the processor(s) may include a central processing unit (CPU), a graphical processing unit (GPU), both CPU and GPU, a microprocessor, a digital signal processor or other processing units or components known in the art. Alternatively, or in addition, the functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that may be used include field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), complex programmable logic devices (CPLDs), etc. Additionally, each of the processor(s) may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems. The one or more control systems, computer controller and remote control, may include one or more cores. [0280] Embodiments may be provided as a computer program product including a non- transitory machine-readable storage medium having stored thereon instructions (in compressed or uncompressed form) that may be used to program a computer (or other electronic device) to perform processes or methods described herein. The computer-readable media may include volatile and/or nonvolatile memory, removable and non-removable media implemented in any method or technology for storage of information, such as computer- readable instructions, data structures, program modules, or other data. The machine-readable storage medium may include, but is not limited to, hard drives, floppy diskettes, optical disks, CD-ROMs, DVDs, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, flash memory, magnetic or optical cards, solid-state memory devices, or other types of media/machine-readable medium suitable for storing electronic instructions. Further, embodiments may also be provided as a computer program product including a transitory machine-readable signal (in compressed or uncompressed form). Examples of machine-readable signals, whether modulated using a carrier or not, include, but are not limited to, signals that a computer system or machine hosting or running a computer program can be configured to access, including signals downloaded through the Internet or other networks. [0281] A person of ordinary skill in the art will recognize that any process or method disclosed herein can be modified in many ways. The process parameters and sequence of the steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. [0282] The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or comprise additional steps in addition to those disclosed. Further, a step of any method as disclosed herein can be combined with any one or more steps of any other method as disclosed herein. [0283] The disclosure sets forth example embodiments and, as such, is not intended to limit the scope of embodiments of the disclosure and the appended claims in any way. Embodiments have been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined to the extent that the specified functions and relationships thereof are appropriately performed. [0284] The foregoing description of specific embodiments will so fully reveal the general nature of embodiments of the disclosure that others can, by applying knowledge of those of ordinary skill in the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of embodiments of the disclosure. Therefore, such adaptation and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. The phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the specification is to be interpreted by persons of ordinary skill in the relevant art in light of the teachings and guidance presented herein. [0285] The breadth and scope of embodiments of the disclosure should not be limited by any of the above-described example embodiments, but should be defined only in accordance with the following claims and their equivalents. [0286] Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain implementations could include, while other implementations do not include, certain features, elements, and/or operations. Thus, such conditional language generally is not intended to imply that features, elements, and/or operations are in any way required for one or more implementations or that one or more implementations necessarily include logic for deciding, with or without user input or prompting, whether these features, elements, and/or operations are included or are to be performed in any particular implementation. [0287] Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the description do not preclude additional components and are to be construed as open ended. [0288] The specification and annexed drawings disclose examples of systems, apparatus, devices, and techniques that may provide a system and method for determining acoustical signatures of discharged firearms. It is, of course, not possible to describe every conceivable combination of elements and/or methods for purposes of describing the various features of the disclosure, but those of ordinary skill in the art recognize that many further combinations and permutations of the disclosed features are possible. Accordingly, various modifications may be made to the disclosure without departing from the scope or spirit thereof. Further, other embodiments of the disclosure may be apparent from consideration of the specification and annexed drawings, and practice of disclosed embodiments as presented herein. Examples put forward in the specification and annexed drawings should be considered, in all respects, as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only, and not used for purposes of limitation. [0289] Those skilled in the art will appreciate that, in some implementations, the functionality provided by the processes and systems discussed above may be provided in alternative ways, such as being split among more software programs or routines or consolidated into fewer programs or routines. Similarly, in some implementations, illustrated processes and systems may provide more or less functionality than is described, such as when other illustrated processes instead lack or include such functionality respectively, or when the amount of functionality that is provided is altered. In addition, while various operations may be illustrated as being performed in a particular manner (e.g., in serial or in parallel) and/or in a particular order, those skilled in the art will appreciate that in other implementations the operations may be performed in other orders and in other manners. Those skilled in the art will also appreciate that the data structures discussed above may be structured in different manners, such as by having a single data structure split into multiple data structures or by having multiple data structures consolidated into a single data structure. Similarly, in some implementations, illustrated data structures may store more or less information than is described, such as when other illustrated data structures instead lack or include such information respectively, or when the amount or types of information that is stored is altered. The various methods and systems as illustrated in the figures and described herein represent example implementations. The methods and systems may be implemented in software, hardware, or a combination thereof in other implementations. Similarly, the order of any method may be changed and various elements may be added, reordered, combined, omitted, modified, etc., in other implementations. [0290] From the foregoing, it will be appreciated that, although specific implementations have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the appended claims and the elements recited therein. In addition, while certain aspects are presented below in certain claim forms, the inventors contemplate the various aspects in any available claim form. For example, while only some aspects may currently be recited as being embodied in a particular configuration, other aspects may likewise be so embodied. Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended to embrace all such modifications and changes and, accordingly, the above description is to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMS What I claim is: 1. A method for improving shooting performance, comprising: receiving video data of a shooter; determining one or more body landmarks of the shooter; tracking the one or more body landmarks during a shot to generate shot motion data; determining a score of the shot; associating the shot motion data with the score; and generating recommendations for altering the motion data on a subsequent shot.
2. The method of claim 1, wherein determining one or more body landmarks of the shooter comprises generating a wire frame model by connecting the body landmarks.
3. The method of claim 1, wherein associating the shot motion data with the score comprises executing a classification and regression tree machine learning model to identify causal relationship between the shot motion data and the score.
4. The method of claim 1, further comprising determining, through image analysis of the video data of the shooter, a grip of the shooter.
5. The method of claim 4, further comprising analyzing the grip of the shooter and providing, on a display screen, grip recommendations to alter the grip.
6. The method of claim 1, wherein determining the score of the shot comprises: receiving target video data; performing image analysis on the received target video data; determining a hit on the target; and determining a score of the hit.
7. The method of claim 1, wherein receiving the video data includes capturing video data by a mobile phone.
8. The method of claim 1, wherein determining one or more body landmarks includes determining 17 body landmarks.
9. The method of claim 1, wherein tracking the one or more body landmarks includes generating a bounding box around each of the one or more body landmarks.
10. The method of claim 1, further comprising executing a machine learning model to correlate the shot motion data with the score.
11. The method of claim 10, wherein the machine learning model is configured to determine the motion data that results in an off-center target hit.
12. A method for improving a causal consequence of body movement, comprising: receiving video data of a body motion; determining one or more body landmarks viewable in the video data of the body motion; tracking the one or more body landmarks during an action; generating, based at least in part on the tracking the one or more body landmarks, motion data; determining a score associated with the motion data; associating the motion data with the score; and generating recommendations for altering the motion data on a subsequent action.
13. The method of claim 12, wherein receiving the video data includes capturing the video data by a mobile computing device.
14. The method of claim 12, further comprising executing a machine learning model to correlate the motion data with the score.
15. The method of claim 14, wherein the machine learning model is configured to determine the motion data that results in a reduced score.
16. The method of claim 15, further comprising predicting, by the machine learning model, a predicted score based on the motion data.
17. The method of claim 16, further comprising comparing the predicted score with the score.
PCT/US2023/032669 2022-09-13 2023-09-13 Systems and methods for marksmanship improvement through machine learning WO2024059156A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263406241P 2022-09-13 2022-09-13
US202263406245P 2022-09-13 2022-09-13
US202263406208P 2022-09-13 2022-09-13
US63/406,241 2022-09-13
US63/406,245 2022-09-13
US63/406,208 2022-09-13

Publications (1)

Publication Number Publication Date
WO2024059156A1 true WO2024059156A1 (en) 2024-03-21

Family

ID=88297014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/032669 WO2024059156A1 (en) 2022-09-13 2023-09-13 Systems and methods for marksmanship improvement through machine learning

Country Status (1)

Country Link
WO (1) WO2024059156A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200072578A1 (en) * 2018-09-03 2020-03-05 Rod Ghani Multiview display for hand positioning in weapon accuracy training
US10648781B1 (en) * 2017-02-02 2020-05-12 Arthur J. Behiel Systems and methods for automatically scoring shooting sports
US20210174700A1 (en) * 2018-12-11 2021-06-10 NEX Team Inc. Interactive training of body-eye coordination and reaction times using multiple mobile device cameras

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10648781B1 (en) * 2017-02-02 2020-05-12 Arthur J. Behiel Systems and methods for automatically scoring shooting sports
US20200072578A1 (en) * 2018-09-03 2020-03-05 Rod Ghani Multiview display for hand positioning in weapon accuracy training
US20210174700A1 (en) * 2018-12-11 2021-06-10 NEX Team Inc. Interactive training of body-eye coordination and reaction times using multiple mobile device cameras

Similar Documents

Publication Publication Date Title
He et al. A twofold siamese network for real-time object tracking
Lebeau et al. Quiet eye and performance in sport: A meta-analysis
Stöckl et al. Making offensive play predictable-using a graph convolutional network to understand defensive performance in soccer
US20240082685A1 (en) Smart soccer goal
US10664691B2 (en) Method and system for automatic identification of player
US20200200509A1 (en) Joint Firearm Training Systems and Methods
CN113532193B (en) Intelligent combat confrontation training system and method for team tactics
WO2015190071A1 (en) Video processing method, and video processing device
US20230377336A1 (en) Method of operating server providing sports video-based platform service
WO2023155357A1 (en) Aiming method and apparatus, and terminal and storage medium
Yanai et al. Q-ball: Modeling basketball games using deep reinforcement learning
US20220047917A1 (en) Method, device and computer software for determining a relative performance measure between at least two players in a sport
WO2024059156A1 (en) Systems and methods for marksmanship improvement through machine learning
US20170199010A1 (en) System and Method for Tracking and Locating Targets for Shooting Applications
Markoski et al. Application of adaboost algorithm in basketball player detection
US20230226454A1 (en) Method for managing and controlling target shooting session and system associated therewith
WO2024059154A1 (en) Systems and methods for marksmanship digitizing and analyzing
Rodrıguez et al. STOx’s 2014 Extended Team Description Paper
US20240068786A1 (en) Target Practice Evaluation Unit
WO2024059155A1 (en) Systems and methods for automated target identification, classification, and scoring
Kawamura et al. Classification of Handball Shot through Image Analysis
KR102702069B1 (en) Method of controlling sports activity classification learning apparatus, computer readable medium and apparatus for performing the method
Apon et al. Action recognition using transfer learning and majority voting for csgo
Micheli et al. Performance assessment in clay pigeon shooting using machine vision for gaze detection
Izmailov et al. Neural Network Based Scope Positioning System for First-Person Shooters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23786378

Country of ref document: EP

Kind code of ref document: A1