US11017576B2 - Reference model predictive tracking and rendering - Google Patents
Reference model predictive tracking and rendering Download PDFInfo
- Publication number
- US11017576B2 US11017576B2 US16/425,623 US201916425623A US11017576B2 US 11017576 B2 US11017576 B2 US 11017576B2 US 201916425623 A US201916425623 A US 201916425623A US 11017576 B2 US11017576 B2 US 11017576B2
- Authority
- US
- United States
- Prior art keywords
- user
- frame
- points
- avatar
- reference model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30221—Sports video; Sports image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Definitions
- Human brains are incredibly powerful associative learning machines. If two phenomena hit a person's sensory systems consistently together in time, the brain may create an associative memory to link those two phenomena. To the degree that those two phenomena themselves contain parsimonious information (internal structure), the brain may find correlated patterns within said phenomena and may encode deep structural relationships between those phenomena. This apparently happens automatically and effortlessly.
- FIG. 1 illustrates an overview data flow diagram for predictive tracking and rendering in accordance with some examples.
- FIG. 2 illustrates a data flow diagram for reference model predictive rendering in accordance with some examples.
- FIGS. 3A-3C illustrate smoothing processes and examples for post processing trajectories in accordance with some examples.
- FIGS. 4A-4B illustrate a flowchart and a diagram showing noise reduction and lateralization processes in accordance with some examples.
- FIG. 5 illustrates a diagram illustrating an example pruned nearest neighbors process in accordance with some examples.
- FIG. 6 illustrates a diagram illustrating a latency reducing predictive modeling process in accordance with some examples.
- FIG. 7 illustrates a predictive interpolation flowchart in accordance with some examples.
- FIG. 8 illustrates a flowchart showing a process for displaying an avatar or multiple avatars in accordance with some examples.
- FIG. 9 illustrates a flowchart showing a process for triggering capture of training data in accordance with some examples.
- FIG. 10 illustrates a block diagram of an example of a machine upon which any one or more of the processes discussed herein may perform in accordance with some embodiments.
- the present systems and methods use computational systems, networking, and display hardware to seamlessly allow users to see their motion in real-time. This affords the brain the extra visual information that may help a person converge to highly efficient high-quality technique (movement control) more rapidly than other processes.
- the systems and methods described herein enable additional exercises that can enhance a user's rate of improvement. These include automatically synchronizing an expert version of a motion to the precise speed and timing of a user's motion in real-time. In fact, this can be set so the expert slightly leads the user providing online position targets for the user at all times during motion.
- the expert version may include an expert model or an expert avatar, which are used interchangeably throughout this disclosure.
- a prediction is used to generate simultaneous experience of a situation and a representation of that same experience in a computerized medium.
- the low accuracy of this type of prediction due to the open-ended nature of what can happen in the real world is a limiting factor in some examples.
- the systems and methods described herein may predict human movement.
- Human movement is subject to the conservation of momentum. Therefore, on short time scales, velocities tend to be similar from one instant to the next.
- the specific motion that the user attempts to perform may be known in advance. These two bases taken together are used to place strong limits on the possible positions the user can move to a fraction of a second in the future.
- FIG. 1 illustrates an overview data flow diagram for predictive tracking and rendering in accordance with some examples.
- FIG. 1 A global process flow diagram 100 appears in FIG. 1 . This may represent a feedback loop between a user (e.g., a student) and an instructor assuming the present system is in place.
- a user e.g., a student
- instructor assuming the present system is in place.
- a process includes using trajectory inference from a position sensor, such as a depth sensor or a 3D position sensor (e.g., local positioning sensor).
- Derived skeletal constructions may include manipulations of depth sensor data and previously-established motion models independent and with respect to each other in order to enable efficient teaching of human movement skills.
- a motion model of a human movement can be a 5-dimensional object. These dimensions are the usual four dimensions of space-time (three for positioning things in space and one for time) plus an additional parameter that specifies which body segment the other four dimensions are specifying with spatial and time coordinates. These body segments can be listed and numbered 1 to n (where n depends on the specific human body model used in the motion model). Thus this dimension is a discretely varying dimension that can take n values.
- Movement data can be broken up into a time series of body positions.
- movement data can include a time series of positions for individual body segments.
- a time series of positions of a body segment may be referred to as a “trajectory”. Operations may be performed on these trajectories (making sure to keep a good account of the time parameter) by processing them so as to make them more accurate or to predict future movement before recombining them into full motion models, or output predictions into time-slice full-body positions.
- This trajectory inference process may be used to enable a parallelized and cross-constraining real-time body position sensing and virtual-representation-constructing system to:
- the process may utilize statistical inference on trajectories with a cross-constraining method to improve the smoothness and accuracy of position sensor (e.g., depth sensor) tracking in generating motion models when real-time delivery is not used.
- position sensor e.g., depth sensor
- Example processes may include modeling motion in runtime. For example, it may be known in advance what action a user is be trying to execute. This allows rational prediction of near-future motions from the continuity of the motion implicit in a time series of a motion-captured representation of the action. Knowing the action may allow for pruning of a search space for a nearest neighbor algorithm, which may serve as a basis for predictions (as well as enabling additional visual modeling opportunities within the system).
- Example processes may include modeling motion based on a full time series when minimizing time to representation delivery is not critical.
- Representational models of user motion may be generated in a non-real-time case to leverage extra processing time to ensure precision and accuracy. Then, in a real-time case, these models may be used in a prediction process to start finding near-future body positions, for example well in advance of targeted-time-to-display. These may then be handed off to the pipelines for construction representation early to eliminate latency (e.g., within a real-time system).
- Modeling inputs may include, while in runtime predicting: current depth-sensor-sourced locations of body segments, forward technique models from recent instants in the time series of body segment locations, current confidence intervals, reference model data for analogous instant in its time series, or the like.
- Modeling inputs may include, while in post processing (for example not prediction, but instead inference from what is treated as statistical data): instant-specific depth-sensor-sourced locations of body segments, time-series-and-body-segment-tagged confidence intervals on measured body segment positions, reference model data (e.g., data from previously established models which consisted of a user or expert attempting the same motion) for analogous instant in its time series, or the like.
- post processing for example not prediction, but instead inference from what is treated as statistical data
- instant-specific depth-sensor-sourced locations of body segments time-series-and-body-segment-tagged confidence intervals on measured body segment positions
- reference model data e.g., data from previously established models which consisted of a user or expert attempting the same motion
- forward models are constructed at previous instances in the time series using pre-existing Reference Model data specific to the instances in which they are constructed (Backward models, if used, are similarly constructed from subsequent instances).
- using the reference model data includes first matching the user's progress through the technique to the most similar moments in the reference model technique through the matching algorithm.
- the global process flow diagram 100 includes a block 102 for preliminary instruction, modeling, repetitions, or process quality control. For example, an instructor leads the user through initial description, demonstration, repetitions and feedback, but with no 3D capture (and no review of user motion) until the user has sufficient similarity to a quality swing (e.g., as defined in the reference model).
- the global process flow diagram 100 may use the similarity to “recognize” the motion and produce good tracking.
- the global process flow diagram 100 includes a block 104 to capture a first user avatar.
- the global process flow diagram 100 includes a block 106 to review with instruction, for example, at various speeds and views, such as full speed, slow motion, frozen position, side by side with expert, overlay with expert, with motion graphics embellishments, with instructor description or demonstration of a next step, or the like.
- the global process flow diagram 100 includes a block 108 to send data to a user account. For example, after attempting to capture an avatar, the instructor may determine (or an automated system may determine) whether to upload or redo the capture until it is at a sufficient level of quality (the standard of quality here may include a subjective opinion of the instructor).
- the global process flow diagram 100 includes a block 110 to perform or record real-time exercises, for example using predictive extrapolation of current motion referencing a current user avatar to improve the accuracy of forward kinematics.
- the global process flow diagram 100 includes a block 112 to capture a temporary model and review.
- Block 112 may include a process similar to capturing a user avatar, but used for subsequent teaching and guided review with the instructor. Review at block 112 may be performed at different speeds or with different attributes, such as full speed, slow motion, frozen position, side by side reference model, overlay reference model, motion graphics embellishments, instructor description or demonstration of a next step, or the like.
- the global process flow diagram 100 includes a block 114 to capture a new user avatar.
- the new user avatar construction process may be based not on an expert model, but instead on the previous user avatar as the previous user avatar may better predict the user's current motion.
- Processes disclosed herein may refer to the use of a “reference model”.
- a reference model includes a previously obtained motion model of the same technique that the user is attempting in the process of creating the new user avatar. This may, for example, be either of an expert model or the previous user avatar for the same technique (if it exists).
- An expert model may be a motion model of the same technique performed by a world class expert.
- the feedback from block 114 to 108 may include a portion of a process that involves returning to an earlier step an indefinite number of times leading to a loop.
- the region of the global process flow diagram 100 exclusive of blocks 102 and 108 may rely on reference to a pre-existing motion model for the technique that the user is attempting or an algorithm which matches maximally similar points-of-time-progression through the technique between the user and the pre-existing model, for example, even in cases where the relative rates through process had been different at earlier points in the two models.
- This matching algorithm may run in real-time to assist in producing low-latency representations of body positions or in post processing to constrain accuracy. It may also be used in the review process to match up expert and user motions in side-by-side or overlay conditions.
- FIG. 2 illustrates a data flow diagram 200 for reference model predictive rendering in accordance with some examples.
- the data flow diagram 200 may be used for management of latency, such as due to computational overhead of depth sensor skeletal inference.
- the data flow diagram 200 may be used to generate accurate motion models from noisy depth sensor data.
- the data flow diagram 200 may be used to leverage mechanisms for additional one-off features.
- An approach to generating accurate models in delayed processing may include treating the depth-sensor-derived skeletal inference of body position as statistical data about the underlying motion rather than a representation of that motion itself.
- a process for slicing up a time-series of body constructions in a motion model may use a full time series of positions for individual body segments, creating “trajectories” as discussed above.
- the data flow diagram 200 includes a block 202 to perform statistical inference on depth-sensor-inferred time-series body segment positions.
- the block 202 may output a connected path, which may be referred to as a “proto-trajectory,” which may be sent to block 204 .
- the data flow diagram 200 includes a block 204 to perform smoothing. For example, smoothing may include interpolating additional not-directly-measured positions onto the proto-trajectory such that the implied continuous trajectory is unchanged.
- Block 204 may output smooth trajectories and output, such as to block 206 or block 210 .
- the data flow diagram 200 includes a block 206 to assemble full-body constructions in a time-series. The assembly may include breaking apart trajectory time-series to group like-time body segment positions for the full body and assemble into coherent human body constructions.
- Block 206 may output time series body positions, such as to block 208 .
- the data flow diagram 200 includes a block 208 to update a reference model, such as by sending time series body positions to a user account for use as the updated reference model.
- Block 208 may output an updated reference model for future operations, such as to block 214 , 220 , or 210 .
- the data flow diagram 200 includes a block 210 for trajectory alignment, including positioning of reference model trajectories into the same space as a “smooth trajectory” for example translationally or rotationally.
- Block 210 may output overlaid trajectory data points, such as to block 212 .
- the data flow diagram 200 includes a block 212 to perform denser statistical inference. For example, block 212 may output precise proto-trajectories, which are time inferred statistically from a mixture of an aligned trajectory from the reference model and a trajectory from the depth sensor data as feedback to block 204 .
- a targeted sensor may include a Microsoft Kinect from Microsoft Corporation, of Redmond, Wash., which is a depth sensor enhanced with a human skeleton inference algorithm that converts depth data to skeletal positional data.
- depth sensors may include a general human skeletal position sensor. The processes described herein may use data generated by any human skeletal position sensors.
- the data flow diagram 200 includes a block 214 to match reference model frames to frames from depth sensor data in real-time.
- Block 214 may output analogous user and reference model positions, for example to block 216 , 218 , or 220 .
- the data flow diagram 200 includes a block 216 that may be used for synching by displaying matched reference model positions, or leading, by displaying reference model positions that are ahead in time compared to the user.
- Block 216 may output visual information displayed to the user, for example on a user interface of a display.
- the data flow diagram 200 includes a block 218 for mode control, including automated control of whether or not the system is in latency reduction mode.
- Block 218 may output data for use in latency reduction when the user is executing a certain technique and not otherwise, which may be sent to block 220 .
- the data flow diagram 200 includes a block 220 for latency reduction.
- Block 220 may include projecting the user's position into the future by applying calculus concepts to the reference model data and applying changes predicted therein to the user's current position, for example, as quantified in real-time by sensors.
- Block 220 may output projected positions of the user so the rendering pipeline can deliver them closer to when the user is actually there.
- the output of block 220 is finding a predicted user position for a future moment which is precisely when the display system may be generating a visual frame which contains that positional representation.
- the data flow diagram 200 includes a block 222 for real-time interpolation.
- Block 222 may generate display frames that are not in synch with input frames coming from the sensors. For example, the Microsoft Kinect samples at 30 hz while high end VR display systems display at 90 fps (Note that when discussing sampling, the unit “hz” may be used, which means “times per second”. When discussing display the unit “fps” or “frames per second” may be used. These units may be interchangeably used in this disclosure).
- block 222 may generate time targets which are in between the frame timing of the Kinect sensors.
- a process may include adjusting the time parameter that defines the target display time on the latency reduction module to produce frames at a higher rate than the sensors are operating.
- the process from block 202 to 204 may be used to represent the path from depth sensor inference data to production of a reference model which is used in subsequent processing. Particularly, an operation that is the same whether on the first iteration of the system or on subsequent iterations.
- the process among blocks 204 , 206 , and 208 represent steps that occur in the process that are used on the first iteration of the production of a reference model or subsequent iterations.
- the input to the part of the process ( 202 to 204 ) may be modified by the below process among blocks 204 , 210 , 212 , and 208 for reference model productions after the first one (e.g., iterations).
- the process among blocks 204 , 210 , 212 , and 208 may take the smoothed depth sensor skeletal inference data and the current iteration of the reference model as input, and processes them together to produce more strongly statistically constrained trajectories for greater accuracy.
- the process among blocks 208 , 214 , 216 , 218 , 220 , and 222 may leverage the existence of a previously produced reference model to produce visuals at run time for a user who may benefit from richer feedback about their motion or timely information about what that motion may be.
- the process among blocks 202 , 204 , 206 , 208 , 210 , 212 may be used when developing an accurate reference model in a delayed-processing mode (pre-processing).
- the process among blocks 208 , 214 , 216 , 218 , 220 , 222 may be used when creating visual representations for the user at runtime (real-time).
- FIGS. 3A-3C illustrate smoothing processes and examples for post processing trajectories in accordance with some examples.
- FIG. 3A illustrates a flowchart showing a process 300 A for post processing trajectories.
- the process 300 A includes an operation 302 for performing statistical inference on position sensor data, such as from a depth sensor or a 3D position sensor (e.g., local positioning sensor).
- Position sensor inferences of human skeletal locations may be noisy measurements.
- Position sensor inferences may include statistical data points where point-like inferred locations for specific body segments surround the actual trajectory that the segment in question traversed.
- the process 300 A includes an operation 304 for separating body segment trajectories.
- only the series of 3-dimensional coordinate positions of a single body segment at a time are considered.
- the various series of the same type for all of the body segments that comprise a motion model are used.
- individual body segment trajectories may be processed one at a time. When this is done for all body segments then the whole motion model has been processed.
- the curve of the trajectory may be roughly the same in overall quality for all competent users.
- the scale of the curve may vary among users, or the start and stop points for discrete movements may vary between individuals.
- process 300 A may be used to first generate a rough trajectory from the depth sensor data. After that, process 300 A may be used to execute calculations on the rough trajectory to modify and fit a pre-fabricated trajectory to the rough trajectory.
- the first step in constructing a rough trajectory is to break the time series of depth-sensor-inference-delivered body segment locations into sequential subsets or containing three each. These are “sequential clusters” of three points.
- the three points may make up a moving window. For example, taking the points in series, the first, second, and third points may be one such sequential cluster, the second, third, and fourth may be another, and so on.
- each cluster may then have a line segment fitted to them.
- This may be a least squares regression on the three points. This may be overly computationally expensive and not of sufficient benefit, so a simpler method may be used.
- fitting a line segment to the sequential cluster may instead involve first calculating a “center of mass” location (average position among the points) for the three points to serve as a center point. Then process 300 A may be used to calculate the direction of the line segment between the two end points in the set of three depth sensor inferences and creating a line segment through the center point that has that direction. Calculating this direction may involve finding the direction of the vector which points from the first to the third point in the cluster.
- a final correction (which may be applied when calculating the average of the three points to determine the center point in the first place) is to hold this line segment toward the outside of the curve by weighting the central depth sensor reading point highest when calculating this “center of mass” average.
- the proper weighting may be something like 1,3,1 where those are ordered to match the time order of the cluster comprising three points.
- the final step in calculating the average may then be to divide not by the number of points, but instead by the sum of the weights (as in a standard weighted average). The more the line segment is to be held toward the outside of the curve, the larger the middle weight may be compared to the two outer ones.
- the precise optimal weighting may be ironed out in testing the software and adjusting.
- line segments may be strictly centered on the established center point and their total length may be half of the average distance between the line segments connecting the first and second point and the line segment connecting the second and third such point in the set of three depth sensor points. This distance may make it easy to connect all such line segments together once all of them have been found for the trajectory. If vectors a, b, and c represent the depth sensor reading point locations, then this length is (abs(c ⁇ b)+abs(b ⁇ a))/4. “abs” means “absolute value” or magnitude.
- Trajectories are not necessarily closed paths and this means they have start points and end points that are not the same point. In fact, it may only be special cases in human movement where they are closed paths. In the case of closed paths, the method above may work for all points in the trajectory because they may all have neighbors in the time series. As a result, in examples where the trajectory is not a closed path, a different method is used, to leverage the points that are available. In this example, the end points may be connected to the end of the nearest line segments found in the above process to create the start and end line segments.
- line segments After this set of line segments is established, they may not produce a continuous path (e.g., if the length of these segments is made longer, they may connect, in a special case).
- line segments have been kept to a length such that the gaps between the ends of each can be filled just by connecting their nearest ends (e.g., usually nearest in space, and using the nearest in the time series results in the right connection) and such that the full series of line segments results in a fairly smooth trajectory relative to other options for the length of inferred line segments.
- the process 300 A includes an operation 306 for smoothing of trajectories.
- Operation 306 may be used to smooth the previously generated curve out. Despite the inference method being designed to output a smooth trajectory, more smoothness may be desired.
- each line segment may turn the square into an octagon. In an example, it may turn it into a perfectly regular octagon. In an example, the central point of each side of a square is pushed out just the right distance such that the now eight points, if connected by line segments results in a perfectly regular octagon with all sides and angles being equal.
- the process described below is based on a weighted average concept. That weighted average finds these points for all regular polygons using dynamic weighing, which is discussed below in the advanced smoothing section.
- trajectory clusters processes may be repeated here.
- the trajectory generated in the previous step may be used by breaking it into sequential clusters of four points each. This set of four points then may contain three line segments. These may be the central line segment (between the two points toward the center of the cluster) and its two neighbors.
- V n is the vector representation of a point on the trajectory
- EV NM is the vector representation of the “extension point” of a line segment between two points in the trajectory.
- the two line segments that get extended are the two line segments that are neighbors of the central line segment.
- the central line segment may be bifurcated into two smaller line segment by adding a point in the middle that is not actually on the line segment. To find this point, points that extend the neighboring line segments by half in the direction of the central line segment may be used.
- V B and V C are sequentially the second and third points in the cluster and V A and V D are the first and fourth points in the cluster.
- the operations defined by the equation are the vector analogs of numerical calculations using standard mathematical definitions thereof and this carries through for vector operations for the present systems and processes unless otherwise specified.
- BP (referenced below) is the new “bifurcation point” which is the new point this whole process is designed to find. It is designed to add a new point to the trajectory which is both approximately (very close) on the trajectory and near the center point of the line segment which is being bifurcated.
- Weighted average—(EV BA +3V B +3V C +EV CD )/8 BP—BP is “bifurcation point” and is the defining point by which process 300 A may be used to break the old line segment into two new ones which are bowed out.
- 1,3,3,1 coefficients or “weights” in the weighted average
- Dynamic weighted average instead of having a pre-set weight scheme for all of these calculations to find the new point, process 300 A may use the naturally paired points (extension points for the two neighboring line segments being one pair and end points for the central line segment as another pair) and calculating the distances between the points in each pair. These distances may be the basis for the weighting given to that pair when the weighted average is calculated.
- the equation routine that defines that weighting is given in the advanced smoothing section.
- the process 300 A may be used to account for the line segments on the ends of the series.
- process 300 A may use a different weighting scheme. Because only one neighbor may be extended, it is given a greater influence. So does the end point because there is less influence in that direction without its extended point.
- EV BC is the result of vector subtraction of the interior most point of the three “c” from the middle point “b” with “a” being one of the end points of full set of points in the trajectory.
- the process 300 A includes an operation 308 for reassembly of trajectories and optional output (in first iteration this is where the algorithm stops because it has no reference model with which to perform the rest).
- the next step in updating the user avatar with new data is to make each trajectory of the previous avatar align with the newly formed trajectory from the new user data.
- This full process may be done independently for all trajectories and sub-trajectories before recombining trajectories into a full motion model.
- the first step is to rescale the trajectory from the previous avatar.
- Circles scale based on the radius.
- the curvature of a circle scales based on the reciprocal of the radius.
- the reciprocal of the curvature can be used as a scaling factor for a curved trajectory to make the two have the same scale characteristic overall (though not precisely in any local portion of the two trajectories).
- Curvature is a property of continuous curves and not discrete series of connected line segments (as in the trajectories) and process 300 A may use the delta delta of those trajectories. This may include executing the “delta” operation twice. The first time applying the delta operation involves subtracting vectors specifying the positions of neighboring points in the trajectory from one another where the prior one is subtracted from the latter one. The second execution of the delta operation is doing the analogous subtraction but now using the output from the previous delta operation.
- the result is many delta deltas for the full sequence of both trajectories. Scaling the previous user avatar's trajectory may be done with some form of an average over these.
- the local irregularities of the new user data may have their impact diminished by redefining the delta delta type calculations to involve first-step delta calculations (the delta that operates on position data directly as opposed to operating on the output delta data that comes from this first step) between more distant points as opposed to direct neighbors in the time-series.
- the exact time-series span to utilize may depend on the distance covered in the trajectory as larger paths may be less sensitive to depth sensor noise for this calculation.
- the calculation may be done on portions of a trajectory in the top x % of speed represented in the time series (calculated as the deltas between neighboring points in the time series). This may diminish the effect of depth sensor noise as well.
- the precise percentage to use may be a parameter that it tuned during testing of implementations.
- process 300 A may be used to scale the previous avatar. To do this, each of the vectors which represent points in the trajectory from the previous avatar is multiplied by the average delta delta from the user data and divided by the average delta delta from the previous avatar.
- analogous portions are used of the two trajectories for all calculations. This may include sampling between ranges of each trajectory which are analogous to one another.
- the entire motion model of the reference model may be scaled at once by calculating a global curvature ratio across all trajectories in both the reference model and the new user data by averaging the ratio for each body segment's trajectory. This may lead to some trajectories that are a poor match between the two.
- scaling may be achieved just by measuring the dimensions of the two users. This is another way to scale the entire motion model of the reference model all at once.
- Center of mass alignment may include an optional first approximation.
- the option al first approximation includes moving the starting point for the trajectory from the new user data to the location of the starting point of the previous user avatar's trajectory translating the entire trajectory from the new user data along with its starting point.
- a second approximation may include minimizing the sum of the distances from the points on the trajectory from the new user data to the nearest points on the trajectory from the previous user avatar.
- the center of mass of each trajectory may be calculated.
- the vector that defines the location of the center of mass of the reference model trajectory may be subtracted from the vector that represents the center of mass of the new user data trajectory. Add this vector to all points in the reference model trajectory translating it over onto the space that contains the new user data trajectory.
- a third approximation may include breaking the trajectories into n sequential clusters labeling each with their order relative to each other in each trajectory's time series. Calculate the center of mass of each cluster. Calculate all the vectors that define the distance and direction from all clusters in the reference model trajectory to their analogous clusters in the new user data trajectory. Average these vectors. Add the averaged vector to the reference model trajectory to get better alignment.
- a fourth approximation may include repeating the method used in the third approximation but with m sequential clusters (such that m>n).
- the technique may continue with successively improved approximations using greater numbers of clusters for a certain number of iterations or, to be more efficient, do it until the magnitude of the averaged vector that may be added to the reference model trajectory drops below a certain threshold.
- it may be useful to have an initial n-value near 4 or 5 and iterations after that using a value that adds 2 or 3 to it each time. This way a minimal number of cluster center of masses and averaged vectors are calculated up front so that if it does converge quickly the system may not have computed a huge number of these when it was unnecessary.
- Planar alignment may include triple dimension reduction analysis for co-planarization.
- Planar alignment may include using a plane that best fits each of the trajectories and that process 300 A may be used to orient these planes with one another such that the trajectories associated to these planes translate and rotate with them when those planes are made to orient together.
- the result is a good alignment of trajectories.
- these are trajectories in 3D space so there is no requirement that any plane may fit the data well, but this doesn't prevent a plane from existing which is a better fit than all other planes (it may minimize Euclidian distance from the points in the trajectory to said plane compared to all other planes).
- Process 300 A may be used to execute operations that may roughly align the planes as if they were found, rather than actually finding these planes for comparison.
- the approach for process 300 A may be to consider two coordinates at a time in calculating three components which together create a rotation within a plane consisting of those two coordinates.
- the word “components” is used with the description of this process as including one or more of:
- the combination of these three components creates a set of vectors that may rotate each trajectory's representation in the two coordinate plane on which they operate without affecting it in the direction defined by the omitted coordinate.
- one of the three coordinates may be dropped for all positions in specified in both of the trajectories create simplified trajectories in a “truncated parameter space” (e.g. (x,y,z) becomes (x,y), (x,z) or (y,z)). It may be the case that all three combinations are used to find “rotation vectors” in the full (x,y,z) space.
- Doing this may require adding the resulting (x,y), (x,z), and (y,z) vectors together in the following way (x 1 ,y 1 ,0)+(x 2 ,0,z 1 )+(0,y 2 ,z 2 ) for each point and then dividing the resulting three parameter vector by 2. In another example, it may be the case that it is only done twice, say for (x 1 ,y) and (x 2 ,z) and then dividing the x result by two as in ((x 1 +x 2 )/2,y,z).
- the pertinent question then becomes how to find the two-parameter vectors used in either case to generate the 3D rotation vectors that may ultimately be added to the already center-of-mass-aligned reference model trajectory thus rotating it to an on-plane position with the new user data trajectory.
- the process 300 A may be used to first find the direction unit vector. To do this, process 300 A may be used to take the average of specific set of vectors which may be calculated based on finding distances between points in the two trajectories (the operation that defines this calculation is given below) and divide by the absolute value of the averaged vector. When this averaged vector is (0,0) then process 300 A may include recalculating throwing out one of the data points. Choosing which one to throw out may be arbitrary in trying to avoid the average vector being (0,0) and process 300 A may include selecting one, such as the first in the series. The critical thing is simply to avoid dividing by zero.
- process 300 A may be used to narrow the search space for the “nearest” reference model point in the new user data trajectory.
- the process 300 A may be used to seek a match for the points of the previous user avatar's trajectory in sequence starting with its first point in the time series. For example, starting by matching up the first point in the previous user avatar's trajectory with the first point and second point in the new user data trajectory and choosing the shorter vector may be an initial operation.
- process 300 A may include move on to the second point in the reference model's trajectory matching it up with the point from the new user data trajectory chosen previously as well as that point plus one time step and that point plus two time steps, again, choosing the shortest vector.
- This same rule may be applied throughout the full time series of the reference model's trajectory.
- the final point in the new user data trajectory's time series is reached, and used twice the analysis may stop there and the next step may be run with the matches made and stored to calculate the delta vectors.
- the series of delta vectors generated by this process thus constitutes the set of delta vectors used to calculate the average vector.
- process 300 A may be used to calculate their magnitude.
- the process 300 A may be used to map representations of each of these vectors onto a 2-D plane and to fit a line to the resulting data points.
- This data set may be called the “time-indexed data”.
- the data points here consist of two coordinates where the x-value is the time-value for the point in the reference model's trajectory (“sample time”) which each delta vector is associated with and the y-value is the magnitude of the delta vector
- process 300 A may be used to subtract as follows. If its (x,y) value is (m,n) then a vector of (m,0) may be subtracted away leaving a vector (0,n). This same vector may be subtracted from all of these data points resulting in a shifting of the whole data set leftward a distance of m. In this case, the x value represented the time index of the point in the trajectory, so shifting it left a distance of m results in a modified time index.
- Zero-point-distance-indexed data is created with the x-value being the distance in the coordinate space of the original reference model trajectory (remember within this phase of the analysis, one of the three coordinates has factored out and given values of “0”, so this distance is the Euclidian distance calculated from only the other two coordinate values) from the zero point and the y-value still being the vector magnitude.
- the effect of “negative distance” may be created by having points on one side of the zero point having different sign (positive vs. negative) from points on the other side of the zero point.
- the zero-point-distance-indexed data is further modified in the following way.
- the x-value of this set (distance from the zero point) may be multiplied by the shifted x-value from the associated point in the previous data set divided by the absolute value of the shifted x-value from the same associated point in the previous data set.
- the effect to modify the distance values for the points before (in the original time series) the zero point such that they become the negative of the distance to the zero point while keeping those distance values the same for points after the zero point.
- This allows use of the negative direction of the direction unit vector for the points that use it (e.g., the ones close to the zero point) and the positive direction of the direction unit vector for the points after the zero point.
- process 300 A may be used to fit a line to this zero-point-distance-indexed data where the line outputs the proper multiplicand for the direction unit vector the point in the reference model's 2-D trajectory's position in space relative to the zero point.
- process 300 A may be used to add this final vector to its associated (original-time-index-matched) vector which represents each point in the reference model's full 3-D trajectory to get the new, aligned trajectory.
- Time apportionment may include choosing a series of points along the smoothed trajectory such that the trajectory can be displayed as if it was captured at an arbitrary rate of frames per second.
- frames per second is a global standard for a motion model, meaning all body segment trajectories mush have the same frame rate in order to be reconstructed into a time series of body positions in the end.
- each body segment may have different frame rates that do not align (at least not in the majority of pairwise cases) and some global output frame rate may be targeted.
- interpolation methods may be used to find positions between the stored frame position values for all body segments where these interpolated positions line up with targeted frame times. This may properly define positions of all body segments at all the times needed for each frame.
- the process may be done for an arbitrary motion model given presently established in the art interpolation methods, and the output frame rate may match the frame rate that the new user data was captured at with the depth sensor.
- This may be useful for taking motion models that have smoothed trajectories which adds an exponential number of points to the trajectory (the number of points roughly doubles each smoothing iteration) and reducing it to only on-frame points which lie on the trajectory.
- time apportionment may preserve the relative distances between points which is representative of the velocity at each part of the trajectory.
- the approach is to create a time series of the magnitude of the delta vectors between adjacent points in the depth sensor data and divide each by the sum of all of these delta vectors. This gives the proportion of the total distance covered that was covered in the time between each frame. Then each of these proportions is multiplied by the total distance of the smoothed trajectory received from all of the previous processing steps. Points are then chosen starting with the beginning of the time series and working toward the end such that each is the distance from the previous as assigned by the distances calculated in the previous steps. These new points constitute the output time series of positions for the body segment in question form the time apportionment process.
- the analogous process can be used to create matching frames between the new user data and reference model trajectories of the same body segment.
- a more complex, but possibly useful revision of this may include trying to quantify progress in the direction of the hypothetical tangent to the continuous representation of the trajectory at center point between each pair of adjacent data points in the original depth sensor data as opposed to the magnitude of the direct delta of the adjacent points.
- the first and last deltas of may be calculated in the way described above since the tangent-based method may be unavailable.
- the interior deltas may be calculated as follows. First, the time series of deltas may be calculated. Then a time series of what we'll call retrograde 3-gap deltas may be calculated. These are calculated by subtracting a vector representing each point in the series (until the last three) from the vector representing the point three points forward in the time series for each. These 3-gap deltas may be representative of the general trajectory direction over that three-delta series in the trajectory. In an example, a larger portion of trajectory is less susceptible to misrepresentation of the direction as the basic delta if indeed the measured data is noisy. The time series of the deltas and 3-gap deltas may have the same number of deltas if the first and last deltas are removed.
- the remaining time series in both may then be matched up in order.
- the 3-gap deltas are then divided by their magnitude to give a 3-gap delta direction unit vector.
- dot products may be calculated between the deltas and their paired 3-gap delta direction unit vectors which gives the distance in the direction of the 3-gap delta direction unit vector that the delta vector covered. We'll call the result “progress vectors”.
- the resulting time series of progress vectors along with the first and last delta vectors that were calculated the simpler way may have their magnitudes calculated and summed. Then individual vector magnitudes divided by the sum to get a time series of percentages of progress through the trajectory. Then this is applied to the new trajectory to get the points for the new trajectory time series.
- the process 300 A includes an operation 310 for aligning previous reference model trajectories to smoothed trajectories.
- Operation 310 may include iterating operations 306 - 310 for denser inference (e.g., inference based on new data and reference model data resulting in a form of a combination). For example, operation 306 may be repeated to apply smoothing again.
- Operation 308 may be repeated to reassemble trajectories again.
- Denser statistical inference may include redoing the task of fitting a line segment to each time-series-sequential cluster of three points in the new depth sensor data specific to a certain body segment.
- One of these is to generate a line segment following the procedure already used (or, indeed stored from when it had been done the first time around, making this a look-up task).
- the second is to generate a line segment specific to the previous user avatar trajectory. As described below, the two are then averaged in order to generate a final line segment which may be fed into the rest of the process, which, as described above in the description of normal statistical inference of trajectories turns these line segments into a trajectory.
- the process 300 A may start by assigning a sequential cluster of three points (reminder, a sequential cluster is a number of sequential points) in the new user data trajectory such that there is a minimized difference in position between a central new depth sensor data point and the central point in a sequential cluster in the reference model trajectory. This may be done for each interior point in the time-series of the new depth sensor data. Both trajectories may utilize sequential clusters of three points for forthcoming steps in the process and the key is that these are matched up in space as defined above.
- These clusters are comprised by a central point which is the “interior” point (in this case, “interior” is speaking to interior to the full trajectory, while “central” is central to a sequential cluster within a trajectory and each interior point has a sequential cluster of which it is the central point) itself, the point temporally immediately before, and the point temporally immediately after within said trajectory.
- process 300 A may progressively check if a better neighboring assignment leads to reduced distance between the center (average point) of the two clusters. If a reassignment to a neighbor reduces this distance for central points in a cluster of these interior points, then the reassignment may be made. Then process 300 A may check again and repeat the process until no local reassignment reduces this distance. At this point, the assignment is final and process 300 A may be used to progress forward one temporal step in each data set until all interior points in the new depth sensor data is exhausted.
- the process 300 A may use the same center point as used for creating the line segment specific to the new depth sensor data, but now run two analyses in parallel and average them.
- This average may have dynamic weighting. In lower velocity portions of the trajectory, the new depth sensor data may have a higher weight and in higher velocity portions they may have equal weight or the previous user avatar trajectory data may have higher weight.
- the two clusters of three points are used to position the central point of a line segment, the full series of which may be the initial scaffolding of the ultimate trajectory. Now the two clusters of three points may be used to set the direction of the line segment while preserving the position of the central point.
- the central points of matched clusters are not averaged and instead the central points from the new user data may be used as the center points for these line segments. For example, this may be more computationally efficient given limited gain from executing the full averaging method of the two types. Additionally, using the new depth sensor data may give more consistent spacing between these points in terms of how well it maps to the relative velocities seen in the users motion.
- An example technique may include, for the new depth sensor data, this is done as in advance which is explained in “fitting a line segment to each sequential cluster” (under the first and second “description of general scheme for post processing modeling
- a first vector may be created, which may be treated as a line segment.
- a 1st, a 2nd, and a 3rd There is a chronological order to the points in the sequential cluster, a 1st, a 2nd, and a 3rd.
- the delta vectors between all three pairs may be determined. This means there is a vector that is the 1st to 2nd vector, a 2nd to 3rd vector and a 1st to 3rd vector.
- Step one is to compare the 1st to 2nd vector's magnitude to the 2nd to 3rd vector's magnitude. This comparison determines which points we'll average to create a new “direction vector” which may then become a line segment. If the 1st to 2nd vector and 2nd to 3rd vectors are equal in magnitude, then the 1st to 3rd vector may become a direction vector. If the 1st to 2nd vector is shorter than the 2nd to 3rd vector then a weighted average of the 1st and 2nd points in the cluster may be the starting point of the direction vector and the 3rd point in the cluster may be the end point. If the 1st to 2nd vector is longer than the 2nd to 3rd vector then a weighted average of the 2nd and 3rd points in the cluster may be the ending point of the direction vector and the 1st point in the cluster may be the starting point.
- the weighting for the averaging may be as follows. In whichever case, the 1st or 3rd points may have a weight of 1.
- the 2nd point's weight may be given by the absolute value of the magnitude of the 1st to 2nd delta minus the magnitude of the 2nd to 3rd delta. This may all be divided by the magnitude of the 1st to 3rd delta. In this concept, the closer to equal the 1st to 2nd delta and the 2nd to 3rd delta are, the lower the weight of the 2nd point in determining the direction of the direction vector.
- This direction vector and the one from the new user data are then scaled to a quarter of the length of the average of the length of the two line segments (the one from the first to the second point in their cluster and the one from the second to the third point in their cluster) in the present sequential clusters for each. If they are in opposing directions to one another (testable by seeing if the cross product is positive or negative) one of the vectors may by multiplied by ⁇ 1 in order to orient them together. The two vectors are then averaged giving a composite vector.
- This composite vector is then attached to the established center point by moving the “tail” or starting point of the vector to that center point (in normal Cartesian representation, a vector's staring point is placed at the origin and its tip is placed at the point indexed by the vectors components values treated as Cartesian coordinates).
- This composite vector is then copied and reflected across this starting point.
- the end points of the two vectors (the vector and its reflected vector) where both are attached to the starting point are used as the end points of a line segment. This becomes the line segment assigned to the center point of this particular sequential cluster.
- This method of creating a line segment from a vector determined by a weighted average of the three points in a sequential cluster may work well regardless of whether the central point was determined only by the new user data or if both new user data and reference model data are used.
- zero, one, or both of the new user data and reference model data sequential clusters may have line segments fitted to them using the weighted average of the three points to give the 2nd point in the cluster some weight if it is not in equidistant from the other two points as described above.
- An example technique may include, for both clusters, a simple 1st to 3rd delta is used as a direction vector and then these are averaged using the same dynamic weighting as was used to position the central point (which may be velocity dependent). Finally the central point of this line segment may be positioned at previously found center point for this new user data cluster as opposed to adding in the extra work of averaging with a center point from the reference model trajectory's cluster.
- An example technique may include, using the same system as a technique described above, but performing the technique twice. Once for the previous user avatar trajectory cluster with the center point closest to the new depth sensor data cluster's center point and then again for the 2nd closest previous user avatar trajectory (these two may be sequential). Then all three are averaged. Weighting may be assigned by how close the central point in the reference model trajectory's cluster which the line segment to be averaged is based on is to the central point in the new depth sensor data cluster. Zero distance may give a weight of 1.
- the weight may be the distance from the new user data trajectory cluster's central point to the other reference model trajectory cluster's central point divided by the sum of the two distances from the reference model trajectory clusters' two central points to the new depth sensor data central point. This makes sure that the closest reference model trajectory cluster has the most influence with that influence scaling based on the relative closeness of the two clusters.
- the dynamic average scheme uses velocity as the key input. High velocity portions of the trajectories are intended to give more weight to the new user trajectory. To generate weights, a function which may take as a configuration parameter the maximum distance between two sequential new depth sensor points in the given trajectory may assign the weights to meet these requirements.
- weights may be applied to the line segments generated by the reference model trajectory clusters based on which new user data cluster each was matched to.
- the new depth sensor data line segments may all receive the highest weight when averaging.
- process 300 A may just pass the line segment from the new depth sensor data through to the next step.
- Creating a first user avatar from expert model may include using an expert model (e.g., a quantified model of the technique taken from a highly trained expert in the motion) as a previous user avatar.
- an expert model e.g., a quantified model of the technique taken from a highly trained expert in the motion
- the match from the expert model to the new depth sensor data is much worse than that match may be for a previous user avatar.
- This makes, scaling, translating, and rotating methods more significant and may force multiple iterations of them.
- This may include piecewise scaling where specific portions of the reference model are scaled differently than others but where adjacent endpoints between these parts preserve their directional orientation relative to one another and have their distance between the scaled with a value similar to the scaling of the chronologically first part, the chronologically second part, or a mix of the two values.
- new depth sensor data may be used pairwise with one of the sets of new user data standing in as the reference model. Then, the output of that full process applied to one reference model data set. It may rely on 5 or more iterations of pairwise use of new depth sensor data before the final output is of sufficient quality to call it a first user avatar. And, the weight given to the new user data relative to the reference model may scale with the number of iterations of new-user-data-only inference in that case. In an example, multiple different new user data sets may be used to leverage statistical convergence to the mean. The same new user data may not be used multiple times as the result may be the same motion each time.
- Creating a first user avatar from denser statistical data without a reference model may include using systems described above.
- each system may rely on a pre-existing motion model used to help smooth out the new depth sensor data due to the noise inherent in positioning human skeletons in space using skeletal inference from depth sensor readings.
- a new motion model may be generated.
- a truncated version of concept 1 may be used. After the smoothing process is done for the statistically inferred line segments derived from the new depth sensor data alone, this is sent to the reconstruction step and this is the first reference model. With this in hand, and another set of new depth sensor data, the full process may be applied to determine a “first user avatar”.
- the process may be iterated one or several more times to ensure sufficient quality before releasing what a “first user avatar”.
- the process 300 A includes an operation 312 to output the updated reference model.
- FIG. 3B illustrates example smoothed line segments.
- Operation 300 B 1 illustrates smoothing within individual trajectories. Within each trajectory, “sequential clusters” of four points may be operated on.
- the process may operate on all possible sequential clusters (with adjustments for the beginning and end of the time series of the trajectories to address those points differently) so as to find new points each of which is a new point in between two previously existing points thus creating a more densely defined trajectory.
- Operation 300 B 2 illustrates an extension of outer line segments. For example, this includes extending the outer line segments such that their outer-most endpoints remain in the same spot, they still pass through the points where their inner-most points were, but continue past that point a distance of half their original length.
- Operation 300 B 3 illustrates generating a weighted average.
- the four points in the diagram represented with large circles have their positions averaged in order to generate the new point.
- This new point is the endpoint of two new line segments which replaces the central line segment in the sequential cluster.
- Operation 300 B 3 is used for finding the right weighting scheme for a weighted average of the four points. Examples of static weightings are described throughout the present disclosure, as well as a scheme for accurate dynamic weighting.
- Operation 300 B 4 illustrates identifying and storing new line segments in the new trajectory.
- the single line segment at the center of the cluster has been replaced by the two line segments which each include the new point.
- the old line segment may still be used to perform smoothing in other sequential clusters in which it would be included, so it may be stored.
- the operation 300 B 4 may work through trajectories one sequential cluster at a time before working through other trajectories. When all trajectories have been processed, that is one iteration of smoothing.
- FIG. 3C illustrates a flowchart showing a process 300 C for smoothing trajectories.
- the process 300 C includes an operation 320 to break motion data into body segment trajectories.
- the process 300 C includes an operation 322 to break trajectories into sequential clusters, for example of 4 points each.
- the process 300 C includes an operation 324 to find the extension points for each cluster.
- an extension point is halfway between a location generated by reflecting an endpoint of the cluster across its neighboring point toward a center line segment in the cluster and a location of that neighboring point itself.
- the process 300 C includes an operation 326 to use weighted average to define an extra point which is inserted into the trajectory after smoothing has operated on sequential clusters (e.g., all clusters).
- operation 326 outputs “bifurcation points” which are the new points that govern the two new line segments that may replace the old central line segment in the cluster.
- the process 300 C includes an operation 328 to generate bifurcation points at the beginning and end of the trajectory (e.g., using a modified weighted average).
- the process 300 C includes an operation 330 to apply weights. Operation 330 may be applied iteratively with operations 326 and 328 .
- the process 300 C includes an operation 332 to output a smoothed trajectory.
- the smoothed trajectory may include original points plus points in between the original points that preserve the implied curvature of the trajectory.
- FIGS. 4A-4B illustrate a flowchart and a diagram showing noise reduction and lateralization processes ( 400 A and 400 B 1 - 400 B 4 ) in accordance with some examples.
- This optional routine reduces noise in the depth sensor data by creating interdependency within each trajectory so each point in a trajectory helps to constrain the rest.
- Curvature varies smoothly in most natural motion at human scale if that motion is measured or “sampled” at a high enough rate.
- the process 400 A is conceptually similar to “smoothing” but is executed with some key differences.
- process 400 A may be used to move the central point in a cluster of 5.
- process 400 A may be used to extend the exterior line segments by their full length toward the middle.
- Weighting schemes can be tuned for purpose.
- Process 400 A may converge to the target over many iterations, so the original position of the central point may logically get the lion's share of the weight
- a dynamic weighting system that uses the ratio of the distance between the extension points to the distance between the neighbors offers better tuning.
- the function is complicated and is elaborated on in the “Advanced Smoothing and Lateralization section”.
- the process 400 A includes an operation 402 to break motion data into body segment trajectories.
- the process 400 A includes an operation 404 to break trajectories into sequential clusters, for example of 5 points each.
- the process 400 A includes an operation 406 to for each cluster, find the extension points.
- an extension point is halfway between a location generated by reflecting an endpoint of the cluster across its neighboring point toward a center line segment in the cluster and a location of that neighboring point itself.
- the process 400 A includes an operation 408 to perform a weighted average of the central point, its two neighbors, and the two extension points generated.
- the process 400 A includes an operation 410 to keep the endpoints of the trajectory in their previous location.
- the process 400 A includes an operation 414 to use a modified weighted average for the points that are neighbors of the endpoints.
- the process 400 A includes an operation 412 to apply weights. Operation 412 may be applied iteratively with operations 408 and 414 .
- the process 400 A includes an operation 416 to output a lateralized trajectory.
- process 400 A may be used to leverage the concept of smoothing which allowed new points in a trajectory to be generated in between the previously existing set of points where those new points roughly preserved the implicit curvature of the trajectory.
- the consideration of limiting computational complexity may impact the smoothing concept.
- the concept is applied in a way which leverages the interdependence of the points on a trajectory to bring outlying points back toward the implied trajectory.
- This “lateralization” is an optional step which can be applied to the positions of the raw depth sensor data, the representative trajectory generated when statistical inferred line segments are connected, or even later in the process (such as after smoothing) to further produce smooth representation of motion.
- process 400 A may be used to employ sequential clusters that was used in fitting line segments to data as well as smoothing.
- sequential clusters of 3 points may be used.
- smoothing clusters of 4 points may be used.
- clusters of 5 points may be used.
- the process moves the central point in the cluster of 5 points closer to where it may be expected to be based on the positions of the other 4 points in the cluster.
- This location expectation is further based on the implied curvature of the trajectory as defined by the 5 points assuming they are actually noisy data about an underlying trajectory that has less volatile curvature than a path actually through the 5 points may have.
- process 400 A may be used to extend the two exterior line segments in the cluster to two times their full magnitude with the extra length extending toward the central point.
- V n is the vector representation of a point on the trajectory and EV nm is the vector representation of the “extension point” of a line segment between two points in the trajectory (here those points are generic “n” and “m” but above are specified as “a” and “b” as well as “d” and “e”) where said extension point may become an input to a calculation that repositions the central point in the cluster.
- V b and V d are sequentially the second and fourth points in the cluster and V a and V e are the first and fifth points in the cluster.
- IP (referenced below) is the new “iterated point” which is the new point this whole process is designed to find. In this conception, IP is closer to the implied trajectory that the set of points is clustered around. When this is done for all interior points in the trajectory the set of newly generated iterated points may become the new representation of the trajectory.
- IP is “iterated point” and may be the position that may replace the previous vc in the cluster once all iterated points for the trajectory as it exists prior this step are found.
- 8,1,1,1,1 coefficients or “weights” in the weighted average
- Process 400 A may generate, for example, 12,1,1,1,1 on the extreme end of low risk/slow converging and 2,1,1,1,1 on the extreme end of high risk/fast converging.
- Process 400 A may be used to account for the line segments on the ends of the series.
- the point being adjusted was the central point in a cluster of 5 points.
- this cluster moves along the trajectory as a “kernel” does in image processing (albeit as a one dimensional analog) so it takes all 5 point sequential cluster positions that are possible given the sequential set of points in the trajectory. It cannot take positions that put the two earliest or the two latest points in the in the trajectory in the center of a 5 point cluster. So a different scheme may be used for those cases. As noted above, this may not be a problem if trajectories re closed paths that had matching starting and end points, but, in general, trajectories are open paths that start in one spot and end in another.
- Process 400 A may leave the two points on the extreme ends of the trajectory in place.
- V c the point to be moved is labeled V c even if either of V a and V b or V d and V e do not exist. Then points that do exist that are chronologically prior to V c may be V a and V b . Likewise, points that do exist that are chronologically after V c may be V d and V e .
- IP (EV dc +2 n+1 *V c )/(1+2 n+1 )—where n is the number of times the process has been iterated.
- n is the number of times the process has been iterated.
- the effect of this is to reduce the influence of the EV dc each iteration so as to keep the end point from migrating to the location of V d over a large number of iterations. Without this, the impact of EV dc may be diminished with this weighting scheme for each iteration.
- IP (EV ba +2 n+1 *V c )/(1+2 n+1 )
- process 400 A may use a different scheme.
- V c Assume the point to be moved is labeled V c even if either of V a or V e exist.
- IP (2EV d e+V b +V d +8V c )/(12)
- IP (2EV ba +V b +V d +8V c )/(12)
- the key feature for the system for moving the points on the ends of the trajectory is the controlling mechanism for the influence of the EV such that these points don't move in line with or on top of their two nearest neighbors.
- Key features for the next inward points include the double influence of the EV point generated from the extension of the interior most line segment in the cluster and the recalcitrance to movement of the central point created by giving it more weight than the other three points combined.
- lateralization and smoothing are both about allowing neighboring points influence in determining the position of a central point.
- the central point already exists, but is moved in the process.
- smoothing a new point is generated.
- the relationship between the distance between the closest neighboring points to this central point and the distance between the extension points generated in both processes are used to define the weight that the extension points have relative to the neighboring points in smoothing and relative to the central point and neighboring points in lateralization.
- This decoding involves finding the ratio of the magnitude of the delta between the neighboring points to the magnitude of the delta between the extension points. This ratio is applied to a function that then outputs the ratio of the neighboring points' weights to the extension points weights in the weighted average.
- the sum of all the weights is the divisor.
- Special cases include (end points, 1st interior points, very low distance ratios).
- extension distance for the first interior points may be used due to the lack of the extra neighbor needed to generate an extension.
- process 400 A may be used to determine twice the distance from the interior extension point to the plane which bisects the endpoint and the second interior point. For example, the magnitude of the vector from the end point to the second interior point—(2*(cos(180 magnitude of the vector from the end point formed by the lines connecting the end point to the second interior point and second interior point to the third interior point)*the magnitude of the vector between the second interior point and the third interior point)).
- first interior point exitterior-most interior point
- second interior point internal neighbor of the exterior-most interior point
- Neighbor points—neighbor points weight 3+((neighbor distance/extension distance) ⁇ (3/2))/(neighbor distance/extension distance)
- the extension of the neighboring line segments is by half of the length of those line segments as opposed to by the full length of the line segments in lateralization.
- the sum of all the weights is the divisor.
- extension distance for the end line segments may be used due to the endpoint lacking the neighbor needed to generate an extension.
- process 400 A may be used to determine twice the distance from the interior extension point to the plane which bisects the endpoint and the first interior point. For example, the magnitude of the vector from the end point to the first interior point ⁇ (2*(cos(180 ⁇ the angle at the first interior point formed by the lines connecting the end point to the first interior point and first interior point to the second interior point)*half the magnitude of the vector between the first interior point and the second interior point)).
- first interior point and second interior point may be symmetrical with respect to the two ends of the trajectory.
- FIG. 4B illustrates a lateralization process for a trajectory.
- Operation 400 B 1 illustrates lateralization within an individual trajectories.
- “sequential clusters” of five points may be operated on.
- the process may operate on all possible sequential clusters (with adjustments for the beginning and end of the time series of the trajectories to address those points differently) so as to find a new set of points. In an example, when all of the points in this new set are found, a new, updated, trajectory is formed.
- Operation 400 B 2 illustrates an extension of outer line segments.
- the length of the outer line segments in the cluster may be doubled such that the center point of the new ones are located where their interior most point used to be (the endpoint they shared with their neighboring line segments).
- Operation 400 B 3 illustrates generating a weighted average.
- the five points in operation 400 B 3 represented with large circles have their positions averaged in order to generate the new point.
- the new point may replace the central point in the sequential cluster (and closest to the central area of the five points being averaged).
- Operation 400 B 3 is used for finding the right weighting scheme for a weighted average of the five points. Examples of static weightings are described throughout the present disclosure, as well as a scheme for accurate dynamic weighting.
- Operation 400 B 4 illustrates identifying and storing a new point in the new trajectory, for example, when the full process is completed.
- the operation 400 B 4 may proceed to the next sequential cluster for its processing until the trajectory is exhausted.
- Lateralization de-noises the trajectories in a motion model, or, if over-done, it also smoothes out characteristic features. Smoothing densifies reference model trajectories but locks in noise and other features. So the approach here is to lateralize just the right number of times before smoothing or interpolating. In an embodiment, iteration follows smoothing which follows lateralization.
- Smoothing adds data points, by adding enough to cover all possible time targets that are be used during latency reduction or real-time interpolation.
- Some possible real-time interpolation time targets may not align well with ratios defined by powers of 1 ⁇ 2, such as input sampling at 30 hz and outputting frames at 90 fps giving a 1 ⁇ 3 ratio.
- a weighted average of the values in the look up table for example, for frames 1 ⁇ 4 and 1 ⁇ 2 of the way between two input frames, is used to find the approximate values for the frame 1 ⁇ 3 of the way between those two input frames. Smoothing and interpolation may stop when all targeted times for predictions have all relevant 1-retrograde deltas and RM-RPA vectors filled in for the time targets that can be anticipated.
- the system may also measure a global representative curvature for a trajectory (delta delta). This may be an average and may also be local, but may be better if calculated among more distant line segments.
- Lateralization as described elsewhere in this detailed description may include bringing both the global representative change in local curvature and the global representative curvature down.
- the global curvature may change slower than global representative change in local curvature. It may change very minimally over the first, say, four iterations.
- the objective may include getting the global representative change in local curvature below a certain threshold, however, it may be damaging to get to that threshold if global curvature is dropping too much.
- the system may continue iterating lateralization until the global representative change in local curvature drops below an aggressive threshold (low-valued threshold) or the global representative curvature drops below a certain threshold (where that threshold may be a certain percentage of its original value).
- an aggressive threshold low-valued threshold
- a certain threshold where that threshold may be a certain percentage of its original value
- the system may check if global representative change in local curvature is below a conservative threshold (higher-valued threshold than the aggressive threshold), and if so, stop lateralization.
- the system may iterate until the global representative curvature divided by the global representative change in local curvature is above a certain threshold.
- This calculation and threshold may be designed to stop the process when a balance is achieved between reducing the noise measurement and keeping the overall curvature relatively high in cases where more idealized thresholds were unable to achieve the right amount of de-noising at very minimal damage to overall curvature.
- the system may move on to densification (smoothing and interpolation).
- FIG. 5 illustrates a diagram illustrating an example pruned nearest neighbors process 500 in accordance with some examples.
- the process 500 includes an operation 502 to calculate velocity direction for key body segments for frames of incoming motion data.
- the process 500 includes an operation 504 to compare position and velocity direction values to like data taken from frames of a previously acquired model of the same attempted motion.
- the results of operation 504 may be output to operation 506 or 512 below.
- the process 500 includes an operation 506 to prune the field.
- the process 500 includes an operation 508 to advance the field one step forward in the time series for the previously acquired model.
- the process 500 includes an operation 510 to moving the field forward in time based on the velocity ratio between the incoming motion data and the previously acquired model.
- the process 500 includes an operation 512 to generate sets of values in the time series of the previously acquired motion model as candidates.
- the process 500 includes an operation 514 to output Nearest Neighbor match of frame from previously acquired model to current frame of incoming motion data.
- Operation 514 may include using the output of operation 510 or 512 .
- the pre-specification involves setting the core segment at a rigidly pre-defined location and finding a next point which is in a pre-defined direction from that point, and a third point in a pre-specified plane with the other two and such that their distances from each other are the right scaled distance (based on scaling factor that adjusted the reference model to the user size).
- the core point is translated to its pre-specified point first (for both the reference model and the user avatar) via a translation. All points in both models are moved with this translation to as to preserve the internal location relationships between all points in both models for each frame. Then the rotations that are applied to bring the other points to their specified locations are also applied so that the entire models are rotated, again in a way that preserves the internal location relationships between all points in both models for each frame.
- total Euclidian Distance between the body segments in the user's body representation from depth sensors and all like body segments in the reference model body constructions in the field of reference model body positions that are under consideration (this “field” may be the full set of body positions in the time series or may be a pruned set based on routines defined below).
- total in “total Euclidian distance” implies summing each individual one.
- weights may be applied in order to prioritize certain body segments. If a small set of body segments do most of the work, some weights may be zero, thus effectively eliminating some segments from the calculation (and making the system more efficient).
- This calculation may be a sum or a product of the body segment distances calc and the velocity directions calc and one or the other may be normalized by giving it a different weight from the other.
- Process 500 may apply a scheme that automatically reduces the scope of the possible reference model positions that are under consideration based on the logic of the “time series”. Each body position removed from consideration saves all computation involved in calculating the various Euclidian distances between the reference models parts plus the velocities of its parts and like quantities in the new user data.
- the pruned fields that the nearest neighbor calculations may operate on may be selected based on performing the operation on earlier candidates in both time-series and considering the most recent nearest neighbor that had been chosen in that way. Then process 500 system may advance forward in the time series one step for each model and search in that stepped forward field. Of course this leaves the question of where this may all start. It may start at the beginning of the time series for both models working forward one time step in each model from the matched positions in the times series that was found in that first search. A trigger field (explained below) may modify this by giving a different basis for the first match, however subsequent matches may proceed as described here in that case.
- the match may exist for frame i+1 to be j+1.
- j itself, j+1, or j+2 may be the actual match.
- j ⁇ 1, j, j+1, j+2, and j+3 may be considered. This is a range of 2 frames around the expected match j+1.
- process 500 may search a range of n frames around j+1 where n may be bigger if velocity differences are more volatile in the human movement technique in question.
- the description of the operation of the Nearest Neighbor algorithm above may be independent of this pruning. It can calculate a position and/or velocity direction match between new user data and a reference model with whichever type of field it is presented. Once this is calculated, the minimal value can be selected and the associated pairing of reference model frame to new user data frame may be defined as the match.
- Velocity projected pruning may include pruning of the field of candidates for a nearest neighbor identification algorithm, in its basic form, assuming roughly one-to-one alignment and scaling of velocity vectors.
- the system can be made more general.
- Process 500 may maintain the same field size, say of 5 frames. The way to think of this is that there is a central candidate that is the most likely nearest neighbor and then two before and two after that one that are less likely, but very possible. This field size can be larger, may be asymmetric (most likely one not being in the center), or may even scale dynamically such that a larger field is used when there is larger user velocity. Those options are not described in detail here, but enabling those adjustments are uncomplicated steps beyond this discussion.
- the center of the field may be moved forward in the time series based on the component of the user's velocity in the direction of reference model's velocity.
- the magnitude of this component may be divided by the magnitude of the reference model's velocity.
- process 500 may multiply the result of the cosine with the magnitude of user velocity and divide by the magnitude of the reference model velocity. This may be simplified algebraically simply dividing the dot product by the square of the reference model velocity. If it is expected that these two vectors may be well aligned, then the step of calculating the cosine is not needed and process 500 may instead simply work with the ratio of user velocity to reference model velocity.
- process 500 may be used to multiply the result of the calculation by 1 frame (or one time step in the reference model's time series) and rounding to the nearest whole number to determine how many frames in advance to center the field of candidates for the nearest neighbor determination.
- Pruned nearest neighbor enabled user-to-expert analogous position visualization may include the following.
- process 500 may be used to can instruct the system to display that position of the reference model.
- the expert model is analogous to a reference model in that it is a pre-captured and stored motion model. In the case of an expert model, it may preferentially be taken from world class experts at the technique in question.
- Determining where the user is within the action may be determined using a nearest neighbor algorithm.
- the nearest neighbor algorithm may use a modified field specifically stored for making this determination which may be called a “trigger field”. Trigger fields are described below.
- the system may be directed to display that “frame” of the reference model time series.
- the pruning system may be modified such that the field of nearest neighbor candidates for the next user position may be centered based on the ratio of user velocity to reference model velocity (velocity projected pruning).
- process 500 may instruct the system to display a slightly time-advanced position of the reference model.
- Time advancement can be set to be a specific number of frames forward.
- the optimal amount of time forward to display is a perceptual question that may be resolved through user testing. It may be individual, exercise, or even state-of-mind dependent.
- a system which can show the time-aligned positions of a representation of a user and a pre-captured expert motion model can be adjusted to present a time-advanced position for the expert motion model thus displaying to the user the ideal position they may be moving toward.
- this can be done with the time-aligned or time-advanced motion model displayed.
- the representation of the user's motion is an optional addition.
- the nearest neighbor matching algorithm (“pruned nearest neighbor”) may be used to find matching-body-position frames between the user's measured body position and the positions in the time series of the expert model.
- process 500 may then leverage pre-calculated velocity calculations from the expert model time series and run-time velocity calculations (delta vectors from one instant in the time series to the next for like body segments) to generate a future-projected position for the expert model.
- Process 500 may be used to set up a “time projection factor” which is a time-difference into the future of the time-series of the movement that the system may use to select a frame from the expert model time series to display.
- This time projection factor is part of the user experience and its precise tuning may dictate its effectiveness as a training aid, so picking the perfect time projection factor may be a matter of testing. For this example, it may be equal 100 ms.
- the frame that is 100 ms into the future (likely there may be one at that moment as most computerized capture systems operate at a multiple of 30 hz which does have frames that differ by 100 ms) is a position that the expert's body found itself in 100 ms after the one that the nearest neighbor system identified as the most similar to the user's current position.
- the expert had a certain velocity which the user may not be matching.
- the present process 500 may multiply that 100 ms by the ratio of user velocity to the expert model velocity. To do this, a representative global average velocity may be used.
- a global average velocity may be an average velocity of all body segments.
- velocities from the analogous subset of body segments may to be calculated on the fly before averaging. Once the averages are calculated, their absolute values can be taken to then create the ratio defined as the magnitude of the representative global average velocity of the user divided by the same for the expert model.
- the output number of milliseconds can be divided by the number of milliseconds per frame (and rounded to the nearest whole number) to give the number of frames forward (relative to the output from the pruned nearest neighbor result) in the expert model to display.
- matching up user body positions to reference model's positions may be done at a skeletal level. Neither depth sensors nor video measures skeletal position directly. In an example, some system may be in place to infer skeletal positions based on video imagery in the user video feed.
- This same system may, via its inference process, tag each frame of the expert model or reference model video with the skeletal position data. In principal, this may be done at runtime, but it may be most efficient to do this in advance and tag each frame of the expert model video with skeletal position information as meta-data.
- Extra care may be used to create similar camera angles in the two videos.
- This skeletal position information may constitute the field of options for the nearest neighbor algorithm (or pruned nearest neighbor) to computationally compare (via Euclidian-distance-based calculations) user body positions against.
- the reference model may not be a pre-fabricated video and instead the video may be generated at runtime via animation of an avatar using the time-series of expert model position data where the skeletal position of the expert model displayed is the one selected by the nearest neighbor algorithm.
- a “2D shadow” of that model can be calculated at runtime or immediately before user video is captured in a preparatory step which is a projection of skeletal positions into 2D by losing depth information along the axis of the camera angle.
- the system may be calibrated so it may “know” the angle that the camera is taking relative to the user and so it may apply the same angle in the calculations of the 2D shadow.
- trigger fields enable the present system to leverage a nearest neighbor calculation as a way to automatically determine when a user has started executing a technique and to determine the point where the user has finished the technique. This is useful for trimming during post processing and during computation-mode-determination in real-time data representation mode.
- the modes that are switched between may include:
- Raw ds mode display data directly from raw depth sensor skeletal inferences.
- Predictive modeling mode display of data from predictive system.
- an indication of tracking may be displayed and this may be in the form of showing them their latent body positions representations coming in from depth sensors. This may give the user confidence in the system in so far as it shows that it is tracking and is ready to go even if latency reduction measures are not yet in play.
- Predictive modeling during this pre-technique phase may be unattainable because there may not be a precise motion model for it and thus, a reduced basis for predicting future body positions.
- Process 500 may be used to automatically switch between a standard track-and-display using depth sensor skeletal inferences mode and a predict-to-track-and-display mode for technique execution, using forward modeling.
- a threshold-controlled pruned nearest neighbor may be used to switch between these modes.
- a “trigger field” may be set up, which is a set of the reference model's time-series constructions to be matched to user body positions and kinematics using, for example, the same nearest neighbor Euclidian distance scheme discussed above.
- This trigger field may consist of reference model frames from the beginning of the time series of the reference model process.
- Finishing predictive tracking can be done either by an input from the instructor or user (screen, voice, gesture, or other) in the case of a cyclical skill (which may not have a natural finishing point) or by institution of a second trigger field which consists of final reference model frames located at the end of the reference model's time series.
- predictive tracking mode may be “on” when the conditions have been met to trip the beginning trigger field, but the conditions have not yet been met to trip the final trigger field.
- forward modeling based on kinematics alone may constrain depth sensor inferences for more accurate tracking and even to achieve some prediction accuracy, but accuracy may be limited without a reference model to constrain the future curvature of trajectories.
- trimming of pre and post technique phases of recording during post processing modes may also benefit from trigger field type processing so as to automate the part of the analysis which determines what portion of the recording reflects the technique itself and which portions reflect preparation and “off-ramping”.
- FIG. 6 illustrates a diagram illustrating a latency reducing predictive modeling process in accordance with some examples.
- Operation 600 A illustrates context matching. For example, a Pruned Nearest Neighbor process matches user position to a certain position in the time series of a reference model. This matching identifies the time value within the time series that are used. This time value points to specific points within each of the trajectories of the reference model that localize the 1-retrograde delta and RM-RPA are used to make predictions.
- Operation 600 B illustrates calculating vectors.
- 1-retrograde deltas and RM-RPA's are calculated.
- the thick arrow in the upper left diagram is the 1-retrograde delta.
- the 1-retrograde delta is scaled one frame forward (it is the thin arrow here).
- the difference between its new endpoint and the actual body segment position point one frame forward is the RM-RPA for predicting one frame forward.
- the RM-RPA is the thick arrow.
- the 1-retrograde delta is scaled two frames forward and the RM-RPA is the difference between the scaled 1-retrograde delta's endpoint and the body segment's position two frames forward.
- the thin arrow is the scaled 1-retrograde delta and the RM-RPA is the thick arrow.
- Operation 600 C illustrates generating predictions.
- Velocity scaling is used to modify the 1-retrograde delta and RM-RPA for a given future frame's target time. This may be in addition to scaling for number of frames forward.
- 1-retrograde deltas scale by the number of frames forward all times the ratio of user velocity to reference model velocity. Their tail is then positioned at the current depth sensor measured position of the body segment.
- RM-RPAs scale based on the ratio of user velocity to reference model velocity. The diagram on the left is predicting one frame forward and user velocity is modeled as being higher than reference model velocity at this portion of the technique. On the right, the projection is two frames forward.
- the term “frame” may include a slice of a full time series of body positions where all data points in said slice have the same time-parameter in the time series. This analogizes to the use of “frame” in video where all of the pixel values in any “frame” have the same time-parameter.
- a concept of a “delta” or “delta vector” may be used herein.
- a delta generally refers to a difference between two objects. A difference implies subtraction.
- a “delta” or “delta vector” may be generated by standard vector subtraction. In an example, this is done with a vector from earlier in the time series subtracted from a later one. In this conception, standard Cartesian coordinate positions specifications are viewed as vectors.
- Velocity projection scales with ratio of user to reference model velocity.
- Curvature projection scales with the ratio of user to reference model velocity squared.
- Body construction is delivered to animation or other interface representation pipeline.
- the animation pipeline can start producing imagery that is associated with where the user is about to be before the user gets there, that imagery can be delivered nearly simultaneous with the user's arrival.
- Process 600 may output visuals at a particular frame rate, such as 30 fps.
- a form of extrapolation to output at a higher rate [normally termed “interpolation” ]may be used. This is executable by a scaled version of the 1, 2, or 3 frame predictive calculations.
- Reference models may have pre-existing data for frames within their time series that are analogous to the frames coming from depth sensor skeletal tracking. These best-matches between reference model frames and depth sensor frames are calculated via positional comparisons seeking minimization of a pre-defined distance calculation between the two (the pruned nearest neighbor described above). As such, deltas between the reference models' positions in currently matched frames and their frames 1, 2, 3, or even more into the expected time-series-future may be available to us. These deltas may be called “predicting deltas”.
- Process 600 may scale these predicting deltas based on the ratio of the magnitude of user velocity to reference model velocity for a given body segment in the current frame of the user and the analogous frame of the reference model. This ratio is the “scaling factor”.
- This may be applied as an extension of the current position in a scaled fashion. It may be scaled by the predicted number of frames in advance and by the scaling factor already defined. This gives a first approximation of the 1, 2, or more frames in advance position prediction for the body segment in question.
- this approximation prediction is generated not only by scaling the 1-retrograde delta, but also by repositioning it compared to the time position it had when it was calculated from the reference model.
- this is done using the pruned-nearest-neighbor-matched position as the point which defines its tip of the 1-retrograde delta and the position in the reference model one time step prior as its tail.
- the 1-retrograde delta specific to the user's current motion is found in the reference model, this is done using the pruned-nearest-neighbor-matched position as the point which defines its tip of the 1-retrograde delta and the position in the reference model one time step prior as its tail.
- the 1-retrograde delta specific to the user's current motion is found in the reference model, this is done using the pruned-nearest-neighbor-matched position as the point which defines its tip of the 1-retrograde delta and the position in the reference model one time step prior as its tail.
- the 1-retrograde delta specific to the user's current motion is found in the reference
- This first approximation may be modified by a formula which accounts for the expected acceleration associated with the technique for this body segment at this moment in the technique.
- this RM-RPA delta may be calculated from the analogous moment in the reference model time series and then scaled based on a modified version of the scaling factor where the square of the scaling factor is used.
- This RM-RPA delta is calculated as the difference from the position of the appropriate 1-retrograde delta prediction to the actual position of the body segment at the number of frames forward in the time series that the prediction applies to. It calculates precisely how much and in what direction the first approximation prediction may have had to be moved to have been accurate for the reference model's time-series of actual positions.
- the RM-RPA scaling is again based on a ratio of user to reference model velocity magnitude (the scaling factor), but where the ratio is squared to ensure that variation of the ratio makes the prediction vary along a curved trajectory with minimal acceleration correction if the user's velocity is slow relative to the reference model's velocity and large acceleration correction if the user's velocity is high relative to the reference model's velocity.
- these segments may be largely traveling along curved arcs and higher velocities may lead to greater distances traveled along the curves and thus more deflection from linear motion.
- the system may use the two instants in the reference model that are closest to the user position weighted by how much closer the body segment being projected was to the position in the closest reference model frame as compared to that next closest one as the basis for projecting the future user position. Then, their predictions may be averaged using the weighting described above to get the predicted location for the user at the targeted time. This allows better approximation for in-between positions of said body segment.
- the two nearest-neighbor-selected frames in the reference model are neighbors in the reference model time series. If this is true and since they are closest two resulting nearest neighbor choices, the user position may be approximately in between them. If the two are not neighbors, then this may revert to using only the basic single-nearest-neighbor projection method.
- the time parameter may be scaled. This can be done by multiplying the 1-retrograde delta by the number of frames in advance for prediction. Then the above calculations may be applied in a normal fashion, except the RM-RPA delta is now the difference between the position resultant from the scaled 1-retroactive delta to the actual position the scaled number of frames in advance. Interpolation may be used for this calculation if the projection is to a fractional number of frames in advance. But even with a fractional number of frames applied for the prediction calculations, the calculations may be applied in the way described above.
- predictions may be made for several time steps into the future simultaneously (1, 2, 3, etc. frame cycles or interpolation frames in between those, such as 1.5 or 2.333 frames into the future as described in the “predictive interpolation” description below) to ensure multiple predictions are delivered to the system (ones from 1, 2, 3, etc. frames cycles in the past) when the time comes to construct a final location.
- the prediction used may come from a different number of frame steps in the past. Faster execution may benefit from use of predictions from further in the past to eliminate perception of positional latency.
- the system may need to reconstruct body segment positions specified by these trajectories into full body constructions. This is true for either of real-time or delayed processing conditions.
- full body positions may be reconstructed from those segment positions following rules of “lateral modeling” which is to say the body fits together.
- lateral modeling This amounts to a combination of an anthropometric model to define expected distances between joints cross checked against expected angle ranges for multi-body-segment constructions.
- application of confidence intervals ensures that body segments which feature more certainty in their positional accuracy may get to serve as anchor points that won't be allowed to move as far as lower confidence segments. To make things fit together, ones with lower confidence may be tweaked more.
- motion of body segments that are consistently positioned in space based on the dimensions of an actor may be applied to a visualization model used to create the animated imagery which may have different effective dimensions.
- a visualization model used to create the animated imagery which may have different effective dimensions.
- making body segment position data which doesn't yet respect the animation model's dimensional constraints without losing the character of the motion implied by that original body segment position data may use motion capture animation processes.
- This type of reconstruction routine may be applied to processes described herein.
- confidence intervals on the positioning of different body segments may be derived as follows.
- an Instantaneous-2 Confidence Interval may be posited. This may be the depth sensor Confidence Interval (Instantaneous-1 Confidence Interval) divided by the sum of m times the magnitude of the reference model's velocity delta and n times the magnitude of the reference model's acceleration delta-delta (“n” is intended to be greater than “m” both are intended to increase high velocity's and high acceleration's influence in producing confidence-interval reducing effects).
- any or all of these confidence intervals may be normalized by dividing all of a given type by the relevant maximum value of that type (across confidence intervals for body segments within an instant, across confidence intervals for all instants for a single body segment, or across all confidence intervals in the time series, but typically within depth sensor, instantaneous-1, instantaneous-2, or higher order versions such as the composite confidence interval below).
- confidence intervals are included in weighted averages prior normalizing of them is redundant because the weighted average operation always includes a normalizing denominator.
- a composite confidence interval is calculated as the sum of the instantaneous-2 confidence intervals from the forward model(s) that apply to the current instant (thus, are derived from previous instants). If multiple forward models apply, it may be the composite confidence intervals that may be used in generating the final construction of the positions of all body segments for a given instant.
- these composite confidence intervals may still be further modified by weightings such as those described in the series of approximations approach to reconstructing a body based on predicted positions of body segments.
- the plan for allowing users to execute techniques during real-time data representation (most pertinently, real-time visual, or real-time audio or “sonification of movement”) using predictive modeling to minimize latency may be as follows.
- Slow speed technique may be used at first and during slow speed technique, the prediction algorithm may be implemented one frame in advance. At slower speeds, users may be more consistent thus giving a good match to the reference model on which the predictions are based.
- Medium speed technique can be implemented during runtime predicting for real-time data representation as users get more consistent.
- the prediction algorithm may be implemented two frames in advance. At medium speeds, users with sufficient practice can be consistent enough to give a good match to the reference model on which the predictions are based.
- High speed technique can be implemented during runtime predicting for real-time data representation as users get very consistent.
- the prediction algorithm may be implemented three frames in advance.
- users have to be well practiced to be consistent enough to give a good match to the reference model on which the predictions are based. This match is extra important since the prediction is generated so far in advance.
- Projecting three frames forward is a less accurate way to construct the body position at that future point compared with two frames forward, one frame forward, or using raw depth sensor readings from the actual instant to be represented. In an example, lack of accuracy is more tolerated when users are executing techniques at high speed for two reasons.
- projecting multiple frames in advance becomes a high-upside, low-downside option for high speed technique. It allows delivery of low latency visual representation of body position and is accurate enough to avoid user discomfort with any differences between their body position and the represented body position largely due to the user's state of mind during execution of high speed technique.
- Velocity Deltas and Time-indexed Curvature Corrections.
- Velocity deltas refers to 1-retrograde deltas.
- Curvature Corrections refers to RM-RPA vectors. These may be calculated in the way described in the latency reduction and real-time interpolation sections of this detailed description.
- An example modification may include calculating them using the original sampled frame rate timing as opposed to the denser data created via smoothing. They may be calculated using all available trajectory points, but, in this case, they are not be calculated using the nearest neighbors which are closer together than the sensor sample frame rate due to densification processes, but instead other points spaced one full input-frame-time distance (from the sensor sample rate) away.
- This timing used in the reference model may also be the expected sample timing from sensors during real-time predicting because the same system is being used in both cases.
- the following discussion may include a look-up table approach to latency reduction and real-time interpolation.
- the reference model may be rebuilt for animation or stored in two separate forms. One of these two different forms may be the look-up table for run time prediction and the other would be the animation model for visual display.
- the look-up table may contain calculated vectors which are indexed by a few parameters.
- the parameters serve to specify the location of the needed information and may be generated by the pruned nearest neighbor matching algorithm and mode specifications around how far ahead in time the system may be generating predictions.
- Mode specifications may be determined by how fast the user is attempting their technique or whether or not predictions from multiple past frames are converging to generate a composite prediction in the given embodiment.
- Look-up tables give computational results with fewer computer operation steps at runtime compared to executing complex computations at runtime. Reduction of computational overhead affords a later start time in generating predictions which may help with accuracy.
- the look-up table may include pre-fabricated values for time targets across a range of possible output framerates.
- Achieving this may include using smoothing and proximity weighted interpolations.
- Smoothing is described elsewhere in this detailed description, and proximity weighted interpolations is described here in more detail. Their purpose, in this case, is to generate points whose time-position relative to the two closest time-values in the series is not is not a sum of some powers of 1 ⁇ 2 frames (based on input frame time) away from those neighbors. Any times that are a sum of some powers of 1 ⁇ 2 frames from its nearest neighbors are eventually generated given enough smoothing iterations. All other points are not accessible via smoothing.
- proximity weighted interpolation may be used to generate points between the points that exist after a specified number of smoothing iterations. This may be done such that those points are at times that output framerates can be anticipated to target (given near-future display hardware specifications) and where those points are generated via an average of the two existing points with the closest time parameter to the target and weighted by how close in time each of these existing points is to the target point.
- look-up tables may also be lateralized before the generation of all of the extra interior points that smoothing and or interpolation would generate, for example using a balanced lateralization process. This may be used to balance how much they are executed. They may be iterated just enough to diminish the impact of noise, but not enough to ruin the inherent shape of the trajectory.
- Smoothing may have added body positions in between the original body positions.
- Each body position in the time series may have pre-calculated vectors associated to it.
- Each piece of data associated to each body position may be indexed by the body segment it is specific to, the time that body position is seen in the time series, and the targeted time for the prediction it enables if applicable (this is only applicable to RM-RPAs).
- FIG. 7 illustrates a predictive interpolation flowchart in accordance with some examples.
- Operation 700 A illustrates context matching.
- a Pruned Nearest Neighbor process matches user position to a certain position in the time series of a reference model. This matching identifies the time value within the time series that are used. This time value points to specific points within each of the trajectories of the reference model that localize the 1-retrograde delta and RM-RPA are used to make predictions.
- Operation 700 B illustrates calculating vectors.
- 1-retrograde deltas and RM-RPA's are calculated.
- the thick arrow in the upper left diagram is the 1-retrograde delta.
- the 1-retrograde delta is scaled one frame forward (it is the thin arrow here).
- the difference between its new endpoint and the actual body segment position point one frame forward is the RM-RPA for predicting one frame forward.
- the RM-RPA is the thick arrow.
- the 1-retrograde delta is scaled two frames forward triple its original length and the RM-RPA is the difference between the scaled 1-retrograde delta's endpoint and the body segment's position two frames forward.
- the thin arrow is the scaled 1-retrograde delta and the RM-RPA is the thick arrow.
- Operation 700 C illustrates generating predictions of fractional numbers of input frames forward.
- the same mechanisms are used to generate predictions from 1-retrograde deltas and RM-RPAs, but where both are scaled by a non-whole-number-of-frames-forward factor. This works like the velocity scaling factor.
- the 1-retrograde delta scales by the fractional number of frames forward and again is positioned such that its tail is at the current location of the body segment.
- the RM-RPA scales by the square of ratio of the fractional number of frames forward divided by that number rounded up to the nearest whole number. These are modified, in the example case that velocity doesn't match perfectly by the multiplying both, by the ratio of the user velocity to reference model velocity. In the RM-RPA, this multiplication is done after rounding up the denominator, but before squaring.
- FIG. 7 illustrates a process that adjusts a runtime predicting system to predict to arbitrary time specifications into the future.
- the process of FIG. 7 produces higher frame rates than the sensors produce within the runtime predictive system.
- this interpolation process may be used to push the time targets forward into the future even further so as to display the user avatar to represent a position the user is moving toward that they may not yet have arrived at when this position is displayed to the user.
- the user may see a representation of their position before they get to that position. This may be done an arbitrary number of milliseconds forward from the display time. The precise number may be task and user dependent.
- the dynamics involved in finding most useful forward time targets may be further refined user testing of software after implementations are created.
- the method is to target fractional values of frames in advance of the delivery of the depth sensor inferences and to do so at regular intervals to allow a higher number of frames per second than the data is coming in.
- Fractional values for the time parameter of a projection targets may be used (this may be parameterized in milliseconds in which case it may just be the right number of milliseconds which corresponds to the fractional frames value).
- An example depth sensor frame rate is 30 hz (e.g., targeted depth sensors for the Microsoft Kinect are 30 hz).
- each of these assume that the system is slow speed execution mode and thus predicting between 1 and 2 frames in advance.
- For medium speed execution between 2 and 3 frames in advance, add 33 ms to each figure below.
- 33 ms for each frame further in advance to predict.
- the next on-input-timing frame may be 33 ms after the previous on-input-timing frame (which is the timing of the depth sensor readings the prediction may be based on in the slow speed execution mode . . . add 33 ms for each frame added for higher speed execution).
- the first interpolated frame after that may be 50 ms after the depth sensor readings the prediction may be based on.
- the next on-input-timing frame may again be 33 ms after the previous on-input-timing frame (which is the timing of the depth sensor readings the prediction may be based on in the slow speed execution mode . . . add 33 ms for each frame added for higher speed execution).
- the first interpolated frame after that may be 44 ms after the depth sensor readings the prediction may be based on.
- the second interpolated frame after that may be 56 ms after the depth sensor readings the prediction may be based on.
- interpolation in the reference model may be performed to find the point in the reference model trajectory that may have been the “actual” location at the predicted time.
- 1-retrograde delta can simply be scaled with the time parameter and can be calculated the same way as usual
- the RM-RPA uses a location in the reference model from which to calculate a distance.
- the RM-RPA has no point in between frames in the reference model that it can use to generate a distance calculation from. It is calculated as the distance from the scaled 1-Retrograde delta to the actual point in the reference model at the target time. Without a point in the reference model at that time the RM-RPA may not be able to be calculated. Instead a mix of the two nearest well defined RM-RPAs may be used.
- establishing a RM-RPA for 1.333 input frames forward may use a weighted average of the RM-RPA for one input frame forward and the one for two input frames forward.
- the weighting may be based on time proximity to each of those frames.
- the weighing for each RM-RPA may be 1 minus the quantity of the absolute value of the target timing less the specific RM-RPA's timing (first RM-RPA or second RM-RPA).
- An alternative method may use a denser reference model supplied by using extra smoothing operations to generate the needed points between points to create a basis for the RM-RPA, but as this may be a more data intensive model, it may be preferred to use the method described above.
- Delta and delta-delta calculations are done with a time difference of one frame of 33 ms, so scaling them and calculations based on them rely on multiplying by the parameter that controls the time into the future being targeted and dividing by 33 ms.
- Interpolation in post-processing may be performed without this process. Once a 30 fps representation is processed in post processing, an interpolation method can be used. There is no time for this in runtime processing, so the predictive interpolation method is used.
- Runtime predicting may arise from regular delivery of body segment positional representations from sensors and operations performed on the quantifications of those representations.
- the first action is to compare data about the position and motion of the user's body coming in from sensors using the Pruned Nearest Neighbor algorithm described elsewhere in this disclosure.
- the match identified by the Pruned Nearest Neighbor directs the system to a certain location in the look up table to find the relevant 1-retrograde delta and RM-RPA vectors. More than one each of 1-retrograde deltas and RM-RPA vectors may be used if either or both of the following are true.
- Frame-target timing parameters may use more than one predicted output frame for each input frame from sensors (for example when output frame rate is higher than input frame rate). Multiple approximate predictions are calculated and then blended via weighted average to give the final prediction. Operations on parameterization may define which elements of the look-up table are sourced.
- parameterization determines which predicted time targets downstream from each incoming frame data are used to search the look-up table for the vectors needed to generate predictions which meet those targets.
- the look-up table may be time indexed, with RM-RPA values derived from a 1-retrograde delta from time t 1 and calculated as the difference from a scaled version of that 1-retrograde delta to the actual reference model position of that body segment at a targeted time t 2 in the future may be indexed by (t 1 , t 2 ).
- t 1 is determined by the Pruned Nearest Neighbor matching and t 2 is based on adding a certain time delta to t 1 where the specific time delta is determined by parametrization.
- 1-retrograde delta depends only on t 1
- RM-RPAs depend on t 1 and t 2 (another way to say this is that there may be only one 1-retrograde delta for each t 1 , but there may be more than one RM-RPA for each t 1 where the different RM-RPAs are further specified by their t 2 value).
- the system may proceed to the next step. If not, then an interpolation process may be done for each targeted t 2 that was not stored in the table using predictions from the two closest t 2 values that were stored in the table (and from the same t 1 ). These may be averaged together to give the real prediction for the targeted t 2 . This would be known up front once parameterization is established so both of these RM-RPAs would be sourced as needed. This average may be weighted by how close each of the t 2 values that are in the table are to the targeted t 2 value.
- predictions from multiple t 1 values in the past had the same ultimate t 2 value (more than one separate predictions from more than one different times in the past both apply to the same targeted time) then they would be averaged together to produce a final composite prediction.
- This average may be weighted by the size of the time delta between the t 1 and t 2 values involved in each separate prediction. If this is method is used, each prediction may be stored in a buffer indexed by its final time, t 2 so that they are ready for averaging as soon as all the predictions for that time have been calculated and their averaging can begin.
- FIG. 8 illustrates a flowchart showing a process 800 for displaying an avatar in accordance with some examples.
- the process 800 includes an operation 802 to determine a user's current position and speed.
- the process 800 includes an operation 804 to generate a time projection of an avatar based on the speed and position of the user.
- the process 800 includes an operation 806 to render a user avatar for display.
- the process 800 includes an optional operation 808 to display the user avatar a few milliseconds (e.g., 1 ms to 500 ms) ahead of the user's current position or display the user avatar in real-time (e.g., just-in-time) to the user's position.
- a few milliseconds e.g., 1 ms to 500 ms
- real-time e.g., just-in-time
- the process 800 includes an optional operation 810 to generate an expert avatar (e.g., based on previously acquired data from an expert in a particular action, such as a golf swing, a slap shot, a free throw, etc.).
- an expert avatar e.g., based on previously acquired data from an expert in a particular action, such as a golf swing, a slap shot, a free throw, etc.
- the process 800 includes an optional operation 812 to display the expert and the user avatar concurrently. For example, with the expert avatar further from the user than the user avatar, both within a user's field of view. Operation 812 may include displaying the expert avatar and the user avatar just in time of the user's position or a few ms ahead of the user's position.
- the process 800 includes an optional operation 814 to display the expert ahead of the user avatar. For example, with the expert avatar a few ms ahead of the user's current position, with the user avatar displayed in real-time.
- Avatars presented in the process 800 may be presented in an augmented or virtual reality setting.
- FIG. 9 illustrates a flowchart showing a process 900 for triggering capture of training data in accordance with some examples.
- the process 900 includes an operation 902 to identify a trigger that initiates a training action (e.g., a golf swing, a slap shot, a free throw, etc.).
- a training action e.g., a golf swing, a slap shot, a free throw, etc.
- the process 900 includes an operation 904 to determine a user's current position and speed.
- the process 900 includes an optional operation 906 to render a user avatar for display, for example based on the user's current position and speed.
- rendering may occur before operations 902 or 904 , for example with the latency of depth sensor to visual processing.
- the render may be modified by latency reduction until a second trigger where it goes back to usual latency mode.
- the process 900 includes an operation 908 to identify a trigger that ends the training action.
- the process 900 includes an operation 910 to output tracking data (e.g., of the user or an instrument of the user, such as a golf club, a hockey stick, a ball, etc.) from the training action between the initiating trigger and the ending trigger.
- the output tracking data may include data from a few ms or seconds before the initiating trigger or data from a few ms or seconds after the ending trigger, in an example.
- operation 910 applies the trigger concept to trimming the motion to the beginning of the process 900 and to the end of the process 900 in an automated way when creating a new user avatar in the post-processing use case.
- a real-time use case may include using the trigger system to modify the latency that the user experiences.
- FIG. 10 illustrates a block diagram of an example machine 1000 upon which any one or more of the processes discussed herein may perform in accordance with some embodiments.
- the machine 1000 may operate as a standalone device or may be connected (e.g., networked) to other machines.
- the machine 1000 may operate in the capacity of a server machine, a client machine, or both in server-client network environments.
- the machine 1000 may act as a peer machine in peer-to-peer (P2P) (or other distributed) network environment.
- P2P peer-to-peer
- the machine 1000 may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- STB set-top box
- PDA personal digital assistant
- mobile telephone a web appliance
- network router network router, switch or bridge
- machine any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, such as cloud computing, software as a service (SaaS), other computer cluster configurations.
- SaaS software as a service
- Machine 1000 may include a hardware processor 1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1004 and a static memory 1006 , some or all of which may communicate with each other via an interlink (e.g., bus) 1008 .
- the machine 1000 may further include a display unit 1010 , an alphanumeric input device 1012 (e.g., a keyboard), and a user interface (UI) navigation device 1014 (e.g., a mouse).
- the display unit 1010 , input device 1012 and UI navigation device 1014 may be a touch screen display.
- the machine 1000 may additionally include a storage device (e.g., drive unit) 1016 , a signal generation device 1018 (e.g., a speaker), a network interface device 1020 , and one or more sensors 1021 , such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensor.
- the machine 1000 may include an output controller 1028 , such as a serial (e.g., Universal Serial Bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- a serial e.g., Universal Serial Bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- USB Universal Serial Bus
- the storage device 1016 may include a machine readable medium 1022 on which is stored one or more sets of data structures or instructions 1024 (e.g., software) embodying or utilized by any one or more of the processes or functions described herein.
- the instructions 1024 may also reside, completely or at least partially, within the main memory 1004 , within static memory 1006 , or within the hardware processor 1002 during execution thereof by the machine 1000 .
- one or any combination of the hardware processor 1002 , the main memory 1004 , the static memory 1006 , or the storage device 1016 may constitute machine readable media.
- machine readable medium 1022 is illustrated as a single medium, the term “machine readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) configured to store the one or more instructions 1024 .
- the term “machine readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 1000 and that cause the machine 1000 to perform any one or more of the processes of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions.
- Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media.
- the instructions 1024 may further be transmitted or received over a communications network 1026 using a transmission medium via the network interface device 1020 utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.).
- transfer protocols e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.
- Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, peer-to-peer (P2P) networks, among others.
- the network interface device 1020 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 1026 .
- the network interface device 1020 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) processes.
- SIMO single-input multiple-output
- MIMO multiple-input multiple-output
- MISO multiple-input single-output
- transmission medium shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine 1000 , and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
- Method examples described herein may be machine or computer-implemented at least in part. Some examples may include a computer-readable medium or machine-readable medium encoded with instructions operable to configure an electronic device to perform methods as described in the above examples.
- An implementation of such methods may include code, such as microcode, assembly language code, a higher-level language code, or the like. Such code may include computer readable instructions for performing various methods. The code may form portions of computer program products. Further, in an example, the code may be tangibly stored on one or more volatile, non-transitory, or non-volatile tangible computer-readable media, such as during execution or at other times.
- Examples of these tangible computer-readable media may include, but are not limited to, hard disks, removable magnetic disks, removable optical disks (e.g., compact disks and digital video disks), magnetic cassettes, memory cards or sticks, random access memories (RAMs), read only memories (ROMs), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
-
- A direction unit vector—this defines the direction that each point may move within the two-coordinate space (although some points may move in the negative of that direction)
- A “zero point”—a point in the time series where there may be no movement (where the direction unit vector may be scaled to zero)
- A scaling function—this outputs the distance in the direction of the direction unit vector that all points may move
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/425,623 US11017576B2 (en) | 2018-05-30 | 2019-05-29 | Reference model predictive tracking and rendering |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201862678073P | 2018-05-30 | 2018-05-30 | |
| US16/425,623 US11017576B2 (en) | 2018-05-30 | 2019-05-29 | Reference model predictive tracking and rendering |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20200058148A1 US20200058148A1 (en) | 2020-02-20 |
| US11017576B2 true US11017576B2 (en) | 2021-05-25 |
Family
ID=69522991
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/425,623 Active US11017576B2 (en) | 2018-05-30 | 2019-05-29 | Reference model predictive tracking and rendering |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US11017576B2 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11341865B2 (en) | 2017-06-22 | 2022-05-24 | Visyn Inc. | Video practice systems and methods |
| TWI790152B (en) * | 2022-03-31 | 2023-01-11 | 博晶醫電股份有限公司 | Movement determination method, movement determination device and computer-readable storage medium |
Families Citing this family (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3205379A4 (en) * | 2014-10-10 | 2018-01-24 | Fujitsu Limited | Skill determination program, skill determination method, skill determination device, and server |
| JP6707111B2 (en) * | 2018-07-25 | 2020-06-10 | 株式会社バーチャルキャスト | Three-dimensional content distribution system, three-dimensional content distribution method, computer program |
| US10860845B2 (en) * | 2018-10-22 | 2020-12-08 | Robert Bosch Gmbh | Method and system for automatic repetitive step and cycle detection for manual assembly line operations |
| US10620713B1 (en) * | 2019-06-05 | 2020-04-14 | NEX Team Inc. | Methods and systems for touchless control with a mobile device |
| US10997766B1 (en) * | 2019-11-06 | 2021-05-04 | XRSpace CO., LTD. | Avatar motion generating method and head mounted display system |
| US12138543B1 (en) * | 2020-01-21 | 2024-11-12 | Electronic Arts Inc. | Enhanced animation generation based on generative control |
| CN112507953B (en) * | 2020-12-21 | 2022-10-14 | 重庆紫光华山智安科技有限公司 | Target searching and tracking method, device and equipment |
| US11763527B2 (en) * | 2020-12-31 | 2023-09-19 | Oberon Technologies, Inc. | Systems and methods for providing virtual reality environment-based training and certification |
| US11830121B1 (en) | 2021-01-26 | 2023-11-28 | Electronic Arts Inc. | Neural animation layering for synthesizing martial arts movements |
| US11562523B1 (en) | 2021-08-02 | 2023-01-24 | Electronic Arts Inc. | Enhanced animation generation based on motion matching using local bone phases |
| US12374014B2 (en) | 2021-12-07 | 2025-07-29 | Electronic Arts Inc. | Predicting facial expressions using character motion states |
| US12322015B2 (en) | 2021-12-14 | 2025-06-03 | Electronic Arts Inc. | Dynamic locomotion adaptation in runtime generated environments |
| CN114245177B (en) * | 2021-12-17 | 2024-01-23 | 智道网联科技(北京)有限公司 | Smooth display method and device of high-precision map, electronic equipment and storage medium |
| US12205214B2 (en) | 2022-02-23 | 2025-01-21 | Electronic Arts Inc. | Joint twist generation for animation |
| US12403400B2 (en) | 2022-03-31 | 2025-09-02 | Electronic Arts Inc. | Learning character motion alignment with periodic autoencoders |
| US12508508B2 (en) | 2023-09-21 | 2025-12-30 | Electronic Arts Inc. | Motion-inferred player characteristics |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030077556A1 (en) * | 1999-10-20 | 2003-04-24 | French Barry J. | Education system challenging a subject's physiologic and kinesthetic systems to synergistically enhance cognitive function |
| US20120206577A1 (en) | 2006-01-21 | 2012-08-16 | Guckenberger Elizabeth T | System, method, and computer software code for mimic training |
| US20140193132A1 (en) | 2013-01-07 | 2014-07-10 | Samsung Electronics Co., Ltd. | Method and apparatus for controlling contents in electronic device |
| US20150089551A1 (en) | 2013-09-20 | 2015-03-26 | Echostar Technologies L. L. C. | Environmental adjustments to perceive true content |
| US20160101321A1 (en) | 2010-11-05 | 2016-04-14 | Nike, Inc. | Method and System for Automated Personal Training |
| US20160179206A1 (en) | 2014-12-22 | 2016-06-23 | Tim LaForest | Wearable interactive display system |
| US20160267577A1 (en) * | 2015-03-11 | 2016-09-15 | Ventana 3D, Llc | Holographic interactive retail system |
| US20170069125A1 (en) * | 2009-03-20 | 2017-03-09 | Microsoft Technology Licensing, Llc | Chaining animations |
| US20170134639A1 (en) * | 2006-12-01 | 2017-05-11 | Lytro, Inc. | Video Refocusing |
| US20180047200A1 (en) | 2016-08-11 | 2018-02-15 | Jibjab Media Inc. | Combining user images and computer-generated illustrations to produce personalized animated digital avatars |
| US20190019321A1 (en) | 2017-07-13 | 2019-01-17 | Jeffrey THIELEN | Holographic multi avatar training system interface and sonification associative training |
-
2019
- 2019-05-29 US US16/425,623 patent/US11017576B2/en active Active
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030077556A1 (en) * | 1999-10-20 | 2003-04-24 | French Barry J. | Education system challenging a subject's physiologic and kinesthetic systems to synergistically enhance cognitive function |
| US20120206577A1 (en) | 2006-01-21 | 2012-08-16 | Guckenberger Elizabeth T | System, method, and computer software code for mimic training |
| US20170134639A1 (en) * | 2006-12-01 | 2017-05-11 | Lytro, Inc. | Video Refocusing |
| US20170069125A1 (en) * | 2009-03-20 | 2017-03-09 | Microsoft Technology Licensing, Llc | Chaining animations |
| US20160101321A1 (en) | 2010-11-05 | 2016-04-14 | Nike, Inc. | Method and System for Automated Personal Training |
| US20140193132A1 (en) | 2013-01-07 | 2014-07-10 | Samsung Electronics Co., Ltd. | Method and apparatus for controlling contents in electronic device |
| US20150089551A1 (en) | 2013-09-20 | 2015-03-26 | Echostar Technologies L. L. C. | Environmental adjustments to perceive true content |
| US20160179206A1 (en) | 2014-12-22 | 2016-06-23 | Tim LaForest | Wearable interactive display system |
| US20160267577A1 (en) * | 2015-03-11 | 2016-09-15 | Ventana 3D, Llc | Holographic interactive retail system |
| US20180047200A1 (en) | 2016-08-11 | 2018-02-15 | Jibjab Media Inc. | Combining user images and computer-generated illustrations to produce personalized animated digital avatars |
| US20190019321A1 (en) | 2017-07-13 | 2019-01-17 | Jeffrey THIELEN | Holographic multi avatar training system interface and sonification associative training |
| US10679396B2 (en) | 2017-07-13 | 2020-06-09 | Visyn Inc. | Holographic multi avatar training system interface and sonification associative training |
| US20200273229A1 (en) | 2017-07-13 | 2020-08-27 | Visyn Inc. | Holographic multi avatar training system interface and sonification associative training |
Non-Patent Citations (10)
| Title |
|---|
| "U.S. Appl. No. 15/931,144, Non Final Office Action dated Oct. 27, 2020", 20 pgs. |
| "U.S. Appl. No. 16/035,280, Advisory Action dated Mar. 23, 2020", 4 pgs. |
| "U.S. Appl. No. 16/035,280, Examiner Interview Summary dated Mar. 6, 2020", 3 pgs. |
| "U.S. Appl. No. 16/035,280, Final Office Action dated Jan. 13, 2020", 22 pgs. |
| "U.S. Appl. No. 16/035,280, Non Final Office Action dated Aug. 30, 2019", 20 pages. |
| "U.S. Appl. No. 16/035,280, Notice of Allowance dated Apr. 22, 2020", 9 pgs. |
| "U.S. Appl. No. 16/035,280, Response filed Apr. 13, 2020 to Advisory Action dated Mar. 23, 2020", 10 pgs. |
| "U.S. Appl. No. 16/035,280, Response filed Dec. 30, 2019 to Non Final Office Action dated Aug. 30, 2019", 8 pgs. |
| "U.S. Appl. No. 16/035,280, Response filed Mar. 13, 2020 to Final Office Action dated Jan. 13, 2020", 9 pgs. |
| U.S. Appl. No. 16/035,280, filed Jul. 13, 2018, Holographic Multi Avatar Training System Interface and Sonification Associative Training. |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11341865B2 (en) | 2017-06-22 | 2022-05-24 | Visyn Inc. | Video practice systems and methods |
| TWI790152B (en) * | 2022-03-31 | 2023-01-11 | 博晶醫電股份有限公司 | Movement determination method, movement determination device and computer-readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| US20200058148A1 (en) | 2020-02-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11017576B2 (en) | Reference model predictive tracking and rendering | |
| AU2014277220B2 (en) | Online modeling for real-time facial animation | |
| CN105981075B (en) | Efficient Facial Landmark Tracking Using Online Shape Regression Methods | |
| US8730246B2 (en) | Real-time goal space steering for data-driven character animation | |
| CN111540055A (en) | Three-dimensional model driving method, device, electronic device and storage medium | |
| US11763508B2 (en) | Disambiguation of poses | |
| Gan et al. | Omniavatar: Efficient audio-driven avatar video generation with adaptive body animation | |
| US11861777B2 (en) | Using a determined optimum pose sequence to generate a corresponding sequence of frames of animation of an animation character | |
| JP7564378B2 (en) | Robust Facial Animation from Video Using Neural Networks | |
| US10026210B2 (en) | Behavioral motion space blending for goal-oriented character animation | |
| US20250078377A1 (en) | Body tracking from monocular video | |
| CN119359870B (en) | Methods, apparatus, devices and storage media for generating multi-pose portrait animations | |
| EP4123588A1 (en) | Image processing device and moving-image data generation method | |
| CN117201754B (en) | Picture real-time switching method and related device based on software frame rate | |
| US20240135618A1 (en) | Generating artificial agents for realistic motion simulation using broadcast videos | |
| US20250005965A1 (en) | Extraction of human poses from video data for animation of computer models | |
| US20250069259A1 (en) | Real-time extraction of human poses from video for animation of avatars | |
| US20250218092A1 (en) | Linear blend skinning rig for animation | |
| US12555296B2 (en) | Adapting simulated character interactions to different morphologies and interaction scenarios | |
| US12505635B2 (en) | Determination and display of inverse kinematic poses of virtual characters in a virtual environment | |
| US20250363702A1 (en) | Composite avatar | |
| EP4648013A1 (en) | Adapting simulated character interactions to different morphologies and interaction scenarios | |
| US20250218120A1 (en) | Dynamic head generation for animation | |
| US20250218093A1 (en) | Dynamic head generation for animation | |
| Fang et al. | Dynamic 3D Human Body Shape and Pose Estimation from Multiple Views Based on DQN |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| AS | Assignment |
Owner name: VISYN INC., MINNESOTA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BLAYLOCK, ANDREW JOHN;THIELEN, JEFFREY;SIGNING DATES FROM 20190604 TO 20190611;REEL/FRAME:051676/0806 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: SURCHARGE FOR LATE PAYMENT, SMALL ENTITY (ORIGINAL EVENT CODE: M2554); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |