WO2011042230A1 - Head pose estimation - Google Patents

Head pose estimation Download PDF

Info

Publication number
WO2011042230A1
WO2011042230A1 PCT/EP2010/060431 EP2010060431W WO2011042230A1 WO 2011042230 A1 WO2011042230 A1 WO 2011042230A1 EP 2010060431 W EP2010060431 W EP 2010060431W WO 2011042230 A1 WO2011042230 A1 WO 2011042230A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
player
pose
estimate
body part
Prior art date
Application number
PCT/EP2010/060431
Other languages
French (fr)
Inventor
Andreas Launila
Josephine Sullivan
Eric Hayman
Martin Brogren
Original Assignee
Svenska Tracab Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Svenska Tracab Ab filed Critical Svenska Tracab Ab
Publication of WO2011042230A1 publication Critical patent/WO2011042230A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the invention relates to 3D reconstruction and analysis of team sport games. More specifically, the invention relates to estimating the pose of a body part of a team sport player.
  • the head of the player is located in a video frame and the sub-frame including the head is subsequently analyzed. If head pose estimation is performed on low-resolution video footage, the method must perform well with sub-frames of sizes about 20x20 pixels or smaller.
  • Known methods for low-resolution head pose estimation are based on skin detection in combination with support vectors machines (SVM), nearest neighbor models, neural networks, probabilistic models, tree-based models, and boosting.
  • Information from bottom-up head pose estimation may also be combined with top-down information, such as information about the
  • an object of the present invention to provide a method for estimating the pose of a body part of a team sport player using a machine learning technique.
  • a method for estimating the pose of a body part of a team sport player defined in independent claim 1 by means of a computer program product according to independent claim 12, and by means of a system for estimating the head pose of a football player according to independent claims 13 and 15.
  • Embodiments of the invention are characterized by the dependent claims.
  • a method for estimating the pose of a body part of a team sport player uses a machine learning technique.
  • the method comprises the steps of extracting a set of features from tracking data and determining an estimate for the pose.
  • the set of features comprises at least one of a position of the player and a position of a ball.
  • the estimate for the pose is determined by applying a trained classifier to the set of features.
  • the classifier is associated with the machine learning technique.
  • a computer program product comprises a computer usable medium that has a computer readable program code embodied therein.
  • the computer readable program code is adapted to be executed to implement the method according to the first aspect of the invention.
  • a system for estimating the pose of a body part of a team sport player using a machine learning technique comprises a tracking unit, a feature extracting unit, and an estimation unit.
  • the tracking unit is configured for determining at least one of the position of the player, the position of a ball, and the positions of other players.
  • the feature extracting unit is configured for extracting a first set of features.
  • the first set of features is extracted from the positions.
  • the estimation unit is configured for determining an estimate for the pose.
  • the estimate is determined by applying a trained classifier to the first set of features.
  • the classifier is associated with the machine learning technique.
  • the system further comprises a video camera and a body part appearance unit.
  • the tracking unit uses video frames received from the video camera.
  • the body part appearance unit is configured for analyzing an appearance of the body part.
  • the appearance is derived from the video frames.
  • the feature extracting unit is configured for extracting a first set of features and, according to the embodiment, also a second set of features.
  • the first set of features is extracted from the positions.
  • the second set of features is extracted from the appearance.
  • the feature extracting unit according to the embodiment is further configured for combining the first set of features and the second set of features.
  • the estimation unit is further configured for determining an estimate for the pose.
  • the estimate is determined by applying a trained classifier to the combined set of features.
  • the classifier is associated with the machine learning technique.
  • the system comprises a video camera, a tracking unit, a body part appearance unit, a feature extracting unit, and an estimation unit.
  • the tracking unit is configured for determining at least one of the position of the player, the position of a ball, and the positions of other players.
  • the tracking unit uses video frames received from the video camera.
  • the body part appearance unit is configured for analyzing an appearance of the body part.
  • the appearance is derived from the video frames.
  • the feature extracting unit is configured for extracting a first set of features and a second set of features. The first set of features is extracted from the positions. The second set of features is extracted from the appearance.
  • the estimation unit is configured for determining a first estimate for the pose and for determining a second estimate for the pose.
  • the first estimate is determined by applying a trained first classifier to the first set of features.
  • the second estimate is determined by applying a trained second classifier to the second set of features.
  • the first classifier is associated with a first machine learning technique and the second classifier are associated with a second machine learning technique.
  • the estimation unit is further configured for combining the first estimate and the second estimate.
  • An embodiment of the invention may use any machine learning technique, e.g., support vectors machines (SVM), nearest neighbor models, neural networks, probabilistic models, tree-based models, or boosting.
  • SVM support vectors machines
  • the body part may, e.g., be the head or the torso of the player.
  • the method may be applied to any team sport, e.g., football, handball, basketball, ice-hockey, or polo.
  • team sport e.g., football, handball, basketball, ice-hockey, or polo.
  • the present invention makes use of an understanding that the pose of a body part of a team sport player may be estimated using information pertaining to how the body part could be oriented. This is referred to a top- down approach. If, e.g., the head pose of a player is to be estimated, information pertaining to where the player could be looking is utilized. In this case the head pose of the player is used as an approximation for the direction in which the player is looking. Such information may, e.g., comprise the position of the player with respect to the playing field. From this information one may estimate, using machine learning methods, in which direction the player is most likely to look. It would, for instance, be more reasonable that a player who is located close to the other team's goal is looking towards that goal than in any other direction. The information may also comprise the position of a ball, or a puck, with respect to the player. In this case the player would be more likely to look in the direction of the ball than in any other direction.
  • the additional information may be derived from tracking data comprising at least the position of the player or the position of a ball.
  • the tracking data comprises both the position of the player and the position of a ball as a function of time.
  • Tracking data may be obtained by analyzing video footage, by utilizing GPS receivers which the players are equipped with, or by using transponders.
  • the method according to the first aspect of the invention is advantageous in that it is computationally light. Thus, it can be performed, if implemented on a computer, in real-time.
  • Estimated head poses may be utilized, e.g., for analyzing a game or for rendering 3D animations of the players.
  • a position is defined in an absolute manner with respect to a suitable coordinate system.
  • a coordinate system may, e.g., be defined with respect to the playing field.
  • a position can also be defined in a relative manner, e.g., with respect to a certain point of reference.
  • the ball has an absolute position with respect to the coordinate system. It is this absolute position which is extracted from tracking data.
  • the position of the ball with respect to the player i.e., the relative position of the ball
  • the position used in connection with machine learning techniques since it is this relative position that determines in which direction the player is likely to look.
  • Such relative positions can be calculated from absolute positions if the reference position is known. Instead of using a Cartesian coordinate system one may describe a relative position by a direction, i.e., by an angle with respect to a direction of reference, and a distance.
  • the direction of reference may, e.g., be a line of symmetry of the playing field, a camera viewing direction, or any other direction.
  • the pose of a body part may be described by an angle defined with respect to a direction of reference.
  • a number of bins are used for classification, each bin covering a certain angular range such that the total of all bins covers the whole range, i.e., 360 degrees.
  • a velocity is assumed to be a vector quantity, i.e., specifying the speed and the direction of motion.
  • the pose of a body part of a team sport player e.g., the head pose of a football player
  • a trained classifier of a machine learning technique e.g., the head pose of a football player
  • a classifier associated with a machine learning method may be trained in an supervised or in an unsupervised manner. If a supervised training is employed, a labeled set of features is employed for training.
  • the invention is in some cases described with respect to head poses, corresponding embodiments for estimating the pose of other body parts, in particular the torso, may be constructed. Further, the invention is in some cases described with reference to football, but embodiments for other team sports may be constructed.
  • the set of features further comprises positions of other players. Taking into account the positions of other players with respect to the player is advantageous in that a more reliable estimate of the player's head pose can be obtained, since the player is likely to watch the actions of the other players.
  • the number of other players which are taken into account may be limited to players which are within a certain distance from the player or within a certain region of interest. Such a region of interest may, e.g., be confined to the region between the player and one of the goals.
  • the set of features further comprises at least one of a velocity of the player, a velocity of the ball, and velocities of other players.
  • a velocity of the player, the ball, and other players i.e., the dynamical aspect of a game
  • the velocity of the player, the ball, and other players is advantageous in that a more reliable estimate of the head pose may be obtained as compared to considering only the static aspect of the game, i.e., the positions of the player, the ball, and the other players. For example, the player is more likely to look into its own direction of motion or towards a region of the playing field to where the ball is moving.
  • the tracking data is derived from video frames.
  • the video frames may, e.g., be derived from video footage of a team sport game.
  • the video frames may be extracted from low-resolution video footage. This is advantageous since video footage is easily obtained from one or several video cameras placed nearby the playing field.
  • Video cameras may, e.g., be placed outside the playing field such that a side-view is obtained, or they may be placed over the playing field such that a top-view is obtained.
  • Tracking data may also be obtained by other means, for example using GPS based or transponder based tracking devices which the players are equipped with. Tracking data obtained from different sources may be combined.
  • the distance between the player and another entity e.g., the ball, the puck, or another player, as well as the direction of the other entity with respect to the player, can be obtained.
  • velocities may be derived from tracking data extracted from a set of sequential video frames.
  • the set of features further comprises a camera angle. Taking into account the viewing angle of the camera, which produced the video footage from which the tracking data is derived, allows to compensate for different camera angles if video footage from several cameras is used.
  • the set of features further comprises features strongly linked to the team sport.
  • the set of features may comprise the position of a goal, a basket, or a net, the position and/or velocity of one or several referees, and the position of one or several coaches.
  • Further examples for additional features are the position of the ball relative to the estimated player's goal, the side of the ball the player is on relative to his goal, i.e., behind or in front of the ball, the estimated head pose of the player in possession of the ball, the distance between the player in possession of the ball to the estimated player's goal, the team in possession of the ball, if the team of the estimated player is attacking, defending or neither, the position of the estimated player relative to his defending goal, the strategic position of the estimated player, e.g., attacker, goal keeper, inner midfield, the head pose estimation of all other players in the same team and/or in the opposite team as the estimated player, and the head pose estimation of the other players which are nearby the estimated player.
  • Using features which are strongly linked to the team sport is advantageous since it reduces the prediction error in estimating the head pose by taking into account the attacking and defending aspect of the game, which is the core of many team sports such as football.
  • the set of features further comprises features pertaining to an appearance of the body part.
  • the features pertaining to the team sport are merged with features pertaining to appearance of a body part, e.g., the head.
  • the merged set of features may then be used with a common trained classifier. Merging the features from the top-down approach with the features from the bottom-up approach and using a common trained classifier is advantageous in that the prediction error for estimating the pose may be reduced.
  • the estimate for the pose is combined with an estimate for the pose determined using the appearance of the body part.
  • Combining the estimate from a top-down approach with an estimate from a bottom-up approach is advantageous since it may decrease the prediction error for estimating the pose.
  • the estimate for the pose determined using the appearance of the body part may be determined using the same machine learning technique as the estimate obtained from tracking data. The two estimates may also be obtained using different machine learning techniques.
  • two separate trained classifiers are used.
  • One classifier is used for situations when the set of features comprises the position of a ball.
  • Another, independent, trained classifier is used for situations when the set of features does not comprise the position of a ball.
  • Using separate trained classifiers is advantageous since it reduces the prediction error in estimating the pose by taking into account the presence of the ball.
  • the ball may, e.g., be absent from the tracking data if the ball could not be identified in the underlying video footage. This might, e.g., be the case if the ball is obscured by a player, or if the ball is
  • a plurality of separate trained classifiers is used for different types of situation a game is in.
  • Different types of situations may, e.g., be corner-kick, throw-in, counter-attack, penalty, cross, free-kick, ball out-of-play or ball in-play, goal attempt, attacking, and defending.
  • Using several separate trained classifiers is advantageous since it reduces the prediction error in estimating the pose by taking into account the situation the game is in.
  • the player's attention might be focused on the ball in some situations whereas he is more likely to look at the goal keeper in other situations.
  • the poses of several body parts are estimated jointly. This is advantageous since the poses of body parts may be correlated due to human anatomy. This is, e.g., the case for the head and the torso of a team sport player.
  • Fig. 1 shows a football field.
  • Fig. 2 shows a system in accordance with an embodiment of the invention.
  • Fig. 1 shows a football field 100 with players 1 10 1 — 1 10 5 of a first team, players 120 1 -120 5 of a second team, and a football 130. For simplicity, only five players of each team are shown.
  • An embodiment of the invention may be used to estimate the head pose of a football player, e.g., player 1 10 5 , using a machine learning technique.
  • the head pose may, e.g., be defined as the angle of the player's head with respect to the touch line 101 . However, any other direction may be used as reference.
  • bins typically are used for the classification. As an example, one may use eight bins, each bin covering an angular range of 45 degrees.
  • the head pose of player 1 10 5 is estimated by applying a trained classifier to a set of features extracted from tracking data.
  • the set of features comprises at least the position of the player 1 10 5 or the position of the ball 130, and preferably both.
  • the set of features may further comprise the position of other players, e.g., the positions of the players 120 1 -120 5 of the other team and/or the positions of the players 1 10 1 — 1 10 4 of the player's own team.
  • the set of features may also comprise further features, such as the velocities of the player, the ball, and the other players, or any other feature linked to football.
  • the result of the estimation i.e., applying a machine learning method to the set of features, is the bin with the largest likelihood. This bin
  • trained SVM classifiers may be used for estimating the head pose.
  • SVMs are a reliable and fast machine learning technique.
  • machine learning techniques such as nearest neighbor models, neural networks, probabilistic models, tree-based models, or boosting, may also be used.
  • the trained classifier used for estimating the head pose may be trained on tracking data using a supervised or an unsupervised approach. If a supervised approach is used, tracking data extracted from video footage may be used in connection with a supervisor identifying the player's head poses by inspection of the video frames.
  • System 200 comprises a video camera 201 , a tracking unit 202, a body part appearance unit 203, a feature extracting unit 204, and an estimation unit 205.
  • the video camera 201 may be used to generate video footage of a football match, e.g., the scenery sketched in Fig. 1 .
  • the tracking unit 202 may use the video footage for extracting tracking data, e.g., the positions and/or velocities of the players and the ball.
  • the body part appearance unit 203 may analyze the head appearance in a bottom-up approach using a machine learning technique.
  • the feature extracting unit 204 may extract a first set of features, pertaining to the positions and/or velocities of the player, the ball, and the other players, and a second set of features, pertaining to the appearance of the head.
  • the feature extracting unit 204 may further merge the first and the second set of features.
  • the estimation unit 205 estimates the head pose, i.e., the most likely bin, using a machine learning technique. This is achieved by applying a trained classifier of the machine learning technique to the merged set of features obtained from the feature extracting unit 205.
  • the system described with reference to Fig. 2 combines the bottom-up approach, using the appearance of the head as an input for the machine learning technique, with the top-down approach, using the positions and/or velocities of the player, the ball, and the other players as an input.
  • System 200 achieves this by merging the two sets of features, one pertaining to the bottom-up approach and one pertaining to the top-down approach, respectively, and by applying a trained classifier to the merged set of features.
  • the advantage of combining the two approaches is that a more reliable estimate may be obtained, i.e., the prediction error may be reduced and ambiguities may be resolved.
  • the bottom-up approach and the top-down approach may be combined in a different way, in accordance with another embodiment of the invention.
  • the estimating unit 205 may apply two different trained classifiers separately to the two set of features and combine the obtained head pose estimates.
  • the estimating unit 205 may apply a first trained classifier to the first set of features obtained from the feature extracting unit 204 to obtain a first estimate for the head pose from tracking data, i.e., an estimate obtained from a top-down approach.
  • the estimating unit 205 may apply a second trained classifier to the second set of features obtained from the feature extracting unit 204 to obtain a second estimate for the head pose from the appearance of the head, i.e., an estimate obtained from a bottom-up approach.
  • the estimating unit 205 may combine the two estimates, e.g., by calculating a weighted average.
  • the advantage of combining the two approaches is that a more reliable estimate may be obtained, i.e., the prediction error may be reduced and ambiguities may be resolved.
  • the two estimates may be obtained using the same machine learning technique.
  • the two estimates may also be obtained using two different machine learning techniques.
  • a system may be provided using only the top- down approach. Such a system would not need to include a video camera 201 or a body part appearance unit 203.
  • the system comprises a tracking unit 202, a feature extracting unit 204, and an estimation unit 205.
  • the tracking unit 202 may use tracking data from any type of tracking device, e.g., the positions and/or velocities of the players and the ball.
  • the feature extracting unit 204 may extract a first set of features, pertaining to the positions and/or velocities of the player, the ball, and the other players.
  • the estimation unit 205 estimates the head pose, i.e., the most likely bin, using a machine learning technique. This is achieved by applying a trained classifier of the machine learning technique to the first set of features obtained from the feature extracting unit 205.
  • a more reliable estimate of the pose of a body part may be obtained by estimating the poses of several body parts jointly.
  • One way of jointly estimating dependent labels is sequence labeling, where one assumes that the labels have a sequence structure.
  • This problem can be solved by a wide array of algorithms, including Conditional Random Fields (CRFs), Hidden Markov Models, Max Margin Markov Networks and Structured SVMs.
  • CRFs Conditional Random Fields
  • Hidden Markov Models Hidden Markov Models
  • Max Margin Markov Networks and Structured SVMs.
  • a wide range of approaches can be used and three examples are explained below.
  • the first straight-forward approach to joint estimation is to construct a new problem where one label is assigned to each possible combination of the old labels, transforming the joint classification problem into a multiclass problem.
  • the second approach is to use a CRF to estimate all labels while taking interdependencies into account.
  • a feature extraction step can be performed first where SVMs estimate the probability distribution independently. The probability estimates are then given to the CRF.
  • the third approach is to replace the CRF with a one SVM per body part, leading to a multi-layer SVM.
  • the SVMs are trained to predict the body part poses, given estimated probability distributions for the labels.
  • embodiments have been described with reference to the head pose of a football player, corresponding embodiments may be constructed for body parts other than heads and for team sports other than football.

Abstract

A method for estimating the pose of a body part of a team sport player using a machine learning technique is provided. The method comprises the steps of extracting a set of features from tracking data and determining an estimate for the pose by applying a trained classifier to the set of features. The set of features comprises at least one of the position of the player and the position of a ball. Further, a system (200) for estimating the pose of a body part of a team sport player is provided. The system comprises a video camera (201 ), a tracking unit (202), a body part appearance unit (203), a feature extracting unit (204), and an estimation unit (205).

Description

HEAD POSE ESTIMATION.
Field of the invention
The invention relates to 3D reconstruction and analysis of team sport games. More specifically, the invention relates to estimating the pose of a body part of a team sport player.
Background of the invention Known methods for estimating the head pose from video footage are based on the appearance of the player's head.
Typically, in a bottom-up approach, the head of the player is located in a video frame and the sub-frame including the head is subsequently analyzed. If head pose estimation is performed on low-resolution video footage, the method must perform well with sub-frames of sizes about 20x20 pixels or smaller. Known methods for low-resolution head pose estimation are based on skin detection in combination with support vectors machines (SVM), nearest neighbor models, neural networks, probabilistic models, tree-based models, and boosting.
Information from bottom-up head pose estimation may also be combined with top-down information, such as information about the
orientation of the player's body.
Summary of the invention
It is an object of the present invention to provide a more efficient alternative to the above techniques and prior art.
More specifically, it is an object of the present invention to provide a method for estimating the pose of a body part of a team sport player using a machine learning technique. These and other objects of the present invention are achieved by means of a method for estimating the pose of a body part of a team sport player defined in independent claim 1 , by means of a computer program product according to independent claim 12, and by means of a system for estimating the head pose of a football player according to independent claims 13 and 15. Embodiments of the invention are characterized by the dependent claims.
According to a first aspect of the invention, a method for estimating the pose of a body part of a team sport player is provided. The method uses a machine learning technique. The method comprises the steps of extracting a set of features from tracking data and determining an estimate for the pose. The set of features comprises at least one of a position of the player and a position of a ball. The estimate for the pose is determined by applying a trained classifier to the set of features. The classifier is associated with the machine learning technique.
According to a second aspect of the invention, a computer program product is provided. The computer program product comprises a computer usable medium that has a computer readable program code embodied therein. The computer readable program code is adapted to be executed to implement the method according to the first aspect of the invention.
According to a third aspect of the invention, a system for estimating the pose of a body part of a team sport player using a machine learning technique is provided. The system comprises a tracking unit, a feature extracting unit, and an estimation unit. The tracking unit is configured for determining at least one of the position of the player, the position of a ball, and the positions of other players. The feature extracting unit is configured for extracting a first set of features. The first set of features is extracted from the positions. The estimation unit is configured for determining an estimate for the pose. The estimate is determined by applying a trained classifier to the first set of features. The classifier is associated with the machine learning technique.
According to an embodiment of the invention, the system further comprises a video camera and a body part appearance unit. According to the embodiment, the tracking unit uses video frames received from the video camera. The body part appearance unit is configured for analyzing an appearance of the body part. The appearance is derived from the video frames. The feature extracting unit is configured for extracting a first set of features and, according to the embodiment, also a second set of features. The first set of features is extracted from the positions. The second set of features is extracted from the appearance. The feature extracting unit according to the embodiment is further configured for combining the first set of features and the second set of features. The estimation unit is further configured for determining an estimate for the pose. According to the embodiment, the estimate is determined by applying a trained classifier to the combined set of features. The classifier is associated with the machine learning technique.
According to a fourth aspect of the invention, another system for estimating the pose of a body part of team sport player using machine learning techniques is provided. The system comprises a video camera, a tracking unit, a body part appearance unit, a feature extracting unit, and an estimation unit. The tracking unit is configured for determining at least one of the position of the player, the position of a ball, and the positions of other players. The tracking unit uses video frames received from the video camera. The body part appearance unit is configured for analyzing an appearance of the body part. The appearance is derived from the video frames. The feature extracting unit is configured for extracting a first set of features and a second set of features. The first set of features is extracted from the positions. The second set of features is extracted from the appearance. The estimation unit is configured for determining a first estimate for the pose and for determining a second estimate for the pose. The first estimate is determined by applying a trained first classifier to the first set of features. The second estimate is determined by applying a trained second classifier to the second set of features. The first classifier is associated with a first machine learning technique and the second classifier are associated with a second machine learning technique. The estimation unit is further configured for combining the first estimate and the second estimate. An embodiment of the invention may use any machine learning technique, e.g., support vectors machines (SVM), nearest neighbor models, neural networks, probabilistic models, tree-based models, or boosting.
The body part may, e.g., be the head or the torso of the player.
The method may be applied to any team sport, e.g., football, handball, basketball, ice-hockey, or polo.
Even though the method is described with respect to a ball, it may also be applied to a team sport involving a similar item, such as a puck.
The present invention makes use of an understanding that the pose of a body part of a team sport player may be estimated using information pertaining to how the body part could be oriented. This is referred to a top- down approach. If, e.g., the head pose of a player is to be estimated, information pertaining to where the player could be looking is utilized. In this case the head pose of the player is used as an approximation for the direction in which the player is looking. Such information may, e.g., comprise the position of the player with respect to the playing field. From this information one may estimate, using machine learning methods, in which direction the player is most likely to look. It would, for instance, be more reasonable that a player who is located close to the other team's goal is looking towards that goal than in any other direction. The information may also comprise the position of a ball, or a puck, with respect to the player. In this case the player would be more likely to look in the direction of the ball than in any other direction.
The additional information may be derived from tracking data comprising at least the position of the player or the position of a ball.
Preferably, the tracking data comprises both the position of the player and the position of a ball as a function of time. Tracking data may be obtained by analyzing video footage, by utilizing GPS receivers which the players are equipped with, or by using transponders. The method according to the first aspect of the invention is advantageous in that it is computationally light. Thus, it can be performed, if implemented on a computer, in real-time.
Estimated head poses may be utilized, e.g., for analyzing a game or for rendering 3D animations of the players. For the purpose of describing the present invention, a position is defined in an absolute manner with respect to a suitable coordinate system. Such a coordinate system may, e.g., be defined with respect to the playing field. It will be appreciated that a position can also be defined in a relative manner, e.g., with respect to a certain point of reference. For example, the ball has an absolute position with respect to the coordinate system. It is this absolute position which is extracted from tracking data. On the other hand, the position of the ball with respect to the player, i.e., the relative position of the ball, is the position used in connection with machine learning techniques, since it is this relative position that determines in which direction the player is likely to look. Such relative positions can be calculated from absolute positions if the reference position is known. Instead of using a Cartesian coordinate system one may describe a relative position by a direction, i.e., by an angle with respect to a direction of reference, and a distance. The direction of reference may, e.g., be a line of symmetry of the playing field, a camera viewing direction, or any other direction.
The pose of a body part, e.g., the head pose, may be described by an angle defined with respect to a direction of reference. Typically, a number of bins are used for classification, each bin covering a certain angular range such that the total of all bins covers the whole range, i.e., 360 degrees.
A velocity is assumed to be a vector quantity, i.e., specifying the speed and the direction of motion.
To this end, the pose of a body part of a team sport player, e.g., the head pose of a football player, is estimated by applying a trained classifier of a machine learning technique to a set of features extracted from tracking data, the features pertaining to positions of the player, the ball or puck, and the other players.
A classifier associated with a machine learning method may be trained in an supervised or in an unsupervised manner. If a supervised training is employed, a labeled set of features is employed for training.
Even though the invention is in some cases described with respect to head poses, corresponding embodiments for estimating the pose of other body parts, in particular the torso, may be constructed. Further, the invention is in some cases described with reference to football, but embodiments for other team sports may be constructed.
According to an embodiment of the invention, the set of features further comprises positions of other players. Taking into account the positions of other players with respect to the player is advantageous in that a more reliable estimate of the player's head pose can be obtained, since the player is likely to watch the actions of the other players. The number of other players which are taken into account may be limited to players which are within a certain distance from the player or within a certain region of interest. Such a region of interest may, e.g., be confined to the region between the player and one of the goals.
According to an embodiment of the invention, the set of features further comprises at least one of a velocity of the player, a velocity of the ball, and velocities of other players. Taking into account the velocity of the player, the ball, and other players, i.e., the dynamical aspect of a game, is advantageous in that a more reliable estimate of the head pose may be obtained as compared to considering only the static aspect of the game, i.e., the positions of the player, the ball, and the other players. For example, the player is more likely to look into its own direction of motion or towards a region of the playing field to where the ball is moving.
According to an embodiment of the invention, the tracking data is derived from video frames. The video frames may, e.g., be derived from video footage of a team sport game. In particular, the video frames may be extracted from low-resolution video footage. This is advantageous since video footage is easily obtained from one or several video cameras placed nearby the playing field. Video cameras may, e.g., be placed outside the playing field such that a side-view is obtained, or they may be placed over the playing field such that a top-view is obtained. Tracking data may also be obtained by other means, for example using GPS based or transponder based tracking devices which the players are equipped with. Tracking data obtained from different sources may be combined. From the positions extracted from the tracking data, the distance between the player and another entity, e.g., the ball, the puck, or another player, as well as the direction of the other entity with respect to the player, can be obtained. Further, velocities may be derived from tracking data extracted from a set of sequential video frames.
According to an embodiment of the invention, the set of features further comprises a camera angle. Taking into account the viewing angle of the camera, which produced the video footage from which the tracking data is derived, allows to compensate for different camera angles if video footage from several cameras is used.
According to an embodiment of the invention, the set of features further comprises features strongly linked to the team sport. For example, the set of features may comprise the position of a goal, a basket, or a net, the position and/or velocity of one or several referees, and the position of one or several coaches. Further examples for additional features are the position of the ball relative to the estimated player's goal, the side of the ball the player is on relative to his goal, i.e., behind or in front of the ball, the estimated head pose of the player in possession of the ball, the distance between the player in possession of the ball to the estimated player's goal, the team in possession of the ball, if the team of the estimated player is attacking, defending or neither, the position of the estimated player relative to his defending goal, the strategic position of the estimated player, e.g., attacker, goal keeper, inner midfield, the head pose estimation of all other players in the same team and/or in the opposite team as the estimated player, and the head pose estimation of the other players which are nearby the estimated player. Using features which are strongly linked to the team sport is advantageous since it reduces the prediction error in estimating the head pose by taking into account the attacking and defending aspect of the game, which is the core of many team sports such as football.
According to another embodiment of the invention, the set of features further comprises features pertaining to an appearance of the body part. In other words, the features pertaining to the team sport are merged with features pertaining to appearance of a body part, e.g., the head. The merged set of features may then be used with a common trained classifier. Merging the features from the top-down approach with the features from the bottom-up approach and using a common trained classifier is advantageous in that the prediction error for estimating the pose may be reduced.
According to an embodiment of the invention, the estimate for the pose is combined with an estimate for the pose determined using the appearance of the body part. Combining the estimate from a top-down approach with an estimate from a bottom-up approach is advantageous since it may decrease the prediction error for estimating the pose. The estimate for the pose determined using the appearance of the body part may be determined using the same machine learning technique as the estimate obtained from tracking data. The two estimates may also be obtained using different machine learning techniques.
According to an embodiment of the invention, two separate trained classifiers are used. One classifier is used for situations when the set of features comprises the position of a ball. Another, independent, trained classifier is used for situations when the set of features does not comprise the position of a ball. Using separate trained classifiers is advantageous since it reduces the prediction error in estimating the pose by taking into account the presence of the ball. The ball may, e.g., be absent from the tracking data if the ball could not be identified in the underlying video footage. This might, e.g., be the case if the ball is obscured by a player, or if the ball is
indistinguishable from the background.
According to an embodiment of the invention, a plurality of separate trained classifiers is used for different types of situation a game is in. Different types of situations may, e.g., be corner-kick, throw-in, counter-attack, penalty, cross, free-kick, ball out-of-play or ball in-play, goal attempt, attacking, and defending. Using several separate trained classifiers is advantageous since it reduces the prediction error in estimating the pose by taking into account the situation the game is in. The behavior of a player with respect to where his attention is directed, i.e., in which direction he is most likely to look, depends on the situation the game is in. For example, the player's attention might be focused on the ball in some situations whereas he is more likely to look at the goal keeper in other situations. According to an embodiment of the invention, the poses of several body parts are estimated jointly. This is advantageous since the poses of body parts may be correlated due to human anatomy. This is, e.g., the case for the head and the torso of a team sport player.
Even though the invention has in some cases been described with reference to the method according to the first aspect of the invention, corresponding reasoning applies to the computer program product according to the second aspect of the invention and the system according to the third and the fourth aspect of the invention.
Further objectives of, features of, and advantages with, the present invention will become apparent when studying the following detailed disclosure, the drawings and the appended claims. Those skilled in the art realize that different features of the present invention can be combined to create embodiments other than those described in the following.
Brief description of the drawings
The above, as well as additional objects, features and advantages of the present invention, will be better understood through the following illustrative and non-limiting detailed description of embodiments of the present invention, with reference to the appended drawings, in which:
Fig. 1 shows a football field.
Fig. 2 shows a system in accordance with an embodiment of the invention.
All the figures are schematic, not necessarily to scale, and generally only show parts which are necessary in order to elucidate the invention, wherein other parts may be omitted or merely suggested.
Detailed description
Fig. 1 shows a football field 100 with players 1 101— 1 105 of a first team, players 1201-1205 of a second team, and a football 130. For simplicity, only five players of each team are shown. An embodiment of the invention may be used to estimate the head pose of a football player, e.g., player 1 105, using a machine learning technique. The head pose may, e.g., be defined as the angle of the player's head with respect to the touch line 101 . However, any other direction may be used as reference.
Typically, a number of bins are used for the classification. As an example, one may use eight bins, each bin covering an angular range of 45 degrees.
The head pose of player 1 105 is estimated by applying a trained classifier to a set of features extracted from tracking data. The set of features comprises at least the position of the player 1 105 or the position of the ball 130, and preferably both. The set of features may further comprise the position of other players, e.g., the positions of the players 1201-1205 of the other team and/or the positions of the players 1 101— 1 104 of the player's own team. The set of features may also comprise further features, such as the velocities of the player, the ball, and the other players, or any other feature linked to football.
The result of the estimation, i.e., applying a machine learning method to the set of features, is the bin with the largest likelihood. This bin
corresponds to the angular range of the head pose of player 1 105 which is most likely.
For instance, trained SVM classifiers may be used for estimating the head pose. SVMs are a reliable and fast machine learning technique.
However, other machine learning techniques, such as nearest neighbor models, neural networks, probabilistic models, tree-based models, or boosting, may also be used.
The trained classifier used for estimating the head pose may be trained on tracking data using a supervised or an unsupervised approach. If a supervised approach is used, tracking data extracted from video footage may be used in connection with a supervisor identifying the player's head poses by inspection of the video frames.
With reference to Fig. 2, a system 200 in accordance with an embodiment of the invention is described. System 200 comprises a video camera 201 , a tracking unit 202, a body part appearance unit 203, a feature extracting unit 204, and an estimation unit 205.
The video camera 201 may be used to generate video footage of a football match, e.g., the scenery sketched in Fig. 1 . The tracking unit 202 may use the video footage for extracting tracking data, e.g., the positions and/or velocities of the players and the ball. The body part appearance unit 203 may analyze the head appearance in a bottom-up approach using a machine learning technique. The feature extracting unit 204 may extract a first set of features, pertaining to the positions and/or velocities of the player, the ball, and the other players, and a second set of features, pertaining to the appearance of the head. The feature extracting unit 204 may further merge the first and the second set of features. The estimation unit 205 estimates the head pose, i.e., the most likely bin, using a machine learning technique. This is achieved by applying a trained classifier of the machine learning technique to the merged set of features obtained from the feature extracting unit 205.
The system described with reference to Fig. 2 combines the bottom-up approach, using the appearance of the head as an input for the machine learning technique, with the top-down approach, using the positions and/or velocities of the player, the ball, and the other players as an input.
System 200 achieves this by merging the two sets of features, one pertaining to the bottom-up approach and one pertaining to the top-down approach, respectively, and by applying a trained classifier to the merged set of features. The advantage of combining the two approaches is that a more reliable estimate may be obtained, i.e., the prediction error may be reduced and ambiguities may be resolved.
As an alternative to the embodiment of system 200 described above, the bottom-up approach and the top-down approach may be combined in a different way, in accordance with another embodiment of the invention.
Instead of merging the two sets of features and applying a trained classifier to the merged set of features, the estimating unit 205 may apply two different trained classifiers separately to the two set of features and combine the obtained head pose estimates. In other words, the estimating unit 205 may apply a first trained classifier to the first set of features obtained from the feature extracting unit 204 to obtain a first estimate for the head pose from tracking data, i.e., an estimate obtained from a top-down approach. The estimating unit 205 may apply a second trained classifier to the second set of features obtained from the feature extracting unit 204 to obtain a second estimate for the head pose from the appearance of the head, i.e., an estimate obtained from a bottom-up approach. The estimating unit 205 may combine the two estimates, e.g., by calculating a weighted average. The advantage of combining the two approaches is that a more reliable estimate may be obtained, i.e., the prediction error may be reduced and ambiguities may be resolved. The two estimates may be obtained using the same machine learning technique. The two estimates may also be obtained using two different machine learning techniques.
As a further alternative, a system may be provided using only the top- down approach. Such a system would not need to include a video camera 201 or a body part appearance unit 203. The system comprises a tracking unit 202, a feature extracting unit 204, and an estimation unit 205. The tracking unit 202 may use tracking data from any type of tracking device, e.g., the positions and/or velocities of the players and the ball. The feature extracting unit 204 may extract a first set of features, pertaining to the positions and/or velocities of the player, the ball, and the other players. The estimation unit 205 estimates the head pose, i.e., the most likely bin, using a machine learning technique. This is achieved by applying a trained classifier of the machine learning technique to the first set of features obtained from the feature extracting unit 205.
Finally, a more reliable estimate of the pose of a body part may be obtained by estimating the poses of several body parts jointly.
One way of jointly estimating dependent labels is sequence labeling, where one assumes that the labels have a sequence structure. This problem can be solved by a wide array of algorithms, including Conditional Random Fields (CRFs), Hidden Markov Models, Max Margin Markov Networks and Structured SVMs. A wide range of approaches can be used and three examples are explained below. The first straight-forward approach to joint estimation is to construct a new problem where one label is assigned to each possible combination of the old labels, transforming the joint classification problem into a multiclass problem.
The second approach is to use a CRF to estimate all labels while taking interdependencies into account. To avoid the training time incurred by training it with all features, a feature extraction step can be performed first where SVMs estimate the probability distribution independently. The probability estimates are then given to the CRF.
The third approach is to replace the CRF with a one SVM per body part, leading to a multi-layer SVM. The SVMs are trained to predict the body part poses, given estimated probability distributions for the labels.
Even though embodiments have been described with reference to the head pose of a football player, corresponding embodiments may be constructed for body parts other than heads and for team sports other than football.
The person skilled in the art realizes that the present invention by no means is limited to the embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims. For example, different classifiers may be used for different players or different teams. Further, posterior distributions may be calculated as an output from the machine learning method, and an estimate may be obtained by analyzing the posterior distribution. The prediction error of the body part pose estimate may be reduced by adding additional features to the set of features. Even though a system comprising a video camera has been described, video footage may also be obtained from other sources, such as television video footage.

Claims

1 . A method for estimating the pose of a body part of a team sport player using a machine learning technique, the method comprising the steps of:
extracting a set of features from tracking data, said set of features comprising at least one of a position of said player and a position of a ball, and
determining an estimate for said pose by applying a trained classifier to said set of features, said classifier being associated with the machine learning technique.
2. The method according to claim 1 , wherein said set of features further comprises positions of other players.
3. The method according to claim 1 , wherein said set of features further comprises at least one of a velocity of said player, a velocity of said ball, and velocities of said other players.
4. The method according to any one of the claims 1 to 3, wherein said tracking data is derived from video frames.
5. The method according to claim 1 , wherein said set of features further comprises a camera angle.
6. The method according to claim 1 , wherein said set of features further comprises features strongly linked to said team sport.
7. The method according to claim 1 , wherein said set of features further comprises features pertaining to an appearance of said body part.
8. The method according to claim 1 , wherein the estimate for said pose is combined with an estimate for said pose determined using the appearance of said body part.
9. The method according to claim 1 , wherein two separate trained classifiers are used, one for situations when said set of features comprises the position of a ball, and one for situations when said set of features does not comprise the position of a ball.
10. The method according to claim 1 , wherein a plurality of separate trained classifiers is used for different types of situations a game is in.
1 1 . The method according to claim 1 , wherein the poses of several body parts are estimated jointly.
12. A computer program product, comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement the method according to any one of the claims 1 to 1 1 .
13. A system (200) for estimating the pose of a body part of a team sport player using a machine learning technique, the system comprising: a tracking unit (202) configured for determining at least one of the position of the player, the position of a ball, and the positions of other players, a feature extracting unit (204) configured for extracting a first set of features from said positions, and
an estimation unit (205) configured for determining an estimate for said pose by applying a trained classifier to said first set of features, said classifier being associated with the machine learning technique.
14. The system of claim 13, further comprising:
a video camera (201 ), and a body part appearance unit (203) configured for analyzing an appearance of the body part, said appearance being derived form said video frames,
wherein:
said tracking unit (202) is further configured to use video frames received from said video camera for determining said at least one of the position of the player, the position of a ball, and the positions of other players, said feature extracting unit (204) is further configured to extract a second set of features from said appearance, and for combining said first set of features and said second set of features, and
said estimation unit (205) is further configured for determining an estimate for said pose by applying a trained classifier to the combined set of features, said classifier being associated with the machine learning technique.
15. A system (200) for estimating the pose of a body part of a team sport player using machine learning techniques, the system comprising:
a video camera (201 ),
a tracking unit (202) configured for determining at least one of the position of the player, the position of a ball, and the positions of other players, using video frames received from said video camera,
a body part appearance unit (203) configured for analyzing an appearance of the body part, said appearance being derived form said video frames,
a feature extracting unit (204) configured for extracting a first set of features from said positions and a second set of features from said
appearance,
an estimation unit (205) configured for determining a first estimate for said pose by applying a trained first classifier to said first set of features, for determining a second estimate for said pose by applying a trained second classifier to said second set of features, said first classifier being associated with a first machine learning technique and said second classifier being associated with a second machine learning technique, and for combining said first estimate and said second estimate.
PCT/EP2010/060431 2009-10-08 2010-07-19 Head pose estimation WO2011042230A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE0950740-1 2009-10-08
SE0950740 2009-10-08

Publications (1)

Publication Number Publication Date
WO2011042230A1 true WO2011042230A1 (en) 2011-04-14

Family

ID=43086162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2010/060431 WO2011042230A1 (en) 2009-10-08 2010-07-19 Head pose estimation

Country Status (1)

Country Link
WO (1) WO2011042230A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016025605A1 (en) * 2014-08-12 2016-02-18 Board Tracking Technologies, Llc Action sports tracking system and method
CN108920999A (en) * 2018-04-16 2018-11-30 深圳市深网视界科技有限公司 A kind of head angle prediction model training method, prediction technique, equipment and medium
US20220012988A1 (en) * 2020-07-07 2022-01-13 Nvidia Corporation Systems and methods for pedestrian crossing risk assessment and directional warning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141941A1 (en) * 2007-12-04 2009-06-04 Sony Corporation, Image processing apparatus and method for estimating orientation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090141941A1 (en) * 2007-12-04 2009-06-04 Sony Corporation, Image processing apparatus and method for estimating orientation

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
"Karl Malone - John Stockton Documentary", 2000, Retrieved from the Internet <URL:http://www.youtube.com/watch?v=zChWYau6SyI> [retrieved on 20101122] *
BYUN H ET AL: "A SURVEY ON PATTERN RECOGNITION APPLICATIONS OF SUPPORT VECTOR MACHINES", INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, WORLD SCIENTIFIC PUBLISHING, SI, vol. 17, no. 3, 1 May 2003 (2003-05-01), pages 459 - 486, XP001171790, ISSN: 0218-0014, DOI: DOI:10.1142/S0218001403002460 *
D'ORAZIO, T. AND LEO, M.: "A review of vision-based systems for soccer video analysis", PATTERN RECOGNITION, vol. 43, no. 8, 2010, pages 2911 - 2926, XP002612059, Retrieved from the Internet <URL:http://linkinghub.elsevier.com/retrieve/pii/S0031320310001299> [retrieved on 20101125] *
DUDA R ET AL: "Pattern Classification, chapter 9: Algorithm-Independent Machine Learning", 1 January 2001, PATTERN CLASSIFICATION, NEW YORK, JOHN WILEY & SONS, US, PAGE(S) 453 - 515, ISBN: 978-0-471-05669-0, XP002472161 *
JAIN A K ET AL: "STATISTICAL PATTERN RECOGNITION: A REVIEW", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 22, no. 1, 1 January 2000 (2000-01-01), pages 4 - 37, XP000936788, ISSN: 0162-8828, DOI: DOI:10.1109/34.824819 *
JOSEPH DEPASQUALE ET AL: "Random Feature Subset Selection for Ensemble Based Classification of Data with Missing Features", 23 May 2007, MULTIPLE CLASSIFIER SYSTEMS; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 251 - 260, ISBN: 978-3-540-72481-0, XP019079711 *
KRAUSE S ET AL: "An ensemble of classifiers approach for the missing feature problem", IJCNN 2003. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003. PORTLAND, OR, JULY 20 - 24, 2003; [INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS], NEW YORK, NY : IEEE, US, vol. 1, 20 July 2003 (2003-07-20), pages 553 - 558, XP010652464, ISBN: 978-0-7803-7898-8, DOI: DOI:10.1109/IJCNN.2003.1223406 *
KRYSTIAN MIKOLAJCZYK ET AL: "Human Detection Based on a Probabilistic Assembly of Robust Part Detectors", 22 April 2004, COMPUTER VISION - ECCV 2004; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, PAGE(S) 69 - 82, ISBN: 978-3-540-21984-2, XP019005811 *
LAUNILA A: "Real-time head pose estimation in low-resolution football footage", MASTER'S THESIS IN COMPUTER SCIENCE AT THE SCHOOL OF COMPUTER SCIENCE AND ENGINEERING ROYAL INSTITUTE OF TECHNOLOGY,, 3 February 2010 (2010-02-03), pages 1 - 61, XP008129591, Retrieved from the Internet <URL:http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2009/ra pporter09/launila_andreas_09130.pdf> *
LAUNILA, A. AND SULLIVAN, J.: "Contextual Features for Head Pose Estimation in Football Games", 23 August 2010 (2010-08-23), XP002611804, Retrieved from the Internet <URL:http://www.icpr2010.org/pdfs/icpr2010_MoBT8.15.pdf> [retrieved on 20101122] *
MURPHY-CHUTORIAN E ET AL: "Head Pose Estimation in Computer Vision: A Survey", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 31, no. 4, 1 April 2009 (2009-04-01), pages 607 - 626, XP011266518, ISSN: 0162-8828, DOI: DOI:10.1109/TPAMI.2008.106 *
NEIL ROBERTSON ET AL: "Estimating Gaze Direction from Low-Resolution Faces in Video", 1 January 2006, COMPUTER VISION - ECCV 2006 LECTURE NOTES IN COMPUTER SCIENCE;;LNCS, SPRINGER, BERLIN, DE, PAGE(S) 402 - 415, ISBN: 978-3-540-33834-5, XP019036455 *
R. RONFARD, C. SCHMID, B. TRIGGS: "Learning to Parse Pictures of People", 2002, pages 700 - 714, XP002611803, Retrieved from the Internet <URL:http://www.springerlink.com/content/nx7njmr1073prh0b/> [retrieved on 20101122] *
RAMANAN, D.: "Learning to parse images of articulated bodies", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, vol. 19, 2007, pages 1 - 8, XP002611802, Retrieved from the Internet <URL:http://www.ics.uci.edu/~dramanan/papers/parse.pdf> [retrieved on 20101122] *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016025605A1 (en) * 2014-08-12 2016-02-18 Board Tracking Technologies, Llc Action sports tracking system and method
CN108920999A (en) * 2018-04-16 2018-11-30 深圳市深网视界科技有限公司 A kind of head angle prediction model training method, prediction technique, equipment and medium
US20220012988A1 (en) * 2020-07-07 2022-01-13 Nvidia Corporation Systems and methods for pedestrian crossing risk assessment and directional warning
US11682272B2 (en) * 2020-07-07 2023-06-20 Nvidia Corporation Systems and methods for pedestrian crossing risk assessment and directional warning

Similar Documents

Publication Publication Date Title
US10824918B2 (en) System and method for predictive sports analytics using body-pose information
Shih A survey of content-aware video analysis for sports
Zhang et al. Martial arts, dancing and sports dataset: A challenging stereo and multi-view dataset for 3d human pose estimation
Ke et al. A review on video-based human activity recognition
Ahad Motion history images for action recognition and understanding
Andriluka et al. Monocular 3d pose estimation and tracking by detection
US11379683B2 (en) System and method for generating trackable video frames from broadcast video
Ikizler-Cinbis et al. Object, scene and actions: Combining multiple features for human action recognition
JP6525453B2 (en) Object position estimation system and program thereof
Wang et al. Take your eyes off the ball: Improving ball-tracking by focusing on team play
US20150347918A1 (en) Future event prediction using augmented conditional random field
Morimitsu et al. Exploring structure for long-term tracking of multiple objects in sports videos
Morais et al. A multiple camera methodology for automatic localization and tracking of futsal players
Meng et al. A video information driven football recommendation system
Nibali et al. Extraction and classification of diving clips from continuous video footage
Ommer et al. Seeing the objects behind the dots: Recognition in videos from a moving camera
Zhu et al. Action recognition in broadcast tennis video using optical flow and support vector machine
Liu et al. Detecting and tracking sports players with random forests and context-conditioned motion models
WO2011042230A1 (en) Head pose estimation
Naik et al. Ball and player detection & tracking in soccer videos using improved yolov3 model
Needham Tracking and modelling of team game interactions
Wang A novel and effective short track speed skating tracking system
Neher et al. Hyperstacknet: A hyper stacked hourglass deep convolutional neural network architecture for joint player and stick pose estimation in hockey
Zhang et al. Motion trajectory tracking of athletes with improved depth information-based KCF tracking method
Felsen Learning to predict human behavior from video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10740183

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27/07/12)

122 Ep: pct application non-entry in european phase

Ref document number: 10740183

Country of ref document: EP

Kind code of ref document: A1