CN113508604A - System and method for generating trackable video frames from broadcast video - Google Patents
System and method for generating trackable video frames from broadcast video Download PDFInfo
- Publication number
- CN113508604A CN113508604A CN202080017542.3A CN202080017542A CN113508604A CN 113508604 A CN113508604 A CN 113508604A CN 202080017542 A CN202080017542 A CN 202080017542A CN 113508604 A CN113508604 A CN 113508604A
- Authority
- CN
- China
- Prior art keywords
- frames
- trackable
- frame
- cluster
- computing system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000000513 principal component analysis Methods 0.000 claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims description 29
- 230000015654 memory Effects 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 description 150
- 230000008520 organization Effects 0.000 description 40
- 230000033001 locomotion Effects 0.000 description 30
- 238000010586 diagram Methods 0.000 description 28
- 239000011159 matrix material Substances 0.000 description 19
- 230000003287 optical effect Effects 0.000 description 14
- 238000001514 detection method Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 239000013598 vector Substances 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000006073 displacement reaction Methods 0.000 description 6
- 238000003064 k means clustering Methods 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000007796 conventional method Methods 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 238000009434 installation Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000000386 athletic effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004138 cluster model Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/97—Determining parameters from multiple pictures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
- H04N21/2353—Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/26603—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8455—Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30221—Sports video; Sports image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/44—Event detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Library & Information Science (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present disclosure provides a system and method for generating trackable frames from a broadcast video feed. The computing system retrieves a broadcast video feed of the sporting event. The broadcast video feed includes a plurality of video frames. A computing system generates a set of frames for classification using a principal component analysis model. The set of frames is a subset of a plurality of video frames. The computing system divides each frame in the set of frames into a plurality of clusters. The computing system classifies each frame of the plurality of frames as trackable or non-trackable. The trackable frames capture a unified view of the sporting event. The computing system compares each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames. The computing system classifies each cluster including at least a threshold number of trackable frames as trackable.
Description
Priority file
This application claims priority to U.S. provisional application having application number US 62/811,889, filed on 28.02/2019, the entire contents of which are incorporated herein by reference.
Technical Field
The present disclosure relates generally to systems and methods for generating trackable video frames from broadcast video.
Background
Athlete tracking data has been used in many sports for many years for team and athlete analysis. However, conventional player tracking systems require the sports analysis company to install a fixed camera at each field of the team competition. This constraint limits the scalability of the athlete tracking system and limits the data collection for current competitions. In addition, such constraints can incur significant expenses for the sports analysis company due to the installation of hardware at the necessary sites and the associated costs of maintaining such hardware.
Disclosure of Invention
In some embodiments, a method of generating trackable frames from a broadcast video feed is disclosed herein. The computing system retrieves a broadcast video feed of the sporting event. The broadcast video feed includes a plurality of video frames. A computing system generates a set of frames for classification using a principal component analysis model. The set of frames is a subset of a plurality of video frames. The computing system divides each frame in the set of frames into a plurality of clusters. The computing system classifies each frame of the plurality of frames as trackable or non-trackable. The trackable frames capture a unified view of the sporting event. The computing system compares each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames. The computing system classifies each cluster including at least a threshold number of trackable frames as trackable.
In some embodiments, a system for generating trackable frames from a broadcast video feed is disclosed herein. The system includes a processor and a memory. The memory has stored thereon programming instructions that, when executed by the processor, perform one or more operations. The one or more operations include retrieving a broadcast video feed of the sporting event. The broadcast video feed includes a plurality of video frames. The one or more operations further include generating a set of frames for classification using the principal component analysis model. The set of frames is a subset of a plurality of video frames. The one or more operations further include dividing each frame of the set of frames into a plurality of clusters. The one or more operations further include classifying each frame of the plurality of frames as trackable or non-trackable. The trackable frames capture a unified view of the sporting event. The one or more operations further include comparing each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames. The one or more operations further comprise classifying each cluster comprising at least a threshold number of trackable frames as trackable.
In some embodiments, a non-transitory computer-readable medium is disclosed herein. The non-transitory computer-readable medium includes one or more sequences of instructions which, when executed by one or more processors, perform one or more operations. The one or more operations include retrieving a broadcast video feed of the sporting event. The broadcast video feed includes a plurality of video frames. The one or more operations further include generating a set of frames for classification using the principal component analysis model. The set of frames is a subset of a plurality of video frames. The one or more operations further include dividing each frame of the set of frames into a plurality of clusters. The one or more operations further include classifying each frame of the plurality of frames as trackable or non-trackable. The trackable frames capture a unified view of the sporting event. The one or more operations further include comparing each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames. The one or more operations further comprise classifying each cluster comprising at least a threshold number of trackable frames as trackable.
Drawings
So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments.
FIG. 1 is a block diagram illustrating a computing environment, according to an example embodiment.
FIG. 2 is a block diagram illustrating a computing environment, according to an example embodiment.
Fig. 3 is a block diagram illustrating operational aspects discussed in context with fig. 2 and 4-10 in accordance with example embodiments.
FIG. 4 is a flow diagram illustrating a method of generating an athlete track, according to an example embodiment.
Fig. 5 is a flow diagram illustrating a method of generating trackable frames according to an example embodiment.
Fig. 6 is a block diagram illustrating operational aspects discussed above in connection with fig. 5, according to an example embodiment.
Fig. 7 is a flow chart illustrating a method of calibrating a camera for each trackable frame according to an example embodiment.
Fig. 8 is a block diagram illustrating operational aspects discussed above in connection with fig. 7 in accordance with an example embodiment.
FIG. 9 is a flow chart illustrating a method of tracking an athlete according to an example embodiment.
FIG. 10 is a flow chart illustrating a method of tracking an athlete according to an example embodiment.
Fig. 11 is a block diagram illustrating aspects of the operations discussed above in connection with fig. 10 according to an example embodiment.
FIG. 12 is a block diagram illustrating an architecture of a re-identification agent, according to an example embodiment.
Fig. 13A is a block diagram illustrating a computing device, according to an example embodiment.
Fig. 13B is a block diagram illustrating a computing device, according to an example embodiment.
To facilitate understanding, identical reference numerals have been used, where appropriate, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation.
Detailed Description
Athlete tracking data has been a valuable resource for leagues and teams to evaluate not only the team itself, but also the athletes in the team. Conventional methods of collecting or generating athlete tracking data are limited and rely on the installation of stationary cameras at the venue where the sporting event is to be held. In other words, conventional methods of team collecting or generating athlete tracking data require that the team be equipped with a fixed camera system for each court. Those skilled in the art recognize that such constraints severely limit the scalability of the athlete tracking system. In addition, such constraints may limit player tracking data to games played after installation of a fixed camera system, as historical player tracking data is not available at all.
Because a fixed camera system is no longer required, one or more techniques described in this disclosure provide a significant improvement over conventional systems. Specifically, one or more techniques described in this disclosure are directed to generating athlete tracking data using a broadcast video feed of a sporting event. By utilizing a broadcast video feed of a sporting event, not only is there no need to use a dedicated fixed camera system in each venue, but historical player tracking data can now be generated from historical sporting events.
However, utilizing a broadcast video feed of a sporting event is not easy. For example, included in the broadcast video feed may be various different camera angles, player features, crowd features, coach features, commentator videos, advertisements, stadium performances, and so forth. In view of the above, to address the above, one or more techniques described in this disclosure involve cropping or dividing a broadcast video feed into its component parts, e.g., different scenes in a movie or advertisements in a basketball game. By clipping or dividing the broadcast video feed into its component parts, the overall system can better understand the information presented to it so that the system can more efficiently extract data from the underlying broadcast video feed.
FIG. 1 is a block diagram illustrating a computing environment 100, according to an example embodiment. The computing environment 100 may include a camera system 102, an organization computing system 104, and one or more client devices 108 that communicate via a network 105.
The network 105 may be of any suitable type, including individual connections via the internet, such as a cellular or Wi-Fi network. In some embodiments, the network 105 may use techniques such as Radio Frequency Identification (RFID), Near Field Communication (NFC), BluetoothTMBluetooth with low power consumptionTM(BLE)、Wi-FiTM、ZigBeeTMEnvironment backscatter communications (ABC) protocol, USB, WAN or LAN, etc. direct connections to connect terminals, services and mobile devices. Since the information transferred may be personalized or confidential, security issues may require that one or more of these types of connections be encrypted or otherwise secured. In some embodiments, however, the information transferred may not be so personalized, and thus the network connection may be selected for convenience rather than security.
The network 105 may include any type of computer network arrangement for exchanging data or information. For example, the network 105 may be the Internet, a private data network, a virtual private network using a public network and/or other suitable connections, enabling components in the computing environment 100 to send and receive information between components of the environment 100.
The camera system 102 may be positioned in a venue 106. For example, the venue 106 may be configured to host a sporting event that includes one or more players 112. The camera system 102 may be configured to capture the motion of all players (i.e., players) on a playing surface as well as one or more other related objects (e.g., a ball, a referee, etc.). In some embodiments, the camera system 102 may be an optical-based system, for example, using multiple fixed cameras. For example, a system consisting of six stationary calibration cameras that project the three-dimensional positions of the players and the ball onto a two-dimensional top view of the pitch may be used. In another example, a mix of stationary and non-stationary cameras may be used to capture the motion of all players and one or more related objects on the field. Those skilled in the art will recognize that many different course camera views (e.g., high sideline views, penalty line views, focus views, dispute views, goal zone views, etc.) may be generated using such a camera system (e.g., camera system 102). In general, the camera system 102 may be used for broadcast feeds for a given contest. Each frame in the broadcast feed may be stored in a game file 110.
The camera system 102 may be configured to communicate with the organization computing system 104 via a network 105. The organization computing system 104 may be configured to manage and analyze broadcast feeds captured by the camera system 102. The organization computing system 104 may include at least a Web client application server 114, a data store 118, an automatic clip agent 120, a data set generator 122, a camera calibrator 124, a player tracking agent 126, and an interface agent 128. Each of the automatic clipping agent 120, the data set generator 122, the camera calibrator 124, the athlete tracking agent 126, and the interface agent 128 may be comprised of one or more software modules. The one or more software modules may be code or a collection of instructions stored on a medium (e.g., memory of the organization computing system 104) that represent a sequence of machine instructions (e.g., program code) that implement one or more algorithm steps. Such machine instructions may be actual computer code that the processor of the organization computing system 104 interprets to implement the instructions, or may be high-level code that interprets the instructions to obtain the actual computer code. The one or more software modules may also include one or more hardware components. One or more aspects of the example algorithm may be performed by a hardware component (e.g., a circuit) itself, rather than as a result of instructions.
The data store 118 may be configured to store one or more game files 124. Each game file 124 may include broadcast data for a given contest. For example, the broadcast data may be a plurality of video frames captured by the camera system 102.
The automatic clip agent 120 may be configured to parse the broadcast feed for a given contest to identify a unified view of the contest. In other words, the automatic clipping agent 120 may be configured to parse the broadcast feed to identify all frames of information captured from the same view. In one example, such as in the sport of basketball, the unified view may be a high sideline view. The automatic cut agent 120 may cut or segment a broadcast feed (e.g., video) into its component parts (e.g., different scenes in a movie, advertisements in a contest, etc.). To generate a unified view, the automatic clipping agent 120 may identify those portions that capture the same view (e.g., a high edge view). Accordingly, the automatic clip agent 120 may remove all (or part) of the untraceable portion of the broadcast feed (e.g., athlete feature-ups, advertisements, a midcourt performance, etc.). The unified view may be stored in a database as a trackable set of frames.
The dataset generator 122 may be configured to generate a plurality of datasets from the trackable frames. In some embodiments, the data set generator 122 may be configured to recognize body posture information. For example, the data set generator 122 may utilize body position information to detect athletes in trackable frames. In some embodiments, the data set generator 122 may be configured to further track the movement of the ball or puck in trackable frames. In some embodiments, the data set generator 122 may be configured to segment the playing field where the event is occurring to identify one or more markers of the playing field. For example, the data set generator 122 may be configured to identify court (e.g., basketball, tennis, etc.) tags, stadium (e.g., baseball, american football, soccer, rugby, etc.) tags, ice rink (e.g., hockey) tags, and so forth. The multiple data sets generated by the data set generator 122 may then be used by the camera calibrator 124 to calibrate the cameras of each camera system 102.
The camera calibrator 124 may be configured to calibrate the cameras of the camera system 102. For example, the camera calibrator 124 may be configured to project the athlete detected in the trackable frame to real-world coordinates for further analysis. The cameras in the camera system 102 are constantly moving to focus on the ball or critical motion, and therefore such cameras cannot be pre-calibrated. The camera calibrator 124 may be configured to refine or optimize the athlete projection parameters using a homography matrix.
The player tracking agent 126 may be configured to generate a track (track) for each player on the playing field. For example, the athlete tracking agent 126 may generate such traces using athlete gesture detection, camera calibration, and broadcast frames. In some embodiments, the athlete tracking agent 126 may also be configured to generate a track for each athlete even if, for example, the athlete is currently outside of trackable frames. For example, the athlete tracking agent 126 may utilize body position information to link athletes that have left a view frame.
The interface agent 128 may be configured to generate one or more graphical representations corresponding to each athlete's track generated by the athlete tracking agent 126. For example, the interface agent 128 may be configured to generate one or more Graphical User Interfaces (GUIs) that include a graphical representation that tracks each predicted athlete generated by the athlete tracking agent 126.
The client device 108 may communicate with the organization computing system 104 via the network 105. The client device 108 may be operated by a user. For example, the client device 108 may be a mobile device, a tablet computer, a desktop computer, or any computing system having the capabilities described herein. The user may include, but is not limited to, an individual, such as a subscriber, client, potential customer, or entity customer associated with the organization computing system 104, for example, an individual who has obtained, will obtain, or can obtain a product, service, or consultation from an entity associated with the organization computing system 104.
The client device 108 may include at least an application 132. Application 132 may represent a Web browser that allows access to a website or stand-alone application. The client device 108 may access the application 132 to access one or more functions of the organization computing system 104. Client devices 108 may communicate over network 105 to request Web pages, such as Web pages from Web client application servers 114 of organization computing system 104. For example, the client device 108 may be configured to execute an application 132 to access content managed by the Web client application server 114. Content displayed to the client device 108 may be transferred from the Web client application server 114 to the client device 108 and subsequently processed by the application 132 for display via a Graphical User Interface (GUI) of the client device 108.
FIG. 2 is a block diagram illustrating a computing environment 200 according to an example embodiment. As shown, the computing environment 200 includes an automatic clipping agent 120, a data set generator 122, a camera calibrator 124, and a player tracking agent 126 in communication via a network 205.
The network 205 may be of any suitable type, including individual connections via the internet, such as a cellular or Wi-Fi network. In some embodiments, the network 205 may use techniques such as Radio Frequency Identification (RFID), Near Field Communication (NFC), BluetoothTMBluetooth with low power consumptionTM(BLE)、Wi-FiTM、ZigBeeTMEnvironment backscatter communications (ABC) protocol, USB, WAN or LAN, etc. direct connections to connect terminals, services and mobile devices. Since the information transferred may be personalized or confidential, security issues may require that one or more of these types of connections be encrypted or otherwise secured. In some embodiments, however, the information transferred may not be so personalized, and thus the network connection may be selected for convenience rather than security.
The automatic clipping agent 120 may include a Principal Component Analysis (PCA) agent 202, a cluster model 204, and a neural network 206. As described above, the automatic clip agent 120 may be used to clip or segment a video into its component parts when attempting to understand and extract data from a broadcast feed. In some embodiments, the automatic clipping agent 120 may focus on separating a predefined unified view (e.g., a high edge view) from all other portions of the broadcast stream.
PCA agent 202 may be configured to perform each extraction of frame features from the broadcast feed using PCA analysis. For example, in view of a pre-recorded video, the PCA agent 202 may extract one frame every X seconds (e.g., every 10 seconds) to build a PCA model of the video. In some embodiments, the PCA agent 202 may generate the PCA model using incremental PCA, whereby the PCA agent 202 may select a previous subset of components (e.g., the first 120 components) to generate the PCA model. The PCA agent 202 may also be configured to extract one frame from the broadcast stream every X seconds (e.g., every second) and compress the frames using the PCA model. In some embodiments, PCA agent 202 may compress the frame into a 120-dimensional form using a PCA model. For example, PCA agent 202 may solve the principal components on a per-video basis and save the first 100 components of each frame to ensure accurate clipping.
Clustering model 204 may be configured to cluster the previous subset of components into clusters. For example, the clustering model 204 may be configured to center, normalize, and cluster the first 120 components into a plurality of clusters. In some embodiments, for clustering of compressed frames, the clustering model 204 may implement K-Means (K-Means) clustering. In some embodiments, the clustering model 204 may set k-9 clusters. K-means clustering attempts to obtain some data x ═ x1,x2,...,xnAnd divide it into k subsets, S ═ S1,S2,...SkAnd (4) optimizing as follows:
wherein mujIs set SjMean of the data in (1). In other words, clustering model 204 attempts to find clusters with the smallest inter-cluster variance using a K-means clustering technique. Clustering model 204 may label each frame with its respective cluster number (e.g., cluster 1, cluster 2.
The neural network 206 may be configured to classify each frame as trackable or untraceable. The trackable frames may represent frames that include an acquisition of a uniform view (e.g., a high edge view). The untrackable frames may represent frames that do not capture a unified view. To train the neural network 206, an input data set comprising thousands of frames, pre-labeled trackable or untraceable, run through the PCA model may be used. Each compressed frame and label pair (i.e., cluster number and trackable/untraceable) may be provided to the neural network 206 for training.
In some embodiments, the neural network 206 may include four layers. The four layers may include one input layer, two hidden layers, and one output layer. In some embodiments, the input layer may include 120 cells. In some embodiments, each hidden layer may include 240 cells. In some embodiments, the output layer may include 2 cells. The input layer and each hidden layer may use a sigmoid activation function. The output layer may use SoftMax activation functions. To train the neural network 206, the automatic clipping agent 120 may reduce (e.g., minimize) the prediction labels of the samples as followsWith the genuine label yjBinary cross entropy loss between:
accordingly, once trained, the neural network 206 may be configured to classify each frame as being untraceable or trackable. As such, each frame may have two tags: cluster number and trackable/untraceable classification. The automatic clip agent 120 may utilize these two tags to determine whether a given cluster is deemed trackable or non-trackable. For example, if automatic clip agent 120 determines that a threshold number of frames in a cluster are deemed trackable (e.g., 80%), automatic clip agent 120 may conclude that all frames in the cluster are trackable. Additionally, if automatic clip agent 120 determines that less than a threshold number of frames in a cluster are deemed untraceable (e.g., below 30%), automatic clip agent 120 may conclude that all frames in the cluster are untraceable. Still further, if automatic clip agent 120 determines that a particular number of frames in a cluster are deemed trackable (e.g., 30% to 80%), automatic clip agent 120 may request that an administrator further analyze the cluster. Once each frame is classified, automatic clipping agent 120 may clip or segment the trackable frames. The automatic clip agent 120 may store the segments of trackable frames in a database 205 associated therewith.
Data set generator 122 may be configured to generate a plurality of data sets from automatic clipping agent 120. As shown, data set generator 122 may include a gesture detector 212, a ball detector 214, and a field segmenter 216. The gesture detector 212 may be configured to detect athletes within the broadcast feed. Data set generator 122 may provide trackable frames stored in database 205 and broadcast video feeds as inputs to gesture detector 212. In some embodiments, gesture detector 212 may implement Open gestures (Open pos) to generate body gesture data to detect athletes in broadcast feeds and trackable frames. In some embodiments, the gesture detector 212 may implement sensors positioned on the athlete's body to capture body gesture information. In general, gesture detector 212 may obtain body gesture information from the broadcast video feed and trackable frames using any means. The output from gesture detector 212 may be gesture data stored in a database 215 associated with data set generator 122.
The ball detector 214 may be configured to detect and track a ball (or puck) within a broadcast feed. Data set generator 122 may provide trackable frames stored in database 205 and a broadcast video feed as inputs to ball detector 214. In some embodiments, the ball detector 214 may utilize a fast area convolutional neural network (Faster R-CNN) to detect and track balls in trackable frames and broadcast video feeds. The Faster R-CNN is a regional proposal network. The Faster R-CNN proposed regions of interest using a convolutional neural network and then classified the objects in each region of interest. Because it is a single unified network, the regions of interest and the classification steps can be modified from each other, allowing classification to handle objects of various sizes. The output from the ball detector 214 may be ball detection data stored in a database 215 associated with the data set generator 122.
The field segmenter 216 may be configured to identify field markers in the broadcast feed. The data set generator 122 may provide trackable frames stored in the database 205 and broadcast video feeds as inputs to the field segmenter 216. In some embodiments, the field segmenter 216 may be configured to identify field markers using a neural network. The output of the field segmenter 216 may be a field marker stored in a database 215 associated with the data set generator 122.
The camera calibrator 124 may be configured to address the issue of mobile camera calibration in athletic activities. The camera calibrator 124 may include a spatial transport network 224 and an optical flow module 226. The camera calibrator 124 may receive as input the segmented course information, trackable clip information, and pose information generated by the course segmenter 216. In view of such inputs, the camera calibrator 124 may be configured to project the coordinates in the image frames to real world coordinates for tracking analysis.
The keyframe matching module 224 may receive as input the output from the field segmenter 216 and a set of templates. For each frame, the keyframe matching module 224 may match the output from the field segmenter 216 to the template. Those frames that can match a given template are considered key frames. In some embodiments, the keyframe matching module 224 may implement a neural network to match one or more frames. In some embodiments, the keyframe matching module 224 may implement cross-correlation (cross-correlation) to match one or more frames.
The Spatial Transform Network (STN)224 may be configured to receive the identified keyframes as input from the keyframe matching module 224. The STN 224 may implement a neural network to fit the field model to the field segmentation information. By fitting the field model to such outputs, the STN 224 may generate a homography matrix for each keyframe.
The optical flow module 226 may be configured to recognize motion patterns of objects from one trackable frame to another. In some embodiments, the optical flow module 226 may receive as input trackable frame information and body pose information for the athlete in each trackable frame. The optical flow module 226 may use the body pose information to remove the athlete from the trackable frame information. Once removed, the optical flow module 226 may determine inter-frame motion to identify camera motion between successive frames. In other words, optical flow module 226 may identify flow fields from one frame to the next.
The optical flow module 226 and the STN 224 may work together to generate a homography matrix. For example, the optical flow module 226 and the STN 224 may generate a homography matrix for each trackable frame, such that the camera may be calibrated for each frame. The homography matrix may be used to project the athlete's track or location into real world coordinates. For example, the homography matrix may indicate a two-dimensional to two-dimensional transformation, which may be used to project the position of the athlete from image coordinates to real-world coordinates on the playing field.
The athlete tracking agent 126 may be configured to generate a track for each athlete in the competition. The athlete tracking agent 126 may include a neural network 232 and a re-identification agent 232. The athlete tracking agent 126 may receive as input trackable frames, pose data, calibration data, and broadcast video frames. In a first phase, the player tracking agent 126 may match player patch pairs that may be derived from pose information based on appearance and distance. For example, orderPatching an athlete for jth athlete at time tImage coordinates at time t for jth athleteWidth ofAnd heightWhereby the player tracking agent 126 may use appearance correlationAndany detection pairs are associated as follows:
where I is the bounding box position (x, y), width w, and height h; c is the cross-correlation between image patches (e.g., image cropping using bounding boxes), and measures the similarity between two image patches; and L is a measure of the difference (e.g., distance) between the two bounding boxes I.
Performing this operation for each pair may generate a large set of short patches. The endpoints of these tracklets (tracklets) can then be correlated based on motion consistency and color histogram similarity.
For example, let viFor the extrapolated velocity at the end of the ith trace, let vjThe extrapolated velocity at the beginning of the jth trace. Then c isij=vi·vjA movement consistency score may be represented. In addition, let p (h)iRepresenting the color likelihood, h is present in the image patch i. The player tracking agent 126 may use the Bhattacharyya Distance (Bhattacharyya Distance) to determine the color histogram similarity:
To reiterate, the tracking agent 120 finds matching track pairs as follows:
solving for each pair of interrupted patches may yield a set of net patches, while leaving some patches with large gaps (i.e., many frames). To connect large gaps, player tracking agents may augment the neighbor metrics to include motion field estimates, which may account for player directional changes that occur over many frames.
The motion field may be a vector field representing the velocity magnitude and direction as a vector at each location on the field. In view of the masses on the fieldKnown speeds of multiple players can be used to generate a complete motion field using cubic spline interpolation. For example, orderFor player i's pitch position at each time t. There may be a pair with a displacementA point of (a) if<And T. Accordingly, the playing field can be as follows:
where G (x,5) may be a Gaussian Kernel (Gaussian Kernel) with a standard deviation of approximately five feet. In other words, the motion field may be a Gaussian Blur (Gaussian Blur) of the total displacement.
The neural network 232 may be used to predict player trajectories in view of real player trajectories. In view of the set of real player trajectories, XiThe speed of each player at each frame can be calculated, which can provide a ground truth field for the neural network 232 to learn. For example, in view of a set of real player trajectories XiThe athlete tracking agent 126 may be configured to generate a collectionWhereinMay be a predicted motion field. For example, the neural network 232 may be trained such thatAnd (4) minimizing. The player trajectory agent may then generate a neighbor score for any tracking gap size λ as follows:
Kij=V(x,λ)·dij
The re-identification agent 234 may be configured to link players who have left the view frame. The re-identification agent 234 may include a trace generator 236, a conditional auto-encoder 240, and a siense network 242.
Trace generator 236 may be configured to generate a trace gallery. Trace generator 236 may receive a plurality of traces from database 205. For each trail X, an athlete identity tag y may be included, and for each athlete patch I, pose information p may be provided by the pose detection phase. In view of the set of athlete traces, trace generator 236 may build a gallery for each trace in which the athlete's uniform number (or some other static feature) is always visible. The body position information generated by data set generator 122 allows track generator 236 to determine the position of the athlete. For example, trail generator 236 may utilize heuristics, where normalized shoulder widths may be used to determine position as follows:
where l may represent the position of a body part. The shoulder width may be normalized by torso length, which may eliminate the scaling effect. The shoulders should be separated when the athlete is facing the camera or facing away from the camera, so trace generator 236 can use those SorientPatches greater than a threshold value to build the gallery. After this stage, each trace XnThe following gallery may be included:
conditional autoencoder 240 may be configured to identify one or more features in each trace. For example, unlike conventional methods of re-identifying problems, athletes in team sports may have very similar appearance characteristics, such as clothing style, clothing color, and skin tone. One of the more intuitive differences may be the uniform numbers that may be displayed on the front and/or back of each uniform. To capture these specific features, the conditional autoencoder 240 may be trained to recognize such features.
In some embodiments, the conditional auto-encoder 240 may be a triple-layer convolutional auto-encoder (convolutional auto-encoder), wherein the kernel size of all three layers may be 3 × 3, where there are 64, 128, and 128 channels, respectively. These hyper-parameters may be tuned to ensure that uniform numbers are discernable from the reconstructed image so that the desired features can be learned in the self-encoder. In some embodiments, f (I)i) Can be used to represent features learned from image i.
The use of conditional auto-encoder 240 may improve upon conventional processes for a variety of reasons. First, there is often insufficient training data for each player to use, as some players only spend a short time on the game. Second, different teams may have the same uniform color and uniform number, and thus it may be difficult to classify the players.
The sievase network 242 may be used to determine the similarity between two image patches. For example, the siemese network 242 may be trained to determine the similarity between two image patches based on their feature representation f (i). In view of the two image patches, their features represent f (I)i) And f (I)j) May be flattened, connected and input into a perceptual network. In some embodiments, L2The norm can be used to connect two sub-networks f (I)i) And f (I)j). In some embodiments, the perceptual network may include three layers, which may include 1024, 512, and 216 hidden units, respectively. Such a network can be used to determine the similarity s (I) between each pair of image patches of two traces without temporal overlapi,Ij). To enhance the robustness of the prediction, the final similarity score of the two traces may be the average of all the pairwise scores in their respective galleries:
the similarity score may be calculated for every two traces that do not have temporal overlap. If the score is above some threshold, the two traces may be associated.
Fig. 3 is a block diagram 300 illustrating operational aspects discussed in context with fig. 2 and 4-10 in accordance with an example embodiment. Block diagram 300 may illustrate the overall workflow of the organization computing system 104 in generating athlete tracking information. Block diagram 300 may include multiple sets of operations 302 through 308. A set of operations 302 may involve generating a trackable frame (e.g., method 500 in fig. 5). A set of operations 304 may involve generating one or more datasets (e.g., operations performed by dataset generator 122) from trackable frames. A set of operations 306 may involve camera calibration operations (e.g., method 700 in fig. 7). A set of operations 308 may be directed to generating and predicting an athlete track (e.g., method 900 in fig. 9 and method 1000 in fig. 10).
FIG. 4 is a flow diagram illustrating a method 400 of generating an athlete track, according to an example embodiment. Method 400 may begin at step 402.
At step 402, the organization computing system 104 may receive (or retrieve) a broadcast feed of an event. In some embodiments, the broadcast feed may be a live feed received in real-time (or near real-time) from the camera system 102. In some embodiments, the broadcast feed may be a broadcast feed of a game that has ended. In general, a broadcast feed may include multiple frames of video data. Each frame may capture a different camera view.
At step 404, the organization computing system 104 may segment the broadcast feed into a unified view. For example, the automatic clip agent 120 may be configured to parse a plurality of data frames in a broadcast feed to segment trackable frames from non-trackable frames. In general, trackable frames may include those frames that point to a unified view. For example, the unified view may be considered a high edge view. In other examples, the unified view may be a goal zone view. In other examples, the unified view may be a top camera view.
At step 406, the organization computing system 104 may generate a plurality of data sets from the trackable frames (i.e., the unified view). For example, data set generator 122 may be configured to generate multiple data sets based on trackable clips received from automatic clip agent 120. In some embodiments, gesture detector 212 may be configured to detect athletes within a broadcast feed. Data set generator 122 may provide trackable frames stored in database 205 and broadcast video feeds as inputs to gesture detector 212. The output from gesture detector 212 may be gesture data stored in a database 215 associated with data set generator 122.
The ball detector 214 may be configured to detect and track a ball (or puck) within a broadcast feed. Data set generator 122 may provide trackable frames stored in database 205 and a broadcast video feed as inputs to ball detector 214. In some embodiments, ball detector 214 may utilize Faster R-CNN to detect and track balls in trackable frames and broadcast video feeds. The output from the ball detector 214 may be ball detection data stored in a database 215 associated with the data set generator 122.
The field segmenter 216 may be configured to identify field markers in the broadcast feed. The data set generator 122 may provide trackable frames stored in the database 205 and broadcast video feeds as inputs to the field segmenter 216. In some embodiments, the field segmenter 216 may be configured to identify field markers using a neural network. The output of the field segmenter 216 may be a field marker stored in a database 215 associated with the data set generator 122.
Accordingly, the data set generator 120 may generate information for player positions, ball positions, and course sections in all trackable frames for further analysis.
At step 408, the organization computing system 104 may calibrate the camera in each trackable frame based on the data set generated in step 406. For example, the camera calibrator 124 may be configured to calibrate the camera in each trackable frame by generating a homography matrix using the trackable frame and the body pose information. The homography matrix allows the camera calibrator 124 to acquire those trajectories for each athlete in a given frame and project those trajectories into real-world coordinates. By projecting the athlete's position and trajectory into real world coordinates for each frame, the camera calibrator 124 may ensure that the camera is calibrated for each frame.
At step 410, the organization computing system 104 may be configured to generate or predict a trail for each athlete. For example, the athlete tracking agent 126 may be configured to generate or predict a trail for each athlete in the competition. The athlete tracking agent 126 may receive as input trackable frames, pose data, calibration data, and broadcast video frames. Using such input, the athlete tracking agent 126 may be configured to construct an athlete's motion throughout a given competition. Additionally, the athlete tracking agent 126 may be configured to predict an athlete's trajectory in view of each athlete's previous movements.
Fig. 5 is a flow diagram illustrating a method 500 of generating a trackable frame according to an example embodiment. The method 500 may correspond to operation 404 discussed above in connection with fig. 4. Method 500 may begin at step 502.
At step 502, the organization computing system 104 may receive (or retrieve) a broadcast feed of an event. In some embodiments, the broadcast feed may be a live feed received in real-time (or near real-time) from the camera system 102. In some embodiments, the broadcast feed may be a broadcast feed of a game that has ended. In general, a broadcast feed may include multiple frames of video data. Each frame may capture a different camera view.
At step 504, the tissue computing system 104 may generate a set of frames for image classification. For example, the automatic clipping agent 120 may utilize PCA analysis to perform frame-by-frame feature extraction from the broadcast feed. For example, in view of a pre-recorded video, the automatic clipping agent 120 may extract one frame every X seconds (e.g., every 10 seconds) to build a PCA model of the video. In some embodiments, the automatic clipping agent 120 may generate the PCA model using incremental PCA, whereby the automatic clipping agent 120 may select a previous subset of components (e.g., the first 120 components) to generate the PCA model. The auto-clip agent 120 may also be configured to extract one frame from the broadcast stream every X seconds (e.g., every second) and compress the frames using a PCA model. In some embodiments, the automatic clipping agent 120 may compress the frames into a 120-dimensional form using a PCA model. For example, automatic clipping agent 120 may solve the principal components on a per video basis and save the first 100 components per frame to ensure accurate clipping. Such a compressed subset of frames may be considered as a set of frames for image classification. In other words, the PCA model may be used to compress each frame into a small vector, thereby enabling more efficient clustering of frames. Compression may be performed by selecting the first N components from the PCA model to represent the frame. In some examples, N may be 100.
At step 506, the organization computing system 104 may assign each frame in the set of frames to a given cluster. For example, automatic clipping agent 120 may be configured to center, normalize, and cluster the first 120 components into multiple clusters. In some embodiments, for clustering of compressed frames, automatic cut agent 120 may implement K-means clustering. In some embodiments, automatic clip agent 120 may set k-9 clusters. K-means clustering attempts to obtain some data x ═ x1,x2,...,xnAnd divide it into k subsets, S ═ S1,S2,...SkAnd (4) optimizing as follows:
wherein mujIs set SjMean of the data in (1). In other words, clustering model 204 attempts to find clusters with the smallest inter-cluster variance using a K-means clustering technique. Clustering model 204 may label each frame with its respective cluster number (e.g., cluster 1, cluster 2.
At step 508, the organization computing system 104 may classify each frame as trackable or non-trackable. For example, automatic clipping agent 120 may utilize a neural network to classify each frame as trackable or untraceable. The trackable frames may represent frames that include an acquisition of a uniform view (e.g., a high edge view). The untrackable frames may represent frames that do not capture a unified view. To train a neural network (e.g., neural network 206), an input data set comprising thousands of frames pre-labeled trackable or untraceable run through a PCA model may be used. Each compressed frame and label pair (i.e., cluster number and trackable/untraceable) may be provided to a neural network for training. Accordingly, once trained, automatic cut agent 120 may classify each frame as being untraceable or trackable. As such, each frame may have two tags: cluster number and trackable/untraceable classification.
At step 510, the organization computing system 104 may compare each cluster to a threshold. For example, automatic clip agent 120 may utilize both tags to determine whether a given cluster is deemed trackable or untraceable. In some embodiments, if automatic clip agent 120 determines that a threshold number of frames in a cluster are deemed trackable (e.g., 80%), automatic clip agent 120 may conclude that all frames in the cluster are trackable. Additionally, if automatic clip agent 120 determines that less than a threshold number of frames in a cluster are deemed untraceable (e.g., below 30%), automatic clip agent 120 may conclude that all frames in the cluster are untraceable. Still further, if automatic clip agent 120 determines that a particular number of frames in a cluster are deemed trackable (e.g., 30% to 80%), automatic clip agent 120 may request that an administrator further analyze the cluster.
If organization computing system 104 determines that more than a threshold number of frames in a cluster are trackable in step 510, automatic clip agent 120 may classify the cluster as trackable in step 512.
However, if organization computing system 104 determines that less than a threshold number of frames in a cluster are trackable in step 510, automatic clip agent 120 may classify the cluster as non-trackable in step 514.
Fig. 6 is a block diagram 600 illustrating aspects of the operations discussed above in connection with the method 500 according to an example embodiment. As shown, block diagram 600 may include multiple sets of operations 602-608.
At a set of operations 602, video data (e.g., broadcast video) may be provided to the automatic clip agent 120. The automatic clip agent 120 may extract frames from the video. In some embodiments, the automatic cut agent 120 may extract frames from the video at a low frame rate. The automatic clipping agent may use an incremental PCA algorithm to select the first 120 components (e.g., frames) from the set of frames extracted by automatic clipping agent 120. Such an operation may generate a video-specific PCA model.
At a set of operations 604, video data (e.g., broadcast video) may be provided to the automatic clip agent 120. The automatic clip agent 120 may extract frames from the video. In some embodiments, the automatic cut agent 120 may extract frames from the video at a medium frame rate. The automatic clipping agent 120 may use a video-specific PCA model to compress the frames extracted by the automatic clipping agent 120.
At a set of operations 606, the compressed frames and a preselected number of desired clusters may be provided to automatic clip agent 120. Automatic clip agent 120 may utilize a K-means clustering technique to group frames into one or more clusters, as described by a preselected number of desired clusters. The automatic clip agent 120 may assign a cluster label to each compressed frame. The automatic clipping agent 120 may also be configured to classify each frame as trackable or non-trackable. The automatic clip agent 120 may label each respective frame as such.
At a set of operations 608, automatic clip agent 120 may analyze each cluster to determine whether the cluster includes at least a threshold number of trackable frames. For example, as shown, if 80% of the frames in a cluster are classified as trackable, automatic clip agent 120 may consider the entire cluster as trackable. However, if less than 80% of the frames in the cluster are classified as trackable, the automatic clip agent may determine whether at least a second threshold number of frames in the cluster are trackable. For example, if 70% of the frames in a cluster are classified as untraceable, automatic clip agent 120 may consider the entire cluster as trackable. However, if less than 70% of the frames in the cluster are classified as untraceable, i.e., 30% to 70% trackable, then manual annotation may be requested.
Fig. 7 is a flow diagram illustrating a method 700 of calibrating a camera for each trackable frame, according to an example embodiment. The method 700 may correspond to operation 408 discussed above in connection with fig. 4. The method 700 may begin at step 702.
At step 702, the organization computing system 104 may retrieve the video data and gesture data for analysis. For example, the camera calibrator 124 may retrieve trackable frames for a given race and the athlete's pose data in each trackable frame from the database 205. Following step 702, the camera calibrator 124 may perform two parallel processes to generate a homography matrix for each frame. Accordingly, the following operations are not intended to be discussed as being performed sequentially, but rather may be performed in parallel or sequentially.
At step 704, the organization computing system 104 may remove the athlete from each trackable frame. For example, the camera calibrator 124 may parse each trackable frame retrieved from the database 205 to identify one or more athletes contained therein. The camera calibrator 124 may remove the athlete from each trackable frame using pose data retrieved from the database 205. For example, the camera calibrator 124 may identify those pixels corresponding to the gesture data and remove the identified pixels from a given trackable frame.
At step 706, the tissue computing system 104 may identify motion of objects (e.g., surfaces, edges, etc.) between consecutive trackable frames. For example, the camera calibrator 124 may analyze successive trackable frames, removing athletes therefrom, to determine the motion of an object from one frame to the next. In other words, optical flow module 226 can identify flow fields between successive trackable frames.
At step 708, the organization computing system 104 may match the output from the venue divider 216 to a set of templates. For example, the camera calibrator 124 may match one or more frames of the field image that are sharp to one or more templates. The camera calibrator 124 may parse the set of trackable clips to identify those clips that provide clear playing field pictures and indicia therein. Based on the selected clip, the camera calibrator 124 may compare such images to the playing field template. Each template may represent a different camera perspective of the playing field. Those frames that can match a given template are considered key frames. In some embodiments, the camera calibrator 124 may implement a neural network to match one or more frames. In some embodiments, the camera calibrator 124 may perform cross-correlation to match one or more frames.
At step 710, the organization computing system 104 may fit a playing field model to each keyframe. For example, the camera calibrator 124 may be configured to receive the identified keyframes as input. The camera calibrator 124 may implement a neural network to fit the course model to the course segmentation information. By fitting the field model to such outputs, the camera calibrator 124 may generate a homography matrix for each keyframe.
At step 712, the organization computing system 104 may generate a homography matrix for each trackable frame. For example, the camera calibrator 124 may generate a homography matrix for each frame using the flow field identified in step 706 and the homography matrix for each keyframe. The homography matrix may be used to project the athlete's track or location into real world coordinates. For example, the camera calibrator 124 may use its transformation to project the position of the athlete on the image to real-world coordinates on the playing field in view of the geometric transformation represented by the homography matrix.
At step 714, the organization computing system 104 may calibrate each camera based on the homography matrix.
Fig. 8 is a block diagram 800 illustrating aspects of the operations discussed above in connection with the method 700 in accordance with an example embodiment. As shown, block diagram 800 may include an input 802, a first set of operations 804, and a second set of operations 806. The first set of operations 804 and the second set of operations 806 may be performed in parallel.
Input 802 may include video clip 808 and gesture detection 810. In some embodiments, video clip 808 may correspond to trackable frames generated by automatic clip agent 120. In some embodiments, gesture detection 810 may correspond to gesture data generated by gesture detector 212. As shown, only video clips 808 may be provided as input to the first set of operations 804; both the video clip 804 and the post detection 810 may be provided as inputs to the second set of operations 806.
The first set of operations 804 may include semantic segmentation 812, key frame matching 814, and STN fitting 816. In semantic segmentation 812, the course segmenter 216 may be configured to identify course markers in the broadcast feed. In some embodiments, the field segmenter 216 may be configured to identify field markers using a neural network. Such segmentation information may be performed in advance and provided to the camera calibration 124 from the database 215. At key frame matching 814, the key frame matching module 224 may be configured to match one or more frames of the stadium image with one or more templates. At STN fitting 816, the STN 226 may implement a neural network to fit the field model to the field segmentation information. By fitting the field model to such outputs, the STN 224 may generate a homography matrix for each keyframe.
The second set of operations 806 may include camera motion estimation 818. In camera flow estimation 818, the optical flow module 226 may be configured to identify motion patterns of objects from one trackable frame to another. For example, the optical flow module 226 may use the body pose information to remove the athlete from the trackable frame information. Once removed, the optical flow module 226 may determine inter-frame motion to identify camera motion between successive frames.
The first set of operations 804 and the second set of operations 806 may result in homography interpolation 816. The optical flow module 226 and the STN 224 may work together to generate a homography matrix for each trackable frame so that the camera may be calibrated for each frame. The homography matrix may be used to project the athlete's track or location into real world coordinates.
Fig. 9 is a flow chart illustrating a method 900 of tracking an athlete according to an example embodiment. The method 900 may correspond to operation 410 discussed above in connection with fig. 4. Method 900 may begin at step 902.
At step 902, the organization computing system 104 may retrieve a plurality of trackable frames for matching. Each of the plurality of trackable frames may include one or more sets of metadata associated therewith. Such metadata may include, for example, body posture information and camera calibration data. In some embodiments, the athlete tracking agent 126 may also retrieve broadcast video data.
At step 904, the organization computing system 104 may generate a set of short patches. For example, the player tracking agent 126 may match player patch pairs, which may be derived from pose information, based on appearance and distance to generate a set of short patches. For example, orderPatching an athlete for jth athlete at time tImage coordinates at time t for jth athleteWidth ofAnd heightWhereby the player tracking agent 126 may use appearance correlationAndany detection pairs are associated as follows:
performing this operation for each pair may generate a set of short patches. The endpoints of these patches may then be correlated based on motion consistency and color histogram similarity.
For example, let viFor the extrapolated velocity at the end of the ith trace, let vjThe extrapolated velocity at the beginning of the jth trace. Then c isij=vi·vjA movement consistency score may be represented. In addition, let p (h)iRepresenting the color likelihood, h is present in the image patch i. The athlete tracking agent 126 may use the babbitt distance to determine color histogram similarity:
At step 906, the organization computing system 104 may connect the gaps between each set of short patches. For example, reiterating, the tracking agent 120 finds matching track pairs as follows:
solving for each pair of interrupted patches may yield a set of net patches, while leaving some patches with large gaps (i.e., many frames). To connect large gaps, player tracking agent 126 may augment the neighborhood metrics to include a motion field estimate, which may account for player directional changes that occur over many frames.
A playing field may be a vector field that determines the direction in which a player will be at a point on the playing field x after a period of time λ. For example, orderFor player i's pitch position at each time t. There may be a pair with a displacementA point of (a) if<And T. Accordingly, the playing field can be as follows:
where G (x,5) may be a gaussian kernel with a standard deviation equal to approximately five feet. In other words, the motion field may be a gaussian blur of the total displacement.
At step 908, the organization computing system 104 may predict the player's movements based on the motion field. For example, the player tracking system 126 may use a neural network (e.g., the neural network 232) to predict player trajectories in view of ground truth player trajectories. Considering set X of real player trajectoriesiThe athlete tracking agent 126 may be configured to generate a collectionWhereinMay be a predicted motion field. The athlete tracking agent 126 may train the neural network 232 to reduce (e.g., minimize)The athlete tracking agent 126 may then generate a neighbor score for any tracking gap size λ as follows:
Kij=V(x,λ)·dij
whereinIs the displacement vector between all the interrupt traces with a gap size of lambda. Accordingly, the athlete tracking agent 126 may solve the matching pairs. For example, in view of the affinity scores, the athlete tracking agent 126 may assign a missing piece for each pair of breaks using the Hungarian Algorithm (Hungarian Algorithm). The Hungarian algorithm (e.g., Kuhn-Munkres) can optimize the set of best matches under the constraint that each pair is to be matched.
At step 910, the organization computing system 104 may output a graphical representation of the prediction. For example, the interface agent 128 may be configured to generate one or more graphical representations corresponding to each athlete's track generated by the athlete tracking agent 126. For example, the interface agent 128 may be configured to generate one or more Graphical User Interfaces (GUIs) that include a graphical representation that tracks each predicted athlete generated by the athlete tracking agent 126.
In some cases, during the course of a competition, the athlete or player tends to wander outside the field of view of the camera. Such problems may occur during injuries, athlete fatigue, quick mistakes, quick change of attack and defense, etc. Accordingly, the athlete in the first trackable frame may no longer be in the successive second or third trackable frame. The athlete tracking agent 126 may address this issue via the re-identification agent 234.
Fig. 10 is a flow diagram illustrating a method 1000 of tracking an athlete according to an example embodiment. The method 1000 may correspond to operation 410 discussed above in connection with fig. 4. Method 1000 may begin at step 1002.
At step 1002, the organization computing system 104 may retrieve a plurality of trackable frames for matching. Each of the plurality of trackable frames may include one or more sets of metadata associated therewith. Such metadata may include, for example, body posture information and camera calibration data. In some embodiments, the athlete tracking agent 126 may also retrieve broadcast video data.
At step 1004, the tissue computing system 104 may identify a short subset of traces that the athlete has left the camera's line of sight. Each track may include a plurality of image patches associated with at least one athlete. An image patch may refer to a corresponding subset of frames of a plurality of trackable frames. In some embodiments, each track X may include an athlete identity tag y. In some embodiments, each athlete patch I in a given track X may include pose information generated by data set generator 122. For example, in view of the input video, gesture detection, and trackable frames, the re-recognition agent 234 may generate a trace set that includes several short interruption traces of the athlete.
At step 1006, the organization computing system 104 may generate a gallery for each trace. For example, in view of those small traces, the re-identification agent 234 may build a gallery for each trace. The re-identification agent 234 may build a gallery for each track in which the player's uniform number (or some other static feature) is always visible. The body position information generated by the data set generator 122 allows the re-recognition agent 234 to determine the position of each athlete. For example, the re-recognition agent 234 may utilize heuristics that may use a normalized shoulder width to determine position as follows:
where l may represent a bit of a body partAnd (4) placing. The shoulder width may be normalized by torso length, which may eliminate the scaling effect. The two shoulders should be separated when the athlete is facing the camera or facing away from the camera, so the re-recognition agent 234 can use those SorientPatches greater than a threshold value to build the gallery. Accordingly, each trace XnThe following gallery may be included:
at step 1008, the tissue computing system 104 may match the traces using a convolutional auto-encoder. For example, the re-identification agent 234 may identify one or more features in each trace using a conditional self-encoder (e.g., the conditional self-encoder 240). For example, unlike conventional methods of re-identifying problems, athletes in team sports may have very similar appearance characteristics, such as clothing style, clothing color, and skin tone. One of the more intuitive differences may be the uniform numbers that may be displayed on the front and/or back of each uniform. To capture these specific features, the re-recognition agent 234 may train a conditional autoencoder to recognize such features.
In some embodiments, the conditional self-encoder may be a triple-layered convolutional self-encoder, wherein the kernel size of all three layers may be 3 × 3, where there are 64, 128 channels, respectively. These hyper-parameters may be tuned to ensure that uniform numbers are discernable from the reconstructed image so that the desired features can be learned in the self-encoder. In some embodiments, f (I)i) Can be used to represent features learned from image i.
Using a particular example, the re-identification agent 234 may identify a first trail corresponding to a first athlete. Using the conditional auto-encoder 240, the re-identification agent 234 may learn a first set of team service features associated with the first trace, for example, based on a first set of image patches that include or are associated with the first trace. The re-identification agent 234 may also identify a second trail that may initially correspond to a second athlete. Using the conditional auto-encoder 240, the re-identification agent 234 may learn a second set of service features associated with the second trace, for example, based on a second set of image patches that include or are associated with the second trace.
At step 1010, the organization computing system 104 may use a siense network to determine the similarity between matching traces. For example, the re-recognition agent 234 may train a siense network (e.g., siense network 242) to determine the similarity between two image patches based on their feature representations f (i). In view of the two image patches, their features represent f (I)i) And f (I)j) Can be flattened, connected and fed into the perceptron network. In some embodiments, L2The norm can be used to connect two sub-networks f (I)i) And f (I)j). In some embodiments, the perceptual network may include three layers, including 1024, 512, and 216 hidden units, respectively. Such a network can be used to determine the similarity s (I) between each pair of image patches of two traces without temporal overlapi,Ij). To enhance the robustness of the prediction, the final similarity score of the two traces may be the average of all the pairwise scores in their respective galleries:
continuing with the above example, the re-recognition agent 234 may utilize the siemese network 242 to calculate a similarity score between the first learned uniform feature set and the second learned uniform feature set.
At step 1012, if the similarity score for the traces is above a predetermined threshold, organization computing system 104 may associate the traces. For example, the re-identification agent 234 may calculate a similarity score for every two traces that do not have temporal overlap. If the score is above a certain threshold, the re-identification agent 234 may associate the two traces.
Continuing with the above example, if the similarity score generated, for example, by the siernese network 242 is at least above the threshold, the re-identification agent 234 may associate the first trace with the second trace. Assuming that the similarity score is above the threshold, the re-identification agent 234 may determine that the first athlete in the first track and the second athlete in the second track are indeed the same person.
Fig. 11 is a block diagram 1100 illustrating operational aspects discussed above in connection with the method 1000 in accordance with an example embodiment.
As shown, block diagram 1100 may include input video 1102, gesture detection 1104, athlete tracking 1106, trace collection 1108, gallery setup and pairing 1110, and trace connection 1112. Block diagram 1100 illustrates the general pipeline of method 1000 provided above.
In view of the input video 1102, the gesture detection information 1104 (e.g., generated by the gesture detector 212), and the athlete tracking information 1106 (e.g., generated by one or more of the athlete tracking agent 126, the automatic clipping agent 120, and the camera calibrator 124), the re-recognition agent 234 may generate a trace set 1108. Each trace set 1108 may include a plurality of short interruption traces (e.g., trace 1114) for the athlete. Each trace 1114 may include one or more image patches 1116 contained therein. In view of traces 1114, re-identification agent 234 may generate gallery 1110 for each trace. For example, gallery 1110 may include those image patches 1118 in a given track, which image patches 1118 include images of athletes whose position meets a threshold. In other words, the re-identification agent 234 may generate the gallery 1110 for each trace 1114 that includes the image patch 1118 for each athlete so that the number for that athlete may be visible in each frame. Image patch 1118 may be a subset of image patch 1116. The re-identification agent 234 may re-pair each frame to calculate a similarity score via the siense network. For example, as shown, the re-identification agent 234 may match the first frame from trace II with the second frame from trace I and feed these frames to the siense network.
The re-identification agent 234 may again connect traces 1112 based on the similarity scores. For example, if the similarity score of two frames exceeds a certain threshold, the re-identification agent 234 may connect or associate those traces.
Fig. 12 is a block diagram illustrating an architecture 1200 of the siemese network 242 re-identifying the agent 234 according to an example embodiment. As shown, the Siamese network 242 may include two subnetworks 1202, 1204 and one perceptive network 1205.
Each of the two subnetworks 1202, 1204 may be similarly configured. For example, the sub-network 1202 may include a first convolutional layer 1206, a second convolutional layer 1208, and a third convolutional layer 1210. The first subnetwork 1202 may receive player patch I1Patch I from player as input and output1Set of features (denoted as f (I))1)). Sub-network 1204 may include a first convolutional layer 1216, a second convolutional layer 1218, and a third convolutional layer 1220. The second sub-network 1204 may receive player patch I2As input and possibly output patch I from the athlete2Set of features (denoted as f (I))2)). The output from sub-networks 1202 and 1204 may be respective player patches I1、I2Is encoded to represent. In some embodiments, the output from sub-network 1202 and sub-network 1204 may be followed by a flattening operation, which may generate respective feature vectors f (I) respectively1) And f (I)2). In some embodiments, each feature vector f (I)1) And f (I)2) May include 10240 cells. In some embodiments, the L2 norm is f (I)1) And f (I)2) May be calculated and used as input to the perceptual network 1205.
Fig. 13A illustrates a system bus of a computing system architecture 1300, according to an example embodiment. System 1300 can represent at least a portion of organization computing system 104. One or more components of the system 1300 may be in electrical communication with each other using a bus 1305. The system 1300 may include a processing unit (CPU or processor) 1310 and a system bus 1305 that couples various system components including a system memory 1315, such as a Read Only Memory (ROM)1320 and a Random Access Memory (RAM)1325, to the processor 1310. System 1300 may include a cache directly connected to, in close proximity to processor 1310, or a high-speed memory integrated as part of processor 1310. The system 1300 may copy data from the memory 1315 and/or storage 1330 to the cache 1312 for quick access by the processor 1310. In this manner, cache 1312 may provide a performance boost that avoids processor 1310 delaying while waiting for data. These and other modules may control or be configured to control processor 1310 to perform various actions. Other system memories 1315 may also be used. The memory 1315 may include a number of different types of memory having different performance characteristics. Processors 1310 may include any general purpose processor and hardware or software modules, such as service I1332, service II 1334, and service III 1336 (configured to control processor 1310) and special purpose processors, stored in storage device 1330, with software instructions incorporated into the actual processor design. The processor 1310 may be essentially an entirely separate computing system containing multiple cores or processors, buses, memory controllers, caches, and the like. The multi-core processor may be symmetric or asymmetric.
To enable a user to interact with computing device 1300, input device 1345 may represent any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, a keyboard, a mouse, motion input, speech, or the like. Output device 1335 may also be one or more of a variety of output mechanisms known to those skilled in the art. In some cases, the multimodal system may enable a user to provide multiple types of input to communicate with the computing device 1300. Communication interface 1340 may generally govern and manage user input and system output. No limitations are intended to the operation of any particular hardware arrangement, so that the essential features of the disclosure may be readily replaced by improved hardware or firmware arrangements that are still under development.
The storage device 1330 may be a non-volatile memory, a hard disk or other type of computer-readable medium that can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, magnetic cassettes, Random Access Memory (RAM)1325, Read Only Memory (ROM)1320, and hybrids thereof.
Fig. 13B illustrates a computer system 1350 having a chipset architecture that may represent at least a portion of the organization computing system 104. Computer system 1350 may be examples of computer hardware, software, and firmware usable to implement the techniques of this disclosure. System 1350 may include a processor 1355, which represents any number of physically and/or logically distinct resources capable of executing software, firmware, and hardware configured to perform the identified calculations. The processor 1355 may communicate with a chipset 1360, which may control inputs and outputs of the processor 1355. In this example, the chipset 1360 outputs information to an output 1365, such as a display, and may read and write information to a storage device 1370, which storage device 1370 may include, for example, magnetic media and solid state media. The chipset 1360 may also read data from, and write data to, the RAM 1375. A bridge 1380 for interfacing with various user interface components 1385 may be provided for interfacing with the chipset 1360. Such user interface components 1385 may include a keyboard, microphone, touch detection processing circuitry, a pointing device such as a mouse, and the like. In general, the inputs to system 1350 can come from any of a variety of sources, machine-generated and/or manually-generated.
It may be appreciated that example systems 1300 and 1350 may have more than one processor 1310 or be part of a group or cluster of computing devices networked together to provide higher processing capabilities.
While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof. For example, aspects of the present disclosure may be implemented in hardware or software, or a combination of hardware and software. One embodiment described in this disclosure may be implemented as a program product for use with a computer system. The program of the program product defines functions of the embodiments (including the methods described in this disclosure) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory (ROM) devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile memory) on which information is permanently stored; (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access memory) on which alterable information is stored. Such computer-readable storage media, when carrying computer-readable instructions that direct the functions of embodiments of the present disclosure, become embodiments of the present disclosure.
Those skilled in the art will appreciate that the foregoing examples are illustrative only and not limiting. It will be apparent to those skilled in the art upon reading this specification and studying the drawings that all substitutions, enhancements, equivalents and modifications are intended to be within the true spirit and scope of the disclosure. Accordingly, the appended claims are intended to cover all such modifications, permutations, and equivalents as fall within the true spirit and scope of the teachings of this disclosure.
Claims (20)
1. A method of generating trackable frames from a broadcast video feed, comprising:
retrieving, by a computing system, a broadcast video feed of a sporting event, the broadcast video feed comprising a plurality of video frames;
generating, by the computing system, a set of frames for classification using a principal component analysis model, wherein the set of frames is a subset of the plurality of video frames;
dividing, by the computing system, each frame in the set of frames into a plurality of clusters;
classifying, by the computing system, each frame of the plurality of frames as trackable or non-trackable, wherein a trackable frame captures a unified view of the sporting event;
comparing, by the computing system, each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames; and
classifying, by the computing system, each cluster comprising at least the threshold number of trackable frames as trackable.
2. The method of claim 1, wherein generating, by the computing system, a set of frames for classification using a principal component analysis model comprises:
extracting one frame from the plurality of video frames at selected time intervals to generate the principal component analysis model of the broadcast video feed;
identifying a subset of frames from the extracted frames via the principal component analysis model; and
using the subset of frames as the set of frames for classification.
3. The method of claim 2, wherein dividing, by the computing system, each frame in the set of frames into a plurality of clusters comprises:
each frame in the subset of frames is labeled with a corresponding cluster number.
4. The method of claim 3, wherein classifying, by the computing system, each frame of the plurality of frames as trackable or untrackable comprises:
a neural network is trained using a training set comprising a plurality of video frames and a label associated with each video frame to classify the video frames as trackable or untraceable, wherein the label is trackable or untraceable.
5. The method of claim 4, wherein each frame of the plurality of frames comprises a trackable/untraceable classification and an associated cluster number.
6. The method of claim 5, wherein comparing, by the computing system, each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames comprises:
identifying each frame corresponding to a given cluster label; and
a number of frames corresponding to the given cluster label containing trackable classifications is determined.
7. The method of claim 1, further comprising:
each cluster including at least a threshold number of trackable frames is stored in a data store.
8. A system for re-identifying athletes in a broadcast video feed, comprising:
a processor; and
a memory having programming instructions stored thereon that, when executed by the processor, perform one or more operations comprising:
retrieving a broadcast video feed of a sporting event, the broadcast video feed comprising a plurality of video frames;
generating a set of frames for classification using a principal component analysis model, wherein the set of frames is a subset of the plurality of video frames;
dividing each frame in a set of frames into a plurality of clusters;
classifying each frame of the plurality of frames as trackable or non-trackable, wherein a trackable frame captures a unified view of the sporting event;
comparing each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames; and
classifying each cluster comprising at least the threshold number of trackable frames as trackable.
9. The system of claim 8, wherein generating a set of frames for classification using a principal component analysis model comprises:
extracting one frame from the plurality of video frames at selected time intervals to generate the principal component analysis model of the broadcast video feed;
identifying a subset of frames from the extracted frames via the principal component analysis model; and
using the subset of frames as the set of frames for classification.
10. The system of claim 9, wherein dividing each frame in a set of frames into a plurality of clusters comprises:
each frame in the subset of frames is labeled with a corresponding cluster number.
11. The system of claim 10, wherein classifying each frame of the plurality of frames as trackable or non-trackable comprises:
a neural network is trained using a training set comprising a plurality of video frames and a label associated with each video frame to classify the video frames as trackable or untraceable, wherein the label is trackable or untraceable.
12. The system of claim 11, wherein each frame of the plurality of frames comprises a trackable/untraceable classification and an associated cluster number.
13. The system of claim 12, wherein comparing each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames comprises:
identifying each frame corresponding to a given cluster label; and
a number of frames corresponding to the given cluster label containing trackable classifications is determined.
14. The system of claim 13, further comprising:
each cluster including at least a threshold number of trackable frames is stored in a data store.
15. A non-transitory computer-readable medium comprising one or more sequences of instructions which, when executed by one or more processors, perform one or more of the following:
retrieving, by a computing system, a broadcast video feed of a sporting event, the broadcast video feed comprising a plurality of video frames;
generating, by the computing system, a set of frames for classification using a principal component analysis model, wherein the set of frames is a subset of the plurality of video frames;
dividing, by the computing system, each frame in the set of frames into a plurality of clusters;
classifying, by the computing system, each frame of the plurality of frames as trackable or non-trackable, wherein a trackable frame captures a unified view of the sporting event;
comparing, by the computing system, each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames; and
classifying, by the computing system, each cluster comprising at least the threshold number of trackable frames as trackable.
16. The non-transitory computer-readable medium of claim 15, wherein generating, by the computing system, a set of frames for classification using a principal component analysis model comprises:
extracting one frame from the plurality of video frames at selected time intervals to generate the principal component analysis model of the broadcast video feed;
identifying a subset of frames from the extracted frames via the principal component analysis model; and
using the subset of frames as the set of frames for classification.
17. The non-transitory computer-readable medium of claim 16, wherein dividing, by the computing system, each frame in the set of frames into a plurality of clusters comprises:
each frame in the subset of frames is labeled with a corresponding cluster number.
18. The non-transitory computer-readable medium of claim 17, wherein classifying, by the computing system, each frame of the plurality of frames as trackable or non-trackable comprises:
a neural network is trained using a training set comprising a plurality of video frames and a label associated with each video frame to classify the video frames as trackable or untraceable, wherein the label is trackable or untraceable.
19. The non-transitory computer-readable medium of claim 18, wherein each frame of the plurality of frames comprises a trackable/untraceable classification and an associated cluster number.
20. The non-transitory computer-readable medium of claim 19, wherein comparing, by the computing system, each cluster to a predetermined threshold to determine whether each cluster includes at least a threshold number of trackable frames comprises:
identifying each frame corresponding to a given cluster label; and
a number of frames corresponding to the given cluster label containing trackable classifications is determined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311306878.3A CN117880607A (en) | 2019-02-28 | 2020-02-28 | Method for generating trackable video frame, identification system and medium |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962811889P | 2019-02-28 | 2019-02-28 | |
US62/811,889 | 2019-02-28 | ||
PCT/US2020/020444 WO2020176873A1 (en) | 2019-02-28 | 2020-02-28 | System and method for generating trackable video frames from broadcast video |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311306878.3A Division CN117880607A (en) | 2019-02-28 | 2020-02-28 | Method for generating trackable video frame, identification system and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113508604A true CN113508604A (en) | 2021-10-15 |
CN113508604B CN113508604B (en) | 2023-10-31 |
Family
ID=72235973
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080017542.3A Active CN113508604B (en) | 2019-02-28 | 2020-02-28 | System and method for generating trackable video frames from broadcast video |
CN202080017553.1A Active CN113574866B (en) | 2019-02-28 | 2020-02-28 | System and method for calibrating mobile cameras capturing broadcast video |
CN202080017540.4A Active CN113544698B (en) | 2019-02-28 | 2020-02-28 | System and method for re-identifying athlete in broadcast video |
CN202311306878.3A Pending CN117880607A (en) | 2019-02-28 | 2020-02-28 | Method for generating trackable video frame, identification system and medium |
CN202080017536.8A Active CN113508419B (en) | 2019-02-28 | 2020-02-28 | System and method for generating athlete tracking data from broadcast video |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080017553.1A Active CN113574866B (en) | 2019-02-28 | 2020-02-28 | System and method for calibrating mobile cameras capturing broadcast video |
CN202080017540.4A Active CN113544698B (en) | 2019-02-28 | 2020-02-28 | System and method for re-identifying athlete in broadcast video |
CN202311306878.3A Pending CN117880607A (en) | 2019-02-28 | 2020-02-28 | Method for generating trackable video frame, identification system and medium |
CN202080017536.8A Active CN113508419B (en) | 2019-02-28 | 2020-02-28 | System and method for generating athlete tracking data from broadcast video |
Country Status (4)
Country | Link |
---|---|
US (13) | US11182642B2 (en) |
EP (4) | EP3912089A4 (en) |
CN (5) | CN113508604B (en) |
WO (4) | WO2020176875A1 (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11305194B2 (en) * | 2019-01-21 | 2022-04-19 | Tempus Ex Machina, Inc. | Systems and methods for providing a real-time representation of positional information of subjects |
US11182642B2 (en) | 2019-02-28 | 2021-11-23 | Stats Llc | System and method for generating player tracking data from broadcast video |
US11475590B2 (en) * | 2019-09-12 | 2022-10-18 | Nec Corporation | Keypoint based pose-tracking using entailment |
EP4133406A4 (en) * | 2020-04-10 | 2024-04-10 | Stats Llc | End-to-end camera calibration for broadcast video |
US11804042B1 (en) * | 2020-09-04 | 2023-10-31 | Scale AI, Inc. | Prelabeling of bounding boxes in video frames |
CN114511596A (en) * | 2020-10-23 | 2022-05-17 | 华为技术有限公司 | Data processing method and related equipment |
CN112464814A (en) * | 2020-11-27 | 2021-03-09 | 北京百度网讯科技有限公司 | Video processing method and device, electronic equipment and storage medium |
US11532099B2 (en) | 2021-02-03 | 2022-12-20 | BetMIX, LLC | Method and system for compiling performance metrics for racing competitors |
CN113158891B (en) * | 2021-04-20 | 2022-08-19 | 杭州像素元科技有限公司 | Cross-camera pedestrian re-identification method based on global feature matching |
CN113537032B (en) * | 2021-07-12 | 2023-11-28 | 南京邮电大学 | Diversity multi-branch pedestrian re-identification method based on picture block discarding |
US20230070051A1 (en) * | 2021-09-09 | 2023-03-09 | Stats Llc | Estimating Missing Player Locations in Broadcast Video Feeds |
US11704892B2 (en) * | 2021-09-22 | 2023-07-18 | Proposal Pickleball Inc. | Apparatus and method for image classification |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040143434A1 (en) * | 2003-01-17 | 2004-07-22 | Ajay Divakaran | Audio-Assisted segmentation and browsing of news videos |
CN105653700A (en) * | 2015-03-13 | 2016-06-08 | Tcl集团股份有限公司 | Video search method and system |
CN108399435A (en) * | 2018-03-21 | 2018-08-14 | 南京邮电大学 | A kind of video classification methods based on sound feature |
Family Cites Families (151)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010047297A1 (en) * | 2000-02-16 | 2001-11-29 | Albert Wen | Advertisement brokering with remote ad generation system and method in a distributed computer network |
US6950123B2 (en) * | 2002-03-22 | 2005-09-27 | Intel Corporation | Method for simultaneous visual tracking of multiple bodies in a closed structured environment |
WO2004014061A2 (en) * | 2002-08-02 | 2004-02-12 | University Of Rochester | Automatic soccer video analysis and summarization |
US6859547B2 (en) * | 2003-01-25 | 2005-02-22 | The Mostert Group | Methods and computer-readable medium for tracking motion |
JP4394920B2 (en) * | 2003-10-06 | 2010-01-06 | 日本電信電話株式会社 | Video classification display method, system and program |
ES2312881T3 (en) | 2004-06-11 | 2009-03-01 | Saab Ab | MODELING BY PHYSICAL SCENES COMPUTER. |
US7512261B2 (en) * | 2004-07-27 | 2009-03-31 | Microsoft Corp. | System and method for calibrating multiple cameras without employing a pattern by inter-image homography |
US20080192116A1 (en) * | 2005-03-29 | 2008-08-14 | Sportvu Ltd. | Real-Time Objects Tracking and Motion Capture in Sports Events |
US8199199B1 (en) | 2007-08-15 | 2012-06-12 | Yuriy Shlyak | Method and system for real time judging boundary lines on tennis court |
GB2452512B (en) | 2007-09-05 | 2012-02-29 | Sony Corp | Apparatus and method of object tracking |
US8339456B2 (en) | 2008-05-15 | 2012-12-25 | Sri International | Apparatus for intelligent and autonomous video content generation and streaming |
US8406534B2 (en) * | 2008-06-30 | 2013-03-26 | Intel Corporation | System and method for video based scene analysis |
CN101843093A (en) | 2008-09-08 | 2010-09-22 | 索尼公司 | Image processing apparatus and method, imaging apparatus, and program |
JP4670923B2 (en) * | 2008-09-22 | 2011-04-13 | ソニー株式会社 | Display control apparatus, display control method, and program |
JP4905474B2 (en) * | 2009-02-04 | 2012-03-28 | ソニー株式会社 | Video processing apparatus, video processing method, and program |
KR101644789B1 (en) | 2009-04-10 | 2016-08-04 | 삼성전자주식회사 | Apparatus and Method for providing information related to broadcasting program |
GB0907870D0 (en) | 2009-05-07 | 2009-06-24 | Univ Catholique Louvain | Systems and methods for the autonomous production of videos from multi-sensored data |
JP5214533B2 (en) * | 2009-05-21 | 2013-06-19 | 富士フイルム株式会社 | Person tracking method, person tracking apparatus, and person tracking program |
US8755569B2 (en) * | 2009-05-29 | 2014-06-17 | University Of Central Florida Research Foundation, Inc. | Methods for recognizing pose and action of articulated objects with collection of planes in motion |
ES2692289T3 (en) * | 2009-07-20 | 2018-12-03 | Mediaproducción, S.L. | Procedure to calibrate a camera based on an image in which a circle and at least one line are visible |
US8345990B2 (en) * | 2009-08-03 | 2013-01-01 | Indian Institute Of Technology Bombay | System for creating a capsule representation of an instructional video |
US8553982B2 (en) * | 2009-12-23 | 2013-10-08 | Intel Corporation | Model-based play field registration |
WO2011085008A1 (en) * | 2010-01-05 | 2011-07-14 | Isolynx. Llc | Systems and methods for analyzing event data |
GB2477333B (en) | 2010-01-29 | 2014-12-03 | Sony Corp | A method and apparatus for creating a stereoscopic image |
US20120180084A1 (en) | 2011-01-12 | 2012-07-12 | Futurewei Technologies, Inc. | Method and Apparatus for Video Insertion |
US20140037213A1 (en) | 2011-04-11 | 2014-02-06 | Liberovision Ag | Image processing |
CN103649904A (en) * | 2011-05-10 | 2014-03-19 | Nds有限公司 | Adaptive presentation of content |
US20120316843A1 (en) * | 2011-06-08 | 2012-12-13 | Cobra Golf Incorporated | Systems and methods for communicating sports-related information |
US9883163B2 (en) | 2012-01-09 | 2018-01-30 | Disney Enterprises, Inc. | Method and system for determining camera parameters from a long range gradient based on alignment differences in non-point image landmarks |
US20130263166A1 (en) | 2012-03-27 | 2013-10-03 | Bluefin Labs, Inc. | Social Networking System Targeted Message Synchronization |
CN104364824A (en) | 2012-06-13 | 2015-02-18 | 松下知识产权经营株式会社 | Object detection device |
US9058757B2 (en) * | 2012-08-13 | 2015-06-16 | Xerox Corporation | Systems and methods for image or video personalization with selectable effects |
JP6210234B2 (en) * | 2012-09-19 | 2017-10-11 | 日本電気株式会社 | Image processing system, image processing method, and program |
US9237340B2 (en) * | 2012-10-10 | 2016-01-12 | Texas Instruments Incorporated | Camera pose estimation |
WO2014081767A1 (en) | 2012-11-21 | 2014-05-30 | H4 Engineering, Inc. | Automatic cameraman, automatic recording system and video recording network |
US20140169663A1 (en) * | 2012-12-19 | 2014-06-19 | Futurewei Technologies, Inc. | System and Method for Video Detection and Tracking |
US9349413B2 (en) * | 2013-02-05 | 2016-05-24 | Alc Holdings, Inc. | User interface for video preview creation |
DE102013004073A1 (en) | 2013-03-11 | 2014-09-11 | Martin Vorbach | Video stream analysis |
US9064189B2 (en) | 2013-03-15 | 2015-06-23 | Arris Technology, Inc. | Playfield detection and shot classification in sports video |
US9098923B2 (en) | 2013-03-15 | 2015-08-04 | General Instrument Corporation | Detection of long shots in sports video |
EP2790152B1 (en) * | 2013-04-12 | 2015-12-02 | Alcatel Lucent | Method and device for automatic detection and tracking of one or multiple objects of interest in a video |
JP6551226B2 (en) * | 2013-04-26 | 2019-07-31 | 日本電気株式会社 | INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM |
US9471849B2 (en) * | 2013-05-05 | 2016-10-18 | Qognify Ltd. | System and method for suspect search |
WO2014183004A1 (en) * | 2013-05-10 | 2014-11-13 | Robert Bosch Gmbh | System and method for object and event identification using multiple cameras |
KR101480348B1 (en) | 2013-05-31 | 2015-01-09 | 삼성에스디에스 주식회사 | People Counting Apparatus and Method |
US9875550B2 (en) | 2013-08-28 | 2018-01-23 | Disney Enterprises, Inc. | Method and device for tracking sports players with context-conditioned motion models |
US20150125042A1 (en) | 2013-10-08 | 2015-05-07 | Smartlanes Technologies, Llc | Method and system for data collection using processed image data |
US10096114B1 (en) | 2013-11-27 | 2018-10-09 | Google Llc | Determining multiple camera positions from multiple videos |
WO2015101385A1 (en) * | 2013-12-30 | 2015-07-09 | Telecom Italia S.P.A. | Method and system for automatically selecting parts of a video and/or audio media content based on information obtained from social networks |
US10832057B2 (en) * | 2014-02-28 | 2020-11-10 | Second Spectrum, Inc. | Methods, systems, and user interface navigation of video content based spatiotemporal pattern recognition |
WO2018053257A1 (en) * | 2016-09-16 | 2018-03-22 | Second Spectrum, Inc. | Methods and systems of spatiotemporal pattern recognition for video content development |
US11120271B2 (en) * | 2014-02-28 | 2021-09-14 | Second Spectrum, Inc. | Data processing systems and methods for enhanced augmentation of interactive video content |
US10521671B2 (en) * | 2014-02-28 | 2019-12-31 | Second Spectrum, Inc. | Methods and systems of spatiotemporal pattern recognition for video content development |
US20220327830A1 (en) * | 2014-02-28 | 2022-10-13 | Genius Sports Ss, Llc | Methods and systems of combining video content with one or more augmentations to produce augmented video |
US9684056B2 (en) | 2014-05-29 | 2017-06-20 | Abdullah I. Khanfor | Automatic object tracking camera |
KR20150144179A (en) | 2014-06-16 | 2015-12-24 | 삼성전자주식회사 | The Method and Apparatus of Object Part Position Estimation |
US9600723B1 (en) | 2014-07-03 | 2017-03-21 | Google Inc. | Systems and methods for attention localization using a first-person point-of-view device |
JP6027070B2 (en) * | 2014-09-24 | 2016-11-16 | 富士フイルム株式会社 | Area detection apparatus, area detection method, image processing apparatus, image processing method, program, and recording medium |
MX2017004722A (en) * | 2014-10-10 | 2017-11-30 | Livebarn Inc | System and method for optical player tracking in sports venues. |
JP6430914B2 (en) | 2014-12-22 | 2018-11-28 | キヤノンイメージングシステムズ株式会社 | Image processing apparatus and image processing method |
EP3262643A4 (en) * | 2015-02-24 | 2019-02-20 | Plaay, LLC | System and method for creating a sports video |
US20180301169A1 (en) * | 2015-02-24 | 2018-10-18 | Plaay, Llc | System and method for generating a highlight reel of a sporting event |
JP2016163311A (en) | 2015-03-05 | 2016-09-05 | ソニー株式会社 | Video processing device, video processing system, and video processing method |
US10074015B1 (en) * | 2015-04-13 | 2018-09-11 | Google Llc | Methods, systems, and media for generating a summarized video with video thumbnails |
EP3284061B1 (en) * | 2015-04-17 | 2021-11-10 | FotoNation Limited | Systems and methods for performing high speed video capture and depth estimation using array cameras |
JP6820527B2 (en) * | 2015-06-25 | 2021-01-27 | パナソニックIpマネジメント株式会社 | Video synchronization device and video synchronization method |
JP6697743B2 (en) * | 2015-09-29 | 2020-05-27 | パナソニックIpマネジメント株式会社 | Information terminal control method and program |
US10372989B2 (en) | 2015-10-30 | 2019-08-06 | Canon Kabushiki Kaisha | Control apparatus and control method for determining relation of persons included in an image, and storage medium storing a program therefor |
US9799373B2 (en) * | 2015-11-05 | 2017-10-24 | Yahoo Holdings, Inc. | Computerized system and method for automatically extracting GIFs from videos |
JP6534609B2 (en) | 2015-12-04 | 2019-06-26 | クラリオン株式会社 | Tracking device |
WO2017103674A1 (en) | 2015-12-17 | 2017-06-22 | Infinity Cube Ltd. | System and method for mobile feedback generation using video processing and object tracking |
US11132787B2 (en) | 2018-07-09 | 2021-09-28 | Instrumental, Inc. | Method for monitoring manufacture of assembly units |
US10726358B2 (en) | 2016-02-01 | 2020-07-28 | SweatWorks, LLC | Identification of individuals and/or times using image analysis |
CA3012721C (en) | 2016-02-03 | 2022-04-26 | Sportlogiq Inc. | Systems and methods for automated camera calibration |
JP6284086B2 (en) * | 2016-02-05 | 2018-02-28 | パナソニックIpマネジメント株式会社 | Tracking support device, tracking support system, and tracking support method |
US9852513B2 (en) | 2016-03-01 | 2017-12-26 | Intel Corporation | Tracking regions of interest across video frames with corresponding depth maps |
US9911223B2 (en) * | 2016-05-13 | 2018-03-06 | Yahoo Holdings, Inc. | Automatic video segment selection method and apparatus |
US10956723B2 (en) * | 2016-06-03 | 2021-03-23 | Pillar Vision, Inc. | Systems and methods for determining reduced player performance in sporting events |
US9886624B1 (en) | 2016-06-03 | 2018-02-06 | Pillar Vision, Inc. | Systems and methods for tracking dribbling in sporting environments |
JP7033587B2 (en) * | 2016-06-20 | 2022-03-10 | ピクセルロット エルティーディー. | How and system to automatically create video highlights |
JP6776719B2 (en) * | 2016-08-17 | 2020-10-28 | 富士通株式会社 | Mobile group detection program, mobile group detection device, and mobile group detection method |
JP6918455B2 (en) | 2016-09-01 | 2021-08-11 | キヤノン株式会社 | Image processing equipment, image processing methods and programs |
EP3475848B1 (en) * | 2016-09-05 | 2019-11-27 | Google LLC | Generating theme-based videos |
US10372970B2 (en) | 2016-09-15 | 2019-08-06 | Qualcomm Incorporated | Automatic scene calibration method for video analytics |
US10009586B2 (en) | 2016-11-11 | 2018-06-26 | Christie Digital Systems Usa, Inc. | System and method for projecting images on a marked surface |
US10818018B2 (en) * | 2016-11-24 | 2020-10-27 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and non-transitory computer-readable storage medium |
TWI607387B (en) | 2016-11-25 | 2017-12-01 | 財團法人工業技術研究院 | Character recognition systems and character recognition methods thereof |
MX2019006588A (en) | 2016-12-05 | 2019-10-09 | Avigilon Corp | System and method for appearance search. |
JP6873678B2 (en) * | 2016-12-09 | 2021-05-19 | キヤノン株式会社 | Image processing equipment, control methods, and programs |
CN106682598B (en) * | 2016-12-14 | 2021-02-19 | 华南理工大学 | Multi-pose face feature point detection method based on cascade regression |
US20180181569A1 (en) * | 2016-12-22 | 2018-06-28 | A9.Com, Inc. | Visual category representation with diverse ranking |
US10503775B1 (en) | 2016-12-28 | 2019-12-10 | Shutterstock, Inc. | Composition aware image querying |
US11042586B2 (en) | 2016-12-29 | 2021-06-22 | Shutterstock, Inc. | Clustering search results based on image composition |
US10628675B2 (en) * | 2017-02-07 | 2020-04-21 | Fyusion, Inc. | Skeleton detection and tracking via client-server communication |
WO2018177134A1 (en) | 2017-03-29 | 2018-10-04 | 腾讯科技(深圳)有限公司 | Method for processing user-generated content, storage medium and terminal |
SG11201909105TA (en) * | 2017-03-30 | 2019-11-28 | Nec Corp | Information processing apparatus, control method, and program |
JP6779365B2 (en) | 2017-03-30 | 2020-11-04 | 三菱電機株式会社 | Object detection device and vehicle |
US10521691B2 (en) * | 2017-03-31 | 2019-12-31 | Ebay Inc. | Saliency-based object counting and localization |
KR20200028330A (en) | 2017-05-09 | 2020-03-16 | 뉴럴라 인코포레이티드 | Systems and methods that enable continuous memory-based learning in deep learning and artificial intelligence to continuously run applications across network compute edges |
US10089556B1 (en) * | 2017-06-12 | 2018-10-02 | Konica Minolta Laboratory U.S.A., Inc. | Self-attention deep neural network for action recognition in surveillance videos |
US10162844B1 (en) | 2017-06-22 | 2018-12-25 | NewVoiceMedia Ltd. | System and methods for using conversational similarity for dimension reduction in deep analytics |
US10402687B2 (en) | 2017-07-05 | 2019-09-03 | Perceptive Automata, Inc. | System and method of predicting human interaction with vehicles |
US10431000B2 (en) * | 2017-07-18 | 2019-10-01 | Sony Corporation | Robust mesh tracking and fusion by using part-based key frames and priori model |
JP6898165B2 (en) * | 2017-07-18 | 2021-07-07 | パナソニック株式会社 | People flow analysis method, people flow analyzer and people flow analysis system |
KR20190009103A (en) | 2017-07-18 | 2019-01-28 | 삼성전자주식회사 | Electronic Device that is moved based on Distance to External Object and the Control Method |
JP6867056B2 (en) * | 2017-07-28 | 2021-04-28 | 日本電気株式会社 | Information processing equipment, control methods, and programs |
US10474988B2 (en) | 2017-08-07 | 2019-11-12 | Standard Cognition, Corp. | Predicting inventory events using foreground/background processing |
US11232294B1 (en) | 2017-09-27 | 2022-01-25 | Amazon Technologies, Inc. | Generating tracklets from digital imagery |
JP6543313B2 (en) | 2017-10-02 | 2019-07-10 | 株式会社エイチアイ | Image generation record display device and program for mobile object |
US20190108631A1 (en) | 2017-10-06 | 2019-04-11 | AgriSight, Inc. | System and method for field pattern analysis |
US10740620B2 (en) * | 2017-10-12 | 2020-08-11 | Google Llc | Generating a video segment of an action from a video |
AU2017254859A1 (en) | 2017-10-31 | 2019-05-16 | Canon Kabushiki Kaisha | Method, system and apparatus for stabilising frames of a captured video sequence |
CN107871120B (en) * | 2017-11-02 | 2022-04-19 | 汕头市同行网络科技有限公司 | Sports event understanding system and method based on machine learning |
JP6615847B2 (en) * | 2017-11-08 | 2019-12-04 | 株式会社東芝 | Image processing apparatus, image processing system, image processing method, and program |
AU2017272325A1 (en) * | 2017-12-08 | 2019-06-27 | Canon Kabushiki Kaisha | System and method of generating a composite frame |
CN107944431B (en) * | 2017-12-19 | 2019-04-26 | 天津天远天合科技有限公司 | A kind of intelligent identification Method based on motion change |
US10529077B2 (en) | 2017-12-19 | 2020-01-07 | Canon Kabushiki Kaisha | System and method for detecting interaction |
JP2021511729A (en) | 2018-01-18 | 2021-05-06 | ガムガム インコーポレイテッドGumgum, Inc. | Extension of the detected area in the image or video data |
US20190251204A1 (en) | 2018-02-14 | 2019-08-15 | Commvault Systems, Inc. | Targeted search of backup data using calendar event data |
US10719712B2 (en) | 2018-02-26 | 2020-07-21 | Canon Kabushiki Kaisha | Classify actions in video segments using play state information |
JP7132730B2 (en) | 2018-03-14 | 2022-09-07 | キヤノン株式会社 | Information processing device and information processing method |
CN108509896B (en) * | 2018-03-28 | 2020-10-13 | 腾讯科技(深圳)有限公司 | Trajectory tracking method and device and storage medium |
EP3553740A1 (en) | 2018-04-13 | 2019-10-16 | Koninklijke Philips N.V. | Automatic slice selection in medical imaging |
US11055902B2 (en) | 2018-04-23 | 2021-07-06 | Intel Corporation | Smart point cloud reconstruction of objects in visual scenes in computing environments |
US20190335192A1 (en) | 2018-04-27 | 2019-10-31 | Neulion, Inc. | Systems and Methods for Learning Video Encoders |
EP3579196A1 (en) | 2018-06-05 | 2019-12-11 | Cristian Sminchisescu | Human clothing transfer method, system and device |
US11842572B2 (en) | 2018-06-21 | 2023-12-12 | Baseline Vision Ltd. | Device, system, and method of computer vision, object tracking, image analysis, and trajectory estimation |
CN108875667B (en) * | 2018-06-27 | 2021-03-02 | 北京字节跳动网络技术有限公司 | Target identification method and device, terminal equipment and storage medium |
JP7002050B2 (en) | 2018-07-11 | 2022-02-10 | パナソニックIpマネジメント株式会社 | Imaging device |
US11348252B1 (en) * | 2018-07-12 | 2022-05-31 | Nevermind Capital Llc | Method and apparatus for supporting augmented and/or virtual reality playback using tracked objects |
JP7179515B2 (en) * | 2018-07-13 | 2022-11-29 | キヤノン株式会社 | Apparatus, control method and program |
US11070729B2 (en) | 2018-07-27 | 2021-07-20 | Canon Kabushiki Kaisha | Image processing apparatus capable of detecting moving objects, control method thereof, and image capture apparatus |
US10867390B2 (en) * | 2018-09-10 | 2020-12-15 | Arm Limited | Computer vision processing |
EP3857511A4 (en) * | 2018-09-28 | 2022-05-04 | INTEL Corporation | Methods and apparatus to generate photo-realistic three-dimensional models of photographed environment |
JP7132501B2 (en) * | 2018-11-01 | 2022-09-07 | ミツミ電機株式会社 | ranging camera |
JP2020086659A (en) * | 2018-11-19 | 2020-06-04 | トヨタ自動車株式会社 | Information processing system, program, and information processing method |
US11107099B2 (en) | 2018-12-21 | 2021-08-31 | Google Llc | Brand penetration determination system using image semantic content |
JP7310834B2 (en) | 2019-01-18 | 2023-07-19 | 日本電気株式会社 | Information processing equipment |
US11875567B2 (en) * | 2019-01-22 | 2024-01-16 | Plaay, Llc | System and method for generating probabilistic play analyses |
JP2020119284A (en) * | 2019-01-24 | 2020-08-06 | 日本電気株式会社 | Information processing device, information processing method, and program |
US11182642B2 (en) | 2019-02-28 | 2021-11-23 | Stats Llc | System and method for generating player tracking data from broadcast video |
US12046038B2 (en) * | 2019-03-22 | 2024-07-23 | The Regents Of The University Of California | System and method for generating visual analytics and player statistics |
KR20200113743A (en) | 2019-03-26 | 2020-10-07 | 한국전자통신연구원 | Method and apparatus for estimating and compensating human's pose |
US20200311392A1 (en) | 2019-03-27 | 2020-10-01 | Agt Global Media Gmbh | Determination of audience attention |
US10949960B2 (en) | 2019-06-20 | 2021-03-16 | Intel Corporation | Pose synthesis in unseen human poses |
WO2021108626A1 (en) * | 2019-11-27 | 2021-06-03 | Compound Eye Inc. | System and method for correspondence map determination |
EP4158505A4 (en) * | 2020-05-27 | 2024-05-08 | Helios Sports, Inc. | Intelligent sports video and data generation from ai recognition events |
US11361468B2 (en) * | 2020-06-26 | 2022-06-14 | Standard Cognition, Corp. | Systems and methods for automated recalibration of sensors for autonomous checkout |
US11521411B2 (en) * | 2020-10-22 | 2022-12-06 | Disney Enterprises, Inc. | System and method for providing multi-camera 3D body part labeling and performance metrics |
US20210118182A1 (en) * | 2020-12-22 | 2021-04-22 | Intel Corporation | Methods and apparatus to perform multiple-camera calibration |
JP2024508136A (en) * | 2021-03-05 | 2024-02-22 | トラックマン・アクティーゼルスカブ | System and method for player identification |
-
2020
- 2020-02-28 US US16/805,086 patent/US11182642B2/en active Active
- 2020-02-28 WO PCT/US2020/020446 patent/WO2020176875A1/en unknown
- 2020-02-28 CN CN202080017542.3A patent/CN113508604B/en active Active
- 2020-02-28 CN CN202080017553.1A patent/CN113574866B/en active Active
- 2020-02-28 CN CN202080017540.4A patent/CN113544698B/en active Active
- 2020-02-28 EP EP20762799.3A patent/EP3912089A4/en active Pending
- 2020-02-28 WO PCT/US2020/020444 patent/WO2020176873A1/en unknown
- 2020-02-28 EP EP20762485.9A patent/EP3912362A4/en active Pending
- 2020-02-28 US US16/805,116 patent/US11379683B2/en active Active
- 2020-02-28 WO PCT/US2020/020442 patent/WO2020176872A1/en unknown
- 2020-02-28 US US16/805,157 patent/US11593581B2/en active Active
- 2020-02-28 WO PCT/US2020/020438 patent/WO2020176869A1/en unknown
- 2020-02-28 US US16/805,009 patent/US11176411B2/en active Active
- 2020-02-28 CN CN202311306878.3A patent/CN117880607A/en active Pending
- 2020-02-28 CN CN202080017536.8A patent/CN113508419B/en active Active
- 2020-02-28 EP EP20763955.0A patent/EP3912132A4/en active Pending
- 2020-02-28 EP EP20763348.8A patent/EP3935832A4/en active Pending
-
2021
- 2021-11-15 US US17/454,952 patent/US11586840B2/en active Active
- 2021-11-22 US US17/532,707 patent/US11830202B2/en active Active
-
2022
- 2022-07-01 US US17/810,457 patent/US11861848B2/en active Active
-
2023
- 2023-02-17 US US18/171,066 patent/US11861850B2/en active Active
- 2023-02-27 US US18/175,278 patent/US11935247B2/en active Active
- 2023-10-18 US US18/489,278 patent/US20240046483A1/en active Pending
- 2023-11-17 US US18/512,983 patent/US20240095931A1/en active Pending
- 2023-11-20 US US18/514,326 patent/US20240087138A1/en active Pending
-
2024
- 2024-01-29 US US18/425,629 patent/US20240169554A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040143434A1 (en) * | 2003-01-17 | 2004-07-22 | Ajay Divakaran | Audio-Assisted segmentation and browsing of news videos |
CN105653700A (en) * | 2015-03-13 | 2016-06-08 | Tcl集团股份有限公司 | Video search method and system |
CN108399435A (en) * | 2018-03-21 | 2018-08-14 | 南京邮电大学 | A kind of video classification methods based on sound feature |
Non-Patent Citations (4)
Title |
---|
CHUNXI LIU: "Extracting Story Units in Sports Video Based on Unsupervised Video Scene Clustering", 《IEEE》 * |
CHUNXI LIU: "Extracting Story Units in Sports Video Based on Unsupervised Video Scene Clustering", 《IEEE》, 9 July 2006 (2006-07-09), pages 2 - 4 * |
LU H: "Unsupervised Clustering of Dominant Scenes in Sports Video", 《ELSEVIER》 * |
LU H: "Unsupervised Clustering of Dominant Scenes in Sports Video", 《ELSEVIER》, 1 November 2003 (2003-11-01), pages 1 - 4 * |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113508604B (en) | System and method for generating trackable video frames from broadcast video | |
Nguyen et al. | STAP: Spatial-temporal attention-aware pooling for action recognition | |
Liu et al. | Deep learning based basketball video analysis for intelligent arena application | |
AU2017272325A1 (en) | System and method of generating a composite frame | |
Kviatkovsky et al. | Online action recognition using covariance of shape and motion | |
Turchini et al. | Understanding sport activities from correspondences of clustered trajectories | |
CN118941640A (en) | System and method for calibrating mobile cameras capturing broadcast video | |
Pontes et al. | Application of machine learning in soccer broadcast: A systematic review | |
US20230047821A1 (en) | Active Learning Event Models | |
Wilson et al. | Event-based sports videos classification using HMM framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |