CN116958872A - Intelligent auxiliary training method and system for badminton - Google Patents

Intelligent auxiliary training method and system for badminton Download PDF

Info

Publication number
CN116958872A
CN116958872A CN202310922353.6A CN202310922353A CN116958872A CN 116958872 A CN116958872 A CN 116958872A CN 202310922353 A CN202310922353 A CN 202310922353A CN 116958872 A CN116958872 A CN 116958872A
Authority
CN
China
Prior art keywords
badminton
ball
camera
dimensional
binocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310922353.6A
Other languages
Chinese (zh)
Inventor
韩梁俭
李雨恒
韩博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310922353.6A priority Critical patent/CN116958872A/en
Publication of CN116958872A publication Critical patent/CN116958872A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image
    • G06T2207/30224Ball; Puck
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention is suitable for the field of artificial intelligence computer vision application, and particularly relates to an intelligent auxiliary training method and system for badminton, which output the position information and time sequence information of a ball body from two-dimensional ball path detection and tracking, and restore and track optimization the ball body in a three-dimensional space. The athlete technical action acquisition method based on the binocular view angle and the monocular view angle is realized. Additional condition constraints are introduced from a plurality of angles such as time sequence, places, competition rules and the like, and more accurate estimation results are tried to be more strategically obtained for the three-dimensional human body posture of the badminton athlete under the monocular view. The intelligent auxiliary training prototype system is realized, the ball road acquisition and athlete technical action acquisition scheme is integrated in the system, the corresponding data acquisition function, the data storage function and the index function are matched, and a closed loop of a badminton video analysis flow of data acquisition, data analysis, data storage and output is formed.

Description

Intelligent auxiliary training method and system for badminton
Technical Field
The invention belongs to the field of artificial intelligence computer vision application, and particularly relates to an intelligent auxiliary training method and system for badminton.
Background
The badminton way acquisition problem, namely the way tracking and the movement track calculation problem, can be divided into a way tracking scheme based on a monocular view angle and a way tracking scheme based on a multi-view angle.
Ball path tracking based on a monocular visual angle adopts a traditional method of visually tracking the movement track of a ball body, and researchers usually predict the ball path based on a Kalman filter or particle filter method by combining the characteristics of the volume, color, speed, acceleration and the like of the ball body. For the generalized sphere tracking framework, the sphere has nonlinear factors such as sphere speed, acceleration, spin and the like in three-dimensional space, and a switching search method based on a particle filter can be used for searching for a sphere which disappears due to undersize or occlusion in a locally continuous image sequence. However, if the sphere is blocked for a long time, the prediction effect of the conventional Kalman or particle filter method on the sphere path is poor. Therefore, it is necessary to manually define the appearance characteristics of the sphere, such as color, volume, shape or statistical characteristics, perform preliminary detection on the sphere target, and then convert the sphere path tracking problem into a problem of finding a globally optimal solution. However, in video images, the available feature information is very limited, and errors in feature information can further accumulate in a game due to occlusion, shading, and changes in appearance characteristics, resulting in a shift in tracking results. In recent years, with the rapid development of deep learning technology in the field of RGB image processing, some popular ball trackers are also beginning to be applied to ball tracking. The ball path tracking method based on machine learning can detect the ball target of each frame through a pre-trained convolutional neural network, track the target moving at high speed, and obtain the detection information of the ball by means of the appearance characteristics of the ball through regression.
Ball path tracking based on multiple view angles adopts a multiple camera system to make up for the defect that individual view angles cannot capture balls. In the traditional method, a plurality of cameras are used for measuring the three-dimensional positions of players in a tennis match, and the positions of the players are obtained through mean value calculation; and a specific camera calibration and tracking error is processed by a position information fusion method to cause a larger measurement error; and modeling a ball path in a state that the ball is owned by the player by combining the position information of the ball and the player so as to solve the problem of shielding the ball. The ball path tracking algorithm based on machine learning firstly detects and tracks a ball body in a two-dimensional space, and then synthesizes the two-dimensional tracking results of multiple cameras into a three-dimensional track by considering the reliability of the results.
The human body posture estimation can be classified into a two-dimensional human body posture estimation method and a three-dimensional human body posture estimation method from the classification of the human body posture modeling mode.
Two-dimensional human body posture estimation methods based on deep learning can be classified into human body posture estimation methods for single person and for multiple persons for how many people are in an input image. The single human body posture estimation method takes a single RGB image as input, positions the joint point of the human body by utilizing two-dimensional single posture estimation, and if a plurality of persons exist in the image, the single image can be obtained by cutting the input image. Generally, two-dimensional human body posture estimation methods are classified into two types, that is, a regression-based human body posture estimation method and a body part detection-based human body posture estimation method, respectively. Experiments prove that the joint point obtained by the regression-based method has larger error and is difficult to obtain accurate two-dimensional pixel coordinates from the image. Therefore, researchers have proposed a method based on body part detection, which is characterized in that the body part or the approximate position of the joint is predicted, and the joint position is obtained by supervised learning, usually by using a joint Heat Map (Heat Map) representation. This approach can reduce prediction errors well. The heat map is a two-dimensional gaussian distribution map which is constructed by taking the joint point position as the center, and is used for representing the confidence that a pixel coordinate is possibly a joint point. The multi-person pose estimation method needs to process two tasks of human body detection and human body joint point positioning. Thus, multi-person pose estimation can be divided into two categories, depending on the order and hierarchy in which the two tasks are performed: one is a Top-Down (Top-Down) method of estimating the joint point in each label after extracting the foreground information of the label as a person, and the other is a Bottom-Up (Bottom-Up) method of predicting all possible limbs in the image and then performing association combination to obtain a human body. The main tasks of the top-down method are human body target detection and human body joint point detection, foreground extraction is carried out through a target detection algorithm, each human body in an image background is detected, a human body bounding box or a human body outline is identified, and then single human body joint point detection is respectively carried out for each human body. Most of the research has focused on human target detection, such as Mask-RCNN, FPN, faster RCNN, YOLO series, and the like. The bottom-up method is faster in speed by detecting all possible human body joint points or human body trunk heat maps in the image and then clustering all the joint points or the human trunk to generate individuals of the corresponding human body.
Three-dimensional human body pose estimation is a task of predicting human body joint positions in three-dimensional space (relative camera position space or absolute position space) as targets. Because the two-dimensional human body posture estimation method can only represent the human body position in the image, the depth and height information is lacked, the human body posture information in the real world which can be obtained is limited, and the human body posture cannot be accurately expressed. Therefore, a plurality of researchers combine camera parameter matrixes (an internal reference matrix, an external reference matrix and the like) obtained by estimation or calibration on the basis of a two-dimensional human body posture estimation method, and a three-dimensional human body posture estimation method is provided. Three-dimensional human body posture estimation can be divided into a monocular view and a multi-view human body posture estimation method according to different viewpoint numbers of data acquisition. The human body posture estimation method based on the monocular view angle is mainly divided into a one-stage method, a two-stage method and a parameterized model-based method. The human body posture estimation method in one stage takes an original image as the input of a deep neural network, and directly maps to obtain a three-dimensional articulation point. The two-stage human body posture estimation method benefits from the successful development of the two-dimensional human body posture estimation method, and is generally divided into two stages, wherein the first stage takes an RGB image as input, and a depth neural network is adopted to predict the pixel coordinates of the two-dimensional articulation point; and in the second stage, taking the two-dimensional joint point as input, lifting the two-dimensional joint point to the three-dimensional joint point through a second neural network, and predicting to obtain the three-dimensional joint point coordinate. The model based on parameterization is a data driving model, a large amount of human modeling data is obtained through 3D scanning, and then the shape parameters of the human are obtained through machine learning. The human body posture estimation method based on the multi-view angle is used for solving the problems that a human body is blocked under the single-view angle and the camera shooting view angle is poor, and can be regarded as a certainty problem adopting a position information fusion technology.
Disclosure of Invention
The embodiment of the invention aims to provide an intelligent auxiliary training method for badminton, which aims to solve the problems that a human body is shielded under a monocular view and a camera shooting view is poor due to the fact that a human body posture estimation method based on a multi-view is used for solving the problem that the monocular view is poor.
The embodiment of the invention is realized in such a way that the intelligent auxiliary training method for badminton comprises the following steps:
constructing a binocular vision angle badminton training data set and a monocular vision angle badminton match data set;
constructing a badminton route acquisition model based on binocular viewing angles;
testing a shuttlecock path acquisition model based on a binocular view angle based on the binocular view angle shuttlecock training data set and the monocular view angle shuttlecock match data set;
performing a three-dimensional ball path position calculation test on the binocular corner badminton training data set, and comparing the results before and after ball path improvement;
estimating camera parameters based on the site feature point labels and the monocular visual angle badminton match data set to obtain a camera parameter estimation result;
constructing a badminton player tracking model, and tracking the badminton player through human bounding box detection and badminton court boundary marking;
constructing a technical action acquisition model suitable for a binocular corner badminton training data set;
Constructing a technical action acquisition model applicable to a monocular visual angle badminton match data set;
evaluating the monocular visual angle badminton match data set based on the time sequence human body posture estimation network;
the auxiliary training data are generated by performing binocular vision angle video acquisition on badminton athletes, and performing athlete physical analysis, athlete technical and tactical analysis and competition technology video segmentation angles on the badminton athletes.
Preferably, preparing a binocular vision angle badminton training data set and a monocular vision angle badminton match data set; a camera scheme for synchronously collecting binocular angular badminton video data is adopted, 30 tens of thousands of frames of image videos are collected, the data are cleaned and marked, in addition, a monocular visual angle badminton match data set is collected and arranged, the data set is opened, and 18 badminton match videos are divided into a training set and a testing set.
Preferably, designing a shuttlecock route acquisition scheme based on a binocular view angle, comprising designing and realizing the route acquisition scheme based on the binocular view angle according to the acquired binocular view angle shuttlecock training video, and calculating a three-dimensional route of the shuttlecock based on target detection and tracking; the scheme can be divided into 3 stages: the method comprises the steps that firstly, a heat map of a center point of a badminton ball body is obtained from monocular image videos of cameras; combining the shuttlecock heat map information, and carrying out position information fusion on pixel coordinates of shuttlecock targets predicted by two visual angles based on an epipolar constraint principle and calibrated camera parameters to obtain three-dimensional positions of the shuttlecocks at all times; and thirdly, combining time sequence information and outlier detection, and removing track noise to obtain a smooth three-dimensional ball path.
Preferably, a shuttlecock path acquisition scheme based on a binocular view angle is tested on a single-view angle shuttlecock match data set and a binocular view angle shuttlecock training data set, and specifically comprises the following steps: the training set is a training set part of the badminton match data set; the test set is divided into two parts, wherein the first part adopts all image videos of the test set part of the monocular visual angle badminton match data set, the second part adopts 558 image videos of the binocular visual angle badminton training data set, the target detection and tracking precision of the network model is evaluated by using a prediction precision (Prediction Accuracy) evaluation method commonly used in the field of single target detection and tracking, and the performances of the badminton target tracking convolutional neural network on the monocular visual angle badminton match data set and the binocular visual angle badminton training data set are compared.
Preferably, an experiment of calculating the three-dimensional ball path position is performed on a monocular visual angle badminton match data set, and the results before and after the improvement of the ball path are compared, specifically: dividing a ball path into a ball sending track and a ball striking track; then, outliers with obvious noise on the movement track of the ball body are deleted through the epipolar constraint principle, so that the ball path is smoother; and finally, the track before and after visual optimization is used for completing a comparison experiment.
Preferably, a camera parameter estimation method suitable for monocular visual angle image video and based on field feature point annotation is designed and realized, and specifically comprises the following steps: and (3) giving an RGB image, manually marking the characteristic points in the court, calculating a possible focal length of the camera according to the pixel coordinates of the marked points and the corresponding world coordinates, and finally determining an external reference matrix of the camera according to the internal reference matrix of the camera and the conversion errors of the pixel coordinates and the world coordinates corresponding to the characteristic points.
Preferably, a player detection tracking method based on the boundary constraint of the badminton court, which is suitable for monocular and binocular image videos, is designed, and badminton players are tracked through human bounding box detection and badminton court boundary marking; the method comprises the following steps: the method comprises the steps of realizing an athlete detection tracking module, wherein the athlete detection tracking module consists of a human body detection network and an athlete bounding box tracking post-processing sub-module, a main network of the human body detection network adopts a target detection algorithm to predict human body labels in images, obtain all identifiable human body targets in the images, output bounding box information, realize rapid athlete instance segmentation through collision detection post-processing operation based on boundary marking constraint of a badminton court, cut bounding boxes of output results, and estimate and obtain technical actions of athletes in each human body detection bounding box.
Preferably, the binocular vision angle badminton training data set designs and realizes a technical action acquisition scheme for a badminton player aiming at binocular vision angles, and specifically comprises the following steps: inputting an image video of a binocular viewing angle, obtaining a human bounding box and a corresponding id of the athlete through detection and tracking of the athlete, inputting the cut and enlarged human bounding box to perform two-dimensional human joint detection, predicting pixel coordinates of joints of the athlete under each viewing angle, performing position information fusion based on polar line constraint on the same athlete corresponding to the binocular viewing angle, obtaining three-dimensional human pose information of the athlete, and combining the three-dimensional human poses frame by frame to obtain technical actions of the athlete in the viewing angle.
Preferably, a technical action acquisition scheme of a badminton player aiming at a monocular view is designed and realized based on a monocular view badminton match data set, and specifically comprises the following steps: inputting an image video with a monocular view angle, calculating a focal length and an inner parameter matrix of a monocular camera by adopting a camera parameter estimation method based on feature points of a badminton court, obtaining a human bounding box and a corresponding id of the athlete through detection and tracking of the athlete, inputting the cut and amplified human bounding box to perform three-dimensional human posture estimation, predicting rough joint point coordinates of the athlete in a three-dimensional space, performing time sequence posture optimization based on self-adaptive filtering on a joint point coordinate sequence of the image video, obtaining smooth athlete technical actions, improving a root node position estimation result of the athlete based on ground constraint, and finally obtaining smoother and more accurate athlete technical actions in the three-dimensional space.
Preferably, the performance of the sequential human body posture estimation network on the three-dimensional human body posture estimation of the monocular image video is evaluated, specifically: the performance of the sequential human body posture estimation network on the three-dimensional human body posture estimation of the monocular image video is evaluated on the public data set. The improved result based on the posture optimization is compared and evaluated with the traditional interpolation or filtering method, and the absolute positions of human body articulation points before and after the improvement of the introduced ground constraint are compared and evaluated.
Preferably, an intelligent auxiliary training prototype system oriented to badminton is designed and realized according to the data analysis requirements of badminton games. The method comprises the following steps: a system consisting of a user interface, a background functional module and a database is designed. The user interface provides 5 interfaces for users, namely video acquisition, video playback, information management, video analysis and user management interfaces; the background functional modules are divided into 3 main modules, and are divided and classified according to functional types, namely: the system comprises a data acquisition module, a data analysis module and a data management module; the database supports user data record storage. The user can select the corresponding video acquisition, video playback, information management, video analysis and user management button interface to interact, the system internally processes the pipelining type through the corresponding inlet module of the interface, and the generated data is stored in the local or database through the integration of the data management module.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
the method realizes the ball path acquisition scheme based on binocular viewing angles, outputs the position information and the time sequence information of the ball body from the two-dimensional ball path detection and tracking, and restores and optimizes the track in the three-dimensional space. The athlete technical action acquisition method based on the binocular view angle and the monocular view angle is realized. Additional condition constraints are introduced from a plurality of angles such as time sequence, places, competition rules and the like, and more accurate estimation results are tried to be more strategically obtained for the three-dimensional human body posture of the badminton athlete under the monocular view. The intelligent auxiliary training prototype system is realized, the ball road acquisition and athlete technical action acquisition scheme is integrated in the system, the corresponding data acquisition function, the data storage function and the index function are matched, and a closed loop of a badminton video analysis flow of data acquisition, data analysis, data storage and output is formed.
Drawings
FIG. 1 is a general frame diagram of an intelligent training aid for badminton activities, according to an example;
FIG. 2 is a diagram of a three-dimensional ball path overall frame of a shuttlecock obtained in a binocular angular scene provided by the embodiment;
FIG. 3 is a schematic illustration of the epipolar constraint principle provided by the embodiments;
FIG. 4 is a schematic diagram of the three-dimensional ball path before and after optimization provided in the embodiment;
FIG. 5 is a schematic view of a course feature point annotation provided by an embodiment;
FIG. 6 is a schematic diagram of an athlete detection tracking module provided by an embodiment;
FIG. 7 is a three-dimensional pose generated from a cropped athlete bounding box provided by an embodiment;
FIG. 8 is a frame diagram of an athlete's technical action acquisition plan at a binocular viewing angle provided by the embodiments;
FIG. 9 is a frame diagram of an athlete's technical action acquisition plan at a monocular viewing angle provided by the embodiments;
FIG. 10 is a schematic view of a player technical action acquisition from a monocular perspective provided by an embodiment;
fig. 11 is a block diagram of an intelligent training aid system for badminton.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of this disclosure.
As shown in fig. 1, an overall framework diagram of an intelligent training aid method for badminton is provided, and the method includes:
step 1, constructing a binocular vision angle badminton training data set and a monocular vision angle badminton match data set.
In the step, a shooting scheme of synchronously collecting binocular angle cameras is adopted, and a badminton amateur match video and a battle training video during the full-fortune period of a badminton team are collected, and then the shooting video is preprocessed to be used as a homemade data set (namely, binocular angle badminton training data set) in the embodiment; firstly, developing a binocular vision angle data acquisition module, which comprises the following specific steps: 1) 2 Baumer VLXT-50C.I cameras and a matched synchronous controller are matched; 2) Combining a neoAPI and a synchronous controller provided by a camera, realizing synchronous control and real-time interaction functions of the two cameras; 3) And adjusting camera parameters to complete the acquisition function of binocular viewing angles. Then, video data acquisition work is carried out, and the acquisition steps comprise: 1) Before acquisition, setting corresponding camera parameters through camera application, and modifying and recording the acquired camera parameters (default parameters are used when the camera is not configured) by the system; 2) Filling in game information and athlete related information when a game record is newly added, and recording the game information and athlete information acquired at the time by the system; 3) Recording a video; 4) Filling in event information (such as score, middle place rest time, etc.) occurring in some competition process; 5) And submitting the video to finish the acquisition. Then, the collected video and data are stored in a database, the system creates a new game record based on the fact that one game training video is recorded each time, and the video storage path, athlete information and camera configuration information are associated, and the information is stored in the database in a table structure. And finally, marking the acquired video, marking the shuttlecock which is visible under the view angle of the camera only without predicting and marking the shuttlecock at the shielding part, taking the ball support part of the shuttlecock under the view angle of the camera as a marking center point, and recording the pixel coordinates of the ball support of the shuttlecock in each frame of image. Finally, 9 video segments are combined from the manufactured data set, 336736 frames are summed up by binocular vision angles, 168368 RGB image videos are acquired from each view angle, and camera parameters in each shooting process are calibrated. The camera of Baumer VLXT-50C.I model is adopted during shooting, the resolution of the acquired image and video is 1224 pixels wide and 1024 pixels high, the shooting frame rate is 20 to 30 frames per second, and the exposure time is 6 to 8 milliseconds.
In embodiments, this difficulty of minimal open dataset for shuttlecock target tracking is addressed by an open dataset for small volume ball (tennis, badminton) target tracking, hereinafter referred to as a monocular vision shuttlecock match dataset. The data set respectively constructs a tennis ball data set and a badminton ball data set based on a tennis ball public match video and a badminton ball public match video, wherein 55563 frames of video images are taken as training sets and 13200 frames of video images are taken as test sets aiming at the data sets of the badminton balls. The training set included 15 badminton games videos (total 46038 frames) and 3 amateur badminton games videos (total 9525 frames). The resolution of the video image is 1280 x 720, where the video frame rate for each game is 30FPS. In order to avoid overfitting as much as possible, the network learns the badminton motion characteristics in the continuous image sequence, the training set is switched according to different game backgrounds and visual angles in the game process, the video is divided into 125 segments, and each video segment comprises 2500 to 3000 continuous images. For each frame of image video, the data tag is denoted }, where frame id represents the sequential number of each frame, u and v represent the noted sphere pixel coordinates, vc represents whether the sphere is visible, 0 represents invisible, and 1 represents visible. The test set is a special badminton match segment which does not appear in the training set at all. For the shuttlecock games of the test set, the playing field and the game background thereof do not appear in the training set at all.
And 2, constructing a badminton route acquisition model based on binocular viewing angles.
In this step, a badminton training image sequence { I ] is captured by a given binocular angle t ,I t+1 ...,I t+m-1 Sum { I' t ,I′ t+1 ...,I′ t+m-1 Obtaining smooth three-dimensional badminton pathWherein { I' t ,I′ t+1 }∈R 2×m×w×h×3 Representing paired RGB images simultaneously photographed by 2 cameras, m representing the total length of the image sequence, w, h, t representing the number of images, width, height and photographing time stamp, respectively,/">{p t ,p′ t }∈R m×2 The world coordinate system position of the center (ball support) of the badminton ball at the time t is respectively represented, the coordinate system position in the double camera is represented, the two-dimensional image coordinate system position in the double camera is represented, and P' represents the badminton coordinates subjected to smoothing processing. The specific embodiment is divided into three stages.
The first stage: two-dimensional sphere target tracking. First, a segment of badminton training RGB image sequence { I }, is given t ,I t+1 ...,I t+m-1 }∈R m×w×h×3 A sliding window of length 3 is defined, and successive RGB images at each camera view angle are input each time at the window size. The 2 image sequences then each pass through a badminton sphere tracking network (TrackNet) of the same Encoder-decoder Structure. The method comprises the following specific steps: 1) After the RGB image is input into the encoder, the characteristic size is reduced to 1/4 of the original characteristic size through 2 convolution layers and 1 maximum Pooling Layer (Max-Pooling Layer), and the encoder repeats the same operation for 3 times to finally reduce the characteristic size to 1/8; 2) Then the network generates a prediction function through a Decoder (Decoder) extended Feature Map (Feature Map) to generate a 512×288 Feature Map, and then decodes by adopting an Up Sampling (Up Sampling) structure corresponding to a Down Sampling (Down Sampling) structure of the encoder; 3) To avoid feathering The characteristics of the sphere of the hairball disappear, a jump connection mechanism (Skip Connection Mechanism) is introduced, and the image characteristics of the object target with small volume are maintained by directly transmitting the characteristic array in the encoder network to the corresponding characteristic mapping layer of the decoder network; 4) Generating a real value array of 1 channel in the last Sigmoid activation layer for recording a specific value of a two-dimensional normal distribution function which is outwards diffused by taking the badminton support part as the center of a circle; 5) Finally, outputting a two-dimensional sphere target heat map sequence { H } which is the same as the input image in size and based on probability distribution t ,H t+1 ...,H t+m-1 }∈R m×w×h×1
And a second stage: binocular angular position information fusion. First, a shuttlecock predicted heat map sequence { H } given two perspectives t ,H′ t }∈R 2×m×w×h×1 The method comprises the steps of carrying out a first treatment on the surface of the Then, the badminton ball image at any t moment is processed by the following specific steps: 1) Respectively selecting pixel coordinate points p with highest scores under 2 visual angles 1 ,p 2 Wherein 1 and 2 respectively represent camera numbers, a camera number 1 is defined as a source view angle, and a camera view angle number 2 is defined as a reference view angle; 2) Representing a two-dimensional coordinate point pair of a badminton ball image as { p } 1 ,p 2 },p 1 ,p 2 Shuttlecock ball tracking network predicted shuttlecock two-dimensional pixel coordinates (u) in photographed pictures respectively representing source view and reference view i ,v i ) I ε {1,2}, to facilitate computation, add a constant, expand the two-dimensional pixel coordinates to three dimensions, denoted as (u) i ,v i 1) a step of; 3) Given the corresponding calibration parameter matrix { K ] of two cameras i ,R i ,T i I e {1,2}, where K represents the camera-corresponding reference matrix, R represents the two camera-corresponding reference rotation matrix, and T represents the two camera-corresponding reference displacement matrix. Camera parameters { K } according to source view angle 1 ,R 1 ,T 1 And badminton sphere coordinate p 1 And camera parameters { K for reference viewing angle 2 ,R 2 ,T 2 And badminton sphere coordinate p 2 Normalizing the depths of the 2 camera planes to obtain normalizationPlane, at the same time get p 1 ,p 2 Projection point p 'on normalized plane' 1 ,p′ 2 And obtains the polar line C formed by the center coordinates of the shuttlecock and the center coordinates of the camera predicted under 2 visual angles 1 p′ 1 ,C 2 p′ 2 Wherein C 1 ,C 2 The camera centers are respectively 1 and 2; 4) As shown in fig. 3, referring to the epipolar geometry epipolar constraint principle, a three-dimensional coordinate point P of the badminton sphere in the world coordinate system is calculated i =(x i ,y,z i ) I e {1,2}; 4) Shuttlecock center coordinates P predicted for 2 perspectives 1 ,P 2 P when the sphere target detected by the sphere target tracking network is completely correct 1 =P 2 If the network mispredicts the positive sample coordinate P under the reference view angle 2 We solve for C based on source view 1 p′ 1 Upper distance C 2 p′ 2 Nearest point P 1 As a three-dimensional coordinate point of the badminton body in the world coordinate system. Finally, information Fusion (Position Fusion) is carried out on all binocular angles on the time sequence, and a three-dimensional sphere center point Position sequence of the badminton is obtained preliminarily
And in the third stage, the three-dimensional ball road is improved smoothly, and the smooth ball road is obtained through detection of obvious noise and curve fitting technology. The main cause of the occurrence of noise due to the ball trajectory is due to the network misidentification of the positive sample (FP), i.e. the misidentification of objects that were not originally shuttlecocks as shuttlecock balls. To repair the erroneous completion trajectory optimization (Trajectory Optimization), we first determine which of the three specific causes of this type of phenomenon are: 1) Only one view angle appears FP, and the other view angle outputs TP, so that the sphere track appears an outlier; 2) Because the ball speed quickly leads to the difference between the ball position of a single or two visual angles and the true value, the distance error is not large; 3) Objects similar to the characteristics of the shuttlecock appear in the scenes of the two view angles, and are detected as the shuttlecock by the network of the binocular view angles at the same time, and are always broken Intermittent appearance. Then, we use the corresponding solution according to the cause of the error, and the specific steps are: 1) For the class 1 case, we use the approach to delete outliers. Firstly, an outlier detection scheme classical isolated forest algorithm based on an isolated forest selects a shuttlecock three-dimensional position of a continuous segment each time and combines timestamp information (x i ,y i ,z i ,frame i ) As training sample S input; and then, deleting the outlier with the Boolean value of False according to the Boolean value of the predictive label of the returned sample, and performing primary screening on the outlier. 2) For the problem caused by the 2 nd type of situation, namely that no insignificant noise on the position coordinates is deleted, a certain frame of sphere which is predicted to be invisible by the network under the binocular viewing angle is deleted, and the three-dimensional coordinates of the timestamp after the outlier is deleted based on the isolated forest method, the image frames of which the sphere position information is deleted are interpolated by adopting the two-dimensional curve based on projection, so that a better fitting effect is obtained. Firstly, projecting the movement track of the shuttlecock in a vertical plane along the center line of a court, wherein the badminton route curve projected on the plane is a parabolic curve; then, projecting the parabolic motion track of the badminton route to the court surface to be approximately a straight line; then, for the badminton position sequence projected on the vertical plane, obtaining a curve fitting function of badminton projected on the plane by a least square method aiming at two-dimensional curve fitting; for a badminton position sequence projected on the ground, obtaining a curve fitting function projected on the plane by the badminton through linear interpolation; finally, obtaining the optimized ball path, namely the final ball position sequence, through the coordinate interpolation result of the vertical plane and the ground 3) For the class 3 case, if this occurs continuously in the image video, due to the complex field background, less is encountered during training and playing of the shuttlecock, so embodiments will not discuss this class.
And 3, testing a shuttlecock road acquisition model based on the binocular view angle badminton training data set and the monocular view angle badminton match data set.
In the step, a binocular angular badminton training data set and a monocular visual angle badminton match data set are used, wherein the training set adopts image videos (46038 frames) of all training set parts of the badminton match data set, the testing set is divided into two parts, the first part adopts all image videos (9525 frames) of all testing set parts of the badminton match data set, the second part adopts 558 image videos in the binocular angular training video, 588 images are a plurality of athlete training video fragments shot by a binocular camera, and the two visual angles 294 images are in one-to-one correspondence on a time stamp. The image content is a video segment of the course of a player completing a service, receiving a service, and striking a ball, each segment being from 20 frames to 60 frames in length. The image resolution of the badminton match data set is 1280 multiplied by 720, and the image is reduced to 1/4 times of the original image in equal proportion as input according to the parameter input of the network. The resolution of the images of the binocular angular badminton training data set is 1224 multiplied by 1024, the processing method is that the images are reduced to 0.28 times of the original images and then rounded, namely 342 multiplied by 286, and then the images are filled (padding) in the specific operation mode: shadows of 85 pixels in length are filled on both sides of the image, and shadows of 1 pixel are filled above and below the image, respectively. For the badminton target tracking convolutional neural network, an Adadelta optimizer with an initial learning rate of 1.0 is adopted for training to optimize network parameters, a Weighted cross entropy loss (Weighted Binary CrossEntropy, WBCE) is adopted for a loss function, the probability that each pixel point is the center of a sphere is represented by 0 or 1, and the epoch is set to 30, and the tolerance error value is 4. The heat map of the network training output is a value of 0 to 1, 0.5 is selected as a threshold, and each value is converted into 0 or 1. The position of the ball is considered to be the center of the largest region in the 0-1 heat map. If the Euclidean metric (Euclidean distance, also called L2 distance) between the predicted sphere coordinates and the real label is less than a given threshold, i.e., the tolerance error parameter set in the experiment, i.e., 4 pixels, then the prediction is considered a true positive value.
As for the evaluation method, the present embodiment evaluates the target detection and tracking accuracy of the network model using a prediction accuracy (Prediction Accuracy) evaluation method commonly used in the field of single target detection and tracking. The positive samples (posives) and the negative samples (negotives) are evaluated according to different evaluation indexes, wherein the evaluation indexes are as follows: accuracy (Acc), precision (Prec), recall (Rec), specificity (False Positive Rate, FPR), F1 Score (F1-Score, F1). The evaluation index is calculated as follows:
1) Accuracy (Accuracy, acc): the ratio of the sum of correctly predicted positive and correctly predicted negative samples to the total number of samples. The calculation was acc= (tp+tn)/(tp+tn+fp+fn).
2) Precision (Precision, pre): in the data predicted as positive samples, the duty ratio of the positive samples is correctly detected. The calculation is prec=tp/(tp+fp).
3) Recall (Recall, rec): the duty cycle of the positive number of samples in all positive samples. The calculation method is rec=tp/(tp+fn).
4) Specificity (False Positive Rate, FPR): among the total number of samples in which the real label is a negative sample, the duty ratio of the positive sample is erroneously judged. The smaller the value, the better. The calculation is fpr=fp/(fp+fn).
5) F1 Score (F1-Score, F1): the F1 score is considered to be equally important in recall and precision, calculated as f1=2 TP/(2tp+fp+fn).
Where TP is an abbreviation for True posives, representing the number of samples correctly identified as positive samples in samples where the True label is positive. TN is an abbreviation for True negative and represents the number of samples that are correctly identified as negative in the sample where the True label is negative. FP is an abbreviation for False Positives, representing False Positives, i.e. the number of samples that a negative sample is mispredicted or identified as a positive sample. FN is an abbreviation for False negative, i.e. the number of samples where positive samples are mispredicted or identified as negative samples.
And 4, performing a three-dimensional ball path position calculation test on the binocular corner badminton training data set, and comparing the results before and after ball path improvement.
In this step, as shown in fig. 4, the three-dimensional ball path obtained through the position information fusion is analyzed, for tracking the badminton target under the binocular viewing angle, the situation that the prediction is invisible due to the shielding of the shuttlecock shot under one viewing angle, but the other viewing angle is still correctly detected inevitably occurs, and for this kind of situation, the position information fusion is not performed in consideration of the complexity of the ball movement in the shuttlecock match, that is, only the correct detection (TP) of the positive sample or the incorrect prediction (FP) of the negative sample in the binocular viewing angle is screened out to perform binocular viewing angle position information fusion. The method comprises the following specific steps: 1) Setting a ball sending position of a player as a No. 1 position, setting a ball hitting position of the player as a No. 2 position, dividing the position of the player into a ball sending track and a ball hitting track; 2) Restoring the ball-serving track and the ball-striking track through a polar constraint principle respectively, and restoring coordinate points with space position errors on the track of the ball body, so that the two ball paths are particularly displayed on the curve of the parabola in the whole; 3) And comparing the ball-serving track and the ball-hitting track before and after visual optimization.
And 5, estimating camera parameters based on the site feature point labels and the monocular visual angle badminton match data set to obtain a camera parameter estimation result.
In this step, as shown in fig. 2, 5 and 6, camera parameter estimation is achieved by manually labeling feature points on the landmark line of the badminton court based on the characteristic that the shooting angle of view of the camera is relatively fixed. The method comprises the following specific steps: 1) And manually marking the characteristic points. Firstly, for a given RGB image, characteristic points in all marking lines of the badminton court are marked manually. In order to ensure the accuracy and the robustness of the estimation result, 16 points are selected from a ground plane and a net of a badminton court to serve as marking feature points based on the characteristics of the badminton court; then, 16 pairs of feature points (p i ,P i W ) I=1, 2,3,..16, wherein p i Representing pixel coordinates (u i ,v i ,1),P i W Representing three-dimensional coordinates of the ith pair of feature points in world coordinate system2)According to the pixel coordinates of the marked points and the Chen et al-based method, the image is obtained by a camera center C and any 2 pairs of characteristic points (p i ,P i W ) And (p) j ,P j W ) Estimating the focal length of the camera, and calculating the feature points of the labeling points in a two-by-two traversal way to obtain 128 possible focal lengths f of the camera k (k=1, 2,3,) 128. 3) Determining camera internal parameters and external parameter matrixes, calling classical solvePnP algorithm, and focusing camera focal length f k And 16 pairs of characteristic points are used as input to obtain a corresponding external parameter rotation matrix R k According to each group (f k ,R k ) Sequentially calculating the P i W Projected coordinates P 'in the image coordinate system' i Calculating p with manual annotation i Projection error Δp of (a) i Selecting Deltap i Minimum value min (Δp 1 ,Δp 2 ,Δp 3 ,...,Δp 128 ) The corresponding (f, R), f is the estimation result of the camera parameters.
And 6, constructing a badminton player tracking model, and tracking the badminton player through human bounding box detection and badminton court boundary marking.
In the step, a player detection and tracking scheme based on boundary constraint of a badminton court is designed and realized, and is packaged into a player detection and tracking module (Player Detection Module), so that the bounding box detection and tracking of the badminton player can be realized in monocular or binocular images. The module consists of a Human detection network (Human detection Net) and an athlete bounding box tracking post-processing sub-module (Player Tracker).
The human body detection network classifies the labels of the target categories in the RGB image and detects the bounding boxes, acquires the bounding boxes with the labels being human bodies, and completes the detection of the human body bounding boxes under 2 visual angles. The method comprises the following specific steps: firstly, a target detection algorithm is selected as a main network of a human body detection network, a YOLOv5l model divides an original image of an RGB image into S×S square units, the original image is output as a feature image with the size of S×S through a convolution network, namely, any point on the feature image corresponds to the S×S square units of the original image in a mode of outputting the feature image with the size of S×S, and each unit is named as an anchor point. Then, according to the real label, calculating the area where the center point of the marked object is located, wherein the anchor point in the area is responsible for detecting the bounding box, and the detection flow is as follows: 1) Calculating IoU (Intersection over Union) of all real labels and each anchor point in the RGB image, and obtaining the anchor point id with highest matching degree for each real label; 2) Traversing each real label to find the corresponding scale of the anchor point; 3) And finally outputting an N-dimensional target detection bounding box, wherein N represents the object type of the real tag. Wherein the bounding box data comprises (x, y, w, h, p, c), wherein x, y represents the center coordinates of the bounding box, w, h represents the width and height of the bounding box, p represents the confidence level, and c represents the classification information. And finally, taking a target detection bounding box with highest confidence coefficient p in bounding box data to participate in calculating a loss function, and converting a target detection task into a regression task by setting the loss function by YOLOv5 l.
The athlete bounding box tracking post-processing module is based on boundary marking constraint of the badminton court, and rapid athlete instance segmentation is achieved through collision detection post-processing operation. The method comprises the following specific steps: 1) For any bounding box bbox in an RGB image i = (x, y, w, h, p, c), 4 vertex coordinates of bounding box can be calculated2) Selecting two points at the bottom of the surrounding frameAnd 5, selecting boundary characteristic points (P0, P1, P2, P3, P4 and P5) of a single-shot field under a world coordinate system based on the boundary characteristic points defined in the step 5, wherein an A half field area is defined by marked lines formed by sequentially connecting (P0, P1, P2 and P3), and a B half field area is defined by marked lines formed by sequentially connecting (P0, P3, P4 and P5). 3) Inputting the coordinates of the characteristic points of the foot points to be tracked and the A/B half fields of all the detected bounding boxes in sequence, if at least 1 foot point exists in a certain half field in 2 foot points in a certain bounding box to be detected, indicating that the bounding box is an athlete in the half field, acquiring all bounding box data of the athlete to correspond to corresponding half field numbers, and obtaining the id of the human bounding box to correspond to the half field numbersAthlete id binding of (c).
And 7, constructing a technical action acquisition model suitable for the binocular corner badminton training data set.
In this step, as shown in fig. 7 and 8, a two-dimensional human body joint point of the athlete is obtained through a two-dimensional human body joint point estimation network (2D Estimate Net) of binocular viewing angles, position information fusion is performed according to an inside and outside parameter matrix of the camera, a three-dimensional human body posture of the athlete in a world coordinate system is obtained, and technical actions of the athlete are formed on continuous images. The method comprises the following specific steps: first, based on the output result obtained by the athlete detecting and tracking module, the bounding box is cut. Then, a high-resolution convolutional neural network is selected as a main network for two-dimensional human body posture estimation, and two-dimensional joint point estimation is carried out on the human body bounding boxes obtained by tracking in each bounding box. 1) Based on the parallel structure of a plurality of branch networks of the network design, a feature map with different sizes is used in each branch network; 2) Feature information of the branch network is collected through the fusion layer, so that feature map information under different sizes is effectively integrated; 3) Generating a new feature map with smaller size from the existing feature map through a conversion layer; 4) The high-resolution convolutional neural network outputs a predicted heat map of the two-dimensional joint point position of the human body. Finally, mapping the two-dimensional human body posture estimation result to three dimensions. According to the pixel point with the highest score of the corresponding node point serving as the predicted node point image coordinate position, the pixel point is input into a position information fusion module mentioned in the second stage of the step 2, the two-dimensional coordinate point of each pair of pixels under the binocular vision angle is mapped into three dimensions, all human body nodes in the binocular vision angle are traversed, the human body posture of the badminton player under the three-dimensional world coordinate system is obtained, and technical action information of the player is formed on continuous images.
And 8, constructing a technical action acquisition model suitable for the monocular badminton match data set.
In this step, as shown in fig. 9, the present invention provides an End-to-End Learning method (End-to-End Learning). The solution is divided into 4 parts, first, the two-dimensional body position (2D Player Estimation) of each player is estimated, and the player is followedA two-dimensional sequence of human body joints in a sequence of detected tracking (Player Detection) and a two-dimensional human body pose estimation network (2D Estimate Net) predicted images. Then, a three-dimensional human body posture estimation network based on time sequence is used for forming a sliding window with the size of 2w+1 by a w-frame two-dimensional human body key point sequence before and after a t-frame image and an estimated camera parameter K together as an input cavity convolutional neural network structure, and the three-dimensional human body posture and the position of the t-frame image are predicted. Then, the posture of the athlete in the image sequence is optimized (Pose refinishment), and the posture optimization method based on the energy optimization key point track self-adaptive filtering is adopted to ensure that when the human posture is blocked in a single frame or partial joint point estimation errors, consistent and smooth posture actions can be kept, so that the three-dimensional human body is smoother in time sequence. Finally, an improved method (Position Correction) for estimating the position of the root node of the human body based on ground constraint is designed and realized, and the position of the athlete under the three-dimensional world coordinate system can be estimated more accurately on the premise that the feet of the athlete do not jump by combining the relation between the human body bounding box and the field. The method comprises the following specific steps: 1) First, the foot point P of the athlete closer to the camera (the closest base line of the camera) is calculated g According to the key point P of the knee of the human body under the image coordinate system knee Key point P with ankle of human body ankle Extension line P of connection line of key point knee P ankle Obtain the intersection point P with the bottom line of the bounding box g (u, v). 2) Secondly, calculating a foot point P under an image coordinate system g Position P in world coordinate System g (x, y, z) since this point intersects the bottom surface, it is known that this point is located in the course plane xOy plane in the world coordinate system, i.e., z=0. 3) Calculating to obtain foot point P according to the estimated camera parameter matrixes K, R and T g Absolute coordinates of (c). 4) Finally, according to the intersection point P g And P in world coordinate system knee P ankle The proportional relation of (2) is calculated to obtain P knee 、P ankle Further calculating the corrected position of the root node. And correcting the estimated human body posture under the monocular image through translation transformation.
Step 9, as shown in fig. 10, evaluating the monocular vision badminton match data set based on the time sequence human body posture estimation network.
In this step, the performance of the sequential human body posture estimation network on the three-dimensional human body posture estimation of the monocular image video is evaluated. The method mainly comprises the steps of evaluating on a public data set, comparing and evaluating an improved result based on gesture optimization with a traditional interpolation or filtering method, and comparing and evaluating absolute positions of human body articulation points before and after improvement of introduced ground constraint. First, training was performed using the MuCo-Temp dataset as a training set, and testing was performed on the MuPoTS-3D dataset. Secondly, for an input image sequence, the image size is 2048 multiplied by 2048, for the detected athlete human bounding box, the width of a scaled image is 192 pixels, the height of the scaled image is 256 pixels, the scaled image is input into a high-resolution convolutional neural network for two-dimensional human joint point estimation, wherein an optimizer adopts Adam, the learning rate is 0.001, the batch size is 32, and the epoch is 140. Then, the human body joint point coordinates output by the high-resolution convolutional neural network and the focal length f of the camera are input into a time sequence human body posture estimation network, wherein an optimizer adopts Adam, the learning rate is 0.001, the batch size is 1024, and the epoch is 80. Then, the evaluation indexes MPJPE, MRPE, N-MRPE and N-MPJPE are adopted to evaluate the three-dimensional human body posture estimation result. Finally, the result analysis is completed, and the specific evaluation content is as follows: 1) Comparing the performance of a time sequence human body posture estimation network (TPN) on a MuCo-Temp data set with other methods in a public data set evaluation; 2) Comparing the time sequence human body posture estimation network (TPN) with the traditional interpolation or filtering method, and adopting a linear interpolation and 1-Euro filter method to perform human body posture estimation; 3) The three-dimensional human body posture obtained by matching the time sequence human body posture estimation network with the posture optimization method is compared with the three-dimensional human body posture obtained by the binocular angle human body posture estimation method by comparing and evaluating the front and rear improvement methods introducing the ground constraint aiming at the 32 binocular angle badminton training data, and the human body posture estimated by adding the ground constraint. Adopting the evaluation indexes of MRPE and MPJPE; 4) Visual result evaluation, three-dimensional human body gestures obtained based on information fusion of polar constraint under visual binocular angles, three-dimensional human body gestures obtained by using a Temporal PoseNet in combination with time sequence gesture optimization, and root node position estimation improvement results after ground constraint is introduced, and the three are stacked on one graph for comparison.
Step 10, performing binocular vision angle video acquisition on badminton athletes, performing athlete physical performance analysis, athlete technical and tactical analysis and competition technology video segmentation angles on the badminton athletes, and generating auxiliary training data.
In the step, the physical performance index analysis, the technical tactics index analysis and the technical action video segment 3 types of requirements are summarized by researching indexes considered in the competition analysis of the Zhejiang badminton team in the field and combining with the actual analysis requirements of coaches and athletes. The designed shuttlecock auxiliary training prototype system provides quantized auxiliary data support for athletes from the angles of athlete physical performance analysis, athlete technical and tactics analysis and competition technique video segmentation in the shuttlecock, covers all auxiliary training requirements of the shuttlecock competition, and provides algorithm support and lasting storage service for competition analysis.
The badminton training assisting prototype system is divided into a user interface, a background functional module and a database. The user interface provides 5 interfaces for a user, namely video acquisition, video playback, information management, video analysis and user management interfaces; the system is internally divided into 3 main modules and 1 database supporting user data record storage. The 3 main modules are divided and classified according to function types, and are respectively: the system comprises a data acquisition module, a data analysis module and a data management module. The user can select the corresponding video acquisition, video playback, video analysis and information management button interface for interaction. And carrying out pipeline processing through an entry module corresponding to the interface in the system, and storing the generated data in a local or database through integration of the data management module. The system is developed based on PyQt language and expressed in the form of desktop software.
The data acquisition module is used for carrying out binocular vision angle video shooting on the training process of the badminton athlete, and the acquisition work of the binocular vision angle badminton training data set is completed by the prototype system. The system adopts the principle of multi-process asynchronous IO to realize the binocular vision angle synchronous acquisition function. In the process of data acquisition of the system, the system works in a total of 3 processes, a main process is a graphical interface and is used for processing interactive operation of a user, subprocesses are a camera working subprocess and a storage subprocess respectively, the camera working subprocess is used for synchronous acquisition of a camera, the storage subprocess is used for creating a storage thread, and acquired image video data is asynchronously stored in the acquisition process. The method comprises the steps that 2 command queues and display queues in the form of queue data structures are created between a main process and a camera work subprocess based on an inter-process communication mode of a shared memory, the command queues are used for transmitting user operation to the camera work subprocess, the work process receives and processes commands in the command queues one by one to finish operation change of the camera, and the display queues are used for returning image data synchronously collected in real time by the camera in a collection process to a display interface of the main process and displaying a current collected image to a user. The method comprises the steps that 1 storage queues in the form of queue data structures are created between a camera work subprocess and a storage subprocess based on an inter-process communication mode of a shared memory, and data collected by the camera are transmitted to the storage subprocess in a one-way mode by the camera work subprocess; the storage subprocess dynamically creates a storage thread according to the length of the storage queue, and writes the image in the storage queue in a specified path of the local hard disk.
The data analysis module is an integrated functional module for acquiring technical fragments of badminton routes and athlete technical actions. The module can carry out ball path analysis and athlete technical action acquisition on the input video at the same time, and carries out cleaning treatment and position information fusion on the output data to obtain the output data based on the training data index. The method comprises the following specific steps: 1) The system obtains all data of physical performance indexes based on technical actions of athletes, wherein the all data comprise running distance, running line, number of times of taking off and taking off height. 2) And analyzing technical and tactical indexes. The system can obtain the ball speed and the ball-serving drop point area in technical and tactical indexes through the generated three-dimensional ball path of the badminton, and further obtain the active batting, passive batting and transitional batting areas. 3) The system analyzes the identification of various swing actions of the badminton player based on the technical and tactical analysis of the badminton player according to the three-dimensional human body posture and the ball path analysis of the badminton, comprehensively judges the events of the ball serving and the ball receiving and serving fragments, segments the video based on the judgment of the game event, and outputs the corresponding technical video fragments.
The data management module stores user account information, athlete physical quality information and competition video information, wherein the competition video information is divided into an original acquisition video and auxiliary training analysis data. The system develops a database based on MySQL 8.0.21 and builds a corresponding database table. The user can perform the operations of adding, deleting and checking athlete information, and the system records the corresponding operations in the database.
As shown in fig. 11, the embodiment of the invention further provides an intelligent training aid system for badminton, which comprises:
the system comprises a user interface module, a display module and a display module, wherein the user interface module comprises an acquisition interface, a video return visit interface, an information management interface, a video analysis interface and a user management interface;
the data acquisition module comprises a camera configuration unit, a field acquisition unit, a video importing unit and a video transmission unit;
the data management module comprises a user account management unit, an athlete management unit and a video management unit;
the data analysis module comprises an action acquisition unit, a ball path detection unit and a data processing unit;
and the database module is used for storing user data, athlete data, game record data, original video data and video analysis data.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (7)

1. An intelligent auxiliary training method for badminton, which is characterized by comprising the following steps:
Constructing a binocular vision angle badminton training data set and a monocular vision angle badminton match data set;
constructing a badminton route acquisition model based on binocular viewing angles;
testing a shuttlecock path acquisition model based on a binocular view angle based on the binocular view angle shuttlecock training data set and the monocular view angle shuttlecock match data set;
performing a three-dimensional ball path position calculation test on the binocular corner badminton training data set, and comparing the results before and after ball path improvement;
estimating camera parameters based on the site feature point labels and the monocular visual angle badminton match data set to obtain a camera parameter estimation result;
constructing a badminton player tracking model, and tracking the badminton player through human bounding box detection and badminton court boundary marking;
constructing a technical action acquisition model suitable for a binocular corner badminton training data set;
constructing a technical action acquisition model applicable to a monocular visual angle badminton match data set;
evaluating the monocular visual angle badminton match data set based on the time sequence human body posture estimation network;
the auxiliary training data are generated by performing binocular vision angle video acquisition on badminton athletes, and performing athlete physical analysis, athlete technical and tactical analysis and competition technology video segmentation angles on the badminton athletes.
2. The intelligent training aid for badminton according to claim 1, wherein the constructing a binocular view-based badminton route acquisition model comprises:
tracking a two-dimensional sphere target;
fusing binocular visual angle position information;
and carrying out smooth improvement on the three-dimensional ball road, and respectively obtaining a smooth ball road through detection of obvious noise and estimation of curve fitting technology.
3. The intelligent training aid for badminton according to claim 1, wherein the badminton route acquisition model comprises a badminton target tracking convolutional neural network, the training is carried out by using an Adadelta optimizer with an initial learning rate of 1.0 to optimize network parameters, a loss function is used for representing the probability that each pixel point is the center of a sphere by 0 or 1, the Epochs is set to 30, the tolerance error value is 4, the heat map output by the network training is a numerical value of 0 to 1, 0.5 is selected as a threshold value, each value is converted to 0 or 1, the position of the sphere is regarded as the center of the largest region in the 0-1 heat map, and if the Euclidean metric between the predicted sphere coordinates and the real labels is smaller than the given threshold value, the prediction is regarded as a true positive value.
4. The intelligent training aid for badminton according to claim 1, wherein the step of performing a three-dimensional road position calculation test on the binocular angular badminton training data set and comparing the results before and after the improvement of the road comprises:
setting a ball sending position of a player as a No. 1 position, setting a ball hitting position of the player as a No. 2 position, dividing the position of the player into a ball sending track and a ball hitting track; restoring the ball-serving track and the ball-striking track through a polar constraint principle respectively, and restoring coordinate points with space position errors on the track of the ball body, so that the two ball paths are particularly displayed on the curve of the parabola in the whole; and comparing the ball-serving track and the ball-hitting track before and after visual optimization.
5. The intelligent training aid method for badminton according to claim 1, wherein the step of estimating the camera parameters based on the site feature point label and the monocular vision badminton match data set to obtain the estimation result of the camera parameters specifically comprises:
giving an RGB image, and labeling characteristic points in each marking line of the badminton court;
estimating a camera focal length through a camera center and any two teams of feature points, and performing two-by-two traversal on feature points of the marking points to obtain the camera focal length;
Determining an internal parameter matrix and an external parameter matrix of a camera, taking a focal length and a characteristic point of the camera as inputs, calculating an external parameter rotation matrix through a solvePnP algorithm, calculating coordinates of a three-dimensional coordinate of the characteristic point under a world coordinate system projected onto an image coordinate system, calculating a projection error, and selecting the focal length of the camera and the external parameter rotation matrix corresponding to the minimum value of the projection error as a camera parameter estimation result.
6. The intelligent auxiliary training method for badminton, according to claim 1, characterized in that said step of constructing a technical action acquisition model suitable for binocular angular badminton training data sets, comprises the steps of obtaining two-dimensional human body joints of athletes through a two-dimensional joint point estimation network of human bodies at binocular viewing angles, performing position information fusion according to an internal and external parameter matrix of a camera, obtaining three-dimensional human body gestures of the athletes in a world coordinate system, and forming technical actions of the athletes on continuous images.
7. An intelligent training aid system for badminton, the system comprising:
the system comprises a user interface module, a display module and a display module, wherein the user interface module comprises an acquisition interface, a video return visit interface, an information management interface, a video analysis interface and a user management interface;
The data acquisition module comprises a camera configuration unit, a field acquisition unit, a video importing unit and a video transmission unit;
the data management module comprises a user account management unit, an athlete management unit and a video management unit;
the data analysis module comprises an action acquisition unit, a ball path detection unit and a data processing unit;
and the database module is used for storing user data, athlete data, game record data, original video data and video analysis data.
CN202310922353.6A 2023-07-26 2023-07-26 Intelligent auxiliary training method and system for badminton Pending CN116958872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310922353.6A CN116958872A (en) 2023-07-26 2023-07-26 Intelligent auxiliary training method and system for badminton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310922353.6A CN116958872A (en) 2023-07-26 2023-07-26 Intelligent auxiliary training method and system for badminton

Publications (1)

Publication Number Publication Date
CN116958872A true CN116958872A (en) 2023-10-27

Family

ID=88461503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310922353.6A Pending CN116958872A (en) 2023-07-26 2023-07-26 Intelligent auxiliary training method and system for badminton

Country Status (1)

Country Link
CN (1) CN116958872A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649597A (en) * 2024-01-29 2024-03-05 吉林大学 Underwater three-dimensional hand gesture estimation method and system based on event camera

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649597A (en) * 2024-01-29 2024-03-05 吉林大学 Underwater three-dimensional hand gesture estimation method and system based on event camera

Similar Documents

Publication Publication Date Title
Zimmermann et al. Freihand: A dataset for markerless capture of hand pose and shape from single rgb images
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
CN110637323B (en) Method, device and system for part-based tracking
Sun et al. Conditional regression forests for human pose estimation
Dockstader et al. Multiple camera tracking of interacting and occluded human motion
EP2707834B1 (en) Silhouette-based pose estimation
US20060269145A1 (en) Method and system for determining object pose from images
CN109948526A (en) Image processing method and device, detection device and storage medium
US20110208685A1 (en) Motion Capture Using Intelligent Part Identification
CN111460976B (en) Data-driven real-time hand motion assessment method based on RGB video
US11790652B2 (en) Detection of contacts among event participants
CN116958872A (en) Intelligent auxiliary training method and system for badminton
CN115100744A (en) Badminton game human body posture estimation and ball path tracking method
CN113808047A (en) Human motion capture data denoising method
CN115482523A (en) Small object target detection method and system of lightweight multi-scale attention mechanism
EP3945463A1 (en) A computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay
Le Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset
CN113988269A (en) Loop detection and optimization method based on improved twin network
Lin et al. Overview of 3d human pose estimation
Xue et al. Event-based non-rigid reconstruction from contours
US11640713B2 (en) Computing system and a computer-implemented method for sensing gameplay events and augmentation of video feed with overlay
US11461956B2 (en) 3D representation reconstruction from images using volumic probability data
Almasi Human movement analysis from the egocentric camera view
Raskin et al. Dimensionality reduction for articulated body tracking
Wang et al. 3D-2D spatiotemporal registration for sports motion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination