CN113255489A - Multi-mode diving event intelligent evaluation method based on label distribution learning - Google Patents

Multi-mode diving event intelligent evaluation method based on label distribution learning Download PDF

Info

Publication number
CN113255489A
CN113255489A CN202110524112.7A CN202110524112A CN113255489A CN 113255489 A CN113255489 A CN 113255489A CN 202110524112 A CN202110524112 A CN 202110524112A CN 113255489 A CN113255489 A CN 113255489A
Authority
CN
China
Prior art keywords
diving
score
pred
global
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110524112.7A
Other languages
Chinese (zh)
Other versions
CN113255489B (en
Inventor
陈嘉顺
陈杰
田龙岗
宋毅恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110524112.7A priority Critical patent/CN113255489B/en
Publication of CN113255489A publication Critical patent/CN113255489A/en
Application granted granted Critical
Publication of CN113255489B publication Critical patent/CN113255489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a brand-new multi-mode action quality assessment method based on label distribution learning, which innovatively models a motion video in space and time and applies the method to the construction of an intelligent assessment system of a diving event. Based on a bimodal puffed three-dimensional convolutional network (I3D ConvNet, I3D), one path of optical Flow is added, the space-time characteristics of the diving videos in RGB and Flow modes are extracted and fused to form global characteristics, and the coherent attributes of the whole action can be better captured. Inputting the global features into the full-connection layer, outputting the probability Distribution of all judgment scoring mean values by using Label Distribution Learning (LDL), selecting the maximum probability score therein, and multiplying the maximum probability score by the difficulty coefficient to obtain the final diving score. The invention fully utilizes the difficulty coefficient concept in the diving rule, improves the output strategy of the full connection layer, and innovatively solves the diving score.

Description

Multi-mode diving event intelligent evaluation method based on label distribution learning
Technical Field
The invention relates to an intelligent assessment system for a diving event, in particular to a multi-mode intelligent assessment method for a diving event based on label distribution learning, and belongs to the field of deep learning and computer vision.
Background
The Action Quality Assessment (AQA) technology can be widely applied to various realistic scenes, such as intelligent sports officials, surgical skill scoring, motor therapy guidance, and the like. Particularly in a diving event, the accurate intelligent evaluation and scoring system can greatly enhance the objectivity and effectiveness of the game score, thereby promoting the fairness of the game. There are reports of questions asked by officials. Therefore, the demand for developing an intelligent evaluation system for diving events is increasing day by day, and the invention aims to generate an accurate and objective result based on the field video intelligence and prevent the sports health baby who is trained for a project for tens of years with diligence from being treated unfairly.
Although some researchers have developed research and made some successful researches on an intelligent evaluation system for diving events, particularly in the application aspect of diving. For example, a three-dimensional Convolutional neural network (C3D ) performs well in motion recognition-related tasks because it captures the spatiotemporal features of appearance and salient motion. Paritosh Parmar et al consider the diving score as a multitask regression problem and collected a maximal multitask AQA dataset to date, including 1412 diving samples, multitask labeling including diving scores, map action recognition and summary generation. Wang et al use Spatial Convolutional Networks (SCNs) and Temporal Convolutional Networks (TCNs) with two-stage strategy training. In particular, in time convolution, a mechanism of attention is introduced to fuse features according to their impact on the overall performance in the time dimension. These algorithms seem to be effective, but do not make full use of the objective parameters given by the data set, i.e. the difficulty coefficients, and do not have sufficient confidence in the fluency of the overall action and the details of the final splash.
In a diving game, a referee rates scores based on the athlete's run-up (i.e., board, running board), take-off, air, and water-in motions. The referee team was set up in two categories, 5 referees and 7 referees. The first is that after all 5 judges mark out the score, delete the highest and lowest invalid score first, the score sum of the remaining 3 judges multiplies the difficulty coefficient of the sportsman's jump action, get the real score of the action; the second method is to delete the highest and lowest invalid scores after all 7 judges have scored, and multiply the sum of the scores of the remaining 5 judges by the difficulty factor of the player's jump to obtain the true score of the action, but finally divide by 5 and multiply by 3. The rule shows that the judgment preferences of each judge are different, the given scores have differences, but the scores have consistency according to the level displayed in the diving process of athletes, the characteristic can be naturally regarded as normal probability distribution, the center of the normal probability distribution is the average value of the scores of all the judges, and the average value can be perfectly calculated by means of label distribution learning.
In addition, it can be found by observing the diving rules that the difficulty factor is an objective value that indicates how easily the athlete is performing the action. The international diving competition rule determines a corresponding difficulty coefficient for each diving action, and determines the numerical value of each diving action according to the difference of action groups, competition items (diving boards and diving platforms), instrument heights, action postures, the number of turns of the tumbling bodies and the like. By the inspiration, the final diving score can be predicted only by predicting the scoring average values of all judges and multiplying the scoring average values by the action difficulty coefficient.
Disclosure of Invention
In order to solve the above problems, the present invention provides an intelligent assessment method for a multi-modal diving event based on marker distribution learning, which provides a high-quality and high-efficiency motion quality assessment technique, and reasonably gives a diving result to solve the above problems in the background.
The technical scheme is as follows: in order to achieve the aim, the invention provides a multi-modal intelligent assessment method for the diving event based on marker distribution learning, which creates space-time modeling of a motion video and applies the space-time modeling to the construction of an intelligent assessment system for the diving event; inputting the global characteristics into a full connection layer, using label distribution learning, outputting probability distribution of all judgment scoring mean values, selecting the maximum probability score, and multiplying the maximum probability score by a difficulty coefficient to obtain the final diving score.
Wherein the global feature extraction method is based on a bimodal puffed three-dimensional convolution network I3D, adding a light stream, extracting the space-time characteristics of the diving videos of the RGB and Flow modes, and fusing to form a full-local characteristic fglobal
Given an L-frame RGB-modal diving video
Figure BDA0003065105230000031
Considering that the light flow is the instantaneous speed of pixel motion of a space moving object on a viewing imaging plane, the motion information of the object between adjacent frames can be effectively represented, and the light flow is firstly converted into an optical flow mode by using a classic TV-L1 optical flow algorithm
Figure BDA0003065105230000032
Then use the sliding window strategy to put VRGBAnd VFlowAre divided into N segments that overlap each other. VRGBAnd VFlowThe fragments are respectively input into an I3D model, and then are respectively output a pair of RGB and Flow fragment characteristics through an MLP block consisting of 3 full-connection layers
Figure BDA0003065105230000033
Figure BDA0003065105230000034
Note that the weights of these 3 fully connected layers are shared by all the fragments. Then, for the extracted segment features, respectively fusing the extracted segment features into f by using time domain average poolingRGBAnd fFlowAnd further average mixing and fusing the global characteristics fglobal
Will f isglobalIntroducing into a full-link layer, and predicting the probability distribution of the scoring mean value (diving score divided by difficulty coefficient) of all judges based on LDL; firstly, generating a score probability distribution according to prior knowledge of judgment scoring to represent real distribution, and continuously iterating and optimizing network parameters by minimizing the loss functions of predicted probability distribution and real probability distribution to improve the prediction accuracy; finally, multiplying the output of the network by a difficulty coefficient to predict the score of the diving athlete;
in trainingAnd in the stage, giving a diving video with a mark score S and a difficulty coefficient DD, and then giving a judgment mark mean value SJFirst, as shown in formula (1), S is the mean valueJAnd a gaussian function of the standard deviation σ as a prior probability distribution.
Figure BDA0003065105230000041
σ is a hyper-parameter for evaluating SJDegree of uncertainty. The fraction interval is uniformly distributed to a discrete component number set c ═ c1,c2,…,cm]Using the vector g ═ g (c)1),g(c2),…,g(cm)]Represents SJThe degree factor on each score. Final SJIs equal to [ p (c) ]1),p(c2),…, p(cm)]By normalizing g, the calculation of each element in p is embodied in formula (2).
Figure BDA0003065105230000042
Will f isglobalInput into a fully-connected layer and a softmax layer, mapped to a probability distribution p in the m-dimensionpred=[ppred(c1),ppred(c2),…,ppred(cm)]As a predictive probability distribution. Finally by calculating ppredAnd the KL divergence between p yields the loss function, equation (3):
Figure BDA0003065105230000043
at ppredIn the method, a score with the maximum probability value is selected as the average score S of all judgesJ,pred(formula 4), and multiplying by the difficulty coefficient DD to obtain the final predicted diving fraction SpredIt is expressed as formula (5):
SJ,pred=argmaxci{ppred(c1),ppred(c2),…,ppred(cm)} (4)
Spred=DD×SJ,pred (5)
the invention has the following beneficial effects:
(1) the existing I3D model is fully utilized, and a path of optical flow is added on the basis, so that the coherent attribute of the whole action can be better captured, and the influence of a diving background (such as noisy crowds and railings) and other interference factors is reduced.
(2) Based on LDL, the difficulty coefficient concept in the diving rule is fully utilized, the output strategy of the full connection layer is improved, and the diving score is innovatively solved.
Drawings
Fig. 1 is an intelligent evaluation flow of diving quality of the present invention.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
Example 1: since the action quality evaluation predicts the quality "score" as a regression problem, SRC (Spearman's rank correlation) used in the conventional literature is used as an evaluation index. The SRC ranges between-1 and 1, with higher SRC meaning that the true and predicted scores are more correlated, i.e. the closer SRC is to 1, the stronger the correlation and the weaker the correlation. In addition, an UNLV-Dive data set is adopted, 370 video materials are contained, and 10m channel water half-finals and finals of a certain sports meeting in 2012 are included. Each video contains two labels, a difficulty coefficient and a diving score. The diving videos are all shot from the same side view, and the change of the view angle is small. This dataset also divides the training and testing set into 300/70.
Example 2: the model was written on the ubuntu 16.04 system using the pytorech deep learning framework, trained iteratively 100 times (training was accelerated by an Nvidia TiTian RTX GPU). An I3D model (including RGB and Flow modalities) trained in advance on Kinetics is used as a feature extractor, which takes a 16-frame motion sequence as an input and outputs 1024-dimensional features. Each UNIV-Dive dataset is padded to 151 frames (video with insufficient number of frames to supplement all zero frames). Each video is divided into 10 segments using a sliding window strategy. Each MLP block contains two hidden layers FC (256, ReLU) and FC (128, ReLU). The network is optimized by adopting an Adam optimizer, and the learning rate is set to be 0.0001. In addition, in experiments, the invention normalizes the final total score to the range of [0,50] to ensure the consistency of the scale, and applies Gaussian distribution to obtain prior probability distribution.
Example 3: the performance of the invention was compared to several other methods of the most advanced and the results evaluated with SRC are shown in table 1. It can be seen that the present invention is superior to the existing methods in the UNLV-Dive dataset. This result demonstrates the unique advantage of the inventive framework of the present invention in terms of saltwater quality assessment.
TABLE 1 expression of different methods on UNLV-Dive
Figure BDA0003065105230000061
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims (3)

1. A multi-mode diving event intelligent evaluation method based on mark distribution learning is characterized in that: the spatial-temporal modeling of the motion video is innovated, and the motion video is applied to the construction of an intelligent assessment system of the diving event; and inputting the global features into a full-connection layer, outputting probability distribution of all judgment scoring mean values by using label distribution learning, selecting the maximum probability score, and multiplying the maximum probability score by the difficulty coefficient to obtain the final diving score.
2. The intelligent multi-modal diving event assessment method based on label distribution learning according to claim 1, characterized in that: the global feature extraction method is characterized in that one path of optical Flow modal information is added on the basis of the existing bimodal expanded three-dimensional convolution network, the temporal-spatial features of the diving videos of RGB and Flow modes are extracted at the same time, and the temporal-spatial features are fused to form a global feature fglobal
3. The intelligent multi-modal diving event assessment method based on label distribution learning according to claim 2, characterized in that: given an L-frame RGB-modal diving video
Figure FDA0003065105220000011
Considering that the optical flow is the instantaneous speed of pixel motion of a spatial moving object on a viewing imaging plane, and can effectively represent the motion information of the object between adjacent frames, the optical flow is firstly converted into an optical flow mode by using a classical TV-L1 optical flow algorithm
Figure FDA0003065105220000012
Then V is divided by using sliding window strategyRGBAnd VFlowAre divided into N segments which are overlapped with each other;
VRGBand VFlowThe fragments are respectively input into an I3D model, and then are respectively output a pair of RGB and Flow fragment characteristics through an MLP block consisting of 3 full-connection layers
Figure FDA0003065105220000013
Note that the weights of these 3 fully-connected layers are shared by all segments; then, for the extracted segment features, respectively fusing the extracted segment features into f by using time domain average poolingRGBAnd fFlowAnd further average mixing and fusing the global characteristics fglobal
Will f isglobalIntroduced into the full-junction layer, and based on LDL, the scoring mean (diving score) of all judges is predictedDivided by the difficulty factor); firstly, generating a score probability distribution according to prior knowledge of judgment scoring to represent real distribution, and continuously iterating and optimizing network parameters by minimizing the loss functions of predicted probability distribution and real probability distribution to improve the prediction accuracy; finally, multiplying the output of the network by a difficulty coefficient to predict the score of the diving athlete;
in the training stage, the diving video of the mark score S and the difficulty coefficient DD is given, and the judgment scoring mean value is SJFirst, as shown in formula (1), S is the mean valueJAnd a gaussian function of the standard deviation σ as a prior probability distribution;
Figure FDA0003065105220000021
σ is a hyper-parameter for evaluating SJDegree of uncertainty. By uniformly dispersing the fraction interval into a component number set c ═ c1,c2,…,cm]Using the vector g ═ g (c)1),g(c2),…,g(cm)]Denotes SJThe degree factor on each score. Final SJIs equal to [ p (c) ]1),p(c2),…,p(cm)]The calculation of each element in p is embodied in formula (2) through normalization g;
Figure FDA0003065105220000022
will f isglobalInput into a fully-connected layer and a softmax layer, mapping to a m-dimensional probability distribution ppred=[ppred(c1),ppred(c2),…,ppred(cm)]As a predictive probability distribution. Finally by calculating ppredAnd the KL divergence between p yields the loss function, equation (3):
Figure FDA0003065105220000023
at ppredIn the method, a score with the maximum probability value is selected as the average score S of all judgesJ,pred(formula 4), and multiplying by the difficulty coefficient DD to obtain the final predicted diving fraction SpredIt is expressed as formula (5):
SJ,pred=argmaxci{ppred(c1),ppred(c2),…,ppred(cm)} (4)
Spred=DD×SJ,pred (5)。
CN202110524112.7A 2021-05-13 2021-05-13 Multi-mode diving event intelligent evaluation method based on mark distribution learning Active CN113255489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110524112.7A CN113255489B (en) 2021-05-13 2021-05-13 Multi-mode diving event intelligent evaluation method based on mark distribution learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110524112.7A CN113255489B (en) 2021-05-13 2021-05-13 Multi-mode diving event intelligent evaluation method based on mark distribution learning

Publications (2)

Publication Number Publication Date
CN113255489A true CN113255489A (en) 2021-08-13
CN113255489B CN113255489B (en) 2024-04-16

Family

ID=77181800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110524112.7A Active CN113255489B (en) 2021-05-13 2021-05-13 Multi-mode diving event intelligent evaluation method based on mark distribution learning

Country Status (1)

Country Link
CN (1) CN113255489B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920584A (en) * 2021-10-15 2022-01-11 东南大学 Action quality evaluation method based on time perception feature learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705463A (en) * 2019-09-29 2020-01-17 山东大学 Video human behavior recognition method and system based on multi-mode double-flow 3D network
WO2020088763A1 (en) * 2018-10-31 2020-05-07 Huawei Technologies Co., Ltd. Device and method for recognizing activity in videos
CN111784121A (en) * 2020-06-12 2020-10-16 清华大学 Action quality evaluation method based on uncertainty score distribution learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020088763A1 (en) * 2018-10-31 2020-05-07 Huawei Technologies Co., Ltd. Device and method for recognizing activity in videos
CN110705463A (en) * 2019-09-29 2020-01-17 山东大学 Video human behavior recognition method and system based on multi-mode double-flow 3D network
CN111784121A (en) * 2020-06-12 2020-10-16 清华大学 Action quality evaluation method based on uncertainty score distribution learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920584A (en) * 2021-10-15 2022-01-11 东南大学 Action quality evaluation method based on time perception feature learning

Also Published As

Publication number Publication date
CN113255489B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
KR20200068545A (en) System and method for training a convolutional neural network and classifying an action performed by a subject in a video using the trained convolutional neural network
Díaz-Pereira et al. Automatic recognition and scoring of olympic rhythmic gymnastic movements
Chen et al. LSTM with bio inspired algorithm for action recognition in sports videos
CN110490109A (en) A kind of online human body recovery action identification method based on monocular vision
Chen Research on college physical education model based on virtual crowd simulation and digital media
CN114821804A (en) Attention mechanism-based action recognition method for graph convolution neural network
Tang et al. Research on sports dance movement detection based on pose recognition
CN113255489A (en) Multi-mode diving event intelligent evaluation method based on label distribution learning
Pu et al. Orientation and decision-making for soccer based on sports analytics and AI: A systematic review
Sałabun et al. Swimmer Assessment Model (SWAM): expert system supporting sport potential measurement
Nalbant et al. Literature review on the relationship between artificial intelligence technologies with digital sports marketing and sports management
CN113343774B (en) Fine-grained engineering mechanics diving action simulation and evaluation method
Jiang et al. Human-centered artificial intelligence-based ice hockey sports classification system with web 4.0
CN115985462A (en) Rehabilitation and intelligence-developing training system for children cerebral palsy
Zhang Application Analysis of Badminton Intelligence based on Knowledge Graphs
Li et al. Measuring diversity of game scenarios
Zhang et al. A hybrid neural network-based intelligent body posture estimation system in sports scenes
Hu The Application of Artificial Intelligence and Big Data Technology in Basketball Sports Training
Yuan et al. Applications of data mining in intelligent computer-aided athletic training
Tits Expert gesture analysis through motion capture using statistical modeling and machine learning
Casas Ortiz Capturing, modelling, analyzing and providing feedback in martial arts with artificial intelligence to support psychomotor learning activities
Zou et al. Research on athlete training effect evaluation based on machine learning algorithm
CN117766098B (en) Body-building optimization training method and system based on virtual reality technology
Kiciroglu Automated Human Motion Analysis and Synthesis
Du [Retracted] The Selecting Optimal Ball‐Receiving Body Parts Using Pose Sequence Analysis and Sports Biomechanics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant