CN113255489A

CN113255489A - Multi-mode diving event intelligent evaluation method based on label distribution learning

Info

Publication number: CN113255489A
Application number: CN202110524112.7A
Authority: CN
Inventors: 陈嘉顺; 陈杰; 田龙岗; 宋毅恒
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-13
Anticipated expiration: 2041-05-13
Also published as: CN113255489B

Abstract

The invention discloses a brand-new multi-mode action quality assessment method based on label distribution learning, which innovatively models a motion video in space and time and applies the method to the construction of an intelligent assessment system of a diving event. Based on a bimodal puffed three-dimensional convolutional network (I3D ConvNet, I3D), one path of optical Flow is added, the space-time characteristics of the diving videos in RGB and Flow modes are extracted and fused to form global characteristics, and the coherent attributes of the whole action can be better captured. Inputting the global features into the full-connection layer, outputting the probability Distribution of all judgment scoring mean values by using Label Distribution Learning (LDL), selecting the maximum probability score therein, and multiplying the maximum probability score by the difficulty coefficient to obtain the final diving score. The invention fully utilizes the difficulty coefficient concept in the diving rule, improves the output strategy of the full connection layer, and innovatively solves the diving score.

Description

Multi-mode diving event intelligent evaluation method based on label distribution learning

Technical Field

The invention relates to an intelligent assessment system for a diving event, in particular to a multi-mode intelligent assessment method for a diving event based on label distribution learning, and belongs to the field of deep learning and computer vision.

Background

The Action Quality Assessment (AQA) technology can be widely applied to various realistic scenes, such as intelligent sports officials, surgical skill scoring, motor therapy guidance, and the like. Particularly in a diving event, the accurate intelligent evaluation and scoring system can greatly enhance the objectivity and effectiveness of the game score, thereby promoting the fairness of the game. There are reports of questions asked by officials. Therefore, the demand for developing an intelligent evaluation system for diving events is increasing day by day, and the invention aims to generate an accurate and objective result based on the field video intelligence and prevent the sports health baby who is trained for a project for tens of years with diligence from being treated unfairly.

Although some researchers have developed research and made some successful researches on an intelligent evaluation system for diving events, particularly in the application aspect of diving. For example, a three-dimensional Convolutional neural network (C3D ) performs well in motion recognition-related tasks because it captures the spatiotemporal features of appearance and salient motion. Paritosh Parmar et al consider the diving score as a multitask regression problem and collected a maximal multitask AQA dataset to date, including 1412 diving samples, multitask labeling including diving scores, map action recognition and summary generation. Wang et al use Spatial Convolutional Networks (SCNs) and Temporal Convolutional Networks (TCNs) with two-stage strategy training. In particular, in time convolution, a mechanism of attention is introduced to fuse features according to their impact on the overall performance in the time dimension. These algorithms seem to be effective, but do not make full use of the objective parameters given by the data set, i.e. the difficulty coefficients, and do not have sufficient confidence in the fluency of the overall action and the details of the final splash.

In a diving game, a referee rates scores based on the athlete's run-up (i.e., board, running board), take-off, air, and water-in motions. The referee team was set up in two categories, 5 referees and 7 referees. The first is that after all 5 judges mark out the score, delete the highest and lowest invalid score first, the score sum of the remaining 3 judges multiplies the difficulty coefficient of the sportsman's jump action, get the real score of the action; the second method is to delete the highest and lowest invalid scores after all 7 judges have scored, and multiply the sum of the scores of the remaining 5 judges by the difficulty factor of the player's jump to obtain the true score of the action, but finally divide by 5 and multiply by 3. The rule shows that the judgment preferences of each judge are different, the given scores have differences, but the scores have consistency according to the level displayed in the diving process of athletes, the characteristic can be naturally regarded as normal probability distribution, the center of the normal probability distribution is the average value of the scores of all the judges, and the average value can be perfectly calculated by means of label distribution learning.

In addition, it can be found by observing the diving rules that the difficulty factor is an objective value that indicates how easily the athlete is performing the action. The international diving competition rule determines a corresponding difficulty coefficient for each diving action, and determines the numerical value of each diving action according to the difference of action groups, competition items (diving boards and diving platforms), instrument heights, action postures, the number of turns of the tumbling bodies and the like. By the inspiration, the final diving score can be predicted only by predicting the scoring average values of all judges and multiplying the scoring average values by the action difficulty coefficient.

Disclosure of Invention

In order to solve the above problems, the present invention provides an intelligent assessment method for a multi-modal diving event based on marker distribution learning, which provides a high-quality and high-efficiency motion quality assessment technique, and reasonably gives a diving result to solve the above problems in the background.

The technical scheme is as follows: in order to achieve the aim, the invention provides a multi-modal intelligent assessment method for the diving event based on marker distribution learning, which creates space-time modeling of a motion video and applies the space-time modeling to the construction of an intelligent assessment system for the diving event; inputting the global characteristics into a full connection layer, using label distribution learning, outputting probability distribution of all judgment scoring mean values, selecting the maximum probability score, and multiplying the maximum probability score by a difficulty coefficient to obtain the final diving score.

Wherein the global feature extraction method is based on a bimodal puffed three-dimensional convolution network I3D, adding a light stream, extracting the space-time characteristics of the diving videos of the RGB and Flow modes, and fusing to form a full-local characteristic f_global。

Given an L-frame RGB-modal diving video

Considering that the light flow is the instantaneous speed of pixel motion of a space moving object on a viewing imaging plane, the motion information of the object between adjacent frames can be effectively represented, and the light flow is firstly converted into an optical flow mode by using a classic TV-L1 optical flow algorithm

Then use the sliding window strategy to put V_RGBAnd V_FlowAre divided into N segments that overlap each other. V_RGBAnd V_FlowThe fragments are respectively input into an I3D model, and then are respectively output a pair of RGB and Flow fragment characteristics through an MLP block consisting of 3 full-connection layers

Note that the weights of these 3 fully connected layers are shared by all the fragments. Then, for the extracted segment features, respectively fusing the extracted segment features into f by using time domain average pooling_RGBAnd f_FlowAnd further average mixing and fusing the global characteristics f_global。

Will f is_globalIntroducing into a full-link layer, and predicting the probability distribution of the scoring mean value (diving score divided by difficulty coefficient) of all judges based on LDL; firstly, generating a score probability distribution according to prior knowledge of judgment scoring to represent real distribution, and continuously iterating and optimizing network parameters by minimizing the loss functions of predicted probability distribution and real probability distribution to improve the prediction accuracy; finally, multiplying the output of the network by a difficulty coefficient to predict the score of the diving athlete;

in trainingAnd in the stage, giving a diving video with a mark score S and a difficulty coefficient DD, and then giving a judgment mark mean value S_JFirst, as shown in formula (1), S is the mean value_JAnd a gaussian function of the standard deviation σ as a prior probability distribution.

σ is a hyper-parameter for evaluating S_JDegree of uncertainty. The fraction interval is uniformly distributed to a discrete component number set c ═ c₁,c₂,…,c_m]Using the vector g ═ g (c)₁),g(c₂),…,g(c_m)]Represents S_JThe degree factor on each score. Final S_JIs equal to [ p (c) ]₁),p(c₂),…, p(c_m)]By normalizing g, the calculation of each element in p is embodied in formula (2).

Will f is_globalInput into a fully-connected layer and a softmax layer, mapped to a probability distribution p in the m-dimension_pred＝[p_pred(c₁),p_pred(c₂),…,p_pred(c_m)]As a predictive probability distribution. Finally by calculating p_predAnd the KL divergence between p yields the loss function, equation (3):

at p_predIn the method, a score with the maximum probability value is selected as the average score S of all judges_J,pred(formula 4), and multiplying by the difficulty coefficient DD to obtain the final predicted diving fraction S_predIt is expressed as formula (5):

S_J,pred＝argmax_ci{p_pred(c₁),p_pred(c₂),…,p_pred(c_m)} (4)

S_pred＝DD×S_J,pred (5)

the invention has the following beneficial effects:

(1) the existing I3D model is fully utilized, and a path of optical flow is added on the basis, so that the coherent attribute of the whole action can be better captured, and the influence of a diving background (such as noisy crowds and railings) and other interference factors is reduced.

(2) Based on LDL, the difficulty coefficient concept in the diving rule is fully utilized, the output strategy of the full connection layer is improved, and the diving score is innovatively solved.

Drawings

Fig. 1 is an intelligent evaluation flow of diving quality of the present invention.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.

Example 1: since the action quality evaluation predicts the quality "score" as a regression problem, SRC (Spearman's rank correlation) used in the conventional literature is used as an evaluation index. The SRC ranges between-1 and 1, with higher SRC meaning that the true and predicted scores are more correlated, i.e. the closer SRC is to 1, the stronger the correlation and the weaker the correlation. In addition, an UNLV-Dive data set is adopted, 370 video materials are contained, and 10m channel water half-finals and finals of a certain sports meeting in 2012 are included. Each video contains two labels, a difficulty coefficient and a diving score. The diving videos are all shot from the same side view, and the change of the view angle is small. This dataset also divides the training and testing set into 300/70.

Example 2: the model was written on the ubuntu 16.04 system using the pytorech deep learning framework, trained iteratively 100 times (training was accelerated by an Nvidia TiTian RTX GPU). An I3D model (including RGB and Flow modalities) trained in advance on Kinetics is used as a feature extractor, which takes a 16-frame motion sequence as an input and outputs 1024-dimensional features. Each UNIV-Dive dataset is padded to 151 frames (video with insufficient number of frames to supplement all zero frames). Each video is divided into 10 segments using a sliding window strategy. Each MLP block contains two hidden layers FC (256, ReLU) and FC (128, ReLU). The network is optimized by adopting an Adam optimizer, and the learning rate is set to be 0.0001. In addition, in experiments, the invention normalizes the final total score to the range of [0,50] to ensure the consistency of the scale, and applies Gaussian distribution to obtain prior probability distribution.

Example 3: the performance of the invention was compared to several other methods of the most advanced and the results evaluated with SRC are shown in table 1. It can be seen that the present invention is superior to the existing methods in the UNLV-Dive dataset. This result demonstrates the unique advantage of the inventive framework of the present invention in terms of saltwater quality assessment.

TABLE 1 expression of different methods on UNLV-Dive

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims

1. A multi-mode diving event intelligent evaluation method based on mark distribution learning is characterized in that: the spatial-temporal modeling of the motion video is innovated, and the motion video is applied to the construction of an intelligent assessment system of the diving event; and inputting the global features into a full-connection layer, outputting probability distribution of all judgment scoring mean values by using label distribution learning, selecting the maximum probability score, and multiplying the maximum probability score by the difficulty coefficient to obtain the final diving score.

2. The intelligent multi-modal diving event assessment method based on label distribution learning according to claim 1, characterized in that: the global feature extraction method is characterized in that one path of optical Flow modal information is added on the basis of the existing bimodal expanded three-dimensional convolution network, the temporal-spatial features of the diving videos of RGB and Flow modes are extracted at the same time, and the temporal-spatial features are fused to form a global feature f_global。

3. The intelligent multi-modal diving event assessment method based on label distribution learning according to claim 2, characterized in that: given an L-frame RGB-modal diving video

Considering that the optical flow is the instantaneous speed of pixel motion of a spatial moving object on a viewing imaging plane, and can effectively represent the motion information of the object between adjacent frames, the optical flow is firstly converted into an optical flow mode by using a classical TV-L1 optical flow algorithm

Then V is divided by using sliding window strategy_RGBAnd V_FlowAre divided into N segments which are overlapped with each other;

V_RGBand V_FlowThe fragments are respectively input into an I3D model, and then are respectively output a pair of RGB and Flow fragment characteristics through an MLP block consisting of 3 full-connection layers

Note that the weights of these 3 fully-connected layers are shared by all segments; then, for the extracted segment features, respectively fusing the extracted segment features into f by using time domain average pooling_RGBAnd f_FlowAnd further average mixing and fusing the global characteristics f_global；

Will f is_globalIntroduced into the full-junction layer, and based on LDL, the scoring mean (diving score) of all judges is predictedDivided by the difficulty factor); firstly, generating a score probability distribution according to prior knowledge of judgment scoring to represent real distribution, and continuously iterating and optimizing network parameters by minimizing the loss functions of predicted probability distribution and real probability distribution to improve the prediction accuracy; finally, multiplying the output of the network by a difficulty coefficient to predict the score of the diving athlete;

in the training stage, the diving video of the mark score S and the difficulty coefficient DD is given, and the judgment scoring mean value is S_JFirst, as shown in formula (1), S is the mean value_JAnd a gaussian function of the standard deviation σ as a prior probability distribution;

σ is a hyper-parameter for evaluating S_JDegree of uncertainty. By uniformly dispersing the fraction interval into a component number set c ═ c₁,c₂,…,c_m]Using the vector g ═ g (c)₁),g(c₂),…,g(c_m)]Denotes S_JThe degree factor on each score. Final S_JIs equal to [ p (c) ]₁),p(c₂),…,p(c_m)]The calculation of each element in p is embodied in formula (2) through normalization g;

will f is_globalInput into a fully-connected layer and a softmax layer, mapping to a m-dimensional probability distribution p_pred＝[p_pred(c₁),p_pred(c₂),…,p_pred(c_m)]As a predictive probability distribution. Finally by calculating p_predAnd the KL divergence between p yields the loss function, equation (3):

S_J,pred＝argmax_ci{p_pred(c₁),p_pred(c₂),…,p_pred(c_m)} (4)

S_pred＝DD×S_J,pred (5)。