WO2021199392A1 - 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム - Google Patents
学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム Download PDFInfo
- Publication number
- WO2021199392A1 WO2021199392A1 PCT/JP2020/015136 JP2020015136W WO2021199392A1 WO 2021199392 A1 WO2021199392 A1 WO 2021199392A1 JP 2020015136 W JP2020015136 W JP 2020015136W WO 2021199392 A1 WO2021199392 A1 WO 2021199392A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- loss
- scores
- function
- estimated
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention provides, for example, a learning device, a learning method and a learning program for learning know-how regarding a player's competition scoring method, and a score estimation device for estimating a competition score based on learning results, a score estimation method and a score.
- a learning device for learning know-how regarding a player's competition scoring method
- a score estimation device for estimating a competition score based on learning results, a score estimation method and a score.
- Non-Patent Document 1 discloses a method of performing AQA using deep learning.
- the video data of the competition taken by the athlete and the true score obtained by the official referee scoring the competition are taken as training data. ..
- the feature amount is then extracted from the moving image data included in the training data by using a deep neural network.
- the estimated score is further estimated from the extracted features.
- the loss between the estimated estimated score and the true score included in the training data is calculated.
- the weight and bias of the deep neural network are repeatedly updated so as to reduce the loss based on the calculated loss.
- Non-Patent Document 1 in addition to the regression loss indicating the loss between the estimated score and the true score, the line king loss aimed at improving the accuracy of the order between the obtained estimated scores. Is adopted.
- the order of estimated scores and the order of true scores may be interchanged due to an error in estimating scores between video data with close true score. There is.
- Non-Patent Document 1 adopts the ranking loss shown in the following equation (1) to reduce the probability of such an error and achieve accuracy higher than that of the prior art. ing.
- ReLU (x) is a function that uses x as the return value when the argument x is 0 or more, and 0 as the return value when the argument x is smaller than 0.
- ⁇ is a margin value and is a positive value. Therefore, the estimated score s i, a magnitude relationship between s j, true value score g i, if the magnitude relationship between g j is a mismatch, ranking loss in accordance with an increase in the absolute value of the difference between the estimated scores s i, s j Will increase.
- the margin value ⁇ has the effect of separating the two estimated scores s i and s j so as to have a difference at least according to the margin value ⁇ when the difference between the two estimated scores s i and s j is small. Therefore, the magnitude relationship between the estimated score s i, s j, so ranking losses occur in accordance with the magnitude of the true value score g i, margin value even if the magnitude of g j matches ⁇ ing.
- the margin value [delta] because it is a fixed value determined in advance, in a combination of all video data v i and video data v j, the same margin value [delta] It will be applied.
- the margin value ⁇ is a parameter adopted for the purpose of having a difference at least according to the margin value when the difference between the two estimated scores s i and s j is small.
- an object of the present invention is to provide a technique capable of learning know-how regarding a player's competition scoring method more accurately than the conventional technique and obtaining a more accurate estimated score.
- One aspect of the present invention is training in which a moving image data recording an athlete's movement during a competition and a plurality of true value scores, which are scores scored by a referee for the competition recorded in the moving image data, are combined. It has an input unit that captures data and a function approximator that approximates a function based on parameters, and estimates the estimated score of the video data by giving the video data captured by the input unit to the function approximator as input.
- a first loss function for obtaining a regression loss between each of the estimation unit, each of the plurality of estimated scores, and each of the true value scores corresponding to each of the estimated scores, and two different moving image data.
- a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations.
- the regression loss and the second loss function which are the outputs of the first loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
- This is a learning device including a parameter updating unit that updates the parameters by performing a learning process that reduces each of the ranking losses, which is the output of the above.
- one aspect of the present invention comprises a learning process of the input unit that captures moving image data recording the movements of the athlete during the competition, the learning device according to claim 1, or the learning device according to claim 2. It has a function approximation device that approximates a function based on the obtained learned parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input unit to the function approximation device as an input. It is a score estimation device including an estimation unit.
- one aspect of the present invention combines a moving image data recording an athlete's movement during a competition and a plurality of true value scores which are scores scored by a referee for the competition recorded in the moving image data.
- the estimated score of the moving image data is estimated, and each of the plurality of estimated scores and the estimated A first loss function for finding the regression loss between each of the true scores corresponding to each of the scores, and two estimated scores and two corresponding to each of all combinations of the two different video data.
- a learning process is performed to reduce each of the regression loss, which is the output of the first loss function, and the ranking loss, which is the output of the second loss function. This is a learning method for updating the parameters.
- one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained.
- An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said.
- a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data.
- the regression loss and the second which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
- This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
- one aspect of the present invention was obtained by taking in moving image data recording the movements of the athlete during the competition and performing the learning process of the learning device according to claim 1 or the learning device according to claim 2.
- This is a score estimation method for estimating the estimated score of the moving image data by giving the moving image data taken into a function approximator that approximates the function based on the learned parameters as an input.
- one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained.
- An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said.
- a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data.
- the regression loss and the second which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
- This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
- the present invention it is possible to learn the know-how regarding the method of scoring the competition of the athlete more accurately than the conventional technique, and to obtain a more accurate estimated score.
- FIG. 1 is a block diagram showing an internal configuration of the learning device 1 according to the first embodiment.
- the learning device 1 includes a training data storage unit 10, an input unit 11, an estimation unit 50, a parameter update unit 14, a feature amount extraction parameter storage unit 15, and a score estimation parameter storage unit 16.
- the estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
- the training data storage unit 10 stores in advance a plurality of training data in which each of the plurality of moving image data and each of the plurality of true value scores are combined.
- Each of the plurality of video data is generated, for example, by taking a picture of the movement performed by the athlete during the competition with a camera or the like.
- the competition is a sports competition in which, for example, there is a quantitative scoring standard for techniques such as high diving and gymnastics.
- a player is, for example, a player who plays the sport.
- Each of the plurality of true value scores is a score scored by an official referee in advance for the competition of the athlete recorded in the video data corresponding to each.
- the input unit 11 repeatedly reads n training data from the training data storage unit 10.
- n is an integer of 2 or more, and is a batch size when the learning process described below is performed.
- the number of training data stored by the training data storage unit 10 is assumed to be a multiple of n, that is, n ⁇ m (where m is an integer of 1 or more).
- any one video data included in the n training data shown in v i or v j indicates a true value score corresponding to the moving image data v i as g i, moving image data v j
- the true score corresponding to is shown as g j.
- the input unit 11 outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the n read training data to the parameter update unit 14.
- the feature amount extraction parameter storage unit 15 stores the feature amount extraction parameters that serve as weights and biases applied to the first function approximator included in the feature amount extraction unit 12.
- the feature amount extraction unit 12 has a first function approximation device, and applies the feature amount extraction parameter stored in the feature amount extraction parameter storage unit 15 to the first function approximation device.
- the first function approximator approximates the function corresponding to the feature amount extraction parameter by applying the feature amount extraction parameter.
- Feature amount extraction unit 12 extracts the feature amount of video data v i by providing as input video data v i in which the input unit 11 outputs to the first function approximator.
- the first function approximation is any neural network for extracting a feature from the moving image data v i, for example, Non-Patent Document 1 Fig.
- the score estimation parameter storage unit 16 stores score estimation parameters that serve as weights and biases to be applied to the second function approximator included in the score estimation unit 13.
- the score estimation unit 13 has a second function approximation device, and applies the score estimation parameters stored in the score estimation parameter storage unit 16 to the second function approximation device.
- the second function approximator approximates the function corresponding to the score estimation parameter by applying the score estimation parameter.
- Score estimator 13 estimates the estimated score s i by applying the second function approximator the feature amount by the feature extraction unit 12 has extracted as an input.
- the second function approximator is an arbitrary neural network that estimates the estimated score from the features, for example, Fig.
- a neural network having a two-stage fully connected layer in which a ReLU layer and a Dropout layer are connected (hereinafter referred to as “fully connected layer 131”) or the like is applied to the subsequent stage shown in 1.
- the parameter update unit 14 has n true score g 1 to n output by the input unit 11, n estimated scores s 1 to n estimated by the score estimation unit 13, and a predetermined first loss function. Based on, the regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n is calculated.
- the first loss function for example, MSE (Mean Square Error) shown in the following equation (2) for calculating the regression loss is applied.
- the parameter update unit 14 previously sets two estimated scores s i , s j and two true score g i , g j corresponding to each of all combinations of the two different moving image data v i , v j. based on the second loss function defined, the two estimated score s i, s 2 having a true value of the ranking loss indicating an error degree of order of j score g i, in consideration of the magnitude of the difference g j calculate.
- the absolute value of the difference between the two true score g i and g j is replaced with the margin value ⁇ . It has been applied.
- the sign (x) function is a function whose return value is the sign of the argument x
- ReLU (x) is the case where the argument x is 0 or more.
- X is the return value, and when the argument x is smaller than 0, 0 is the return value.
- the parameter update unit 14 performs learning processing so as to reduce the calculated regression loss, that is, Loss1, which is the output value of the equation (2), and the calculated ranking loss, that is, Loss2, which is the output value of the equation (3).
- the parameter update unit 14 calculates a new parameter for feature quantity extraction and a new parameter for score estimation by performing the learning process.
- the parameter update unit 14 updates the contents of the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 based on the calculated new feature amount extraction parameter and the new score estimation parameter.
- FIG. 2 is a flowchart showing the flow of the learning process performed by the learning device 1.
- the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 store the initial value feature amount extraction parameters and the initial value score estimation parameters in advance, respectively.
- the feature amount extraction unit 12 reads out the feature amount extraction parameter from the feature amount extraction parameter storage unit 15, and transfers the read feature amount extraction parameter to the neural network of the moving image feature amount extraction layer 121 which is the first function approximation device. Apply (step S1).
- the score estimation unit 13 reads out the score estimation parameters from the score estimation parameter storage unit 16, and applies the read score estimation parameters to the neural network of the fully connected layer 131, which is the second function approximation device (step S2). ..
- the input unit 11 reads the first n training data from the training data storage unit 10. As shown in FIG. 3, the input unit 11 outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the read training data to the parameter update unit 14. The parameter update unit 14 takes in n true value scores g 1 to n output by the input unit 11 (step S3).
- steps S4, S5 are repeated (loop L1s ⁇ L1e).
- Feature amount extraction unit 12 gives the video feature quantity extraction layer 121 video data v i as an input, as shown in FIG. 3, obtains a feature amount of video data v i as the output of the video feature quantity extraction layer 121.
- Feature amount extraction unit 12 outputs the feature quantity of the acquired moving image data v i in the score estimation unit 13 (step S4).
- Score estimator 13 gives the total binding layer 131 as inputs the feature quantity of the moving image data v i, as shown in FIG. 3, to obtain the estimated score s i of the moving image data v i as the output of the total binding layer 131. Score estimator 13 outputs the estimated score s i of the acquired moving image data v i in the parameter update unit 14 (step S5).
- n moving images are applied to the moving image feature amount extraction layer 121 and the fully connected layer 131, respectively, with the same feature amount extraction parameters and the same score estimation parameters.
- the processing of steps S4 and S5 is performed n times with each of the data v1 to n as an input.
- the regression loss Loss 1 is calculated by the equation (2) based on g 1 to n (step S6).
- the parameter update unit 14 calculates the ranking loss Loss2 by the equation (3) based on n estimated scores s 1 to n and n true score g 1 to n (step S7).
- the parameter update unit 14 calculates the evaluation loss Loss by, for example, the following equation (4) (step S8).
- ⁇ 1 and ⁇ 1 are ⁇ 1 > 0 and ⁇ 1 > 0, which are constants arbitrarily determined to balance the two losses.
- 2 is a term of L2-regulation.
- the parameter update unit 14 determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S9). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
- the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition (step S9, Yes).
- the parameter update unit 14 ends the process.
- the parameter update unit 14 uses, for example, an error backpropagation method so as to reduce the regression loss Loss1 and the ranking loss Loss2. By the learning process used, a new parameter for feature quantity extraction and a new parameter for score estimation are calculated.
- the parameter update unit 14 writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 and updates the feature amount extraction parameter.
- the parameter update unit 14 writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter (step S10).
- step S1 the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S2 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.
- the input unit 11 reads out the next n training data from the training data storage unit 10 in step S3 to be performed again. In the process of repeating the process, when the processes of steps S4 and S5 are performed for all the training data stored in the training data storage unit 10, the input unit 11 again starts from the first n training data. The reading from the training data storage unit 10 is repeated in order.
- step S9 when the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition, the parameter storage unit 15 for feature quantity extraction and the parameter storage unit 16 for score estimation each have a sufficient regression loss Loss1. Then, the learned feature amount extraction parameter and the learned score estimation parameter in the state where the ranking loss Loss2 is reduced are recorded.
- the parameter update unit 14 is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. Based on the first loss function for the regression loss and the two estimated scores and the two true scores corresponding to each of all combinations of two different video data, the degree of misorder between the two estimated scores. The output of the first loss function based on the second loss function that finds the ranking loss and corrects the ranking loss by taking into account the magnitude of the difference between the two true scores.
- the function approximator of the estimation unit 50 By performing learning processing to reduce each of the regression loss, which is the output of the second loss function, and the ranking loss, which is the output of the second loss function, the function approximator of the estimation unit 50 (first function approximator and second function approximator).
- the parameters applied to, that is, the parameters for feature quantity extraction and the parameters for score estimation are updated.
- the learning device 1 provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
- the formula (3) is used as the ranking loss instead of the formula (1) adopted by the technique disclosed in Non-Patent Document 1.
- the effect of the equation (3) will be described separately for each case.
- the estimated score s i, and the magnitude of s j, true value score g i, if the magnitude of g j are coincident is better to use Equation (3) than using equation (1), more precisely estimated score s i, the absolute value of the difference between s j, it is possible to perform a learning process to approach the absolute value of the difference between the true value score g i, g j.
- the estimated score s i when the difference between the s j is small, better to use Equation (3) is, according to the magnitude of abs (g j -g i), the estimated score s i, the s j it is possible to increase the absolute value of the difference, more accurately estimated score s i, the absolute value of the difference between s j, is possible to perform learning processing to approach the absolute value of the difference between the true value score g i, g j It will be possible.
- FIG. 4 is a block diagram showing an internal configuration of the score estimation device 2 according to the first embodiment.
- the score estimation device 2 includes an input unit 11-1, an estimation unit 50, an output unit 17, a learned feature amount extraction parameter storage unit 18, and a learned score estimation parameter storage unit 19.
- the estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
- step S9 shown in FIG. 2 when the parameter update unit 14 determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and The learned feature amount extraction parameters and the learned score estimation parameters are recorded in the score estimation parameter storage unit 16, respectively.
- the learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1 is completed.
- the learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1 is completed.
- Input unit 11-1 takes in arbitrary video data given from the outside.
- the input unit 11-1 outputs the captured moving image data to the feature amount extraction unit 12.
- the feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121.
- the feature amount extraction unit 12 gives the moving image data output by the input unit 11-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
- the score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131.
- the score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17.
- the output unit 17 outputs the estimated score output by the score estimation unit 13 to the outside.
- the estimation unit 50 uses the learning device 1 to obtain learned parameters (learned feature amount extraction parameters and learned score estimation parameters).
- learned parameters learned feature amount extraction parameters and learned score estimation parameters.
- Has a function approximater first function approximater and second function approximater that approximates the function based on
- the score estimation device 2 is a learned feature amount obtained by the learning process of the learning device 1 that learns the know-how regarding the scoring method of the official referee more accurately than the technique described in Non-Patent Document 1. Since the estimated score for arbitrary moving image data can be obtained based on the extraction parameter and the obtained learned score estimation parameter, it is possible to obtain a more accurate estimated score.
- FIG. 5 is a block diagram showing an internal configuration of the learning device 1a according to the second embodiment.
- the same configurations as those of the learning device 1 of the first embodiment are designated by the same reference numerals, and different configurations will be described below.
- the learning device 1a includes a training data storage unit 10a, an input unit 11a, an estimation unit 50a, a parameter update unit 14a, a feature amount extraction parameter storage unit 15, a score estimation parameter storage unit 16, and a class estimation parameter storage unit 21. ..
- the estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
- the training data storage unit 10a stores in advance a plurality of training data in which each of the plurality of moving image data, each of the plurality of true value scores, and each of the plurality of true value class labels are combined.
- a plurality of video data are classified into a plurality of predetermined classes based on the contents recorded in each video data.
- the class is a type of competition having different scoring criteria such as high diving and gymnastics.
- the true value class label is identification information indicating the class to which the corresponding moving image data belongs by classification.
- the input unit 11a repeatedly reads n training data from the training data storage unit 10a.
- n is an integer of 2 or more, and is a batch size when the learning process described below is performed.
- the number of training data stored by the training data storage unit 10a is assumed to be a multiple of n, that is, n ⁇ m (where m is an integer of 1 or more).
- any one video data included in the n training data shown in v i or v j indicates a true value score corresponding to the moving image data v i as g i, moving image data v j
- the true score corresponding to is shown as g j.
- the input unit 11a outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n included in the n read training data and n true value class labels k 1 to n to the parameter update unit 14a.
- the class estimation parameter storage unit 21 stores class estimation parameters that serve as weights and biases to be applied to the third function approximator of the class estimation unit 20.
- the class estimator 20 has a third function approximation device, and applies the class estimation parameters stored in the class estimation parameter storage unit 21 to the third function approximation device.
- the third function approximator approximates the function corresponding to the class estimation parameter by applying the class estimation parameter.
- Class estimator 20 estimates the estimated class c i by applying the third function approximator the feature amount by the feature extraction unit 12 has extracted as an input.
- the estimated class c i is the information indicated by the probability of each class, by reference to the estimated class c i, to identify whether the probability of belonging to any corresponding moving image data v i class higher Can be done.
- the third function approximator is an arbitrary neural network that estimates the estimation class from the features.
- a neural network of a fully connected layer in which the Softmax layer is connected in the subsequent stage hereinafter, “fully connected layer + Softmax layer”). 201 ”) and the like are applied.
- the parameter update unit 14a has n true value scores g 1 to n output by the input unit 11a and n estimated scores estimated by the score estimation unit 13. Regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n based on s 1 to n and the first loss function represented by the above equation (2). Is calculated.
- the parameter update unit 14a has n true value class labels k 1 to n output by the input unit 11a, n estimation classes c 1 to n estimated by the class estimation unit 20, and a third predetermined class. Based on the loss function of, the class loss between each of the estimated classes s 1 to n and each of the true value class labels k 1 to n is calculated.
- the parameter update unit 14a two different moving image data v i, v all combinations of each corresponding to the two estimated scores s i of j, s j, two true value score g i, g j and two
- the ranking loss indicating the degree of error in the order of the two estimation scores s i and s j is the two true scores g i and g. considering the size of the difference between j, and two putative class c i, it is calculated by considering the correlation between c j.
- the loss function represented by the following equation (6) is applied.
- correlation is two putative class c i, a correlation coefficient indicating a degree of similarity c j.
- Spearman's rank correlation coefficient obtained by the equation (7) is applied as the correlation coefficient.
- Y is the number of classes as in equation (5).
- CR i, y is the order of class y in estimated class c i.
- the probability is 1st, the probability of belonging to Class1 is 2nd, and the probability of belonging to Class3 is 3rd.
- the parameter update unit 14a includes the calculated regression loss, that is, Loss1, which is the output value of the equation (2), the calculated class loss, that is, Loss3, which is the output value of the equation (5), and the calculated ranking loss, that is, the equation ( The learning process is performed so as to reduce the class 4 which is the output value of 6).
- the parameter update unit 14a calculates a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter by learning processing.
- the parameter update unit 14a stores the feature quantity extraction parameter storage unit 15 and the score estimation parameter storage based on the calculated new feature quantity extraction parameter, new score estimation parameter, and new class estimation parameter. The contents of the unit 16 and the parameter storage unit 21 for class estimation are updated.
- FIG. 6 is a flowchart showing the flow of the learning process performed by the learning device 1a.
- the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 have an initial value feature amount extraction parameter, an initial value score estimation parameter, and an initial value score estimation parameter, respectively.
- Initial value class estimation parameters are stored in advance.
- steps S21 and S22 the same processing as steps S1 and S2 of the first embodiment shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13.
- the class estimation unit 20 reads out the class estimation parameters from the class estimation parameter storage unit 21, and applies the read class estimation parameters to the neural network of the fully connected layer + Softmax layer 201, which is a third function approximation device (step). S23).
- the input unit 11a reads the first n training data from the training data storage unit 10a. As shown in FIG. 7, the input unit 11a outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n and n true value class labels k 1 to n included in the read training data to the parameter update unit 14a.
- the parameter update unit 14a takes in n true value scores g 1 to n and n true value class labels k 1 to n output by the input unit 11a (step S24).
- n are each the number of moving image data v 1 ⁇ n about video data v i, the processing of step S25, S26, S27 are repeated (loop L2s ⁇ L2e).
- step S26 the same processing as in steps S4 and S5 shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13.
- step S26 the score estimation unit 13 outputs the estimated score s i obtained in the parameter update unit 14a.
- Class estimator 20 gives the total binding layer + Softmax layer 201 as inputs the feature quantity of the moving image data v i, as shown in FIG. 7, the estimated class c i of the moving image data v i as the output of the total binding layer + Softmax layer 201 get. Class estimator 20 outputs the estimated class c i of the acquired moving image data v i in the parameter update unit 14a (step S27).
- the same feature amount extraction parameter, the same score estimation parameter, and the same class estimation parameter are used in the moving image feature amount extraction layer 121, the fully connected layer 131, and the like, respectively.
- the processes of steps S25, S26, and S27 are performed n times with each of the n moving image data v1 to n as an input.
- step S28 the same processing as in step S6 shown in FIG. 2 is performed by the parameter update unit 14a.
- the parameter update unit 14a captures the n estimation classes c1 to n estimated by the class estimation unit 20
- the fetched n estimation classes c1 to n and the n true value classes captured in step S24 Based on the labels k1 to n , the class loss Loss3 is calculated by the equation (5) (step S29).
- the parameter update unit 14a has a ranking loss according to the equation (6) based on n estimated scores s 1 to n , n true score g 1 to n , and n estimated classes c 1 to n. Score4 is calculated (step S30).
- the parameter update unit 14a calculates the evaluation loss Loss by, for example, the following equation (8) (step S31).
- ⁇ 2 , ⁇ 2 and ⁇ 2 are ⁇ 2 > 0, ⁇ 2 > 0, and ⁇ 2 > 0, and are constants arbitrarily determined to balance the three losses. be. Further,
- 2 is a term of L2-regulation.
- the parameter update unit 14a determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S32). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
- the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition (step S32, Yes).
- the parameter update unit 14a ends the process.
- the parameter update unit 14a reduces, for example, the error reverse so as to reduce the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4.
- a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter are calculated by learning processing using a propagation method or the like.
- the parameter update unit 14a writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 to update the feature amount extraction parameter.
- the parameter update unit 14a writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter.
- the parameter update unit 14a writes the calculated new class estimation parameter to the class estimation parameter storage unit 21 to update the class estimation parameter (step S33).
- step S21 to be performed again the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121.
- step S22 performed again the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.
- step S23 to be performed again the class estimation unit 20 reads the updated class estimation parameters from the class estimation parameter storage unit 21 and applies them to the fully connected layer + Softmax layer 201.
- the input unit 11a reads out the next n training data from the training data storage unit 10a in step S24 to be performed again.
- the input unit 11a again performs the first n trainings. Reading from the training data storage unit 10a is repeated in order from the data.
- step S32 when the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 In each case, the parameters for extracting the learned feature amount in the state where the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4 are sufficiently small, the learned score estimation parameters, and the trained class estimation parameters are used. Will be recorded.
- the parameter update unit 14a is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores.
- the ranking loss indicating the degree of order error between the two estimated scores is obtained.
- the function approximation device of the estimation unit 50a is performed by performing learning processing to reduce each of the regression loss which is the output of the function, the class loss which is the output of the third loss function, and the ranking loss which is the output of the fourth loss function.
- the parameters applied to (the first function approximation device, the second function approximation device, and the third function approximation device), that is, the feature quantity extraction parameter, the score estimation parameter, and the class estimation parameter are updated.
- the learning device 1a provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
- the learning device 1a of the second embodiment has the following effects in addition to the effects of the learning device 1 of the first embodiment.
- the constraint of ranking loss can be strengthened for similar competitions, and conversely, the constraint of ranking loss can be weakened for competitions that are not similar. be able to.
- the learning apparatus 1a taking into account the type of differences in the competition Therefore, since the learning process is performed, it becomes possible to learn the know-how regarding the scoring method of the official referee more accurately than the learning device 1.
- FIG. 8 is a block diagram showing an internal configuration of the score estimation device 2a according to the second embodiment.
- the score estimation device 2a includes an input unit 11a-1, an estimation unit 50a, an output unit 17a, a learned feature amount extraction parameter storage unit 18, a learned score estimation parameter storage unit 19, and a learned class estimation parameter storage unit 22.
- the estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
- step S32 shown in FIG. 6 when the parameter update unit 14a determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and , The score estimation parameter storage unit 16 and the class estimation parameter storage unit 21 each record a learned feature amount extraction parameter, a learned score estimation parameter, and a learned class estimation parameter. Will be done.
- the learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1a is completed.
- the learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1a is completed.
- the class estimation parameter storage unit 22 stores in advance the learned class estimation parameters recorded in the class estimation parameter storage unit 21 when the learning process of the learning device 1a is completed.
- the input unit 11a-1 takes in arbitrary video data given from the outside.
- the input unit 11a-1 outputs the captured moving image data to the feature amount extraction unit 12.
- the feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121.
- the feature amount extraction unit 12 gives the moving image data output by the input unit 11a-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
- the score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131.
- the score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17a.
- the class estimation unit 20 reads the learned class estimation parameters from the learned class estimation parameter storage unit 22, and applies the learned class estimation parameters read out to the fully connected layer + Softmax layer 201.
- the class estimation unit 20 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer + Softmax layer 201 as an input, acquires an estimation class as an output, and outputs the acquired estimation class to the output unit 17a.
- the output unit 17a outputs the estimated score output by the score estimation unit 13 to the outside, and outputs the estimation class output by the class estimation unit 20 to the outside.
- the class estimation unit 20 and the learned class estimation parameter storage unit 22 may not be provided.
- the estimation unit 50a uses the learning device 1a to obtain learned parameters (learned feature amount extraction parameters, learned score estimation parameters). And has a function fitter (first function fitter, second function fitter, and third function fitter) that approximates the function based on the trained class estimation parameters), and the function fitter is a movie. By giving the data as an input, the estimated score of the moving image data is estimated.
- the formula (4) for calculating the valuation loss Loss of the first embodiment and the formula (8) for calculating the valuation loss Loss of the second embodiment are examples.
- an arbitrary formula is applied so as to balance the regression loss and the ranking loss, and in the second embodiment, the regression loss, the ranking loss and the class loss can be balanced. You may.
- Cross Entropy Loss as the third loss function
- another function may be applied as the third loss function.
- the Spearman's rank correlation coefficient shown in the equation (7) is applied as the correlation coefficient correlation of the equation (6)
- another correlation coefficient may be applied as the correlation coefficient correlation. ..
- the training data storage units 10, 10a are provided inside the learning devices 1, 1a, but may be provided outside the learning devices 1, 1a. .. Further, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 may also be provided outside the score estimation devices 2 and 2a. ..
- the training data storage units 10, 10a, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 store data to be stored. It is desirable to apply a non-volatile storage area because it is a storage unit.
- the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 are storage units for temporarily storing data, they have a non-volatile storage area. It may be applied or a volatile storage area may be applied.
- the first function approximation device, the second function approximation device, and the third function approximation device shown in the first and second embodiments described above are other than the neural network having the above-described configuration.
- a neural network of configuration may be applied.
- other means capable of learning processing used in machine learning may be applied.
- the first function approximation device and the second function approximation device are used.
- the approximations may be integrated to form one function approximation, or in the second embodiment, the first function approximation, the second function approximation, and the third function approximation are integrated into one. Two function approximations may be configured.
- the learning devices 1, 1a and the score estimation devices 2, 2a in the above-described embodiment may be realized by a computer.
- the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed.
- the term "computer system” as used herein includes hardware such as an OS and peripheral devices.
- the "computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system.
- a "computer-readable recording medium” is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).
- FPGA Field Programmable Gate Array
- 1 Learning device, 10 ... Training data storage unit, 11 ... Input unit, 12 ... Feature quantity extraction unit, 13 ... Score estimation unit, 14 ... Parameter update unit, 15 ... Feature quantity extraction parameter storage unit, 16 ... Score estimation Parameter storage unit, 50 ... estimation unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022511452A JP7352119B2 (ja) | 2020-04-02 | 2020-04-02 | 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム |
| PCT/JP2020/015136 WO2021199392A1 (ja) | 2020-04-02 | 2020-04-02 | 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/015136 WO2021199392A1 (ja) | 2020-04-02 | 2020-04-02 | 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021199392A1 true WO2021199392A1 (ja) | 2021-10-07 |
Family
ID=77930154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/015136 Ceased WO2021199392A1 (ja) | 2020-04-02 | 2020-04-02 | 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7352119B2 (https=) |
| WO (1) | WO2021199392A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2023176256A (ja) * | 2022-05-31 | 2023-12-13 | 楽天グループ株式会社 | 画像からデータを予測する方法、コンピュータシステム、及びコンピュータ可読媒体 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010061376A (ja) * | 2008-09-03 | 2010-03-18 | National Institute Of Advanced Industrial Science & Technology | 動作評価装置および動作評価方法 |
| WO2018070414A1 (ja) * | 2016-10-11 | 2018-04-19 | 富士通株式会社 | 運動認識装置、運動認識プログラムおよび運動認識方法 |
| JP2020038440A (ja) * | 2018-09-03 | 2020-03-12 | 国立大学法人 東京大学 | 動作認識方法及び装置 |
-
2020
- 2020-04-02 WO PCT/JP2020/015136 patent/WO2021199392A1/ja not_active Ceased
- 2020-04-02 JP JP2022511452A patent/JP7352119B2/ja active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2010061376A (ja) * | 2008-09-03 | 2010-03-18 | National Institute Of Advanced Industrial Science & Technology | 動作評価装置および動作評価方法 |
| WO2018070414A1 (ja) * | 2016-10-11 | 2018-04-19 | 富士通株式会社 | 運動認識装置、運動認識プログラムおよび運動認識方法 |
| JP2020038440A (ja) * | 2018-09-03 | 2020-03-12 | 国立大学法人 東京大学 | 動作認識方法及び装置 |
Non-Patent Citations (1)
| Title |
|---|
| YONGJUN LI ET AL.: "End-To-End Learning for Action Quality Assessment", PCM, 2018, pages 125 - 134, XP047486317, DOI: https://doi.org/10.1007/978-3-030-00767-6 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2023176256A (ja) * | 2022-05-31 | 2023-12-13 | 楽天グループ株式会社 | 画像からデータを予測する方法、コンピュータシステム、及びコンピュータ可読媒体 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2021199392A1 (https=) | 2021-10-07 |
| JP7352119B2 (ja) | 2023-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110147711B (zh) | 视频场景识别方法、装置、存储介质和电子装置 | |
| US12586348B2 (en) | Feature fusion for multi-modal machine learning analysis | |
| CN110267119B (zh) | 视频精彩度的评价方法及相关设备 | |
| US11055537B2 (en) | Systems and methods for determining actions depicted in media contents based on attention weights of media content frames | |
| US9846845B2 (en) | Hierarchical model for human activity recognition | |
| US10671895B2 (en) | Automated selection of subjectively best image frames from burst captured image sequences | |
| CN110942006A (zh) | 运动姿态识别方法、运动姿态识别装置、终端设备及介质 | |
| CN111062951A (zh) | 一种基于语义分割类内特征差异性的知识蒸馏方法 | |
| CN112819852A (zh) | 对基于姿态的运动进行评估 | |
| CN109344884A (zh) | 媒体信息分类方法、训练图片分类模型的方法及装置 | |
| CN111479130A (zh) | 一种视频定位方法、装置、电子设备和存储介质 | |
| Zhao et al. | Learning to acquire the quality of human pose estimation | |
| Yan et al. | Automatic annotation of tennis games: An integration of audio, vision, and learning | |
| US12397195B2 (en) | Method, device, and non-transitory computer-readable recording medium for estimating information regarding golf swing | |
| CN115131879B (zh) | 一种动作评价方法及装置 | |
| CN115240106A (zh) | 任务自适应的小样本行为识别方法及系统 | |
| US12594463B2 (en) | Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing | |
| US11922822B2 (en) | Method of scoring a move of a user and system thereof | |
| CN115905613A (zh) | 音视频多任务学习、评估方法、计算机设备及介质 | |
| JP6600288B2 (ja) | 統合装置及びプログラム | |
| WO2021199392A1 (ja) | 学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラム | |
| CN102473409A (zh) | 基准模型适应装置、集成电路、av设备、在线自适应方法以及其程序 | |
| Yang et al. | Applications of cluster-based transfer learning in image and localization tasks | |
| US20240299803A1 (en) | Method and device for estimating information about golf swing, and non-transitory computer-readable recording medium | |
| CN115512435A (zh) | 一种利用人体定位的单阶段多人人体姿态估计方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20928921 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022511452 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20928921 Country of ref document: EP Kind code of ref document: A1 |