WO2021199392A1 - Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program - Google Patents

Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program Download PDF

Info

Publication number
WO2021199392A1
WO2021199392A1 PCT/JP2020/015136 JP2020015136W WO2021199392A1 WO 2021199392 A1 WO2021199392 A1 WO 2021199392A1 JP 2020015136 W JP2020015136 W JP 2020015136W WO 2021199392 A1 WO2021199392 A1 WO 2021199392A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss
scores
function
estimated
score
Prior art date
Application number
PCT/JP2020/015136
Other languages
French (fr)
Japanese (ja)
Inventor
隆昌 永井
信哉 志水
草地 良規
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2022511452A priority Critical patent/JP7352119B2/en
Priority to PCT/JP2020/015136 priority patent/WO2021199392A1/en
Publication of WO2021199392A1 publication Critical patent/WO2021199392A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention provides, for example, a learning device, a learning method and a learning program for learning know-how regarding a player's competition scoring method, and a score estimation device for estimating a competition score based on learning results, a score estimation method and a score.
  • a learning device for learning know-how regarding a player's competition scoring method
  • a score estimation device for estimating a competition score based on learning results, a score estimation method and a score.
  • Non-Patent Document 1 discloses a method of performing AQA using deep learning.
  • the video data of the competition taken by the athlete and the true score obtained by the official referee scoring the competition are taken as training data. ..
  • the feature amount is then extracted from the moving image data included in the training data by using a deep neural network.
  • the estimated score is further estimated from the extracted features.
  • the loss between the estimated estimated score and the true score included in the training data is calculated.
  • the weight and bias of the deep neural network are repeatedly updated so as to reduce the loss based on the calculated loss.
  • Non-Patent Document 1 in addition to the regression loss indicating the loss between the estimated score and the true score, the line king loss aimed at improving the accuracy of the order between the obtained estimated scores. Is adopted.
  • the order of estimated scores and the order of true scores may be interchanged due to an error in estimating scores between video data with close true score. There is.
  • Non-Patent Document 1 adopts the ranking loss shown in the following equation (1) to reduce the probability of such an error and achieve accuracy higher than that of the prior art. ing.
  • ReLU (x) is a function that uses x as the return value when the argument x is 0 or more, and 0 as the return value when the argument x is smaller than 0.
  • is a margin value and is a positive value. Therefore, the estimated score s i, a magnitude relationship between s j, true value score g i, if the magnitude relationship between g j is a mismatch, ranking loss in accordance with an increase in the absolute value of the difference between the estimated scores s i, s j Will increase.
  • the margin value ⁇ has the effect of separating the two estimated scores s i and s j so as to have a difference at least according to the margin value ⁇ when the difference between the two estimated scores s i and s j is small. Therefore, the magnitude relationship between the estimated score s i, s j, so ranking losses occur in accordance with the magnitude of the true value score g i, margin value even if the magnitude of g j matches ⁇ ing.
  • the margin value [delta] because it is a fixed value determined in advance, in a combination of all video data v i and video data v j, the same margin value [delta] It will be applied.
  • the margin value ⁇ is a parameter adopted for the purpose of having a difference at least according to the margin value when the difference between the two estimated scores s i and s j is small.
  • an object of the present invention is to provide a technique capable of learning know-how regarding a player's competition scoring method more accurately than the conventional technique and obtaining a more accurate estimated score.
  • One aspect of the present invention is training in which a moving image data recording an athlete's movement during a competition and a plurality of true value scores, which are scores scored by a referee for the competition recorded in the moving image data, are combined. It has an input unit that captures data and a function approximator that approximates a function based on parameters, and estimates the estimated score of the video data by giving the video data captured by the input unit to the function approximator as input.
  • a first loss function for obtaining a regression loss between each of the estimation unit, each of the plurality of estimated scores, and each of the true value scores corresponding to each of the estimated scores, and two different moving image data.
  • a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations.
  • the regression loss and the second loss function which are the outputs of the first loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
  • This is a learning device including a parameter updating unit that updates the parameters by performing a learning process that reduces each of the ranking losses, which is the output of the above.
  • one aspect of the present invention comprises a learning process of the input unit that captures moving image data recording the movements of the athlete during the competition, the learning device according to claim 1, or the learning device according to claim 2. It has a function approximation device that approximates a function based on the obtained learned parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input unit to the function approximation device as an input. It is a score estimation device including an estimation unit.
  • one aspect of the present invention combines a moving image data recording an athlete's movement during a competition and a plurality of true value scores which are scores scored by a referee for the competition recorded in the moving image data.
  • the estimated score of the moving image data is estimated, and each of the plurality of estimated scores and the estimated A first loss function for finding the regression loss between each of the true scores corresponding to each of the scores, and two estimated scores and two corresponding to each of all combinations of the two different video data.
  • a learning process is performed to reduce each of the regression loss, which is the output of the first loss function, and the ranking loss, which is the output of the second loss function. This is a learning method for updating the parameters.
  • one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained.
  • An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said.
  • a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data.
  • the regression loss and the second which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
  • This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
  • one aspect of the present invention was obtained by taking in moving image data recording the movements of the athlete during the competition and performing the learning process of the learning device according to claim 1 or the learning device according to claim 2.
  • This is a score estimation method for estimating the estimated score of the moving image data by giving the moving image data taken into a function approximator that approximates the function based on the learned parameters as an input.
  • one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained.
  • An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said.
  • a second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data.
  • the regression loss and the second which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores.
  • This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
  • the present invention it is possible to learn the know-how regarding the method of scoring the competition of the athlete more accurately than the conventional technique, and to obtain a more accurate estimated score.
  • FIG. 1 is a block diagram showing an internal configuration of the learning device 1 according to the first embodiment.
  • the learning device 1 includes a training data storage unit 10, an input unit 11, an estimation unit 50, a parameter update unit 14, a feature amount extraction parameter storage unit 15, and a score estimation parameter storage unit 16.
  • the estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
  • the training data storage unit 10 stores in advance a plurality of training data in which each of the plurality of moving image data and each of the plurality of true value scores are combined.
  • Each of the plurality of video data is generated, for example, by taking a picture of the movement performed by the athlete during the competition with a camera or the like.
  • the competition is a sports competition in which, for example, there is a quantitative scoring standard for techniques such as high diving and gymnastics.
  • a player is, for example, a player who plays the sport.
  • Each of the plurality of true value scores is a score scored by an official referee in advance for the competition of the athlete recorded in the video data corresponding to each.
  • the input unit 11 repeatedly reads n training data from the training data storage unit 10.
  • n is an integer of 2 or more, and is a batch size when the learning process described below is performed.
  • the number of training data stored by the training data storage unit 10 is assumed to be a multiple of n, that is, n ⁇ m (where m is an integer of 1 or more).
  • any one video data included in the n training data shown in v i or v j indicates a true value score corresponding to the moving image data v i as g i, moving image data v j
  • the true score corresponding to is shown as g j.
  • the input unit 11 outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the n read training data to the parameter update unit 14.
  • the feature amount extraction parameter storage unit 15 stores the feature amount extraction parameters that serve as weights and biases applied to the first function approximator included in the feature amount extraction unit 12.
  • the feature amount extraction unit 12 has a first function approximation device, and applies the feature amount extraction parameter stored in the feature amount extraction parameter storage unit 15 to the first function approximation device.
  • the first function approximator approximates the function corresponding to the feature amount extraction parameter by applying the feature amount extraction parameter.
  • Feature amount extraction unit 12 extracts the feature amount of video data v i by providing as input video data v i in which the input unit 11 outputs to the first function approximator.
  • the first function approximation is any neural network for extracting a feature from the moving image data v i, for example, Non-Patent Document 1 Fig.
  • the score estimation parameter storage unit 16 stores score estimation parameters that serve as weights and biases to be applied to the second function approximator included in the score estimation unit 13.
  • the score estimation unit 13 has a second function approximation device, and applies the score estimation parameters stored in the score estimation parameter storage unit 16 to the second function approximation device.
  • the second function approximator approximates the function corresponding to the score estimation parameter by applying the score estimation parameter.
  • Score estimator 13 estimates the estimated score s i by applying the second function approximator the feature amount by the feature extraction unit 12 has extracted as an input.
  • the second function approximator is an arbitrary neural network that estimates the estimated score from the features, for example, Fig.
  • a neural network having a two-stage fully connected layer in which a ReLU layer and a Dropout layer are connected (hereinafter referred to as “fully connected layer 131”) or the like is applied to the subsequent stage shown in 1.
  • the parameter update unit 14 has n true score g 1 to n output by the input unit 11, n estimated scores s 1 to n estimated by the score estimation unit 13, and a predetermined first loss function. Based on, the regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n is calculated.
  • the first loss function for example, MSE (Mean Square Error) shown in the following equation (2) for calculating the regression loss is applied.
  • the parameter update unit 14 previously sets two estimated scores s i , s j and two true score g i , g j corresponding to each of all combinations of the two different moving image data v i , v j. based on the second loss function defined, the two estimated score s i, s 2 having a true value of the ranking loss indicating an error degree of order of j score g i, in consideration of the magnitude of the difference g j calculate.
  • the absolute value of the difference between the two true score g i and g j is replaced with the margin value ⁇ . It has been applied.
  • the sign (x) function is a function whose return value is the sign of the argument x
  • ReLU (x) is the case where the argument x is 0 or more.
  • X is the return value, and when the argument x is smaller than 0, 0 is the return value.
  • the parameter update unit 14 performs learning processing so as to reduce the calculated regression loss, that is, Loss1, which is the output value of the equation (2), and the calculated ranking loss, that is, Loss2, which is the output value of the equation (3).
  • the parameter update unit 14 calculates a new parameter for feature quantity extraction and a new parameter for score estimation by performing the learning process.
  • the parameter update unit 14 updates the contents of the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 based on the calculated new feature amount extraction parameter and the new score estimation parameter.
  • FIG. 2 is a flowchart showing the flow of the learning process performed by the learning device 1.
  • the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 store the initial value feature amount extraction parameters and the initial value score estimation parameters in advance, respectively.
  • the feature amount extraction unit 12 reads out the feature amount extraction parameter from the feature amount extraction parameter storage unit 15, and transfers the read feature amount extraction parameter to the neural network of the moving image feature amount extraction layer 121 which is the first function approximation device. Apply (step S1).
  • the score estimation unit 13 reads out the score estimation parameters from the score estimation parameter storage unit 16, and applies the read score estimation parameters to the neural network of the fully connected layer 131, which is the second function approximation device (step S2). ..
  • the input unit 11 reads the first n training data from the training data storage unit 10. As shown in FIG. 3, the input unit 11 outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the read training data to the parameter update unit 14. The parameter update unit 14 takes in n true value scores g 1 to n output by the input unit 11 (step S3).
  • steps S4, S5 are repeated (loop L1s ⁇ L1e).
  • Feature amount extraction unit 12 gives the video feature quantity extraction layer 121 video data v i as an input, as shown in FIG. 3, obtains a feature amount of video data v i as the output of the video feature quantity extraction layer 121.
  • Feature amount extraction unit 12 outputs the feature quantity of the acquired moving image data v i in the score estimation unit 13 (step S4).
  • Score estimator 13 gives the total binding layer 131 as inputs the feature quantity of the moving image data v i, as shown in FIG. 3, to obtain the estimated score s i of the moving image data v i as the output of the total binding layer 131. Score estimator 13 outputs the estimated score s i of the acquired moving image data v i in the parameter update unit 14 (step S5).
  • n moving images are applied to the moving image feature amount extraction layer 121 and the fully connected layer 131, respectively, with the same feature amount extraction parameters and the same score estimation parameters.
  • the processing of steps S4 and S5 is performed n times with each of the data v1 to n as an input.
  • the regression loss Loss 1 is calculated by the equation (2) based on g 1 to n (step S6).
  • the parameter update unit 14 calculates the ranking loss Loss2 by the equation (3) based on n estimated scores s 1 to n and n true score g 1 to n (step S7).
  • the parameter update unit 14 calculates the evaluation loss Loss by, for example, the following equation (4) (step S8).
  • ⁇ 1 and ⁇ 1 are ⁇ 1 > 0 and ⁇ 1 > 0, which are constants arbitrarily determined to balance the two losses.
  • 2 is a term of L2-regulation.
  • the parameter update unit 14 determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S9). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
  • the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition (step S9, Yes).
  • the parameter update unit 14 ends the process.
  • the parameter update unit 14 uses, for example, an error backpropagation method so as to reduce the regression loss Loss1 and the ranking loss Loss2. By the learning process used, a new parameter for feature quantity extraction and a new parameter for score estimation are calculated.
  • the parameter update unit 14 writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 and updates the feature amount extraction parameter.
  • the parameter update unit 14 writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter (step S10).
  • step S1 the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S2 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.
  • the input unit 11 reads out the next n training data from the training data storage unit 10 in step S3 to be performed again. In the process of repeating the process, when the processes of steps S4 and S5 are performed for all the training data stored in the training data storage unit 10, the input unit 11 again starts from the first n training data. The reading from the training data storage unit 10 is repeated in order.
  • step S9 when the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition, the parameter storage unit 15 for feature quantity extraction and the parameter storage unit 16 for score estimation each have a sufficient regression loss Loss1. Then, the learned feature amount extraction parameter and the learned score estimation parameter in the state where the ranking loss Loss2 is reduced are recorded.
  • the parameter update unit 14 is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. Based on the first loss function for the regression loss and the two estimated scores and the two true scores corresponding to each of all combinations of two different video data, the degree of misorder between the two estimated scores. The output of the first loss function based on the second loss function that finds the ranking loss and corrects the ranking loss by taking into account the magnitude of the difference between the two true scores.
  • the function approximator of the estimation unit 50 By performing learning processing to reduce each of the regression loss, which is the output of the second loss function, and the ranking loss, which is the output of the second loss function, the function approximator of the estimation unit 50 (first function approximator and second function approximator).
  • the parameters applied to, that is, the parameters for feature quantity extraction and the parameters for score estimation are updated.
  • the learning device 1 provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
  • the formula (3) is used as the ranking loss instead of the formula (1) adopted by the technique disclosed in Non-Patent Document 1.
  • the effect of the equation (3) will be described separately for each case.
  • the estimated score s i, and the magnitude of s j, true value score g i, if the magnitude of g j are coincident is better to use Equation (3) than using equation (1), more precisely estimated score s i, the absolute value of the difference between s j, it is possible to perform a learning process to approach the absolute value of the difference between the true value score g i, g j.
  • the estimated score s i when the difference between the s j is small, better to use Equation (3) is, according to the magnitude of abs (g j -g i), the estimated score s i, the s j it is possible to increase the absolute value of the difference, more accurately estimated score s i, the absolute value of the difference between s j, is possible to perform learning processing to approach the absolute value of the difference between the true value score g i, g j It will be possible.
  • FIG. 4 is a block diagram showing an internal configuration of the score estimation device 2 according to the first embodiment.
  • the score estimation device 2 includes an input unit 11-1, an estimation unit 50, an output unit 17, a learned feature amount extraction parameter storage unit 18, and a learned score estimation parameter storage unit 19.
  • the estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
  • step S9 shown in FIG. 2 when the parameter update unit 14 determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and The learned feature amount extraction parameters and the learned score estimation parameters are recorded in the score estimation parameter storage unit 16, respectively.
  • the learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1 is completed.
  • the learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1 is completed.
  • Input unit 11-1 takes in arbitrary video data given from the outside.
  • the input unit 11-1 outputs the captured moving image data to the feature amount extraction unit 12.
  • the feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121.
  • the feature amount extraction unit 12 gives the moving image data output by the input unit 11-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
  • the score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131.
  • the score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17.
  • the output unit 17 outputs the estimated score output by the score estimation unit 13 to the outside.
  • the estimation unit 50 uses the learning device 1 to obtain learned parameters (learned feature amount extraction parameters and learned score estimation parameters).
  • learned parameters learned feature amount extraction parameters and learned score estimation parameters.
  • Has a function approximater first function approximater and second function approximater that approximates the function based on
  • the score estimation device 2 is a learned feature amount obtained by the learning process of the learning device 1 that learns the know-how regarding the scoring method of the official referee more accurately than the technique described in Non-Patent Document 1. Since the estimated score for arbitrary moving image data can be obtained based on the extraction parameter and the obtained learned score estimation parameter, it is possible to obtain a more accurate estimated score.
  • FIG. 5 is a block diagram showing an internal configuration of the learning device 1a according to the second embodiment.
  • the same configurations as those of the learning device 1 of the first embodiment are designated by the same reference numerals, and different configurations will be described below.
  • the learning device 1a includes a training data storage unit 10a, an input unit 11a, an estimation unit 50a, a parameter update unit 14a, a feature amount extraction parameter storage unit 15, a score estimation parameter storage unit 16, and a class estimation parameter storage unit 21. ..
  • the estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
  • the training data storage unit 10a stores in advance a plurality of training data in which each of the plurality of moving image data, each of the plurality of true value scores, and each of the plurality of true value class labels are combined.
  • a plurality of video data are classified into a plurality of predetermined classes based on the contents recorded in each video data.
  • the class is a type of competition having different scoring criteria such as high diving and gymnastics.
  • the true value class label is identification information indicating the class to which the corresponding moving image data belongs by classification.
  • the input unit 11a repeatedly reads n training data from the training data storage unit 10a.
  • n is an integer of 2 or more, and is a batch size when the learning process described below is performed.
  • the number of training data stored by the training data storage unit 10a is assumed to be a multiple of n, that is, n ⁇ m (where m is an integer of 1 or more).
  • any one video data included in the n training data shown in v i or v j indicates a true value score corresponding to the moving image data v i as g i, moving image data v j
  • the true score corresponding to is shown as g j.
  • the input unit 11a outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n included in the n read training data and n true value class labels k 1 to n to the parameter update unit 14a.
  • the class estimation parameter storage unit 21 stores class estimation parameters that serve as weights and biases to be applied to the third function approximator of the class estimation unit 20.
  • the class estimator 20 has a third function approximation device, and applies the class estimation parameters stored in the class estimation parameter storage unit 21 to the third function approximation device.
  • the third function approximator approximates the function corresponding to the class estimation parameter by applying the class estimation parameter.
  • Class estimator 20 estimates the estimated class c i by applying the third function approximator the feature amount by the feature extraction unit 12 has extracted as an input.
  • the estimated class c i is the information indicated by the probability of each class, by reference to the estimated class c i, to identify whether the probability of belonging to any corresponding moving image data v i class higher Can be done.
  • the third function approximator is an arbitrary neural network that estimates the estimation class from the features.
  • a neural network of a fully connected layer in which the Softmax layer is connected in the subsequent stage hereinafter, “fully connected layer + Softmax layer”). 201 ”) and the like are applied.
  • the parameter update unit 14a has n true value scores g 1 to n output by the input unit 11a and n estimated scores estimated by the score estimation unit 13. Regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n based on s 1 to n and the first loss function represented by the above equation (2). Is calculated.
  • the parameter update unit 14a has n true value class labels k 1 to n output by the input unit 11a, n estimation classes c 1 to n estimated by the class estimation unit 20, and a third predetermined class. Based on the loss function of, the class loss between each of the estimated classes s 1 to n and each of the true value class labels k 1 to n is calculated.
  • the parameter update unit 14a two different moving image data v i, v all combinations of each corresponding to the two estimated scores s i of j, s j, two true value score g i, g j and two
  • the ranking loss indicating the degree of error in the order of the two estimation scores s i and s j is the two true scores g i and g. considering the size of the difference between j, and two putative class c i, it is calculated by considering the correlation between c j.
  • the loss function represented by the following equation (6) is applied.
  • correlation is two putative class c i, a correlation coefficient indicating a degree of similarity c j.
  • Spearman's rank correlation coefficient obtained by the equation (7) is applied as the correlation coefficient.
  • Y is the number of classes as in equation (5).
  • CR i, y is the order of class y in estimated class c i.
  • the probability is 1st, the probability of belonging to Class1 is 2nd, and the probability of belonging to Class3 is 3rd.
  • the parameter update unit 14a includes the calculated regression loss, that is, Loss1, which is the output value of the equation (2), the calculated class loss, that is, Loss3, which is the output value of the equation (5), and the calculated ranking loss, that is, the equation ( The learning process is performed so as to reduce the class 4 which is the output value of 6).
  • the parameter update unit 14a calculates a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter by learning processing.
  • the parameter update unit 14a stores the feature quantity extraction parameter storage unit 15 and the score estimation parameter storage based on the calculated new feature quantity extraction parameter, new score estimation parameter, and new class estimation parameter. The contents of the unit 16 and the parameter storage unit 21 for class estimation are updated.
  • FIG. 6 is a flowchart showing the flow of the learning process performed by the learning device 1a.
  • the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 have an initial value feature amount extraction parameter, an initial value score estimation parameter, and an initial value score estimation parameter, respectively.
  • Initial value class estimation parameters are stored in advance.
  • steps S21 and S22 the same processing as steps S1 and S2 of the first embodiment shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13.
  • the class estimation unit 20 reads out the class estimation parameters from the class estimation parameter storage unit 21, and applies the read class estimation parameters to the neural network of the fully connected layer + Softmax layer 201, which is a third function approximation device (step). S23).
  • the input unit 11a reads the first n training data from the training data storage unit 10a. As shown in FIG. 7, the input unit 11a outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n and n true value class labels k 1 to n included in the read training data to the parameter update unit 14a.
  • the parameter update unit 14a takes in n true value scores g 1 to n and n true value class labels k 1 to n output by the input unit 11a (step S24).
  • n are each the number of moving image data v 1 ⁇ n about video data v i, the processing of step S25, S26, S27 are repeated (loop L2s ⁇ L2e).
  • step S26 the same processing as in steps S4 and S5 shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13.
  • step S26 the score estimation unit 13 outputs the estimated score s i obtained in the parameter update unit 14a.
  • Class estimator 20 gives the total binding layer + Softmax layer 201 as inputs the feature quantity of the moving image data v i, as shown in FIG. 7, the estimated class c i of the moving image data v i as the output of the total binding layer + Softmax layer 201 get. Class estimator 20 outputs the estimated class c i of the acquired moving image data v i in the parameter update unit 14a (step S27).
  • the same feature amount extraction parameter, the same score estimation parameter, and the same class estimation parameter are used in the moving image feature amount extraction layer 121, the fully connected layer 131, and the like, respectively.
  • the processes of steps S25, S26, and S27 are performed n times with each of the n moving image data v1 to n as an input.
  • step S28 the same processing as in step S6 shown in FIG. 2 is performed by the parameter update unit 14a.
  • the parameter update unit 14a captures the n estimation classes c1 to n estimated by the class estimation unit 20
  • the fetched n estimation classes c1 to n and the n true value classes captured in step S24 Based on the labels k1 to n , the class loss Loss3 is calculated by the equation (5) (step S29).
  • the parameter update unit 14a has a ranking loss according to the equation (6) based on n estimated scores s 1 to n , n true score g 1 to n , and n estimated classes c 1 to n. Score4 is calculated (step S30).
  • the parameter update unit 14a calculates the evaluation loss Loss by, for example, the following equation (8) (step S31).
  • ⁇ 2 , ⁇ 2 and ⁇ 2 are ⁇ 2 > 0, ⁇ 2 > 0, and ⁇ 2 > 0, and are constants arbitrarily determined to balance the three losses. be. Further,
  • 2 is a term of L2-regulation.
  • the parameter update unit 14a determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S32). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
  • the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition (step S32, Yes).
  • the parameter update unit 14a ends the process.
  • the parameter update unit 14a reduces, for example, the error reverse so as to reduce the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4.
  • a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter are calculated by learning processing using a propagation method or the like.
  • the parameter update unit 14a writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 to update the feature amount extraction parameter.
  • the parameter update unit 14a writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter.
  • the parameter update unit 14a writes the calculated new class estimation parameter to the class estimation parameter storage unit 21 to update the class estimation parameter (step S33).
  • step S21 to be performed again the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121.
  • step S22 performed again the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.
  • step S23 to be performed again the class estimation unit 20 reads the updated class estimation parameters from the class estimation parameter storage unit 21 and applies them to the fully connected layer + Softmax layer 201.
  • the input unit 11a reads out the next n training data from the training data storage unit 10a in step S24 to be performed again.
  • the input unit 11a again performs the first n trainings. Reading from the training data storage unit 10a is repeated in order from the data.
  • step S32 when the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 In each case, the parameters for extracting the learned feature amount in the state where the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4 are sufficiently small, the learned score estimation parameters, and the trained class estimation parameters are used. Will be recorded.
  • the parameter update unit 14a is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores.
  • the ranking loss indicating the degree of order error between the two estimated scores is obtained.
  • the function approximation device of the estimation unit 50a is performed by performing learning processing to reduce each of the regression loss which is the output of the function, the class loss which is the output of the third loss function, and the ranking loss which is the output of the fourth loss function.
  • the parameters applied to (the first function approximation device, the second function approximation device, and the third function approximation device), that is, the feature quantity extraction parameter, the score estimation parameter, and the class estimation parameter are updated.
  • the learning device 1a provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
  • the learning device 1a of the second embodiment has the following effects in addition to the effects of the learning device 1 of the first embodiment.
  • the constraint of ranking loss can be strengthened for similar competitions, and conversely, the constraint of ranking loss can be weakened for competitions that are not similar. be able to.
  • the learning apparatus 1a taking into account the type of differences in the competition Therefore, since the learning process is performed, it becomes possible to learn the know-how regarding the scoring method of the official referee more accurately than the learning device 1.
  • FIG. 8 is a block diagram showing an internal configuration of the score estimation device 2a according to the second embodiment.
  • the score estimation device 2a includes an input unit 11a-1, an estimation unit 50a, an output unit 17a, a learned feature amount extraction parameter storage unit 18, a learned score estimation parameter storage unit 19, and a learned class estimation parameter storage unit 22.
  • the estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
  • step S32 shown in FIG. 6 when the parameter update unit 14a determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and , The score estimation parameter storage unit 16 and the class estimation parameter storage unit 21 each record a learned feature amount extraction parameter, a learned score estimation parameter, and a learned class estimation parameter. Will be done.
  • the learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1a is completed.
  • the learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1a is completed.
  • the class estimation parameter storage unit 22 stores in advance the learned class estimation parameters recorded in the class estimation parameter storage unit 21 when the learning process of the learning device 1a is completed.
  • the input unit 11a-1 takes in arbitrary video data given from the outside.
  • the input unit 11a-1 outputs the captured moving image data to the feature amount extraction unit 12.
  • the feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121.
  • the feature amount extraction unit 12 gives the moving image data output by the input unit 11a-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
  • the score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131.
  • the score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17a.
  • the class estimation unit 20 reads the learned class estimation parameters from the learned class estimation parameter storage unit 22, and applies the learned class estimation parameters read out to the fully connected layer + Softmax layer 201.
  • the class estimation unit 20 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer + Softmax layer 201 as an input, acquires an estimation class as an output, and outputs the acquired estimation class to the output unit 17a.
  • the output unit 17a outputs the estimated score output by the score estimation unit 13 to the outside, and outputs the estimation class output by the class estimation unit 20 to the outside.
  • the class estimation unit 20 and the learned class estimation parameter storage unit 22 may not be provided.
  • the estimation unit 50a uses the learning device 1a to obtain learned parameters (learned feature amount extraction parameters, learned score estimation parameters). And has a function fitter (first function fitter, second function fitter, and third function fitter) that approximates the function based on the trained class estimation parameters), and the function fitter is a movie. By giving the data as an input, the estimated score of the moving image data is estimated.
  • the formula (4) for calculating the valuation loss Loss of the first embodiment and the formula (8) for calculating the valuation loss Loss of the second embodiment are examples.
  • an arbitrary formula is applied so as to balance the regression loss and the ranking loss, and in the second embodiment, the regression loss, the ranking loss and the class loss can be balanced. You may.
  • Cross Entropy Loss as the third loss function
  • another function may be applied as the third loss function.
  • the Spearman's rank correlation coefficient shown in the equation (7) is applied as the correlation coefficient correlation of the equation (6)
  • another correlation coefficient may be applied as the correlation coefficient correlation. ..
  • the training data storage units 10, 10a are provided inside the learning devices 1, 1a, but may be provided outside the learning devices 1, 1a. .. Further, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 may also be provided outside the score estimation devices 2 and 2a. ..
  • the training data storage units 10, 10a, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 store data to be stored. It is desirable to apply a non-volatile storage area because it is a storage unit.
  • the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 are storage units for temporarily storing data, they have a non-volatile storage area. It may be applied or a volatile storage area may be applied.
  • the first function approximation device, the second function approximation device, and the third function approximation device shown in the first and second embodiments described above are other than the neural network having the above-described configuration.
  • a neural network of configuration may be applied.
  • other means capable of learning processing used in machine learning may be applied.
  • the first function approximation device and the second function approximation device are used.
  • the approximations may be integrated to form one function approximation, or in the second embodiment, the first function approximation, the second function approximation, and the third function approximation are integrated into one. Two function approximations may be configured.
  • the learning devices 1, 1a and the score estimation devices 2, 2a in the above-described embodiment may be realized by a computer.
  • the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed.
  • the term "computer system” as used herein includes hardware such as an OS and peripheral devices.
  • the "computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system.
  • a "computer-readable recording medium” is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).
  • FPGA Field Programmable Gate Array
  • 1 Learning device, 10 ... Training data storage unit, 11 ... Input unit, 12 ... Feature quantity extraction unit, 13 ... Score estimation unit, 14 ... Parameter update unit, 15 ... Feature quantity extraction parameter storage unit, 16 ... Score estimation Parameter storage unit, 50 ... estimation unit

Abstract

On the basis of a first loss function that obtains a regression loss between each of a plurality of estimated scores having been estimated by giving a plurality of moving-image data, in which motion of athletes during competition are recorded, as input to a function approximator to which parameters are applied and each of true-value scores corresponding to each of the estimated scores and having been scored by a referee, and a second loss function that obtains, on the basis of two estimated scores and two true-value scores corresponding to each of all combinations of two different moving-image data, a ranking loss that indicates the degree of a sequence error between the two estimated scores, the second loss function correcting the ranking loss taking into account the magnitude of a difference between the two true-value scores, the present invention carries out a learning process of reducing each of the regression loss that is the output of the first loss function and the ranking loss that is the output of the second loss function and thereby updates the parameters.

Description

学習装置、学習方法及び学習プログラム、並びに、スコア推定装置、スコア推定方法及びスコア推定プログラムLearning device, learning method and learning program, and score estimation device, score estimation method and score estimation program
 本発明は、例えば、競技者の競技の採点の手法に関するノウハウを学習する学習装置、学習方法及び学習プログラム、並びに、学習結果に基づいて競技のスコアを推定するスコア推定装置、スコア推定方法及びスコア推定プログラムに関する。 The present invention provides, for example, a learning device, a learning method and a learning program for learning know-how regarding a player's competition scoring method, and a score estimation device for estimating a competition score based on learning results, a score estimation method and a score. Regarding the estimation program.
 スポーツ競技において、高飛び込みや体操などの選手が行った競技に対して、オフィシャルな審判員がスコアを採点し、採点したスコアに基づいて個々の競技の順位を決める競技がある。このような競技には、採点における定量的な採点基準が存在している。 In sports competitions, there are competitions in which official referees score scores for competitions performed by athletes such as high diving and gymnastics, and the ranking of each competition is decided based on the scored scores. In such competitions, there are quantitative scoring criteria for scoring.
 近年、このような競技におけるスコアを自動的に推定するといったコンピュータビジョン分野での活動品質評価で使われる技術の検討が進められており、このような技術としてAQA(Action Quality Assessment)という技術が知られている。例えば、非特許文献1には、AQAを、ディープラーニングを用いて行う手法が開示されている。 In recent years, studies have been conducted on technologies used in activity quality evaluation in the field of computer vision, such as automatically estimating scores in such competitions, and AQA (Action Quality Assessment) technology is known as such technology. Has been done. For example, Non-Patent Document 1 discloses a method of performing AQA using deep learning.
 非特許文献1に開示されている技術では、競技者による競技が撮影された動画データと、オフィシャルな審判員が当該競技に対して採点することにより得られた真値スコアとを訓練データとして取り込む。非特許文献1に開示されている技術では、次にディープニューラルネットワークを用いて、訓練データに含まれる動画データから特徴量を抽出する。非特許文献1に開示されている技術では、更に、抽出した特徴量から推定スコアを推定する。 In the technique disclosed in Non-Patent Document 1, the video data of the competition taken by the athlete and the true score obtained by the official referee scoring the competition are taken as training data. .. In the technique disclosed in Non-Patent Document 1, the feature amount is then extracted from the moving image data included in the training data by using a deep neural network. In the technique disclosed in Non-Patent Document 1, the estimated score is further estimated from the extracted features.
 非特許文献1に開示されている技術では、推定した推定スコアと、訓練データに含まれる真値スコアとの間の損失を算出する。非特許文献1に開示されている技術では、算出した損失に基づいて、損失が減少するようにディープニューラルネットワークの重みやバイアスを更新することを繰り返す。これにより、オフィシャルな審判員が行う採点の手法に関するノウハウを学習し、学習済みの重みやバイアスが適用されたディープニューラルネットワークを用いることで、任意の競技者が行う競技のスコアを推定することが可能になる。 In the technique disclosed in Non-Patent Document 1, the loss between the estimated estimated score and the true score included in the training data is calculated. In the technique disclosed in Non-Patent Document 1, the weight and bias of the deep neural network are repeatedly updated so as to reduce the loss based on the calculated loss. By doing this, it is possible to learn the know-how about the scoring method performed by the official referee and estimate the score of the competition performed by any competitor by using the deep neural network to which the learned weights and biases are applied. It will be possible.
 非特許文献1に開示された技術では、推定スコアと真値スコアの間の損失を示す回帰損失に加えて、得られた推定スコア間の順序の正確性を高めることを目的としたラインキング損失を採用している。回帰損失のみを用いて学習を行うと、真値スコアが近い動画データの間では、スコアを推定の際の誤差により、推定スコアの順番と、真値スコアの順番が入れ替わる可能性があるという問題がある。この問題を解決するために、非特許文献1では、次式(1)に示されるランキング損失を採用することにより、このような誤りが発生する確率を低下させ、従来技術を上回る精度を達成している。 In the technique disclosed in Non-Patent Document 1, in addition to the regression loss indicating the loss between the estimated score and the true score, the line king loss aimed at improving the accuracy of the order between the obtained estimated scores. Is adopted. When learning is performed using only regression loss, there is a problem that the order of estimated scores and the order of true scores may be interchanged due to an error in estimating scores between video data with close true score. There is. In order to solve this problem, Non-Patent Document 1 adopts the ranking loss shown in the following equation (1) to reduce the probability of such an error and achieve accuracy higher than that of the prior art. ing.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 動画データの任意の1つをvとする。式(1)において、「g」は、動画データvにおける真値スコアである。また、式(1)において、「s」は、動画データvから得られる推定スコアであり、sign(x)関数は、引数xの符号を戻り値とする関数である。式(1)の「-(s-s)sign(g-g)」の項は、推定スコアs,sの大小関係と、真値スコアg,gの大小関係とが一致している場合、負の値となり、不一致の場合、正の値となる。 Any one of the moving image data and v i. In the formula (1), "g i" is the true value score in the moving image data v i. Further, in the equation (1), "s i" is the estimated score obtained from the moving image data v i, sign (x) function is a function that returns the sign of the argument x. Formula (1) "- (s j -s i) sign (g j -g i) " term of the estimated score s i, a magnitude relationship between s j, the magnitude relationship of the true value score g i, g j If they match, the value is negative, and if they do not match, the value is positive.
 ReLU(x)は、引数xが0以上である場合、xを戻り値とし、引数xが0より小さい場合、0を戻り値とする関数である。δは、マージン値であり、正の値である。そのため、推定スコアs,sの大小関係と、真値スコアg,gの大小関係が不一致の場合に、推定スコアs,sの差の絶対値の増加に応じてランキング損失が増加することになる。 ReLU (x) is a function that uses x as the return value when the argument x is 0 or more, and 0 as the return value when the argument x is smaller than 0. δ is a margin value and is a positive value. Therefore, the estimated score s i, a magnitude relationship between s j, true value score g i, if the magnitude relationship between g j is a mismatch, ranking loss in accordance with an increase in the absolute value of the difference between the estimated scores s i, s j Will increase.
 マージン値δは、2つの推定スコアs,sの差が小さい場合に、少なくともマージン値δに応じた差を有するように2つの推定スコアs,sの間を離す効果がある。そのため、推定スコアs,sの大小関係と、真値スコアg,gの大小関係とが一致している場合にもマージン値δの大きさに応じてランキング損失が生じるようになっている。 The margin value δ has the effect of separating the two estimated scores s i and s j so as to have a difference at least according to the margin value δ when the difference between the two estimated scores s i and s j is small. Therefore, the magnitude relationship between the estimated score s i, s j, so ranking losses occur in accordance with the magnitude of the true value score g i, margin value even if the magnitude of g j matches δ ing.
 しかしながら、ランキング損失として、式(1)を用いた場合、マージン値δは、予め定められる固定値であるため、全ての動画データvと動画データvの組み合わせにおいて、同一のマージン値δを適用することになる。 However, as the ranking loss, when using Equation (1), the margin value [delta], because it is a fixed value determined in advance, in a combination of all video data v i and video data v j, the same margin value [delta] It will be applied.
 マージン値δは、上述したように、2つの推定スコアs,sの差が小さい場合に、少なくともマージン値に応じた差を有するようにする目的で採用されているパラメータである。 As described above, the margin value δ is a parameter adopted for the purpose of having a difference at least according to the margin value when the difference between the two estimated scores s i and s j is small.
 しかしながら、真値スコアg,gの間の離れている度合いがマージン値δよりも小さい場合であっても、式(1)においてマージン値δが加算されることにより、推定スコアs,sの間を、マージン値δに応じた大きさで余分に離すようにする学習が行われてしまうという問題がある。 However, even when the degree of separation between the true score g i and g j is smaller than the margin value δ, the estimated score s i , by adding the margin value δ in the equation (1), There is a problem that learning is performed so that the distance between s j is excessively separated by a size corresponding to the margin value δ.
 上記事情に鑑み、本発明は、従来技術よりも更に正確に競技者の競技の採点の手法に関するノウハウを学習し、かつ、より正確な推定スコアを求めることができる技術の提供を目的としている。 In view of the above circumstances, an object of the present invention is to provide a technique capable of learning know-how regarding a player's competition scoring method more accurately than the conventional technique and obtaining a more accurate estimated score.
 本発明の一態様は、競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込む入力部と、パラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力部が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定部と、複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新するパラメータ更新部と、を備える学習装置である。 One aspect of the present invention is training in which a moving image data recording an athlete's movement during a competition and a plurality of true value scores, which are scores scored by a referee for the competition recorded in the moving image data, are combined. It has an input unit that captures data and a function approximator that approximates a function based on parameters, and estimates the estimated score of the video data by giving the video data captured by the input unit to the function approximator as input. A first loss function for obtaining a regression loss between each of the estimation unit, each of the plurality of estimated scores, and each of the true value scores corresponding to each of the estimated scores, and two different moving image data. A second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations. The regression loss and the second loss function, which are the outputs of the first loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores. This is a learning device including a parameter updating unit that updates the parameters by performing a learning process that reduces each of the ranking losses, which is the output of the above.
 また、本発明の一態様は、競技者の競技中の動作を記録した動画データを取り込む入力部と、請求項1に記載の学習装置、または、請求項2に記載の学習装置の学習処理によって得られた学習済みのパラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力部が取り込んだ前記動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定部と、を備えるスコア推定装置である。 Further, one aspect of the present invention comprises a learning process of the input unit that captures moving image data recording the movements of the athlete during the competition, the learning device according to claim 1, or the learning device according to claim 2. It has a function approximation device that approximates a function based on the obtained learned parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input unit to the function approximation device as an input. It is a score estimation device including an estimation unit.
 また、本発明の一態様は、競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込み、パラメータに基づいて関数を近似する関数近似器に取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定し、複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新する、学習方法である。 Further, one aspect of the present invention combines a moving image data recording an athlete's movement during a competition and a plurality of true value scores which are scores scored by a referee for the competition recorded in the moving image data. By taking in the training data and giving the moving image data taken in the function approximator that approximates the function based on the parameters as input, the estimated score of the moving image data is estimated, and each of the plurality of estimated scores and the estimated A first loss function for finding the regression loss between each of the true scores corresponding to each of the scores, and two estimated scores and two corresponding to each of all combinations of the two different video data. A second loss function for obtaining a ranking loss indicating the degree of order error between two estimated scores based on the true score, and the ranking in consideration of the magnitude of the difference between the two true scores. Based on the second loss function that corrects the loss, a learning process is performed to reduce each of the regression loss, which is the output of the first loss function, and the ranking loss, which is the output of the second loss function. This is a learning method for updating the parameters.
 また、本発明の一態様は、コンピュータを、競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込む入力手段、パラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力手段が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定手段、複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新するパラメータ更新手段、として機能させるための学習プログラムである。 Further, one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained. An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said. A second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data. The regression loss and the second, which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores. This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
 また、本発明の一態様は、競技者の競技中の動作を記録した動画データを取り込み、請求項1に記載の学習装置、または、請求項2に記載の学習装置の学習処理によって得られた学習済みのパラメータに基づいて関数を近似する関数近似器に取り込んだ前記動画データを入力として与えることにより、当該動画データの推定スコアを推定する、スコア推定方法である。 Further, one aspect of the present invention was obtained by taking in moving image data recording the movements of the athlete during the competition and performing the learning process of the learning device according to claim 1 or the learning device according to claim 2. This is a score estimation method for estimating the estimated score of the moving image data by giving the moving image data taken into a function approximator that approximates the function based on the learned parameters as an input.
 また、本発明の一態様は、コンピュータを、競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込む入力手段、パラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力手段が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定手段、複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新するパラメータ更新手段、として機能させるための学習プログラムである。 Further, one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained. An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said. A second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data. The regression loss and the second, which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores. This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.
 この発明によれば、従来技術よりも更に正確に競技者の競技の採点の手法に関するノウハウを学習し、かつ、より正確な推定スコアを求めることが可能となる。 According to the present invention, it is possible to learn the know-how regarding the method of scoring the competition of the athlete more accurately than the conventional technique, and to obtain a more accurate estimated score.
第1の実施形態の学習装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the learning apparatus of 1st Embodiment. 第1の実施形態の学習装置による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing by the learning apparatus of 1st Embodiment. 第1の実施形態の学習装置による処理の概要を示す図である。It is a figure which shows the outline of the processing by the learning apparatus of 1st Embodiment. 第1の実施形態のスコア推定装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the score estimation apparatus of 1st Embodiment. 第2の実施形態の学習装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the learning apparatus of 2nd Embodiment. 第2の実施形態の学習装置による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of processing by the learning apparatus of 2nd Embodiment. 第2の実施形態の学習装置による処理の概要を示す図である。It is a figure which shows the outline of the processing by the learning apparatus of 2nd Embodiment. 第2の実施形態のスコア推定装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the score estimation apparatus of 2nd Embodiment.
(第1の実施形態)
 以下、本発明の実施形態について図面を参照して説明する。図1は、第1の実施形態による学習装置1の内部構成を示すブロック図である。学習装置1は、訓練データ記憶部10、入力部11、推定部50、パラメータ更新部14、特徴量抽出用パラメータ記憶部15及びスコア推定用パラメータ記憶部16を備える。推定部50は、特徴量抽出部12と、スコア推定部13とを備える。
(First Embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an internal configuration of the learning device 1 according to the first embodiment. The learning device 1 includes a training data storage unit 10, an input unit 11, an estimation unit 50, a parameter update unit 14, a feature amount extraction parameter storage unit 15, and a score estimation parameter storage unit 16. The estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
 訓練データ記憶部10は、複数の動画データの各々と、複数の真値スコアの各々とが組み合わされた複数の訓練データを予め記憶する。 The training data storage unit 10 stores in advance a plurality of training data in which each of the plurality of moving image data and each of the plurality of true value scores are combined.
 複数の動画データの各々は、例えば、競技者が行った競技中の動作をカメラ等で撮影することにより生成される。ここで、競技とは、例えば、高飛び込みや体操などの技に対して定量的な採点基準が存在するスポーツ競技である。競技者とは、例えば、当該競技を行う選手である。 Each of the plurality of video data is generated, for example, by taking a picture of the movement performed by the athlete during the competition with a camera or the like. Here, the competition is a sports competition in which, for example, there is a quantitative scoring standard for techniques such as high diving and gymnastics. A player is, for example, a player who plays the sport.
 複数の真値スコアの各々は、各々に対応する動画データに記録されている競技者の競技に対して予めオフィシャルな審判員が採点したスコアである。 Each of the plurality of true value scores is a score scored by an official referee in advance for the competition of the athlete recorded in the video data corresponding to each.
 入力部11は、訓練データ記憶部10からn個ずつ訓練データを繰り返し読み出す。ここで、nは、2以上の整数であり、以下に説明する学習処理が行われる際のバッチサイズである。なお、訓練データ記憶部10が記憶する訓練データの数は、nの倍数、すなわち、n×m個であるものとする(ただし、mは、1以上の整数である)。 The input unit 11 repeatedly reads n training data from the training data storage unit 10. Here, n is an integer of 2 or more, and is a batch size when the learning process described below is performed. The number of training data stored by the training data storage unit 10 is assumed to be a multiple of n, that is, n × m (where m is an integer of 1 or more).
 また、以下の説明において、n個の訓練データに含まれる任意の1つの動画データをvまたはvで示し、動画データvに対応する真値スコアをgとして示し、動画データvに対応する真値スコアをgとして示す。ただし、i=1~n、j=1~nの整数であり、j>iであるとする。 In the following description, any one video data included in the n training data shown in v i or v j, indicates a true value score corresponding to the moving image data v i as g i, moving image data v j The true score corresponding to is shown as g j. However, it is assumed that it is an integer of i = 1 to n and j = 1 to n, and j> i.
 入力部11は、読み出したn個の訓練データに含まれるn個の動画データv1~nを1つずつ特徴量抽出部12に出力する。また、入力部11は、読み出したn個の訓練データに含まれるn個の真値スコアg1~nをパラメータ更新部14に出力する。 The input unit 11 outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the n read training data to the parameter update unit 14.
 特徴量抽出用パラメータ記憶部15は、特徴量抽出部12が有する第1の関数近似器に適用する重みやバイアスとなる特徴量抽出用パラメータを記憶する。特徴量抽出部12は、第1の関数近似器を有しており、特徴量抽出用パラメータ記憶部15が記憶する特徴量抽出用パラメータを第1の関数近似器に適用する。第1の関数近似器は、特徴量抽出用パラメータが適用されることで、特徴量抽出用パラメータに応じた関数を近似する。特徴量抽出部12は、入力部11が出力する動画データvを入力として第1の関数近似器に与えることにより動画データvの特徴量を抽出する。 The feature amount extraction parameter storage unit 15 stores the feature amount extraction parameters that serve as weights and biases applied to the first function approximator included in the feature amount extraction unit 12. The feature amount extraction unit 12 has a first function approximation device, and applies the feature amount extraction parameter stored in the feature amount extraction parameter storage unit 15 to the first function approximation device. The first function approximator approximates the function corresponding to the feature amount extraction parameter by applying the feature amount extraction parameter. Feature amount extraction unit 12 extracts the feature amount of video data v i by providing as input video data v i in which the input unit 11 outputs to the first function approximator.
 ここで、第1の関数近似器は、動画データvから特徴量を抽出する任意のニューラルネットワークであり、例えば、非特許文献1のFig.1に示される後段にReLU(Rectified Linear Unit)層とMax-Pooling層が結合された2段の畳み込み層を有するニューラルネットワーク(以下「動画特徴量抽出層121」という。)などが適用される。 Here, the first function approximation is any neural network for extracting a feature from the moving image data v i, for example, Non-Patent Document 1 Fig. A neural network having a two-stage convolutional layer in which a ReLU (Rectified Linear Unit) layer and a Max-Polling layer are connected to the latter stage shown in 1 (hereinafter referred to as “moving feature amount extraction layer 121”) or the like is applied.
 スコア推定用パラメータ記憶部16は、スコア推定部13が有する第2の関数近似器に適用する重みやバイアスとなるスコア推定用パラメータを記憶する。スコア推定部13は、第2の関数近似器を有しており、スコア推定用パラメータ記憶部16が記憶するスコア推定用パラメータを第2の関数近似器に適用する。第2の関数近似器は、スコア推定用パラメータが適用されることで、スコア推定用パラメータに応じた関数を近似する。スコア推定部13は、特徴量抽出部12が抽出した特徴量を入力として第2の関数近似器に与えることにより推定スコアsを推定する。 The score estimation parameter storage unit 16 stores score estimation parameters that serve as weights and biases to be applied to the second function approximator included in the score estimation unit 13. The score estimation unit 13 has a second function approximation device, and applies the score estimation parameters stored in the score estimation parameter storage unit 16 to the second function approximation device. The second function approximator approximates the function corresponding to the score estimation parameter by applying the score estimation parameter. Score estimator 13 estimates the estimated score s i by applying the second function approximator the feature amount by the feature extraction unit 12 has extracted as an input.
 ここで、第2の関数近似器は、特徴量から推定スコアを推定する任意のニューラルネットワークであり、例えば、非特許文献1のFig.1に示される後段にReLU層とDropout層が結合された2段の全結合層を有するニューラルネットワーク(以下「全結合層131」という。)などが適用される。 Here, the second function approximator is an arbitrary neural network that estimates the estimated score from the features, for example, Fig. A neural network having a two-stage fully connected layer in which a ReLU layer and a Dropout layer are connected (hereinafter referred to as “fully connected layer 131”) or the like is applied to the subsequent stage shown in 1.
 パラメータ更新部14は、入力部11が出力するn個の真値スコアg1~nと、スコア推定部13が推定するn個の推定スコアs1~nと、予め定められる第1の損失関数に基づいて、推定スコアs1~nの各々と、真値スコアg1~nの各々との間の回帰損失を算出する。 The parameter update unit 14 has n true score g 1 to n output by the input unit 11, n estimated scores s 1 to n estimated by the score estimation unit 13, and a predetermined first loss function. Based on, the regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n is calculated.
 ここでは、第1の損失関数として、例えば、回帰損失を算出する次式(2)に示すMSE(Mean Square Error)を適用する。 Here, as the first loss function, for example, MSE (Mean Square Error) shown in the following equation (2) for calculating the regression loss is applied.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 また、パラメータ更新部14は、異なる2つの動画データv,vの全ての組み合わせの各々に対応する2つの推定スコアs,s及び2つの真値スコアg,gと、予め定められる第2の損失関数とに基づいて、2つの推定スコアs,sの順序の誤り度合いを示すランキング損失を2つの真値スコアg,gの差の大きさを考慮して算出する。 Further, the parameter update unit 14 previously sets two estimated scores s i , s j and two true score g i , g j corresponding to each of all combinations of the two different moving image data v i , v j. based on the second loss function defined, the two estimated score s i, s 2 having a true value of the ranking loss indicating an error degree of order of j score g i, in consideration of the magnitude of the difference g j calculate.
 ここでは、第2の損失関数として、次式(3)で示される損失関数を適用する。 Here, as the second loss function, the loss function represented by the following equation (3) is applied.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 非特許文献1に記載の技術において採用されていた式(1)と比較すると、式(3)では、マージン値δに替えて、2つの真値スコアg,gの差の絶対値が適用されている。なお、式(1)と同様に、式(3)において、sign(x)関数は、引数xの符号を戻り値とする関数であり、ReLU(x)は、引数xが0以上である場合、xを戻り値とし、引数xが0より小さい場合、0を戻り値とする関数である。 Compared with the formula (1) adopted in the technique described in Non-Patent Document 1, in the formula (3), the absolute value of the difference between the two true score g i and g j is replaced with the margin value δ. It has been applied. Similar to the equation (1), in the equation (3), the sign (x) function is a function whose return value is the sign of the argument x, and ReLU (x) is the case where the argument x is 0 or more. , X is the return value, and when the argument x is smaller than 0, 0 is the return value.
 パラメータ更新部14は、算出した回帰損失、すなわち式(2)の出力値であるLoss1と、算出したランキング損失、すなわち式(3)の出力値であるLoss2を減少させるように学習処理を行う。パラメータ更新部14は、学習処理を行うことにより、新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータを算出する。 The parameter update unit 14 performs learning processing so as to reduce the calculated regression loss, that is, Loss1, which is the output value of the equation (2), and the calculated ranking loss, that is, Loss2, which is the output value of the equation (3). The parameter update unit 14 calculates a new parameter for feature quantity extraction and a new parameter for score estimation by performing the learning process.
 パラメータ更新部14は、算出した新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータとに基づいて、特徴量抽出用パラメータ記憶部15とスコア推定用パラメータ記憶部16の内容を更新する。 The parameter update unit 14 updates the contents of the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 based on the calculated new feature amount extraction parameter and the new score estimation parameter.
(第1の実施形態の学習装置による処理)
 次に、図2及び図3を参照しつつ第1の実施形態の学習装置1による処理について説明する。図2は、学習装置1が行う学習処理の流れを示すフローチャートである。
(Processing by the learning device of the first embodiment)
Next, the process by the learning device 1 of the first embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a flowchart showing the flow of the learning process performed by the learning device 1.
 特徴量抽出用パラメータ記憶部15とスコア推定用パラメータ記憶部16には、それぞれ初期値の特徴量抽出用パラメータと、初期値のスコア推定用パラメータとが予め記憶されている。 The feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 store the initial value feature amount extraction parameters and the initial value score estimation parameters in advance, respectively.
 特徴量抽出部12は、特徴量抽出用パラメータ記憶部15から特徴量抽出用パラメータを読み出し、読み出した特徴量抽出用パラメータを第1の関数近似器である動画特徴量抽出層121のニューラルネットワークに適用する(ステップS1)。 The feature amount extraction unit 12 reads out the feature amount extraction parameter from the feature amount extraction parameter storage unit 15, and transfers the read feature amount extraction parameter to the neural network of the moving image feature amount extraction layer 121 which is the first function approximation device. Apply (step S1).
 スコア推定部13は、スコア推定用パラメータ記憶部16からスコア推定用パラメータを読み出し、読み出したスコア推定用パラメータを第2の関数近似器である全結合層131のニューラルネットワークに適用する(ステップS2)。 The score estimation unit 13 reads out the score estimation parameters from the score estimation parameter storage unit 16, and applies the read score estimation parameters to the neural network of the fully connected layer 131, which is the second function approximation device (step S2). ..
 入力部11は、訓練データ記憶部10から最初のn個の訓練データを読み出す。入力部11は、図3に示すように読み出したn個の訓練データに含まれるn個の動画データv1~nを1つずつ特徴量抽出部12に出力する。また、入力部11は、読み出した訓練データに含まれるn個の真値スコアg1~nをパラメータ更新部14に出力する。パラメータ更新部14は、入力部11が出力するn個の真値スコアg1~nを取り込む(ステップS3)。 The input unit 11 reads the first n training data from the training data storage unit 10. As shown in FIG. 3, the input unit 11 outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11 outputs n true value scores g 1 to n included in the read training data to the parameter update unit 14. The parameter update unit 14 takes in n true value scores g 1 to n output by the input unit 11 (step S3).
 n個の動画データv1~nの各々である動画データvについて、ステップS4,S5の処理が繰り返し行われる(ループL1s~L1e)。 are each of the n video data v 1 ~ n about video data v i, the processing of steps S4, S5 are repeated (loop L1s ~ L1e).
 特徴量抽出部12は、図3に示すように動画データvを入力として動画特徴量抽出層121に与え、動画特徴量抽出層121の出力として動画データvの特徴量を取得する。特徴量抽出部12は、取得した動画データvの特徴量をスコア推定部13に出力する(ステップS4)。 Feature amount extraction unit 12 gives the video feature quantity extraction layer 121 video data v i as an input, as shown in FIG. 3, obtains a feature amount of video data v i as the output of the video feature quantity extraction layer 121. Feature amount extraction unit 12 outputs the feature quantity of the acquired moving image data v i in the score estimation unit 13 (step S4).
 スコア推定部13は、図3に示すように動画データvの特徴量を入力として全結合層131に与え、全結合層131の出力として動画データvの推定スコアsを取得する。スコア推定部13は、取得した動画データvの推定スコアsをパラメータ更新部14に出力する(ステップS5)。 Score estimator 13 gives the total binding layer 131 as inputs the feature quantity of the moving image data v i, as shown in FIG. 3, to obtain the estimated score s i of the moving image data v i as the output of the total binding layer 131. Score estimator 13 outputs the estimated score s i of the acquired moving image data v i in the parameter update unit 14 (step S5).
 すなわち、図3に示すように、同一の特徴量抽出用パラメータと、同一のスコア推定用パラメータとを、それぞれ動画特徴量抽出層121と、全結合層131に適用した状態で、n個の動画データv1~nの各々を入力として、ステップS4,S5の処理がn回行われる。 That is, as shown in FIG. 3, n moving images are applied to the moving image feature amount extraction layer 121 and the fully connected layer 131, respectively, with the same feature amount extraction parameters and the same score estimation parameters. The processing of steps S4 and S5 is performed n times with each of the data v1 to n as an input.
 パラメータ更新部14は、スコア推定部13が推定したn個の推定スコアs1~nを取り込むと、取り込んだn個の推定スコアs1~nと、ステップS3において取り込んだn個の真値スコアg1~nとに基づいて、式(2)により回帰損失Loss1を算出する(ステップS6)。 When the parameter update unit 14 captures the n estimated scores s 1 to n estimated by the score estimation unit 13, the fetched n estimated scores s 1 to n and the n true score fetched in step S3. The regression loss Loss 1 is calculated by the equation (2) based on g 1 to n (step S6).
 パラメータ更新部14は、n個の推定スコアs1~nと、n個の真値スコアg1~nとに基づいて、式(3)によりランキング損失Loss2を算出する(ステップS7)。 The parameter update unit 14 calculates the ranking loss Loss2 by the equation (3) based on n estimated scores s 1 to n and n true score g 1 to n (step S7).
 パラメータ更新部14は、例えば、次式(4)により評価損失Lossを算出する(ステップS8)。 The parameter update unit 14 calculates the evaluation loss Loss by, for example, the following equation (4) (step S8).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 上記の式(4)において、αとβは、α>0、β>0であり、2つの損失のバランスを取るように任意に定められる定数である。また、||ω||は、L2-reguralationの項である。 In the above equation (4), α 1 and β 1 are α 1 > 0 and β 1 > 0, which are constants arbitrarily determined to balance the two losses. Further, || ω || 2 is a term of L2-regulation.
 パラメータ更新部14は、算出した評価損失Lossが終了条件を満たしているか否かを判定する(ステップS9)。例えば、評価損失Lossが予め定められる閾値未満である場合、評価損失が終了条件を満たしていると判定する。 The parameter update unit 14 determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S9). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
 パラメータ更新部14は、評価損失Lossが終了条件を満たしていると判定した場合(ステップS9、Yes)、処理を終了する。一方、パラメータ更新部14は、評価損失Lossが終了条件を満たしていないと判定した場合(ステップS9、No)、回帰損失Loss1とランキング損失Loss2を減少させるように、例えば、誤差逆伝播法などを用いた学習処理により、新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータとを算出する。 When the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition (step S9, Yes), the parameter update unit 14 ends the process. On the other hand, when the parameter update unit 14 determines that the evaluation loss Loss does not satisfy the end condition (steps S9, No), the parameter update unit 14 uses, for example, an error backpropagation method so as to reduce the regression loss Loss1 and the ranking loss Loss2. By the learning process used, a new parameter for feature quantity extraction and a new parameter for score estimation are calculated.
 パラメータ更新部14は、算出した新たな特徴量抽出用パラメータを特徴量抽出用パラメータ記憶部15に書き込んで特徴量抽出用パラメータを更新する。パラメータ更新部14は、算出した新たなスコア推定用パラメータをスコア推定用パラメータ記憶部16に書き込んでスコア推定用パラメータを更新する(ステップS10)。 The parameter update unit 14 writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 and updates the feature amount extraction parameter. The parameter update unit 14 writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter (step S10).
 その後、ステップS1からの処理が繰り返し行われ、再び行われるステップS1において、特徴量抽出部12は、特徴量抽出用パラメータ記憶部15から更新された特徴量抽出用パラメータを読み出して動画特徴量抽出層121に適用する。また、再び行われるステップS2において、スコア推定部13は、スコア推定用パラメータ記憶部16から更新されたスコア推定用パラメータを読み出して全結合層131に適用する。 After that, the processing from step S1 is repeated, and in step S1 to be performed again, the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S2 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.
 入力部11は、再び行われるステップS3において、訓練データ記憶部10から次のn個の訓練データを読み出す。なお、繰り返す処理の過程において、訓練データ記憶部10に記憶されている全ての訓練データについて、ステップS4,S5の処理が行われた場合、入力部11は、再び最初のn個の訓練データから順に訓練データ記憶部10から読み出すことを繰り返す。 The input unit 11 reads out the next n training data from the training data storage unit 10 in step S3 to be performed again. In the process of repeating the process, when the processes of steps S4 and S5 are performed for all the training data stored in the training data storage unit 10, the input unit 11 again starts from the first n training data. The reading from the training data storage unit 10 is repeated in order.
 ステップS9において、パラメータ更新部14が、評価損失Lossが、終了条件を満たすと判定した時点で、特徴量抽出用パラメータ記憶部15とスコア推定用パラメータ記憶部16の各々において、十分に回帰損失Loss1と、ランキング損失Loss2が小さくなった状態における学習済みの特徴量抽出用パラメータと、学習済みのスコア推定用パラメータとが記録されることになる。 In step S9, when the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition, the parameter storage unit 15 for feature quantity extraction and the parameter storage unit 16 for score estimation each have a sufficient regression loss Loss1. Then, the learned feature amount extraction parameter and the learned score estimation parameter in the state where the ranking loss Loss2 is reduced are recorded.
 上記の第1の実施形態の学習装置1において、パラメータ更新部14は、スコア推定部13が推定した複数の推定スコアの各々と、推定スコアの各々に対応する真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの動画データの全ての組み合わせの各々に対応する2つの推定スコア及び2つの真値スコアに基づいて、2つの推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの真値スコアの差の大きさを考慮してランキング損失を補正する第2の損失関数とに基づいて、第1の損失関数の出力である回帰損失及び第2の損失関数の出力であるランキング損失の各々を減少させる学習処理を行うことにより、推定部50の関数近似器(第1の関数近似器及び第2の関数近似器)に適用するパラメータ、すなわち特徴量抽出用パラメータと、スコア推定用パラメータとを更新する。第2の損失関数を用いることにより、以下に示すように、学習装置1は、非特許文献1に記載の技術よりも更に正確に競技者の競技に対するオフィシャルな審判員の採点の手法に関するノウハウを学習することが可能になる。 In the learning device 1 of the first embodiment described above, the parameter update unit 14 is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. Based on the first loss function for the regression loss and the two estimated scores and the two true scores corresponding to each of all combinations of two different video data, the degree of misorder between the two estimated scores. The output of the first loss function based on the second loss function that finds the ranking loss and corrects the ranking loss by taking into account the magnitude of the difference between the two true scores. By performing learning processing to reduce each of the regression loss, which is the output of the second loss function, and the ranking loss, which is the output of the second loss function, the function approximator of the estimation unit 50 (first function approximator and second function approximator). The parameters applied to, that is, the parameters for feature quantity extraction and the parameters for score estimation are updated. By using the second loss function, as shown below, the learning device 1 provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
 すなわち、上記の第1の実施形態の学習装置1では、ランキング損失として、非特許文献1に開示される技術が採用していた式(1)に替えて、式(3)を用いている。以下、場合分けして、式(3)による効果について説明する。 That is, in the learning device 1 of the first embodiment described above, the formula (3) is used as the ranking loss instead of the formula (1) adopted by the technique disclosed in Non-Patent Document 1. Hereinafter, the effect of the equation (3) will be described separately for each case.
(推定スコアs,sの大小と、真値スコアg,gの大小が一致する場合)
 この場合、式(1)及び式(3)における「-(s-s)sign(g-g)」の項は、負の値になる。
(When the estimated score s i, and the magnitude of s j, true value score g i, the magnitude of g j matches)
In this case, in the formula (1) and (3) - the "(s j -s i) sign ( g j -g i) " is a negative value.
 この場合において、式(1)を適用すると、abs(s-s)<マージン値δのときには、ReLU関数の入力が正の値になるため、ランキング損失が発生することになり、ランキング損失を減少させる学習処理が行われることになる。既に、推定スコアs,sの大小と、真値スコアg,gの大小が一致しているため、ここで行われる学習処理は、推定スコアs,sの順序を入れ替える学習処理ではなく、推定スコアs,sの間を離す学習処理となる。 In this case, when applying equation (1), when abs (s j -s i) <margin value δ is, the input of ReLU function becomes a positive value, will be ranking loss occurs, ranking loss The learning process will be performed to reduce. Already, since the estimated score s i, and the magnitude of s j, true value score g i, the magnitude of g j are matched, the learning process performed here is, change the order of the estimated score s i, s j learning It is not a process, but a learning process that separates the estimated scores s i and s j.
 これに対して、式(3)を適用した場合には、abs(s-s)<abs(g-g)になると、ReLU関数の入力が正になるため、ランキング損失が発生する。abs(g-g)<マージン値δであるときには、式(1)のランキング損失の方が、式(3)のランキング損失よりも大きくなるため、式(1)を用いると真値スコアg,gの差の絶対値よりも、推定スコアs,sの差の絶対値を大きくする学習処理が行われる。 If the result of applying equation (3) it comes to abs (s j -s i) < abs (g j -g i), since the input ReLU function is positive, ranking losses occur do. when in abs (g j -g i) <margin value δ is towards the ranking loss equation (1) is, to become greater than the ranking loss of Formula (3), the true value score Using equation (1) A learning process is performed in which the absolute value of the difference between the estimated scores s i and s j is made larger than the absolute value of the difference between g i and g j.
 一方、abs(g-g)>マージン値δであるときには、式(1)のランキング損失の方が、式(3)のランキング損失よりも小さくなるため、式(1)を用いると真値スコアg,gの差の絶対値よりも、推定スコアs,sの差の絶対値を小さくする学習処理が行われる。 On the other hand, when it is abs (g j -g i)> margin value δ is towards the ranking loss equation (1) is, to become smaller than the ranking loss of Formula (3), the use of equation (1) true A learning process is performed in which the absolute value of the difference between the estimated scores s i and s j is smaller than the absolute value of the difference between the value scores g i and g j.
 したがって、推定スコアs,sの大小と、真値スコアg,gの大小が一致している場合、式(1)を用いるよりも式(3)を用いる方が、より正確に推定スコアs,sの差の絶対値を、真値スコアg,gの差の絶対値に近づける学習処理を行うことが可能になる。 Therefore, the estimated score s i, and the magnitude of s j, true value score g i, if the magnitude of g j are coincident, is better to use Equation (3) than using equation (1), more precisely estimated score s i, the absolute value of the difference between s j, it is possible to perform a learning process to approach the absolute value of the difference between the true value score g i, g j.
(推定スコアs,sの大小と、真値スコアg,gの大小が不一致の場合)
 この場合、式(1)及び式(3)における「-(s-s)sign(g-g)」の項は、正の値になる。そのため、マージン値δ、abs(g-g)のいずれも、ランキング損失を増加させる役割を果たすことになる。この場合も推定スコアs,sの差が小さい場合には、式(3)を用いる方が、abs(g-g)の大きさに応じて、推定スコアs,sの差の絶対値を増加させることができるので、より正確に推定スコアs,sの差の絶対値を、真値スコアg,gの差の絶対値に近づける学習処理を行うことが可能になる。
(Estimated score s i, and the magnitude of s j, if the true value score g i, the magnitude of g j mismatch)
In this case, in the formula (1) and (3) - the "(s j -s i) sign ( g j -g i) " is a positive value. Therefore, the margin value [delta], none of the abs (g j -g i), will serve to increase the ranking loss. In this case also the estimated score s i, when the difference between the s j is small, better to use Equation (3) is, according to the magnitude of abs (g j -g i), the estimated score s i, the s j it is possible to increase the absolute value of the difference, more accurately estimated score s i, the absolute value of the difference between s j, is possible to perform learning processing to approach the absolute value of the difference between the true value score g i, g j It will be possible.
(第1の実施形態のスコア推定装置)
 図4は、第1の実施形態によるスコア推定装置2の内部構成を示すブロック図である。図4において、図1に示す学習装置1と同一の構成については同一の符号を付して示している。スコア推定装置2は、入力部11-1、推定部50、出力部17、学習済み特徴量抽出用パラメータ記憶部18及び学習済みスコア推定用パラメータ記憶部19を備える。推定部50は、特徴量抽出部12と、スコア推定部13とを備える。
(Score estimation device of the first embodiment)
FIG. 4 is a block diagram showing an internal configuration of the score estimation device 2 according to the first embodiment. In FIG. 4, the same configuration as that of the learning device 1 shown in FIG. 1 is shown with the same reference numerals. The score estimation device 2 includes an input unit 11-1, an estimation unit 50, an output unit 17, a learned feature amount extraction parameter storage unit 18, and a learned score estimation parameter storage unit 19. The estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.
 上述したように、図2に示したステップS9において、パラメータ更新部14が、「Yes」の判定、すなわち、評価損失Lossが終了条件を満たすと判定した場合、特徴量抽出用パラメータ記憶部15とスコア推定用パラメータ記憶部16には、それぞれ学習済みの特徴量抽出用パラメータと、学習済みのスコア推定用パラメータとが記録されることになる。 As described above, in step S9 shown in FIG. 2, when the parameter update unit 14 determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and The learned feature amount extraction parameters and the learned score estimation parameters are recorded in the score estimation parameter storage unit 16, respectively.
 学習済み特徴量抽出用パラメータ記憶部18は、学習装置1の学習処理が終了した時点で特徴量抽出用パラメータ記憶部15に記録されている学習済みの特徴量抽出用パラメータを予め記憶する。学習済みスコア推定用パラメータ記憶部19は、学習装置1の学習処理が終了した時点でスコア推定用パラメータ記憶部16に記録されている学習済みのスコア推定用パラメータを予め記憶する。 The learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1 is completed. The learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1 is completed.
 入力部11-1は、外部から与えられる任意の動画データを取り込む。入力部11-1は、取り込んだ動画データを特徴量抽出部12に出力する。 Input unit 11-1 takes in arbitrary video data given from the outside. The input unit 11-1 outputs the captured moving image data to the feature amount extraction unit 12.
 特徴量抽出部12は、学習済み特徴量抽出用パラメータ記憶部18から学習済みの特徴量抽出用パラメータを読み出し、動画特徴量抽出層121に読み出した学習済みの特徴量抽出用パラメータを適用する。特徴量抽出部12は、入力部11-1が出力する動画データを入力として動画特徴量抽出層121に与え、出力として動画データの特徴量を取得し、取得した特徴量をスコア推定部13に出力する。 The feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121. The feature amount extraction unit 12 gives the moving image data output by the input unit 11-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
 スコア推定部13は、学習済みスコア推定用パラメータ記憶部19から学習済みのスコア推定用パラメータを読み出し、全結合層131に読み出した学習済みのスコア推定用パラメータを適用する。スコア推定部13は、特徴量抽出部12が出力する特徴量を入力として全結合層131に与え、出力として推定スコアを取得し、取得した推定スコアを出力部17に出力する。出力部17は、スコア推定部13が出力する推定スコアを外部に出力する。 The score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131. The score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17. The output unit 17 outputs the estimated score output by the score estimation unit 13 to the outside.
 上記の第1の実施形態のスコア推定装置2において、推定部50は、学習装置1の学習処理によって得られた学習済みのパラメータ(学習済みの特徴量抽出用パラメータ及び学習済みのスコア推定用パラメータ)に基づいて関数を近似する関数近似器(第1の関数近似器及び第2の関数近似器)を有し、関数近似器に動画データを入力として与えることにより、当該動画データの推定スコアを推定する。これにより、スコア推定装置2は、非特許文献1に記載の技術よりも更に正確にオフィシャルな審判員の採点の手法に関するノウハウを学習する学習装置1の学習処理によって得られた学習済みの特徴量抽出用パラメータと、得られた学習済みのスコア推定用パラメータとに基づいて、任意の動画データに対する推定スコアを求めることができるので、より正確な推定スコアを求めることが可能になる。 In the score estimation device 2 of the first embodiment described above, the estimation unit 50 uses the learning device 1 to obtain learned parameters (learned feature amount extraction parameters and learned score estimation parameters). ) Has a function approximater (first function approximater and second function approximater) that approximates the function based on), and by giving the moving image data to the function approximator as an input, the estimated score of the moving image data can be obtained. presume. As a result, the score estimation device 2 is a learned feature amount obtained by the learning process of the learning device 1 that learns the know-how regarding the scoring method of the official referee more accurately than the technique described in Non-Patent Document 1. Since the estimated score for arbitrary moving image data can be obtained based on the extraction parameter and the obtained learned score estimation parameter, it is possible to obtain a more accurate estimated score.
(第2の実施形態)
 図5は、第2の実施形態による学習装置1aの内部構成を示すブロック図である。第1の実施形態の学習装置1と同一の構成については同一の符号を付し、以下、異なる構成について説明する。学習装置1aは、訓練データ記憶部10a、入力部11a、推定部50a、パラメータ更新部14a、特徴量抽出用パラメータ記憶部15、スコア推定用パラメータ記憶部16及びクラス推定用パラメータ記憶部21を備える。推定部50aは、特徴量抽出部12と、スコア推定部13と、クラス推定部20とを備える。
(Second Embodiment)
FIG. 5 is a block diagram showing an internal configuration of the learning device 1a according to the second embodiment. The same configurations as those of the learning device 1 of the first embodiment are designated by the same reference numerals, and different configurations will be described below. The learning device 1a includes a training data storage unit 10a, an input unit 11a, an estimation unit 50a, a parameter update unit 14a, a feature amount extraction parameter storage unit 15, a score estimation parameter storage unit 16, and a class estimation parameter storage unit 21. .. The estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
 訓練データ記憶部10aは、複数の動画データの各々と、複数の真値スコアの各々と、複数の真値クラスラベルの各々とが組み合わされた複数の訓練データを予め記憶する。 The training data storage unit 10a stores in advance a plurality of training data in which each of the plurality of moving image data, each of the plurality of true value scores, and each of the plurality of true value class labels are combined.
 複数の動画データは、各々の動画データに記録される内容に基づいて予め定められる複数のクラスに分類されている。ここで、クラスとは、例えば、高飛び込みや体操などの採点基準の異なる競技の種類である。真値クラスラベルは、対応動画データが分類によって属することになったクラスを示す識別情報である。 A plurality of video data are classified into a plurality of predetermined classes based on the contents recorded in each video data. Here, the class is a type of competition having different scoring criteria such as high diving and gymnastics. The true value class label is identification information indicating the class to which the corresponding moving image data belongs by classification.
 入力部11aは、訓練データ記憶部10aからn個ずつ訓練データを繰り返し読み出す。ここで、nは、2以上の整数であり、以下に説明する学習処理が行われる際のバッチサイズである。なお、訓練データ記憶部10aが記憶する訓練データの数は、nの倍数、すなわち、n×m個であるものとする(ただし、mは、1以上の整数である)。 The input unit 11a repeatedly reads n training data from the training data storage unit 10a. Here, n is an integer of 2 or more, and is a batch size when the learning process described below is performed. The number of training data stored by the training data storage unit 10a is assumed to be a multiple of n, that is, n × m (where m is an integer of 1 or more).
 また、以下の説明において、n個の訓練データに含まれる任意の1つの動画データをvまたはvで示し、動画データvに対応する真値スコアをgとして示し、動画データvに対応する真値スコアをgとして示す。また、動画データvに対応する真値クラスラベルをkとして示し、動画データvに対応する真値クラスラベルをkとして示す。ただし、i=1~n、j=1~nの整数であり、j>iであるとする。 In the following description, any one video data included in the n training data shown in v i or v j, indicates a true value score corresponding to the moving image data v i as g i, moving image data v j The true score corresponding to is shown as g j. Also shows a true value class label corresponding to the moving image data v i as k i, indicating the true value class label corresponding to the moving image data v j as k j. However, it is assumed that it is an integer of i = 1 to n and j = 1 to n, and j> i.
 入力部11aは、読み出したn個の訓練データに含まれるn個の動画データv1~nを1つずつ特徴量抽出部12に出力する。また、入力部11aは、読み出したn個の訓練データに含まれるn個の真値スコアg1~nと、n個の真値クラスラベルk1~nとをパラメータ更新部14aに出力する。 The input unit 11a outputs n moving image data v 1 to n included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n included in the n read training data and n true value class labels k 1 to n to the parameter update unit 14a.
 クラス推定用パラメータ記憶部21は、クラス推定部20が有する第3の関数近似器に適用する重みやバイアスとなるクラス推定用パラメータを記憶する。クラス推定部20は、第3の関数近似器を有しており、クラス推定用パラメータ記憶部21が記憶するクラス推定用パラメータを第3の関数近似器に適用する。第3の関数近似器は、クラス推定用パラメータが適用されることで、クラス推定用パラメータに応じた関数を近似する。クラス推定部20は、特徴量抽出部12が抽出した特徴量を入力として第3の関数近似器に与えることにより推定クラスcを推定する。ここで、推定クラスcは、クラスごとの確率で示される情報であり、推定クラスcを参照することで、対応する動画データvが何れのクラスに属する確率が高いかを識別することができる。 The class estimation parameter storage unit 21 stores class estimation parameters that serve as weights and biases to be applied to the third function approximator of the class estimation unit 20. The class estimator 20 has a third function approximation device, and applies the class estimation parameters stored in the class estimation parameter storage unit 21 to the third function approximation device. The third function approximator approximates the function corresponding to the class estimation parameter by applying the class estimation parameter. Class estimator 20 estimates the estimated class c i by applying the third function approximator the feature amount by the feature extraction unit 12 has extracted as an input. Here, the estimated class c i is the information indicated by the probability of each class, by reference to the estimated class c i, to identify whether the probability of belonging to any corresponding moving image data v i class higher Can be done.
 ここで、第3の関数近似器は、特徴量から推定クラスを推定する任意のニューラルネットワークであり、例えば、後段にSoftmax層が結合された全結合層のニューラルネットワーク(以下「全結合層+Softmax層201」という。)などが適用される。 Here, the third function approximator is an arbitrary neural network that estimates the estimation class from the features. For example, a neural network of a fully connected layer in which the Softmax layer is connected in the subsequent stage (hereinafter, "fully connected layer + Softmax layer"). 201 ”) and the like are applied.
 パラメータ更新部14aは、第1の実施形態のパラメータ更新部14と同様に、入力部11aが出力するn個の真値スコアg1~nと、スコア推定部13が推定するn個の推定スコアs1~nと、上記の式(2)で示される第1の損失関数とに基づいて、推定スコアs1~nの各々と、真値スコアg1~nの各々との間の回帰損失を算出する。 Similar to the parameter update unit 14 of the first embodiment, the parameter update unit 14a has n true value scores g 1 to n output by the input unit 11a and n estimated scores estimated by the score estimation unit 13. Regression loss between each of the estimated scores s 1 to n and each of the true score g 1 to n based on s 1 to n and the first loss function represented by the above equation (2). Is calculated.
 また、パラメータ更新部14aは、入力部11aが出力するn個の真値クラスラベルk1~nと、クラス推定部20が推定するn個の推定クラスc1~nと、予め定められる第3の損失関数に基づいて、推定クラスs1~nの各々と、真値クラスラベルk1~nの各々との間のクラス損失を算出する。 Further, the parameter update unit 14a has n true value class labels k 1 to n output by the input unit 11a, n estimation classes c 1 to n estimated by the class estimation unit 20, and a third predetermined class. Based on the loss function of, the class loss between each of the estimated classes s 1 to n and each of the true value class labels k 1 to n is calculated.
 ここでは、第3の損失関数として、例えば、次式(5)に示すCross Entropy Lossを適用する。 Here, for example, Cross Entropy Loss shown in the following equation (5) is applied as the third loss function.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 式(5)において、Yは、クラスの数である。例えば、Y=3であり、3つのクラスをClass1,Class2,Class3として示すとする。i=1の動画データvが、Class1のクラスに属している場合、Class1に属する確率が100%になり、Class2及びClass3に属する確率は、0%になる。この場合、真値クラスラベルk1,yは、例えば、k1,1=1.0、k1,2=0.0、k1,3=0.0という形式で示される。推定クラスc1,yは、対応する動画データvが3つのクラスの各々に属する確率、例えば、c1,1=0.8、c1,2=0.5、c1,3=0.2という形式で示される。 In equation (5), Y is the number of classes. For example, assume that Y = 3 and the three classes are shown as Class1, Class2, and Class3. When the moving image data v 1 with i = 1 belongs to the class of Class 1, the probability of belonging to Class 1 is 100%, and the probability of belonging to Class 2 and Class 3 is 0%. In this case, the true value class labels k 1, y are shown in the form of, for example, k 1 , 1 = 1.0, k 1, 2, = 0.0, k 1, 3 = 0.0. The estimation classes c 1, y are the probabilities that the corresponding video data v 1 belongs to each of the three classes, for example, c 1 , 1 = 0.8, c 1, 2, = 0.5, c 1 , 3 = 0. It is shown in the form of .2.
 また、パラメータ更新部14aは、異なる2つの動画データv,vの全ての組み合わせの各々に対応する2つの推定スコアs,s、2つの真値スコアg,g及び2つの推定クラスc,cと、予め定められる第4の損失関数とに基づいて、2つの推定スコアs,sの順序の誤り度合いを示すランキング損失を2つの真値スコアg,gの差の大きさを考慮し、かつ2つの推定クラスc,cの間の相関を考慮して算出する。 The parameter update unit 14a, two different moving image data v i, v all combinations of each corresponding to the two estimated scores s i of j, s j, two true value score g i, g j and two Based on the estimation classes c i and c j and the predetermined fourth loss function, the ranking loss indicating the degree of error in the order of the two estimation scores s i and s j is the two true scores g i and g. considering the size of the difference between j, and two putative class c i, it is calculated by considering the correlation between c j.
 ここでは、第4の損失関数として、次式(6)で示される損失関数を適用する。 Here, as the fourth loss function, the loss function represented by the following equation (6) is applied.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 式(6)を、第1実施形態の第2の損失関数である式(3)と比較すると、式(3)のReLU関数の出力に対して、correlationを乗算しているという違いがある。 Comparing the equation (6) with the equation (3) which is the second loss function of the first embodiment, there is a difference that the output of the ReLU function of the equation (3) is multiplied by the correlation.
 式(6)において、correlationは、2つの推定クラスc,cの類似度を示す相関係数である。ここでは、相関係数として、例えば、式(7)によって求められるスピアマンの順位相関係数を適用する。 In the formula (6), correlation is two putative class c i, a correlation coefficient indicating a degree of similarity c j. Here, for example, Spearman's rank correlation coefficient obtained by the equation (7) is applied as the correlation coefficient.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 式(7)において、Yは、式(5)と同様にクラスの数である。CRi、yは、推定クラスcにおけるクラスyの順位である。例えば、Y=3である場合に、推定クラスcがci,1=0.5、ci,2=0.8、ci,3=0.2で表される場合、Class2に属する確率が1位であり、Class1に属する確率が2位であり、Class3に属する確率が3位である。この場合、CRi,1=2、CRi,2=1、CRi,3=3となる。 In equation (7), Y is the number of classes as in equation (5). CR i, y is the order of class y in estimated class c i. For example, in the case of Y = 3, if the estimated class c i is c i, 1 = 0.5, c i, 2 = 0.8, represented by c i, 3 = 0.2, belongs to Class2 The probability is 1st, the probability of belonging to Class1 is 2nd, and the probability of belonging to Class3 is 3rd. In this case, CR i, 1 = 2, CR i, 2 = 1, CR i, 3 = 3.
 パラメータ更新部14aは、算出した回帰損失、すなわち式(2)の出力値であるLoss1と、算出したクラス損失、すなわち式(5)の出力値であるLoss3と、算出したランキング損失、すなわち式(6)の出力値であるLoss4を減少させるように学習処理を行う。パラメータ更新部14aは、学習処理により、新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータと、新たなクラス推定用パラメータとを算出する。 The parameter update unit 14a includes the calculated regression loss, that is, Loss1, which is the output value of the equation (2), the calculated class loss, that is, Loss3, which is the output value of the equation (5), and the calculated ranking loss, that is, the equation ( The learning process is performed so as to reduce the class 4 which is the output value of 6). The parameter update unit 14a calculates a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter by learning processing.
 パラメータ更新部14aは、算出した新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータと、新たなクラス推定用パラメータとに基づいて、特徴量抽出用パラメータ記憶部15、スコア推定用パラメータ記憶部16及びクラス推定用パラメータ記憶部21の内容を更新する。 The parameter update unit 14a stores the feature quantity extraction parameter storage unit 15 and the score estimation parameter storage based on the calculated new feature quantity extraction parameter, new score estimation parameter, and new class estimation parameter. The contents of the unit 16 and the parameter storage unit 21 for class estimation are updated.
(第2の実施形態の学習装置による処理)
 次に、図6及び図7を参照しつつ第2の実施形態の学習装置1aによる処理について説明する。図6は、学習装置1aが行う学習処理の流れを示すフローチャートである。
(Processing by the learning device of the second embodiment)
Next, the process by the learning device 1a of the second embodiment will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart showing the flow of the learning process performed by the learning device 1a.
 特徴量抽出用パラメータ記憶部15と、スコア推定用パラメータ記憶部16と、クラス推定用パラメータ記憶部21とには、それぞれ初期値の特徴量抽出用パラメータと、初期値のスコア推定用パラメータと、初期値のクラス推定用パラメータとが予め記憶されている。 The feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 have an initial value feature amount extraction parameter, an initial value score estimation parameter, and an initial value score estimation parameter, respectively. Initial value class estimation parameters are stored in advance.
 ステップS21,S22については、図2に示した第1の実施形態のステップS1,S2と同一の処理が、特徴量抽出部12とスコア推定部13によって行われる。クラス推定部20は、クラス推定用パラメータ記憶部21からクラス推定用パラメータを読み出し、読み出したクラス推定用パラメータを第3の関数近似器である全結合層+Softmax層201のニューラルネットワークに適用する(ステップS23)。 Regarding steps S21 and S22, the same processing as steps S1 and S2 of the first embodiment shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13. The class estimation unit 20 reads out the class estimation parameters from the class estimation parameter storage unit 21, and applies the read class estimation parameters to the neural network of the fully connected layer + Softmax layer 201, which is a third function approximation device (step). S23).
 入力部11aは、訓練データ記憶部10aから最初のn個の訓練データを読み出す。入力部11aは、図7に示すように読み出したn個の訓練データに含まれるn個の動画データv1~nを1つずつ特徴量抽出部12に出力する。また、入力部11aは、読み出した訓練データに含まれるn個の真値スコアg1~nとn個の真値クラスラベルk1~nをパラメータ更新部14aに出力する。パラメータ更新部14aは、入力部11aが出力するn個の真値スコアg1~nとn個の真値クラスラベルk1~nを取り込む(ステップS24)。 The input unit 11a reads the first n training data from the training data storage unit 10a. As shown in FIG. 7, the input unit 11a outputs n moving image data v 1 to n included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11a outputs n true value scores g 1 to n and n true value class labels k 1 to n included in the read training data to the parameter update unit 14a. The parameter update unit 14a takes in n true value scores g 1 to n and n true value class labels k 1 to n output by the input unit 11a (step S24).
 n個の動画データv1~nの各々である動画データvについて、ステップS25,S26,S27の処理が繰り返し行われる(ループL2s~L2e)。 n are each the number of moving image data v 1 ~ n about video data v i, the processing of step S25, S26, S27 are repeated (loop L2s ~ L2e).
 ステップS25,S26については、図2に示したステップS4,S5と同一の処理が、特徴量抽出部12とスコア推定部13によって行われる。なお、ステップS26において、スコア推定部13は、取得した推定スコアsをパラメータ更新部14aに出力する。 For steps S25 and S26, the same processing as in steps S4 and S5 shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13. Incidentally, in step S26, the score estimation unit 13 outputs the estimated score s i obtained in the parameter update unit 14a.
 クラス推定部20は、図7に示すように動画データvの特徴量を入力として全結合層+Softmax層201に与え、全結合層+Softmax層201の出力として動画データvの推定クラスcを取得する。クラス推定部20は、取得した動画データvの推定クラスcをパラメータ更新部14aに出力する(ステップS27)。 Class estimator 20 gives the total binding layer + Softmax layer 201 as inputs the feature quantity of the moving image data v i, as shown in FIG. 7, the estimated class c i of the moving image data v i as the output of the total binding layer + Softmax layer 201 get. Class estimator 20 outputs the estimated class c i of the acquired moving image data v i in the parameter update unit 14a (step S27).
 すなわち、図7に示すように、同一の特徴量抽出用パラメータと、同一のスコア推定用パラメータと、同一のクラス推定用パラメータとを、それぞれ動画特徴量抽出層121と、全結合層131と、全結合層+Softmax層201に適用した状態で、n個の動画データv1~nの各々を入力として、ステップS25,S26,S27の処理がn回行われる。 That is, as shown in FIG. 7, the same feature amount extraction parameter, the same score estimation parameter, and the same class estimation parameter are used in the moving image feature amount extraction layer 121, the fully connected layer 131, and the like, respectively. In the state of being applied to the fully connected layer + Softmax layer 201, the processes of steps S25, S26, and S27 are performed n times with each of the n moving image data v1 to n as an input.
 ステップS28については、図2に示したステップS6と同一の処理が、パラメータ更新部14aによって行われる。 Regarding step S28, the same processing as in step S6 shown in FIG. 2 is performed by the parameter update unit 14a.
 パラメータ更新部14aは、クラス推定部20が推定したn個の推定クラスc1~nを取り込むと、取り込んだn個の推定クラスc1~nと、ステップS24において取り込んだn個の真値クラスラベルk1~nとに基づいて、式(5)によりクラス損失Loss3を算出する(ステップS29)。 When the parameter update unit 14a captures the n estimation classes c1 to n estimated by the class estimation unit 20, the fetched n estimation classes c1 to n and the n true value classes captured in step S24. Based on the labels k1 to n , the class loss Loss3 is calculated by the equation (5) (step S29).
 パラメータ更新部14aは、n個の推定スコアs1~nと、n個の真値スコアg1~nと、n個の推定クラスc1~nとに基づいて、式(6)によりランキング損失Loss4を算出する(ステップS30)。 The parameter update unit 14a has a ranking loss according to the equation (6) based on n estimated scores s 1 to n , n true score g 1 to n , and n estimated classes c 1 to n. Score4 is calculated (step S30).
 パラメータ更新部14aは、例えば、次式(8)により評価損失Lossを算出する(ステップS31)。 The parameter update unit 14a calculates the evaluation loss Loss by, for example, the following equation (8) (step S31).
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 上記の式(8)において、αとβとγは、α>0、β>0、γ>0であり、3つの損失のバランスを取るように任意に定められる定数である。また、||ω||は、L2-reguralationの項である。 In the above equation (8), α 2 , β 2 and γ 2 are α 2 > 0, β 2 > 0, and γ 2 > 0, and are constants arbitrarily determined to balance the three losses. be. Further, || ω || 2 is a term of L2-regulation.
 パラメータ更新部14aは、算出した評価損失Lossが終了条件を満たしているか否かを判定する(ステップS32)。例えば、評価損失Lossが予め定められる閾値未満である場合、評価損失が終了条件を満たしていると判定する。 The parameter update unit 14a determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S32). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.
 パラメータ更新部14aは、評価損失Lossが終了条件を満たしていると判定した場合(ステップS32、Yes)、処理を終了する。一方、パラメータ更新部14aは、評価損失Lossが終了条件を満たしていないと判定した場合(ステップS32、No)、回帰損失Loss1とクラス損失Loss3とランキング損失Loss4を減少させるように、例えば、誤差逆伝播法などを用いた学習処理により、新たな特徴量抽出用パラメータと、新たなスコア推定用パラメータと、新たなクラス推定用パラメータとを算出する。 When the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition (step S32, Yes), the parameter update unit 14a ends the process. On the other hand, when the parameter update unit 14a determines that the evaluation loss Loss does not satisfy the end condition (step S32, No), the parameter update unit 14a reduces, for example, the error reverse so as to reduce the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4. A new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter are calculated by learning processing using a propagation method or the like.
 パラメータ更新部14aは、算出した新たな特徴量抽出用パラメータを特徴量抽出用パラメータ記憶部15に書き込んで特徴量抽出用パラメータを更新する。パラメータ更新部14aは、算出した新たなスコア推定用パラメータをスコア推定用パラメータ記憶部16に書き込んでスコア推定用パラメータを更新する。パラメータ更新部14aは、算出した新たなクラス推定用パラメータをクラス推定用パラメータ記憶部21に書き込んでクラス推定用パラメータを更新する(ステップS33)。 The parameter update unit 14a writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 to update the feature amount extraction parameter. The parameter update unit 14a writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter. The parameter update unit 14a writes the calculated new class estimation parameter to the class estimation parameter storage unit 21 to update the class estimation parameter (step S33).
 その後、ステップS21からの処理が繰り返し行われ、再び行われるステップS21において、特徴量抽出部12は、特徴量抽出用パラメータ記憶部15から更新された特徴量抽出用パラメータを読み出して動画特徴量抽出層121に適用する。また、再び行われるステップS22において、スコア推定部13は、スコア推定用パラメータ記憶部16から更新されたスコア推定用パラメータを読み出して全結合層131に適用する。また、再び行われるステップS23において、クラス推定部20は、クラス推定用パラメータ記憶部21から更新されたクラス推定用パラメータを読み出して全結合層+Softmax層201に適用する。 After that, the processing from step S21 is repeated, and in step S21 to be performed again, the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S22 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131. Further, in step S23 to be performed again, the class estimation unit 20 reads the updated class estimation parameters from the class estimation parameter storage unit 21 and applies them to the fully connected layer + Softmax layer 201.
 入力部11aは、再び行われるステップS24において、訓練データ記憶部10aから次のn個の訓練データを読み出す。なお、繰り返す処理の過程において、訓練データ記憶部10aに記憶されている全ての訓練データについて、ステップS25,S26,S27の処理が行われた場合、入力部11aは、再び最初のn個の訓練データから順に訓練データ記憶部10aから読み出すことを繰り返す。 The input unit 11a reads out the next n training data from the training data storage unit 10a in step S24 to be performed again. In the process of repeating the process, when the processes of steps S25, S26, and S27 are performed for all the training data stored in the training data storage unit 10a, the input unit 11a again performs the first n trainings. Reading from the training data storage unit 10a is repeated in order from the data.
 ステップS32において、パラメータ更新部14aが、評価損失Lossが、終了条件を満たすと判定した時点で、特徴量抽出用パラメータ記憶部15とスコア推定用パラメータ記憶部16とクラス推定用パラメータ記憶部21の各々において、十分に回帰損失Loss1と、クラス損失Loss3と、ランキング損失Loss4が小さくなった状態における学習済みの特徴量抽出用パラメータと、学習済みのスコア推定用パラメータと、学習済みのクラス推定用パラメータとが記録されることになる。 In step S32, when the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 In each case, the parameters for extracting the learned feature amount in the state where the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4 are sufficiently small, the learned score estimation parameters, and the trained class estimation parameters are used. Will be recorded.
 上記の第2の実施形態の学習装置1aにおいて、パラメータ更新部14aは、スコア推定部13が推定した複数の推定スコアの各々と、推定スコアの各々に対応する真値スコアの各々との間の回帰損失を求める第1の損失関数と、クラス推定部20が推定した複数の推定クラスの各々と、推定クラスの各々に対応する真値クラスラベルの各々との間のクラス損失を求める第3の損失関数と、異なる2つの動画データの全ての組み合わせの各々に対応する2つの推定スコア及び2つの真値スコアに基づいて、2つの推定スコアの間の順序の誤り度合いを示すランキング損失を求める第4の損失関数であって2つの真値スコアの差の大きさ、かつ2つの推定クラスの間の相関を考慮してランキング損失を補正する第4の損失関数とに基づいて、第1の損失関数の出力である回帰損失、第3の損失関数の出力であるクラス損失及び第4の損失関数の出力であるランキング損失の各々を減少させる学習処理を行うことにより、推定部50aの関数近似器(第1の関数近似器、第2の関数近似器及び第3の関数近似器)に適用するパラメータ、すなわち特徴量抽出用パラメータと、スコア推定用パラメータと、クラス推定用パラメータとを更新する。第4の損失関数を用いることにより、以下に示すように、学習装置1aは、非特許文献1に記載の技術よりも更に正確に競技者の競技に対するオフィシャルな審判員の採点の手法に関するノウハウを学習することが可能になる。 In the learning device 1a of the second embodiment described above, the parameter update unit 14a is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. A first loss function for finding the regression loss, and a third for finding the class loss between each of the plurality of estimation classes estimated by the class estimation unit 20 and each of the true value class labels corresponding to each of the estimation classes. Based on the loss function and the two estimated scores and the two true scores corresponding to each of all the combinations of the two different video data, the ranking loss indicating the degree of order error between the two estimated scores is obtained. A first loss based on a loss function of four, the magnitude of the difference between the two true scores, and a fourth loss function that corrects the ranking loss by taking into account the correlation between the two estimation classes. The function approximation device of the estimation unit 50a is performed by performing learning processing to reduce each of the regression loss which is the output of the function, the class loss which is the output of the third loss function, and the ranking loss which is the output of the fourth loss function. The parameters applied to (the first function approximation device, the second function approximation device, and the third function approximation device), that is, the feature quantity extraction parameter, the score estimation parameter, and the class estimation parameter are updated. By using the fourth loss function, as shown below, the learning device 1a provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.
 すなわち、第1の実施形態における第2の損失関数である式(3)と、第2の実施形態における第4の損失関数である式(6)を比較すると、式(6)では、式(3)のReLU関数に対して2つの推定クラスc,cの相関係数correlationを乗算して、2つの推定クラスc,cの相関を考慮したランキング損失を採用している。したがって、第2の実施形態の学習装置1aでは、第1の実施形態の学習装置1が奏する効果に加えて、以下のような効果を奏することになる。 That is, when the equation (3), which is the second loss function in the first embodiment, and the equation (6), which is the fourth loss function in the second embodiment, are compared, in the equation (6), the equation ( two putative class c i relative ReLU function 3), by multiplying the correlation coefficient correlation of c j, two putative class c i, employs a ranking loss in consideration of the correlation of c j. Therefore, the learning device 1a of the second embodiment has the following effects in addition to the effects of the learning device 1 of the first embodiment.
 第4の損失関数を用いることにより、学習装置1aでは、類似している競技については、ランキング損失の制約を強くすることができ、逆に類似していない競技についてはランキング損失の制約を弱くさせることができる。これにより、例えば、訓練データに含まれる動画データvに高飛び込みや体操などの複数の種類の競技が記録されている場合であっても、学習装置1aは、競技の種類の違いを考慮した上で、学習処理を行うため、学習装置1よりも更に正確にオフィシャルな審判員の採点の手法に関するノウハウを学習することが可能になる。 By using the fourth loss function, in the learning device 1a, the constraint of ranking loss can be strengthened for similar competitions, and conversely, the constraint of ranking loss can be weakened for competitions that are not similar. be able to. Thus, for example, even when a plurality of types of competitions, such as the moving image data v i included in the training data Dive and gymnastics are recorded, the learning apparatus 1a, taking into account the type of differences in the competition Therefore, since the learning process is performed, it becomes possible to learn the know-how regarding the scoring method of the official referee more accurately than the learning device 1.
(第2の実施形態のスコア推定装置)
 図8は、第2の実施形態によるスコア推定装置2aの内部構成を示すブロック図である。図8において、図5に示す学習装置1aと同一の構成については同一の符号を付して示している。スコア推定装置2aは、入力部11a-1、推定部50a、出力部17a、学習済み特徴量抽出用パラメータ記憶部18、学習済みスコア推定用パラメータ記憶部19及び学習済みクラス推定用パラメータ記憶部22を備える。推定部50aは、特徴量抽出部12と、スコア推定部13と、クラス推定部20とを備える。
(Score estimation device of the second embodiment)
FIG. 8 is a block diagram showing an internal configuration of the score estimation device 2a according to the second embodiment. In FIG. 8, the same configuration as that of the learning device 1a shown in FIG. 5 is shown with the same reference numerals. The score estimation device 2a includes an input unit 11a-1, an estimation unit 50a, an output unit 17a, a learned feature amount extraction parameter storage unit 18, a learned score estimation parameter storage unit 19, and a learned class estimation parameter storage unit 22. To be equipped. The estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.
 上述したように、図6に示したステップS32において、パラメータ更新部14aが、「Yes」の判定、すなわち、評価損失Lossが終了条件を満たすと判定した場合、特徴量抽出用パラメータ記憶部15と、スコア推定用パラメータ記憶部16と、クラス推定用パラメータ記憶部21には、それぞれ学習済みの特徴量抽出用パラメータと、学習済みのスコア推定用パラメータと、学習済みのクラス推定用パラメータとが記録されることになる。 As described above, in step S32 shown in FIG. 6, when the parameter update unit 14a determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and , The score estimation parameter storage unit 16 and the class estimation parameter storage unit 21 each record a learned feature amount extraction parameter, a learned score estimation parameter, and a learned class estimation parameter. Will be done.
 学習済み特徴量抽出用パラメータ記憶部18は、学習装置1aの学習処理が終了した時点で特徴量抽出用パラメータ記憶部15に記録されている学習済みの特徴量抽出用パラメータを予め記憶する。学習済みスコア推定用パラメータ記憶部19は、学習装置1aの学習処理が終了した時点でスコア推定用パラメータ記憶部16に記録されている学習済みのスコア推定用パラメータを予め記憶する。クラス推定用パラメータ記憶部22は、学習装置1aの学習処理が終了した時点でクラス推定用パラメータ記憶部21に記録されている学習済みのクラス推定用パラメータを予め記憶する。 The learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1a is completed. The learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1a is completed. The class estimation parameter storage unit 22 stores in advance the learned class estimation parameters recorded in the class estimation parameter storage unit 21 when the learning process of the learning device 1a is completed.
 入力部11a-1は、外部から与えられる任意の動画データを取り込む。入力部11a-1は、取り込んだ動画データを特徴量抽出部12に出力する。 The input unit 11a-1 takes in arbitrary video data given from the outside. The input unit 11a-1 outputs the captured moving image data to the feature amount extraction unit 12.
 特徴量抽出部12は、学習済み特徴量抽出用パラメータ記憶部18から学習済みの特徴量抽出用パラメータを読み出し、動画特徴量抽出層121に読み出した学習済みの特徴量抽出用パラメータを適用する。特徴量抽出部12は、入力部11a-1が出力する動画データを入力として動画特徴量抽出層121に与え、出力として動画データの特徴量を取得し、取得した特徴量をスコア推定部13に出力する。 The feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121. The feature amount extraction unit 12 gives the moving image data output by the input unit 11a-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.
 スコア推定部13は、学習済みスコア推定用パラメータ記憶部19から学習済みのスコア推定用パラメータを読み出し、全結合層131に読み出した学習済みのスコア推定用パラメータを適用する。スコア推定部13は、特徴量抽出部12が出力する特徴量を入力として全結合層131に与え、出力として推定スコアを取得し、取得した推定スコアを出力部17aに出力する。 The score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131. The score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17a.
 クラス推定部20は、学習済みクラス推定用パラメータ記憶部22から学習済みのクラス推定用パラメータを読み出し、全結合層+Softmax層201に読み出した学習済みのクラス推定用パラメータを適用する。クラス推定部20は、特徴量抽出部12が出力する特徴量を入力として全結合層+Softmax層201に与え、出力として推定クラスを取得し、取得した推定クラスを出力部17aに出力する。出力部17aは、スコア推定部13が出力する推定スコアを外部に出力し、クラス推定部20が出力する推定クラスを外部に出力する。 The class estimation unit 20 reads the learned class estimation parameters from the learned class estimation parameter storage unit 22, and applies the learned class estimation parameters read out to the fully connected layer + Softmax layer 201. The class estimation unit 20 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer + Softmax layer 201 as an input, acquires an estimation class as an output, and outputs the acquired estimation class to the output unit 17a. The output unit 17a outputs the estimated score output by the score estimation unit 13 to the outside, and outputs the estimation class output by the class estimation unit 20 to the outside.
 なお、上記の第2の実施形態のスコア推定装置2aにおいて、推定スコアのみを求める場合、クラス推定部20と、学習済みクラス推定用パラメータ記憶部22を備えないようにしてもよい。 When only the estimated score is obtained in the score estimation device 2a of the second embodiment described above, the class estimation unit 20 and the learned class estimation parameter storage unit 22 may not be provided.
 上記の第2の実施形態のスコア推定装置2aにおいて、推定部50aは、学習装置1aの学習処理によって得られた学習済みのパラメータ(学習済みの特徴量抽出用パラメータ、学習済みのスコア推定用パラメータ及び学習済みのクラス推定用パラメータ)に基づいて関数を近似する関数近似器(第1の関数近似器、第2の関数近似器及び第3の関数近似器)を有し、関数近似器に動画データを入力として与えることにより、当該動画データの推定スコアを推定する。 In the score estimation device 2a of the second embodiment described above, the estimation unit 50a uses the learning device 1a to obtain learned parameters (learned feature amount extraction parameters, learned score estimation parameters). And has a function fitter (first function fitter, second function fitter, and third function fitter) that approximates the function based on the trained class estimation parameters), and the function fitter is a movie. By giving the data as an input, the estimated score of the moving image data is estimated.
 第2の実施形態では、学習装置1aによって推定クラスcと、真値クラスラベルkとを含めた学習処理が行われている。そのため、第1の実施形態とは異なり、第2の実施形態の学習済み特徴抽出用パラメータと学習済みスコア推定用パラメータには、動画データvが属するクラスの情報が反映されている。したがって、第2の実施形態のスコア推定装置2aは、多くの種類の競技の動画データを対象として、第1の実施形態よりも更に正確に推定スコアを求めることができる。 In the second embodiment, the estimated class c i by the learning device 1a, the learning process, including the true value class label k i being performed. Therefore, unlike the first embodiment, the learned feature extraction parameters as for learned score estimation of the second embodiment, is reflected information of the class moving image data v i belongs. Therefore, the score estimation device 2a of the second embodiment can obtain the estimated score more accurately than the first embodiment for the video data of many kinds of competitions.
 また、上記の第1及び第2の実施形態において、第1の損失関数として、式(2)に示すMSEを適用する例を示したが、MSEに替えて、L1-Loss等の他の回帰損失を算出する関数を適用するようにしてもよい。 Further, in the first and second embodiments described above, an example in which the MSE represented by the equation (2) is applied as the first loss function is shown, but instead of the MSE, another regression such as L1-Loss or the like is shown. A function for calculating the loss may be applied.
 また、上記の第1の実施形態の評価損失Lossを算出する式(4)及び第2の実施形態の評価損失Lossを算出する式(8)は、一例である。第1の実施形態においては、回帰損失とランキング損失の間のバランス、第2の実施形態においては、回帰損失とランキング損失とクラス損失の間のバランスを取ることができるような任意の式を適用してもよい。 Further, the formula (4) for calculating the valuation loss Loss of the first embodiment and the formula (8) for calculating the valuation loss Loss of the second embodiment are examples. In the first embodiment, an arbitrary formula is applied so as to balance the regression loss and the ranking loss, and in the second embodiment, the regression loss, the ranking loss and the class loss can be balanced. You may.
 また、第2の実施形態において第3の損失関数としてCross Entoropy Lossを適用する例を示したが、第3の損失関数として他の関数を適用してもよい。また、式(6)の相関係数correlationとして、式(7)に示すスピアマンの順位相関係数を適用する例を示したが、相関係数correlationとして他の相関係数を適用してもよい。 Further, although the example of applying Cross Entropy Loss as the third loss function is shown in the second embodiment, another function may be applied as the third loss function. Further, although an example in which the Spearman's rank correlation coefficient shown in the equation (7) is applied as the correlation coefficient correlation of the equation (6) is shown, another correlation coefficient may be applied as the correlation coefficient correlation. ..
 また、上記の第1及び第2の実施形態では、訓練データ記憶部10,10aが学習装置1,1aの内部に備えられているが、学習装置1,1aの外部に備えられていてもよい。また、学習済み特徴量抽出用パラメータ記憶部18、学習済みスコア推定用パラメータ記憶部19及び学習済みクラス推定用パラメータ記憶部22についても、スコア推定装置2,2aの外部に備えられていてもよい。 Further, in the first and second embodiments described above, the training data storage units 10, 10a are provided inside the learning devices 1, 1a, but may be provided outside the learning devices 1, 1a. .. Further, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 may also be provided outside the score estimation devices 2 and 2a. ..
 また、訓練データ記憶部10,10a、学習済み特徴量抽出用パラメータ記憶部18、学習済みスコア推定用パラメータ記憶部19及び学習済みクラス推定用パラメータ記憶部22については、保存しておくデータを記憶する記憶部であるため、不揮発性の記憶領域を適用するのが望ましい。これに対して、特徴量抽出用パラメータ記憶部15、スコア推定用パラメータ記憶部16及びクラス推定用パラメータ記憶部21は、データを一時的に記憶する記憶部であるため、不揮発性の記憶領域を適用してもよいし、揮発性の記憶領域を適用してもよい。 Further, the training data storage units 10, 10a, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 store data to be stored. It is desirable to apply a non-volatile storage area because it is a storage unit. On the other hand, since the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 are storage units for temporarily storing data, they have a non-volatile storage area. It may be applied or a volatile storage area may be applied.
 また、上記の第1及び第2の実施形態に示した、第1の関数近似器、第2の関数近似器及び第3の関数近似器は、上述したような構成のニューラルネットワーク以外の他の構成のニューラルネットワークを適用してもよい。また、ニューラルネットワークではなく、機械学習において用いられる学習処理が可能な他の手段を適用してもよい。また、第1の関数近似器、第2の関数近似器及び第3の関数近似器のように分かれていなくてもよく、第1の実施形態では、第1の関数近似器と第2の関数近似器が一体として1つの関数近似器を構成していてもよいし、第2の実施形態では、第1の関数近似器と第2の関数近似器と第3の関数近似器が一体として1つの関数近似器を構成していてもよい。 Further, the first function approximation device, the second function approximation device, and the third function approximation device shown in the first and second embodiments described above are other than the neural network having the above-described configuration. A neural network of configuration may be applied. Further, instead of the neural network, other means capable of learning processing used in machine learning may be applied. Further, it does not have to be separated like the first function approximation device, the second function approximation device, and the third function approximation device. In the first embodiment, the first function approximation device and the second function approximation device are used. The approximations may be integrated to form one function approximation, or in the second embodiment, the first function approximation, the second function approximation, and the third function approximation are integrated into one. Two function approximations may be configured.
 上述した実施形態における学習装置1,1a、スコア推定装置2,2aをコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、FPGA(Field Programmable Gate Array)等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The learning devices 1, 1a and the score estimation devices 2, 2a in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).
 以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.
 スポーツ競技における競技の採点に利用することができる。 It can be used for scoring competitions in sports competitions.
1…学習装置、10…訓練データ記憶部、11…入力部、12…特徴量抽出部、13…スコア推定部、14…パラメータ更新部、15…特徴量抽出用パラメータ記憶部、16…スコア推定用パラメータ記憶部、50…推定部 1 ... Learning device, 10 ... Training data storage unit, 11 ... Input unit, 12 ... Feature quantity extraction unit, 13 ... Score estimation unit, 14 ... Parameter update unit, 15 ... Feature quantity extraction parameter storage unit, 16 ... Score estimation Parameter storage unit, 50 ... estimation unit

Claims (7)

  1.  競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込む入力部と、
     パラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力部が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定部と、
     複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新するパラメータ更新部と、
     を備える学習装置。
    An input unit that captures training data that combines video data that records the movements of the athlete during the competition and multiple true score scores that are the scores scored by the referee for the competition recorded in the video data.
    An estimation unit that has a function approximation device that approximates a function based on parameters, and estimates the estimated score of the moving image data by giving the moving image data captured by the input unit to the function approximation unit as an input.
    Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. A parameter update unit that updates the parameters by performing a learning process that reduces each of the ranking losses,
    A learning device equipped with.
  2.  前記動画データは、前記動画データに記録される内容に基づいて予め定められる複数のクラスのいずれかに予め分類され、前記動画データに対して前記動画データが属する前記クラスを示す真値クラスラベルが予め付与されており、
     前記入力部は、
     前記動画データと、当該動画データに対応する前記真値スコアと、当該動画データに付与されている前記真値クラスラベルとを組み合わせた訓練データを取り込み、
     前記推定部は、
     前記関数近似器に前記入力部が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアと、当該動画データが何れの前記クラスに属するかの確率を示す推定クラスを推定し、
     前記パラメータ更新部は、
     複数の前記推定クラスの各々と、前記推定クラスの各々に対応する前記真値クラスラベルの各々との間のクラス損失を求める第3の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第4の損失関数であって2つの前記真値スコアの差の大きさ、かつ2つの前記推定クラスの間の相関を考慮して前記ランキング損失を補正する第4の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失、前記第3の損失関数の出力である前記クラス損失及び前記第2の損失関数に替えて用いる前記第4の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新する、
     請求項1に記載の学習装置。
    The moving image data is pre-classified into one of a plurality of predetermined classes based on the content recorded in the moving image data, and a true value class label indicating the class to which the moving image data belongs is attached to the moving image data. It is given in advance and
    The input unit is
    The training data in which the moving image data, the true value score corresponding to the moving image data, and the true value class label given to the moving image data are combined is taken in, and the training data is taken.
    The estimation unit
    By giving the moving image data captured by the input unit to the function approximator as an input, the estimated score of the moving image data and the estimation class indicating the probability of which said moving image data belongs to are estimated.
    The parameter update unit
    A third loss function for finding the class loss between each of the plurality of estimation classes and each of the true value class labels corresponding to each of the estimation classes, and all combinations of the two different moving image data. A fourth loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true values corresponding to each, and the two true values. The regression loss, which is the output of the first loss function, based on the magnitude of the difference in scores and the fourth loss function that corrects the ranking loss in consideration of the correlation between the two estimation classes. By performing learning processing to reduce each of the class loss which is the output of the third loss function and the ranking loss which is the output of the fourth loss function used in place of the second loss function. Update the parameters,
    The learning device according to claim 1.
  3.  競技者の競技中の動作を記録した動画データを取り込む入力部と、
     請求項1に記載の学習装置、または、請求項2に記載の学習装置の学習処理によって得られた学習済みのパラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力部が取り込んだ前記動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定部と、
     を備えるスコア推定装置。
    An input unit that captures video data that records the movements of the athlete during the competition,
    The learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is provided in the function approximator. An estimation unit that estimates the estimated score of the video data by giving the video data captured by the input unit as input, and an estimation unit.
    A score estimator equipped with.
  4.  競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込み、
     パラメータに基づいて関数を近似する関数近似器に取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定し、
     複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新する、
     学習方法。
    Training data that combines the video data that records the movements of the athlete during the competition and a plurality of true scores, which are the scores scored by the referee for the competition recorded in the video data, is captured.
    By giving the video data captured in the function approximator that approximates the function based on the parameters as input, the estimated score of the video data is estimated.
    Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. The parameters are updated by performing a learning process that reduces each of the ranking losses.
    Learning method.
  5.  コンピュータを、
     競技者の競技中の動作を記録した動画データと、当該動画データに記録された競技に対して審判員が採点したスコアである複数の真値スコアとを組み合わせた訓練データを取り込む入力手段、
     パラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力手段が取り込んだ動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定手段、
     複数の前記推定スコアの各々と、前記推定スコアの各々に対応する前記真値スコアの各々との間の回帰損失を求める第1の損失関数と、異なる2つの前記動画データの全ての組み合わせの各々に対応する2つの前記推定スコア及び2つの前記真値スコアに基づいて2つの前記推定スコアの間の順序の誤り度合いを示すランキング損失を求める第2の損失関数であって2つの前記真値スコアの差の大きさを考慮して前記ランキング損失を補正する第2の損失関数とに基づいて、前記第1の損失関数の出力である前記回帰損失及び前記第2の損失関数の出力である前記ランキング損失の各々を減少させる学習処理を行うことにより、前記パラメータを更新するパラメータ更新手段、
     として機能させるための学習プログラム。
    Computer,
    An input means for capturing training data that combines video data that records a player's movements during competition and multiple true score scores that are scores scored by the referee for the competition recorded in the video data.
    An estimation means that has a function approximation device that approximates a function based on parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input means to the function approximation device as an input.
    Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. A parameter updating means for updating the parameters by performing a learning process that reduces each of the ranking losses.
    A learning program to function as.
  6.  競技者の競技中の動作を記録した動画データを取り込み、
     請求項1に記載の学習装置、または、請求項2に記載の学習装置の学習処理によって得られた学習済みのパラメータに基づいて関数を近似する関数近似器に取り込んだ前記動画データを入力として与えることにより、当該動画データの推定スコアを推定する、
     スコア推定方法。
    Import video data recording the athlete's movements during the competition,
    The moving image data taken into the learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is given as an input. By doing so, the estimated score of the video data is estimated.
    Score estimation method.
  7.  コンピュータを、
     競技者の競技中の動作を記録した動画データを取り込む入力手段、
     請求項1に記載の学習装置、または、請求項2に記載の学習装置の学習処理によって得られた学習済みのパラメータに基づいて関数を近似する関数近似器を有し、前記関数近似器に前記入力手段が取り込んだ前記動画データを入力として与えることにより、当該動画データの推定スコアを推定する推定手段、
     として機能させるためのスコア推定プログラム。
    Computer,
    Input means for capturing video data recording the player's movements during the competition,
    The learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is provided in the function approximator. An estimation means that estimates the estimated score of the video data by giving the video data captured by the input means as input.
    A score estimation program to function as.
PCT/JP2020/015136 2020-04-02 2020-04-02 Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program WO2021199392A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022511452A JP7352119B2 (en) 2020-04-02 2020-04-02 Learning device, learning method, and learning program, as well as score estimating device, score estimating method, and score estimating program
PCT/JP2020/015136 WO2021199392A1 (en) 2020-04-02 2020-04-02 Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/015136 WO2021199392A1 (en) 2020-04-02 2020-04-02 Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program

Publications (1)

Publication Number Publication Date
WO2021199392A1 true WO2021199392A1 (en) 2021-10-07

Family

ID=77930154

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/015136 WO2021199392A1 (en) 2020-04-02 2020-04-02 Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program

Country Status (2)

Country Link
JP (1) JP7352119B2 (en)
WO (1) WO2021199392A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023176256A (en) * 2022-05-31 2023-12-13 楽天グループ株式会社 Method, computer system, and computer-readable medium for predicting data from image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010061376A (en) * 2008-09-03 2010-03-18 National Institute Of Advanced Industrial Science & Technology Operation evaluating device and operation evaluation method
WO2018070414A1 (en) * 2016-10-11 2018-04-19 富士通株式会社 Motion recognition device, motion recognition program, and motion recognition method
JP2020038440A (en) * 2018-09-03 2020-03-12 国立大学法人 東京大学 Motion recognition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010061376A (en) * 2008-09-03 2010-03-18 National Institute Of Advanced Industrial Science & Technology Operation evaluating device and operation evaluation method
WO2018070414A1 (en) * 2016-10-11 2018-04-19 富士通株式会社 Motion recognition device, motion recognition program, and motion recognition method
JP2020038440A (en) * 2018-09-03 2020-03-12 国立大学法人 東京大学 Motion recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONGJUN LI ET AL.: "End-To-End Learning for Action Quality Assessment", PCM, 2018, pages 125 - 134, XP047486317, DOI: https://doi.org/10.1007/978-3-030-00767-6 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023176256A (en) * 2022-05-31 2023-12-13 楽天グループ株式会社 Method, computer system, and computer-readable medium for predicting data from image

Also Published As

Publication number Publication date
JPWO2021199392A1 (en) 2021-10-07
JP7352119B2 (en) 2023-09-28

Similar Documents

Publication Publication Date Title
CN108256433B (en) Motion attitude assessment method and system
CN110147711B (en) Video scene recognition method and device, storage medium and electronic device
CN110472554B (en) Table tennis action recognition method and system based on attitude segmentation and key point features
US11055537B2 (en) Systems and methods for determining actions depicted in media contents based on attention weights of media content frames
CN111340021B (en) Unsupervised domain adaptive target detection method based on center alignment and relation significance
CN110267119B (en) Video precision and chroma evaluation method and related equipment
US20200279156A1 (en) Feature fusion for multi-modal machine learning analysis
CN105590091A (en) Face Recognition System And Method
CN110942006B (en) Motion gesture recognition method, motion gesture recognition apparatus, terminal device, and medium
US10671895B2 (en) Automated selection of subjectively best image frames from burst captured image sequences
US20140143183A1 (en) Hierarchical model for human activity recognition
CN112819852A (en) Evaluating gesture-based motion
WO2020077999A1 (en) Video abstract generation method and apparatus, electronic device and computer storage medium
CN111479130A (en) Video positioning method and device, electronic equipment and storage medium
WO2021199392A1 (en) Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program
CN112528077B (en) Video face retrieval method and system based on video embedding
JP6600288B2 (en) Integrated apparatus and program
CN115131879B (en) Action evaluation method and device
CN116935057A (en) Target evaluation method, electronic device, and computer-readable storage medium
US20230274660A1 (en) Method of scoring a move of a user and system thereof
CN115035007A (en) Face aging system for generating countermeasure network based on pixel level alignment and establishment method
Iosifidis et al. Human action recognition based on bag of features and multi-view neural networks
CN109977819B (en) Weak supervision single action positioning method applying template matching method
CN113221690A (en) Video classification method and device
Ilyes Lakhal et al. Residual stacked rnns for action recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928921

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022511452

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928921

Country of ref document: EP

Kind code of ref document: A1