WO2021199392A1

WO2021199392A1 - Learning device, learning method, learning program, score estimation device, score estimation method, and score estimation program

Info

Publication number: WO2021199392A1
Application number: PCT/JP2020/015136
Authority: WO
Inventors: 隆昌永井; 信哉志水; 草地　良規
Original assignee: 日本電信電話株式会社
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2021-10-07
Also published as: JPWO2021199392A1; JP7352119B2

Abstract

On the basis of a first loss function that obtains a regression loss between each of a plurality of estimated scores having been estimated by giving a plurality of moving-image data, in which motion of athletes during competition are recorded, as input to a function approximator to which parameters are applied and each of true-value scores corresponding to each of the estimated scores and having been scored by a referee, and a second loss function that obtains, on the basis of two estimated scores and two true-value scores corresponding to each of all combinations of two different moving-image data, a ranking loss that indicates the degree of a sequence error between the two estimated scores, the second loss function correcting the ranking loss taking into account the magnitude of a difference between the two true-value scores, the present invention carries out a learning process of reducing each of the regression loss that is the output of the first loss function and the ranking loss that is the output of the second loss function and thereby updates the parameters.

Description

Learning device, learning method and learning program, and score estimation device, score estimation method and score estimation program

The present invention provides, for example, a learning device, a learning method and a learning program for learning know-how regarding a player's competition scoring method, and a score estimation device for estimating a competition score based on learning results, a score estimation method and a score. Regarding the estimation program.

In sports competitions, there are competitions in which official referees score scores for competitions performed by athletes such as high diving and gymnastics, and the ranking of each competition is decided based on the scored scores. In such competitions, there are quantitative scoring criteria for scoring.

In recent years, studies have been conducted on technologies used in activity quality evaluation in the field of computer vision, such as automatically estimating scores in such competitions, and AQA (Action Quality Assessment) technology is known as such technology. Has been done. For example, Non-Patent Document 1 discloses a method of performing AQA using deep learning.

In the technique disclosed in Non-Patent Document 1, the video data of the competition taken by the athlete and the true score obtained by the official referee scoring the competition are taken as training data. .. In the technique disclosed in Non-Patent Document 1, the feature amount is then extracted from the moving image data included in the training data by using a deep neural network. In the technique disclosed in Non-Patent Document 1, the estimated score is further estimated from the extracted features.

In the technique disclosed in Non-Patent Document 1, the loss between the estimated estimated score and the true score included in the training data is calculated. In the technique disclosed in Non-Patent Document 1, the weight and bias of the deep neural network are repeatedly updated so as to reduce the loss based on the calculated loss. By doing this, it is possible to learn the know-how about the scoring method performed by the official referee and estimate the score of the competition performed by any competitor by using the deep neural network to which the learned weights and biases are applied. It will be possible.

In the technique disclosed in Non-Patent Document 1, in addition to the regression loss indicating the loss between the estimated score and the true score, the line king loss aimed at improving the accuracy of the order between the obtained estimated scores. Is adopted. When learning is performed using only regression loss, there is a problem that the order of estimated scores and the order of true scores may be interchanged due to an error in estimating scores between video data with close true score. There is. In order to solve this problem, Non-Patent Document 1 adopts the ranking loss shown in the following equation (1) to reduce the probability of such an error and achieve accuracy higher than that of the prior art. ing.

Any one of the moving image data and v _i. In the formula (1), _{"g i"} is the true value score in the moving image data _{v i.} Further, in the equation (1), _{"s i"} is the estimated score obtained from the moving image data _{v i,} sign (x) function is a function that returns the sign of the argument x. Formula (1) _{_{"- (s j -s i) sign}} (g j -g i) " term of the estimated score _s i, a magnitude relationship between _{s j,} the magnitude relationship of the true value score _g i, _{g j} If they match, the value is negative, and if they do not match, the value is positive.

ReLU (x) is a function that uses x as the return value when the argument x is 0 or more, and 0 as the return value when the argument x is smaller than 0. δ is a margin value and is a positive value. Therefore, the estimated score s _i, a magnitude relationship between s _j, true value score g _i, if the magnitude relationship between g _j is a mismatch, ranking loss in accordance with an increase in the absolute value of the difference between the estimated scores s _i, s _j Will increase.

The margin value δ has the effect of separating the _{two estimated scores s i} and s _j so as to have a difference at least according to the margin value δ when the difference between the _{two estimated scores s i} and s _{j is small.} Therefore, the magnitude relationship between the estimated score s _i, s _j, so ranking losses occur in accordance with the magnitude of the true value score g _i, margin value even if the magnitude of g _j matches δ ing.

However, as the ranking loss, when using Equation (1), the margin value [delta], because it is a fixed value determined in advance, in a combination of all video data v _i and video data v _j, the same margin value [delta] It will be applied.

As described above, the margin value δ is a parameter adopted for the purpose of having a difference at least according to the margin value when the difference between the _{two estimated scores s i} and s _{j is small.}

However, even when the degree of separation between the true score g _i and g _j is smaller than the margin value δ, the estimated score s _i , by adding the margin value δ in the equation (1), There is a problem that learning is performed so that the distance between s _{j is excessively separated by a size corresponding to the margin value δ.}

In view of the above circumstances, an object of the present invention is to provide a technique capable of learning know-how regarding a player's competition scoring method more accurately than the conventional technique and obtaining a more accurate estimated score.

One aspect of the present invention is training in which a moving image data recording an athlete's movement during a competition and a plurality of true value scores, which are scores scored by a referee for the competition recorded in the moving image data, are combined. It has an input unit that captures data and a function approximator that approximates a function based on parameters, and estimates the estimated score of the video data by giving the video data captured by the input unit to the function approximator as input. A first loss function for obtaining a regression loss between each of the estimation unit, each of the plurality of estimated scores, and each of the true value scores corresponding to each of the estimated scores, and two different moving image data. A second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations. The regression loss and the second loss function, which are the outputs of the first loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores. This is a learning device including a parameter updating unit that updates the parameters by performing a learning process that reduces each of the ranking losses, which is the output of the above.

Further, one aspect of the present invention comprises a learning process of the input unit that captures moving image data recording the movements of the athlete during the competition, the learning device according to claim 1, or the learning device according to claim 2. It has a function approximation device that approximates a function based on the obtained learned parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input unit to the function approximation device as an input. It is a score estimation device including an estimation unit.

Further, one aspect of the present invention combines a moving image data recording an athlete's movement during a competition and a plurality of true value scores which are scores scored by a referee for the competition recorded in the moving image data. By taking in the training data and giving the moving image data taken in the function approximator that approximates the function based on the parameters as input, the estimated score of the moving image data is estimated, and each of the plurality of estimated scores and the estimated A first loss function for finding the regression loss between each of the true scores corresponding to each of the scores, and two estimated scores and two corresponding to each of all combinations of the two different video data. A second loss function for obtaining a ranking loss indicating the degree of order error between two estimated scores based on the true score, and the ranking in consideration of the magnitude of the difference between the two true scores. Based on the second loss function that corrects the loss, a learning process is performed to reduce each of the regression loss, which is the output of the first loss function, and the ranking loss, which is the output of the second loss function. This is a learning method for updating the parameters.

Further, one aspect of the present invention is a plurality of true value scores, which are a moving image data recording the movement of a competitor during a competition and a score scored by a referee for the competition recorded in the moving image data. It has an input means for capturing training data in combination with and a function approximator that approximates a function based on parameters, and by giving the moving image data captured by the input means to the function approximator as input, the moving image data can be obtained. An estimation means for estimating an estimated score, a first loss function for finding a regression loss between each of the plurality of estimated scores and each of the true scores corresponding to each of the estimated scores, and two different said. A second loss function that finds a ranking loss that indicates the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to each of all combinations of video data. The regression loss and the second, which are the outputs of the first loss function, are based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two true value scores. This is a learning program for functioning as a parameter updating means for updating the parameters by performing a learning process for reducing each of the ranking losses, which is the output of the loss function of.

Further, one aspect of the present invention was obtained by taking in moving image data recording the movements of the athlete during the competition and performing the learning process of the learning device according to claim 1 or the learning device according to claim 2. This is a score estimation method for estimating the estimated score of the moving image data by giving the moving image data taken into a function approximator that approximates the function based on the learned parameters as an input.

According to the present invention, it is possible to learn the know-how regarding the method of scoring the competition of the athlete more accurately than the conventional technique, and to obtain a more accurate estimated score.

It is a block diagram which shows the internal structure of the learning apparatus of 1st Embodiment. It is a flowchart which shows the flow of processing by the learning apparatus of 1st Embodiment. It is a figure which shows the outline of the processing by the learning apparatus of 1st Embodiment. It is a block diagram which shows the internal structure of the score estimation apparatus of 1st Embodiment. It is a block diagram which shows the internal structure of the learning apparatus of 2nd Embodiment. It is a flowchart which shows the flow of processing by the learning apparatus of 2nd Embodiment. It is a figure which shows the outline of the processing by the learning apparatus of 2nd Embodiment. It is a block diagram which shows the internal structure of the score estimation apparatus of 2nd Embodiment.

(First Embodiment)
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an internal configuration of the learning device 1 according to the first embodiment. The learning device 1 includes a training data storage unit 10, an input unit 11, an estimation unit 50, a parameter update unit 14, a feature amount extraction parameter storage unit 15, and a score estimation parameter storage unit 16. The estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.

The training data storage unit 10 stores in advance a plurality of training data in which each of the plurality of moving image data and each of the plurality of true value scores are combined.

Each of the plurality of video data is generated, for example, by taking a picture of the movement performed by the athlete during the competition with a camera or the like. Here, the competition is a sports competition in which, for example, there is a quantitative scoring standard for techniques such as high diving and gymnastics. A player is, for example, a player who plays the sport.

Each of the plurality of true value scores is a score scored by an official referee in advance for the competition of the athlete recorded in the video data corresponding to each.

The input unit 11 repeatedly reads n training data from the training data storage unit 10. Here, n is an integer of 2 or more, and is a batch size when the learning process described below is performed. The number of training data stored by the training data storage unit 10 is assumed to be a multiple of n, that is, n × m (where m is an integer of 1 or more).

In the following description, any one video data included in the n training data shown in v _i or v _j, indicates a true value score corresponding to the moving image data v _i as g _i, moving image data v _j The true score corresponding to is shown as _{g j.} However, it is assumed that it is an integer of i = 1 to n and j = 1 to n, and j> i.

The input unit 11 outputs n moving image data v _{1 to n} included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11 outputs n true value scores g _{1 to n} included in the n read training data to the parameter update unit 14.

The feature amount extraction parameter storage unit 15 stores the feature amount extraction parameters that serve as weights and biases applied to the first function approximator included in the feature amount extraction unit 12. The feature amount extraction unit 12 has a first function approximation device, and applies the feature amount extraction parameter stored in the feature amount extraction parameter storage unit 15 to the first function approximation device. The first function approximator approximates the function corresponding to the feature amount extraction parameter by applying the feature amount extraction parameter. Feature amount extraction unit 12 extracts the feature amount of video data v _i by providing as input video data v _i in which the input unit 11 outputs to the first function approximator.

Here, the first function approximation is any neural network for extracting a feature from the moving image data v _i, for example, Non-Patent Document 1 Fig. A neural network having a two-stage convolutional layer in which a ReLU (Rectified Linear Unit) layer and a Max-Polling layer are connected to the latter stage shown in 1 (hereinafter referred to as “moving feature amount extraction layer 121”) or the like is applied.

The score estimation parameter storage unit 16 stores score estimation parameters that serve as weights and biases to be applied to the second function approximator included in the score estimation unit 13. The score estimation unit 13 has a second function approximation device, and applies the score estimation parameters stored in the score estimation parameter storage unit 16 to the second function approximation device. The second function approximator approximates the function corresponding to the score estimation parameter by applying the score estimation parameter. Score estimator 13 estimates the estimated score s _i by applying the second function approximator the feature amount by the feature extraction unit 12 has extracted as an input.

Here, the second function approximator is an arbitrary neural network that estimates the estimated score from the features, for example, Fig. A neural network having a two-stage fully connected layer in which a ReLU layer and a Dropout layer are connected (hereinafter referred to as “fully connected layer 131”) or the like is applied to the subsequent stage shown in 1.

The parameter update unit 14 has n true score g _{1 to n} _{output by the input unit 11, n estimated scores s 1 to n} estimated by the score estimation unit 13, and a predetermined first loss function. Based on, the regression loss between each of the _{estimated scores s 1 to n} and each of the true score g _{1 to n is calculated.}

Here, as the first loss function, for example, MSE (Mean Square Error) shown in the following equation (2) for calculating the regression loss is applied.

Further, the parameter update unit 14 _{previously sets two estimated scores s i} , s _j and two true score g _i , g _j corresponding to each of all combinations of the _{two different moving image data v i} , v _j. based on the second loss function defined, the two estimated score s _i, s 2 having a true value of the ranking loss indicating an error degree of order of _j score g _i, in consideration of the magnitude of the difference g _j calculate.

Here, as the second loss function, the loss function represented by the following equation (3) is applied.

Compared with the formula (1) adopted in the technique described in Non-Patent Document 1, in the formula (3), the absolute value of the difference between the _{two true score g i} and g _{j is replaced with the margin value δ.} It has been applied. Similar to the equation (1), in the equation (3), the sign (x) function is a function whose return value is the sign of the argument x, and ReLU (x) is the case where the argument x is 0 or more. , X is the return value, and when the argument x is smaller than 0, 0 is the return value.

The parameter update unit 14 performs learning processing so as to reduce the calculated regression loss, that is, Loss1, which is the output value of the equation (2), and the calculated ranking loss, that is, Loss2, which is the output value of the equation (3). The parameter update unit 14 calculates a new parameter for feature quantity extraction and a new parameter for score estimation by performing the learning process.

The parameter update unit 14 updates the contents of the feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 based on the calculated new feature amount extraction parameter and the new score estimation parameter.

(Processing by the learning device of the first embodiment)
Next, the process by the learning device 1 of the first embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a flowchart showing the flow of the learning process performed by the learning device 1.

The feature amount extraction parameter storage unit 15 and the score estimation parameter storage unit 16 store the initial value feature amount extraction parameters and the initial value score estimation parameters in advance, respectively.

The feature amount extraction unit 12 reads out the feature amount extraction parameter from the feature amount extraction parameter storage unit 15, and transfers the read feature amount extraction parameter to the neural network of the moving image feature amount extraction layer 121 which is the first function approximation device. Apply (step S1).

The score estimation unit 13 reads out the score estimation parameters from the score estimation parameter storage unit 16, and applies the read score estimation parameters to the neural network of the fully connected layer 131, which is the second function approximation device (step S2). ..

The input unit 11 reads the first n training data from the training data storage unit 10. As shown in FIG. 3, the input unit 11 outputs n moving image data v _{1 to n} included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11 outputs n true value scores g _{1 to n} included in the read training data to the parameter update unit 14. The parameter update unit 14 takes in n true value scores g _{1 to n} output by the input unit 11 (step S3).

are each of the n video data _{v 1 ~ n} about video data _{v i,} the processing of steps S4, S5 are repeated (loop L1s ~ L1e).

Feature amount extraction unit 12 gives the video feature quantity extraction layer 121 video data v _i as an input, as shown in FIG. 3, obtains a feature amount of video data v _i as the output of the video feature quantity extraction layer 121. Feature amount extraction unit 12 outputs the feature quantity of the acquired moving image data v _i in the score estimation unit 13 (step S4).

Score estimator 13 gives the total binding layer 131 as inputs the feature quantity of the moving image data v _i, as shown in FIG. 3, to obtain the estimated score s _i of the moving image data v _i as the output of the total binding layer 131. Score estimator 13 outputs the estimated score _{s i} of the acquired moving image data _{v i} in the parameter update unit 14 (step S5).

That is, as shown in FIG. 3, n moving images are applied to the moving image feature amount extraction layer 121 and the fully connected layer 131, respectively, with the same feature amount extraction parameters and the same score estimation parameters. The processing of steps S4 and S5 is performed n times with each of the data v1 _{to n as an input.}

_{When the parameter update unit 14 captures the n estimated scores s 1 to n} estimated by the score estimation unit 13, the fetched n estimated scores s _{1 to n} and the n true score fetched in step S3. The regression loss Loss 1 is calculated by the equation (2) based on _{g 1 to n (step S6).}

The parameter update unit 14 calculates the ranking loss Loss2 by the equation (3) based on _{n estimated scores s 1 to n} and n true score g _{1 to n (step S7).}

The parameter update unit 14 calculates the evaluation loss Loss by, for example, the following equation (4) (step S8).

In the above equation (4), α ₁ and β ₁ are α ₁ > 0 and β ₁ > 0, which are constants arbitrarily determined to balance the two losses. Further, || ω || ² is a term of L2-regulation.

The parameter update unit 14 determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S9). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.

When the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition (step S9, Yes), the parameter update unit 14 ends the process. On the other hand, when the parameter update unit 14 determines that the evaluation loss Loss does not satisfy the end condition (steps S9, No), the parameter update unit 14 uses, for example, an error backpropagation method so as to reduce the regression loss Loss1 and the ranking loss Loss2. By the learning process used, a new parameter for feature quantity extraction and a new parameter for score estimation are calculated.

The parameter update unit 14 writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 and updates the feature amount extraction parameter. The parameter update unit 14 writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter (step S10).

After that, the processing from step S1 is repeated, and in step S1 to be performed again, the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S2 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131.

The input unit 11 reads out the next n training data from the training data storage unit 10 in step S3 to be performed again. In the process of repeating the process, when the processes of steps S4 and S5 are performed for all the training data stored in the training data storage unit 10, the input unit 11 again starts from the first n training data. The reading from the training data storage unit 10 is repeated in order.

In step S9, when the parameter update unit 14 determines that the evaluation loss Loss satisfies the end condition, the parameter storage unit 15 for feature quantity extraction and the parameter storage unit 16 for score estimation each have a sufficient regression loss Loss1. Then, the learned feature amount extraction parameter and the learned score estimation parameter in the state where the ranking loss Loss2 is reduced are recorded.

In the learning device 1 of the first embodiment described above, the parameter update unit 14 is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. Based on the first loss function for the regression loss and the two estimated scores and the two true scores corresponding to each of all combinations of two different video data, the degree of misorder between the two estimated scores. The output of the first loss function based on the second loss function that finds the ranking loss and corrects the ranking loss by taking into account the magnitude of the difference between the two true scores. By performing learning processing to reduce each of the regression loss, which is the output of the second loss function, and the ranking loss, which is the output of the second loss function, the function approximator of the estimation unit 50 (first function approximator and second function approximator). The parameters applied to, that is, the parameters for feature quantity extraction and the parameters for score estimation are updated. By using the second loss function, as shown below, the learning device 1 provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.

That is, in the learning device 1 of the first embodiment described above, the formula (3) is used as the ranking loss instead of the formula (1) adopted by the technique disclosed in Non-Patent Document 1. Hereinafter, the effect of the equation (3) will be described separately for each case.

(When the estimated score _s i, and the magnitude of _{s j,} true value score _g i, the magnitude of _{g j} matches)
In this case, in the formula (1) and (3) _- the _{_{"(s j -s i) sign (}} g j -g i) " is a negative value.

In this case, when applying equation (1), when abs (s j -s _i) _<margin value δ is, the input of ReLU function becomes a positive value, will be ranking loss occurs, ranking loss The learning process will be performed to reduce. Already, since the estimated score _s i, and the magnitude of _{s j,} true value score _g i, the magnitude of _{g j} are matched, the learning process performed here is, change the order of the estimated score _s i, _{s j} learning It is not a process, but a learning process that separates the _{estimated scores s i} and s _j.

If the result of applying equation (3) it comes _to _{abs (s j -s i) <} abs (g j -g i), since the input ReLU function is positive, ranking losses occur do. when in abs _(g j _-g i) <margin value δ is towards the ranking loss equation (1) is, to become greater than the ranking loss of Formula (3), the true value score Using equation (1) A learning process is performed in which the absolute value of the difference between the _{estimated scores s i} and s _j is made larger than the absolute value of the difference between g _i and g _j.

On the other hand, when it is _abs (g j _-g i)> margin value δ is towards the ranking loss equation (1) is, to become smaller than the ranking loss of Formula (3), the use of equation (1) true A learning process is performed in which the absolute value of the difference between the _{estimated scores s i} and s _j is smaller than the absolute value of the difference between the value scores g _i and g _j.

Therefore, the estimated score s _i, and the magnitude of s _j, true value score g _i, if the magnitude of g _j are coincident, is better to use Equation (3) than using equation (1), more precisely estimated score s _i, the absolute value of the difference between s _j, it is possible to perform a learning process to approach the absolute value of the difference between the true value score g _i, g _j.

(Estimated score _s i, and the magnitude of _{s j,} if the true value score _g i, the magnitude of _{g j} mismatch)
In this case, in the formula (1) and (3) _- the _{_{"(s j -s i) sign (}} g j -g i) " is a positive value. Therefore, the margin value [delta], none of the abs _(g j -g _i), will serve to increase the ranking loss. In this case also the estimated score _s i, when the difference between the _{s j} is small, better to use Equation (3) is, according to the magnitude of abs _(g j -g _i), the estimated score _s i, the _{s j} it is possible to increase the absolute value of the difference, more accurately estimated score s _i, the absolute value of the difference between s _j, is possible to perform learning processing to approach the absolute value of the difference between the true value score g _i, g _j It will be possible.

(Score estimation device of the first embodiment)
FIG. 4 is a block diagram showing an internal configuration of the score estimation device 2 according to the first embodiment. In FIG. 4, the same configuration as that of the learning device 1 shown in FIG. 1 is shown with the same reference numerals. The score estimation device 2 includes an input unit 11-1, an estimation unit 50, an output unit 17, a learned feature amount extraction parameter storage unit 18, and a learned score estimation parameter storage unit 19. The estimation unit 50 includes a feature amount extraction unit 12 and a score estimation unit 13.

As described above, in step S9 shown in FIG. 2, when the parameter update unit 14 determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and The learned feature amount extraction parameters and the learned score estimation parameters are recorded in the score estimation parameter storage unit 16, respectively.

The learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1 is completed. The learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1 is completed.

Input unit 11-1 takes in arbitrary video data given from the outside. The input unit 11-1 outputs the captured moving image data to the feature amount extraction unit 12.

The feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121. The feature amount extraction unit 12 gives the moving image data output by the input unit 11-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.

The score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131. The score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17. The output unit 17 outputs the estimated score output by the score estimation unit 13 to the outside.

In the score estimation device 2 of the first embodiment described above, the estimation unit 50 uses the learning device 1 to obtain learned parameters (learned feature amount extraction parameters and learned score estimation parameters). ) Has a function approximater (first function approximater and second function approximater) that approximates the function based on), and by giving the moving image data to the function approximator as an input, the estimated score of the moving image data can be obtained. presume. As a result, the score estimation device 2 is a learned feature amount obtained by the learning process of the learning device 1 that learns the know-how regarding the scoring method of the official referee more accurately than the technique described in Non-Patent Document 1. Since the estimated score for arbitrary moving image data can be obtained based on the extraction parameter and the obtained learned score estimation parameter, it is possible to obtain a more accurate estimated score.

(Second Embodiment)
FIG. 5 is a block diagram showing an internal configuration of the learning device 1a according to the second embodiment. The same configurations as those of the learning device 1 of the first embodiment are designated by the same reference numerals, and different configurations will be described below. The learning device 1a includes a training data storage unit 10a, an input unit 11a, an estimation unit 50a, a parameter update unit 14a, a feature amount extraction parameter storage unit 15, a score estimation parameter storage unit 16, and a class estimation parameter storage unit 21. .. The estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.

The training data storage unit 10a stores in advance a plurality of training data in which each of the plurality of moving image data, each of the plurality of true value scores, and each of the plurality of true value class labels are combined.

A plurality of video data are classified into a plurality of predetermined classes based on the contents recorded in each video data. Here, the class is a type of competition having different scoring criteria such as high diving and gymnastics. The true value class label is identification information indicating the class to which the corresponding moving image data belongs by classification.

The input unit 11a repeatedly reads n training data from the training data storage unit 10a. Here, n is an integer of 2 or more, and is a batch size when the learning process described below is performed. The number of training data stored by the training data storage unit 10a is assumed to be a multiple of n, that is, n × m (where m is an integer of 1 or more).

In the following description, any one video data included in the n training data shown in v _i or v _j, indicates a true value score corresponding to the moving image data v _i as g _i, moving image data v _j The true score corresponding to is shown as _{g j.} Also shows a true value class label corresponding to the moving image data v _i as k _i, indicating the true value class label corresponding to the moving image data v _j as k _j. However, it is assumed that it is an integer of i = 1 to n and j = 1 to n, and j> i.

The input unit 11a outputs n moving image data v _{1 to n} included in the n read training data to the feature amount extracting unit 12 one by one. Further, the input unit 11a outputs n true value scores g _{1 to n} included in the n read training data and n true value class labels k _{1 to n} to the parameter update unit 14a.

The class estimation parameter storage unit 21 stores class estimation parameters that serve as weights and biases to be applied to the third function approximator of the class estimation unit 20. The class estimator 20 has a third function approximation device, and applies the class estimation parameters stored in the class estimation parameter storage unit 21 to the third function approximation device. The third function approximator approximates the function corresponding to the class estimation parameter by applying the class estimation parameter. Class estimator 20 estimates the estimated class c _i by applying the third function approximator the feature amount by the feature extraction unit 12 has extracted as an input. Here, the estimated class c _i is the information indicated by the probability of each class, by reference to the estimated class c _i, to identify whether the probability of belonging to any corresponding moving image data v _i class higher Can be done.

Here, the third function approximator is an arbitrary neural network that estimates the estimation class from the features. For example, a neural network of a fully connected layer in which the Softmax layer is connected in the subsequent stage (hereinafter, "fully connected layer + Softmax layer"). 201 ”) and the like are applied.

Similar to the parameter update unit 14 of the first embodiment, the parameter update unit 14a has n true value scores g _{1 to n} output by the input unit 11a and n estimated scores estimated by the score estimation unit 13. Regression loss between each of the _{estimated scores s 1 to n} and each of the true score g _{1 to n} based on s _{1 to n} and the first loss function represented by the above equation (2). Is calculated.

Further, the parameter update unit 14a has n true value class labels k _{1 to n} _{output by the input unit 11a, n estimation classes c 1 to n} estimated by the class estimation unit 20, and a third predetermined class. Based on the loss function of, the class loss between each of the _{estimated classes s 1 to n} and each of the true value class labels k _{1 to n is calculated.}

Here, for example, Cross Entropy Loss shown in the following equation (5) is applied as the third loss function.

In equation (5), Y is the number of classes. For example, assume that Y = 3 and the three classes are shown as Class1, Class2, and Class3. When the moving image data v ₁ with i = 1 belongs to the class of Class 1, the probability of belonging to Class 1 is 100%, and the probability of belonging to Class 2 and Class 3 is 0%. In this case, the true value class labels k _{1, y} are shown in the form of, for example, k ₁ , 1 = 1.0, k _{1, 2,} = 0.0, k _{1, 3 = 0.0.} The estimation classes c _{1, y} are the probabilities that the corresponding video data v ₁ belongs to each of the three classes, for example, c ₁ , 1 = 0.8, c _{1, 2,} = 0.5, c ₁ , 3 = 0. It is shown in the form of .2.

The parameter update unit 14a, two different moving image data _v i, _v all combinations of each corresponding to the two estimated scores _s i of _{_j, s} j, two true value score _g i, _{g j} and two Based on the estimation classes c _i and c _j and the predetermined fourth loss function, the ranking loss indicating the degree of error in the order of the _{two estimation scores s i} and s _{j is} _{the two true scores g i} and g. considering the size of the difference between _j, and two putative class c _i, it is calculated by considering the correlation between c _j.

Here, as the fourth loss function, the loss function represented by the following equation (6) is applied.

Comparing the equation (6) with the equation (3) which is the second loss function of the first embodiment, there is a difference that the output of the ReLU function of the equation (3) is multiplied by the correlation.

In the formula (6), correlation is two putative class _c i, a correlation coefficient indicating a degree of similarity _{c j.} Here, for example, Spearman's rank correlation coefficient obtained by the equation (7) is applied as the correlation coefficient.

In equation (7), Y is the number of classes as in equation (5). CR _{i, y} is the order of class y in estimated class _{c i.} For example, in the case of Y = 3, if the estimated class _{c i} is _{_{c i, 1 = 0.5, c}} i, 2 = 0.8, represented by _c i, 3 = 0.2, belongs to Class2 The probability is 1st, the probability of belonging to Class1 is 2nd, and the probability of belonging to Class3 is 3rd. In this case, CR _{i, 1} = 2, CR _{i, 2} = 1, CR _{i, 3} = 3.

The parameter update unit 14a includes the calculated regression loss, that is, Loss1, which is the output value of the equation (2), the calculated class loss, that is, Loss3, which is the output value of the equation (5), and the calculated ranking loss, that is, the equation ( The learning process is performed so as to reduce the class 4 which is the output value of 6). The parameter update unit 14a calculates a new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter by learning processing.

The parameter update unit 14a stores the feature quantity extraction parameter storage unit 15 and the score estimation parameter storage based on the calculated new feature quantity extraction parameter, new score estimation parameter, and new class estimation parameter. The contents of the unit 16 and the parameter storage unit 21 for class estimation are updated.

(Processing by the learning device of the second embodiment)
Next, the process by the learning device 1a of the second embodiment will be described with reference to FIGS. 6 and 7. FIG. 6 is a flowchart showing the flow of the learning process performed by the learning device 1a.

The feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 have an initial value feature amount extraction parameter, an initial value score estimation parameter, and an initial value score estimation parameter, respectively. Initial value class estimation parameters are stored in advance.

Regarding steps S21 and S22, the same processing as steps S1 and S2 of the first embodiment shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13. The class estimation unit 20 reads out the class estimation parameters from the class estimation parameter storage unit 21, and applies the read class estimation parameters to the neural network of the fully connected layer + Softmax layer 201, which is a third function approximation device (step). S23).

The input unit 11a reads the first n training data from the training data storage unit 10a. As shown in FIG. 7, the input unit 11a outputs n moving image data v _{1 to n} included in the n training data read out to the feature extraction unit 12 one by one. Further, the input unit 11a outputs n true value scores g _{1 to n} and n true value class labels k _{1 to n} included in the read training data to the parameter update unit 14a. The parameter update unit 14a takes in n true value scores g _{1 to n} and n true value class labels k _{1 to n} output by the input unit 11a (step S24).

n are each the number of moving image data _{v 1 ~ n} about video data _{v i,} the processing of step S25, S26, S27 are repeated (loop L2s ~ L2e).

For steps S25 and S26, the same processing as in steps S4 and S5 shown in FIG. 2 is performed by the feature amount extraction unit 12 and the score estimation unit 13. Incidentally, in step S26, the score estimation unit 13 outputs the estimated score _{s i} obtained in the parameter update unit 14a.

Class estimator 20 gives the total binding layer + Softmax layer 201 as inputs the feature quantity of the moving image data _{v i,} as shown in FIG. 7, the estimated class _{c i} of the moving image data _{v i} as the output of the total binding layer + Softmax layer 201 get. Class estimator 20 outputs the estimated class _{c i} of the acquired moving image data _{v i} in the parameter update unit 14a (step S27).

That is, as shown in FIG. 7, the same feature amount extraction parameter, the same score estimation parameter, and the same class estimation parameter are used in the moving image feature amount extraction layer 121, the fully connected layer 131, and the like, respectively. In the state of being applied to the fully connected layer + Softmax layer 201, the processes of steps S25, S26, and S27 are performed n times with each of the _{n moving image data v1 to n as an input.}

Regarding step S28, the same processing as in step S6 shown in FIG. 2 is performed by the parameter update unit 14a.

_{When the parameter update unit 14a captures the n estimation classes c1 to n} estimated by the class estimation unit 20, the fetched n estimation classes c1 _{to n} and the n true value classes captured in step S24. Based on the labels k1 _{to n} , the class loss Loss3 is calculated by the equation (5) (step S29).

The parameter update unit 14a has a ranking loss according to the equation (6) based on _{n estimated scores s 1 to n} , n true score g _{1 to n} , and n estimated classes c _{1 to n.} Score4 is calculated (step S30).

The parameter update unit 14a calculates the evaluation loss Loss by, for example, the following equation (8) (step S31).

In the above equation (8), α ₂ , β ₂ and γ ₂ are α ₂ > 0, β ₂ > 0, and γ ₂ > 0, and are constants arbitrarily determined to balance the three losses. be. Further, || ω || ² is a term of L2-regulation.

The parameter update unit 14a determines whether or not the calculated evaluation loss Loss satisfies the end condition (step S32). For example, when the valuation loss Loss is less than a predetermined threshold value, it is determined that the valuation loss satisfies the end condition.

When the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition (step S32, Yes), the parameter update unit 14a ends the process. On the other hand, when the parameter update unit 14a determines that the evaluation loss Loss does not satisfy the end condition (step S32, No), the parameter update unit 14a reduces, for example, the error reverse so as to reduce the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4. A new feature amount extraction parameter, a new score estimation parameter, and a new class estimation parameter are calculated by learning processing using a propagation method or the like.

The parameter update unit 14a writes the calculated new feature amount extraction parameter to the feature amount extraction parameter storage unit 15 to update the feature amount extraction parameter. The parameter update unit 14a writes the calculated new score estimation parameter to the score estimation parameter storage unit 16 to update the score estimation parameter. The parameter update unit 14a writes the calculated new class estimation parameter to the class estimation parameter storage unit 21 to update the class estimation parameter (step S33).

After that, the processing from step S21 is repeated, and in step S21 to be performed again, the feature amount extraction unit 12 reads the updated feature amount extraction parameter from the feature amount extraction parameter storage unit 15 and extracts the moving image feature amount. Applies to layer 121. Further, in step S22 performed again, the score estimation unit 13 reads the updated score estimation parameter from the score estimation parameter storage unit 16 and applies it to the fully connected layer 131. Further, in step S23 to be performed again, the class estimation unit 20 reads the updated class estimation parameters from the class estimation parameter storage unit 21 and applies them to the fully connected layer + Softmax layer 201.

The input unit 11a reads out the next n training data from the training data storage unit 10a in step S24 to be performed again. In the process of repeating the process, when the processes of steps S25, S26, and S27 are performed for all the training data stored in the training data storage unit 10a, the input unit 11a again performs the first n trainings. Reading from the training data storage unit 10a is repeated in order from the data.

In step S32, when the parameter update unit 14a determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 In each case, the parameters for extracting the learned feature amount in the state where the regression loss Loss1, the class loss Loss3, and the ranking loss Loss4 are sufficiently small, the learned score estimation parameters, and the trained class estimation parameters are used. Will be recorded.

In the learning device 1a of the second embodiment described above, the parameter update unit 14a is between each of the plurality of estimated scores estimated by the score estimation unit 13 and each of the true value scores corresponding to each of the estimated scores. A first loss function for finding the regression loss, and a third for finding the class loss between each of the plurality of estimation classes estimated by the class estimation unit 20 and each of the true value class labels corresponding to each of the estimation classes. Based on the loss function and the two estimated scores and the two true scores corresponding to each of all the combinations of the two different video data, the ranking loss indicating the degree of order error between the two estimated scores is obtained. A first loss based on a loss function of four, the magnitude of the difference between the two true scores, and a fourth loss function that corrects the ranking loss by taking into account the correlation between the two estimation classes. The function approximation device of the estimation unit 50a is performed by performing learning processing to reduce each of the regression loss which is the output of the function, the class loss which is the output of the third loss function, and the ranking loss which is the output of the fourth loss function. The parameters applied to (the first function approximation device, the second function approximation device, and the third function approximation device), that is, the feature quantity extraction parameter, the score estimation parameter, and the class estimation parameter are updated. By using the fourth loss function, as shown below, the learning device 1a provides know-how regarding the official referee's scoring method for the athlete's competition more accurately than the technique described in Non-Patent Document 1. It becomes possible to learn.

That is, when the equation (3), which is the second loss function in the first embodiment, and the equation (6), which is the fourth loss function in the second embodiment, are compared, in the equation (6), the equation ( two putative class _c i relative ReLU function 3), by multiplying the correlation coefficient correlation of _{c j,} two putative class _c i, employs a ranking loss in consideration of the correlation of _{c j.} Therefore, the learning device 1a of the second embodiment has the following effects in addition to the effects of the learning device 1 of the first embodiment.

By using the fourth loss function, in the learning device 1a, the constraint of ranking loss can be strengthened for similar competitions, and conversely, the constraint of ranking loss can be weakened for competitions that are not similar. be able to. Thus, for example, even when a plurality of types of competitions, such as the moving image data v _i included in the training data Dive and gymnastics are recorded, the learning apparatus 1a, taking into account the type of differences in the competition Therefore, since the learning process is performed, it becomes possible to learn the know-how regarding the scoring method of the official referee more accurately than the learning device 1.

(Score estimation device of the second embodiment)
FIG. 8 is a block diagram showing an internal configuration of the score estimation device 2a according to the second embodiment. In FIG. 8, the same configuration as that of the learning device 1a shown in FIG. 5 is shown with the same reference numerals. The score estimation device 2a includes an input unit 11a-1, an estimation unit 50a, an output unit 17a, a learned feature amount extraction parameter storage unit 18, a learned score estimation parameter storage unit 19, and a learned class estimation parameter storage unit 22. To be equipped. The estimation unit 50a includes a feature amount extraction unit 12, a score estimation unit 13, and a class estimation unit 20.

As described above, in step S32 shown in FIG. 6, when the parameter update unit 14a determines “Yes”, that is, determines that the evaluation loss Loss satisfies the end condition, the feature amount extraction parameter storage unit 15 and , The score estimation parameter storage unit 16 and the class estimation parameter storage unit 21 each record a learned feature amount extraction parameter, a learned score estimation parameter, and a learned class estimation parameter. Will be done.

The learned feature amount extraction parameter storage unit 18 stores in advance the learned feature amount extraction parameters recorded in the feature amount extraction parameter storage unit 15 when the learning process of the learning device 1a is completed. The learned score estimation parameter storage unit 19 stores in advance the learned score estimation parameters recorded in the score estimation parameter storage unit 16 when the learning process of the learning device 1a is completed. The class estimation parameter storage unit 22 stores in advance the learned class estimation parameters recorded in the class estimation parameter storage unit 21 when the learning process of the learning device 1a is completed.

The input unit 11a-1 takes in arbitrary video data given from the outside. The input unit 11a-1 outputs the captured moving image data to the feature amount extraction unit 12.

The feature amount extraction unit 12 reads the learned feature amount extraction parameters from the learned feature amount extraction parameter storage unit 18, and applies the learned feature amount extraction parameters read out to the moving image feature amount extraction layer 121. The feature amount extraction unit 12 gives the moving image data output by the input unit 11a-1 to the moving image feature amount extraction layer 121 as an input, acquires the feature amount of the moving image data as an output, and transfers the acquired feature amount to the score estimation unit 13. Output.

The score estimation unit 13 reads the learned score estimation parameters from the learned score estimation parameter storage unit 19, and applies the learned score estimation parameters read out to the fully connected layer 131. The score estimation unit 13 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer 131 as an input, acquires an estimated score as an output, and outputs the acquired estimated score to the output unit 17a.

The class estimation unit 20 reads the learned class estimation parameters from the learned class estimation parameter storage unit 22, and applies the learned class estimation parameters read out to the fully connected layer + Softmax layer 201. The class estimation unit 20 gives the feature amount output by the feature amount extraction unit 12 to the fully connected layer + Softmax layer 201 as an input, acquires an estimation class as an output, and outputs the acquired estimation class to the output unit 17a. The output unit 17a outputs the estimated score output by the score estimation unit 13 to the outside, and outputs the estimation class output by the class estimation unit 20 to the outside.

When only the estimated score is obtained in the score estimation device 2a of the second embodiment described above, the class estimation unit 20 and the learned class estimation parameter storage unit 22 may not be provided.

In the score estimation device 2a of the second embodiment described above, the estimation unit 50a uses the learning device 1a to obtain learned parameters (learned feature amount extraction parameters, learned score estimation parameters). And has a function fitter (first function fitter, second function fitter, and third function fitter) that approximates the function based on the trained class estimation parameters), and the function fitter is a movie. By giving the data as an input, the estimated score of the moving image data is estimated.

In the second embodiment, the estimated class c _i by the learning device 1a, the learning process, including the true value class label k _i being performed. Therefore, unlike the first embodiment, the learned feature extraction parameters as for learned score estimation of the second embodiment, is reflected information of the class moving image data v _i belongs. Therefore, the score estimation device 2a of the second embodiment can obtain the estimated score more accurately than the first embodiment for the video data of many kinds of competitions.

Further, in the first and second embodiments described above, an example in which the MSE represented by the equation (2) is applied as the first loss function is shown, but instead of the MSE, another regression such as L1-Loss or the like is shown. A function for calculating the loss may be applied.

Further, the formula (4) for calculating the valuation loss Loss of the first embodiment and the formula (8) for calculating the valuation loss Loss of the second embodiment are examples. In the first embodiment, an arbitrary formula is applied so as to balance the regression loss and the ranking loss, and in the second embodiment, the regression loss, the ranking loss and the class loss can be balanced. You may.

Further, although the example of applying Cross Entropy Loss as the third loss function is shown in the second embodiment, another function may be applied as the third loss function. Further, although an example in which the Spearman's rank correlation coefficient shown in the equation (7) is applied as the correlation coefficient correlation of the equation (6) is shown, another correlation coefficient may be applied as the correlation coefficient correlation. ..

Further, in the first and second embodiments described above, the training data storage units 10, 10a are provided inside the

learning devices

1, 1a, but may be provided outside the

learning devices

1, 1a. .. Further, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 may also be provided outside the score estimation devices 2 and 2a. ..

Further, the training data storage units 10, 10a, the learned feature amount extraction parameter storage unit 18, the learned score estimation parameter storage unit 19, and the learned class estimation parameter storage unit 22 store data to be stored. It is desirable to apply a non-volatile storage area because it is a storage unit. On the other hand, since the feature amount extraction parameter storage unit 15, the score estimation parameter storage unit 16, and the class estimation parameter storage unit 21 are storage units for temporarily storing data, they have a non-volatile storage area. It may be applied or a volatile storage area may be applied.

Further, the first function approximation device, the second function approximation device, and the third function approximation device shown in the first and second embodiments described above are other than the neural network having the above-described configuration. A neural network of configuration may be applied. Further, instead of the neural network, other means capable of learning processing used in machine learning may be applied. Further, it does not have to be separated like the first function approximation device, the second function approximation device, and the third function approximation device. In the first embodiment, the first function approximation device and the second function approximation device are used. The approximations may be integrated to form one function approximation, or in the second embodiment, the first function approximation, the second function approximation, and the third function approximation are integrated into one. Two function approximations may be configured.

The

learning devices

1, 1a and the score estimation devices 2, 2a in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

It can be used for scoring competitions in sports competitions.

1 ... Learning device, 10 ... Training data storage unit, 11 ... Input unit, 12 ... Feature quantity extraction unit, 13 ... Score estimation unit, 14 ... Parameter update unit, 15 ... Feature quantity extraction parameter storage unit, 16 ... Score estimation Parameter storage unit, 50 ... estimation unit

Claims

An input unit that captures training data that combines video data that records the movements of the athlete during the competition and multiple true score scores that are the scores scored by the referee for the competition recorded in the video data.
An estimation unit that has a function approximation device that approximates a function based on parameters, and estimates the estimated score of the moving image data by giving the moving image data captured by the input unit to the function approximation unit as an input.
Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. A parameter update unit that updates the parameters by performing a learning process that reduces each of the ranking losses,
A learning device equipped with.
The moving image data is pre-classified into one of a plurality of predetermined classes based on the content recorded in the moving image data, and a true value class label indicating the class to which the moving image data belongs is attached to the moving image data. It is given in advance and
The input unit is
The training data in which the moving image data, the true value score corresponding to the moving image data, and the true value class label given to the moving image data are combined is taken in, and the training data is taken.
The estimation unit
By giving the moving image data captured by the input unit to the function approximator as an input, the estimated score of the moving image data and the estimation class indicating the probability of which said moving image data belongs to are estimated.
The parameter update unit
A third loss function for finding the class loss between each of the plurality of estimation classes and each of the true value class labels corresponding to each of the estimation classes, and all combinations of the two different moving image data. A fourth loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true values corresponding to each, and the two true values. The regression loss, which is the output of the first loss function, based on the magnitude of the difference in scores and the fourth loss function that corrects the ranking loss in consideration of the correlation between the two estimation classes. By performing learning processing to reduce each of the class loss which is the output of the third loss function and the ranking loss which is the output of the fourth loss function used in place of the second loss function. Update the parameters,
The learning device according to claim 1.
An input unit that captures video data that records the movements of the athlete during the competition,
The learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is provided in the function approximator. An estimation unit that estimates the estimated score of the video data by giving the video data captured by the input unit as input, and an estimation unit.
A score estimator equipped with.
Training data that combines the video data that records the movements of the athlete during the competition and a plurality of true scores, which are the scores scored by the referee for the competition recorded in the video data, is captured.
By giving the video data captured in the function approximator that approximates the function based on the parameters as input, the estimated score of the video data is estimated.
Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. The parameters are updated by performing a learning process that reduces each of the ranking losses.
Learning method.
Computer,
An input means for capturing training data that combines video data that records a player's movements during competition and multiple true score scores that are scores scored by the referee for the competition recorded in the video data.
An estimation means that has a function approximation device that approximates a function based on parameters, and estimates the estimated score of the moving image data by giving the moving image data taken in by the input means to the function approximation device as an input.
Each of the first loss function for obtaining the regression loss between each of the plurality of estimated scores and each of the true value scores corresponding to each of the estimated scores, and all combinations of the two different moving image data. A second loss function for obtaining a ranking loss indicating the degree of order error between the two estimated scores based on the two estimated scores and the two true scores corresponding to the two true scores. The regression loss, which is the output of the first loss function, and the output of the second loss function, based on the second loss function that corrects the ranking loss in consideration of the magnitude of the difference between the two. A parameter updating means for updating the parameters by performing a learning process that reduces each of the ranking losses.
A learning program to function as.
Import video data recording the athlete's movements during the competition,
The moving image data taken into the learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is given as an input. By doing so, the estimated score of the video data is estimated.
Score estimation method.
Computer,
Input means for capturing video data recording the player's movements during the competition,
The learning device according to claim 1 or a function approximator that approximates a function based on the learned parameters obtained by the learning process of the learning device according to claim 2 is provided in the function approximator. An estimation means that estimates the estimated score of the video data by giving the video data captured by the input means as input.
A score estimation program to function as.