WO2022049694A1 - 学習装置、推定装置、学習方法、及び学習プログラム - Google Patents
学習装置、推定装置、学習方法、及び学習プログラム Download PDFInfo
- Publication number
- WO2022049694A1 WO2022049694A1 PCT/JP2020/033425 JP2020033425W WO2022049694A1 WO 2022049694 A1 WO2022049694 A1 WO 2022049694A1 JP 2020033425 W JP2020033425 W JP 2020033425W WO 2022049694 A1 WO2022049694 A1 WO 2022049694A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video data
- learning
- score
- mask
- data
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 title claims description 34
- 238000011156 evaluation Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 90
- 230000006870 function Effects 0.000 claims description 42
- 238000013459 approach Methods 0.000 claims description 2
- 230000000873 masking effect Effects 0.000 claims description 2
- 238000013500 data storage Methods 0.000 description 22
- 238000009827 uniform distribution Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 230000009189 diving Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63B—APPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
- A63B71/00—Games or sports accessories not covered in groups A63B1/00 - A63B69/00
- A63B71/06—Indicating or scoring devices for games or players, or for other sports activities
- A63B71/0619—Displays, user interfaces and indicating devices, specially adapted for sport equipment, e.g. display mounted on treadmills
- A63B71/0669—Score-keepers or score display devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30221—Sports video; Sports image
Definitions
- the present invention relates to, for example, a learning device for learning know-how regarding a player's competition scoring method, a learning method, and a learning program, and an estimation device for estimating a competition score based on a learning result.
- Non-Patent Document 1 a method has been proposed in which video data recording a series of movements played by an athlete is used as input data, and features are extracted from the video data by deep learning to estimate a score. There is.
- FIG. 10 is a block diagram showing a schematic configuration of a learning device 100 and an estimation device 200 in the technique described in Non-Patent Document 1.
- the learning processing unit 101 of the learning device 100 contains, as learning data, video data recording a series of movements played by the athlete, and a true score t score scored by the referee for the athlete's competition. Is given.
- the learning processing unit 101 includes a DNN (Deep Neural Network), and applies coefficients such as weights and biases stored in the learning model data storage unit 102 to the DNN.
- DNN Deep Neural Network
- the learning processing unit 101 calculates the loss LSR using the estimated score y score obtained as an output value by giving the video data to the DNN and the true score t score corresponding to the video data.
- the learning processing unit 101 calculates a new coefficient to be applied to the DNN by the error back propagation method so as to reduce the calculated loss LSR .
- the learning processing unit 101 updates the coefficient by writing the calculated new coefficient to the learning model data storage unit 102.
- the estimation device 200 includes an estimation processing unit 201 having a DNN having the same configuration as the learning processing unit 101, and a learning model data storage unit that previously stores learned learning model data stored by the learning model data storage unit 102 of the learning device 100. It is equipped with 202. The learned learning model data stored in the learning model data storage unit 202 is applied to the DNN of the estimation processing unit 201.
- the estimation processing unit 201 provides the DNN with video data recording a series of movements in which an arbitrary competitor has competed, so that an estimated score y score for the competition can be obtained as an output value of the DNN.
- Video data (hereinafter referred to as "original video data") recording a series of movements played by the athlete shown in FIG. 11 (a) and a plurality of image frames included in the original video data shown in FIG. 11 (b).
- original video data recording a series of movements played by the athlete shown in FIG. 11 (a) and a plurality of image frames included in the original video data shown in FIG. 11 (b).
- the area where the athlete is displayed is surrounded by rectangular areas 301, 302, 303, and the rectangular area is filled with the average color of the image frame (hereinafter referred to as “competitor concealed image data”).
- the range of areas 301, 302, and 303 is shown by a dotted line frame, but this dotted line frame is shown to clarify the range and does not exist in the actual player concealed video data. ..
- the accuracy of the estimated score y score obtained when the original video data was given to the estimation processing unit 201 was “0.8890”.
- the accuracy of the estimated score y score obtained when the player concealed video data was given to the estimation processing unit 201 was "0.8563". .. From this experimental result, when the athlete concealed video data is given to the estimation processing unit 201, the score is estimated with high accuracy even though the athlete's movement cannot be seen, and the original video showing the athlete's movement can be seen. It can be seen that the estimation accuracy of the score is hardly reduced compared to the case of the data.
- Non-Patent Document 1 only video data is given as learning data without explicitly giving features related to the movement of the athlete, for example, joint coordinates. Therefore, from the above experimental results, the technique described in Non-Patent Document 1 extracts features in the image that are not related to the movement of the athlete, for example, background features such as the venue, and the learning model data is the competition. It is presumed that it is not generalized to the behavior of the person. Since the characteristics of the background such as the venue are extracted, it is presumed that the technique described in Non-Patent Document 1 may deteriorate the accuracy of the video data including the unknown background.
- the present invention generates learning model data generalized to the movement of the athlete from the video data recording the movement of the athlete without explicitly giving information such as joint information that is difficult to estimate.
- the purpose is to provide technology that makes it possible to do so.
- One aspect of the present invention is a mask video data in which a part of a region surrounding the competitor in each of a plurality of image frames included in the video data recording the movement of the competitor is arbitrarily masked, and recorded in the video data.
- a learning device provided with a learning processing unit that generates learning model data showing the relationship between the true value score, which is the evaluation value for the competition of the competitor, and the mask score weighted according to the ratio of the masked area. be.
- One aspect of the present invention is a mask video data in which a part of a region surrounding the competitor in each of a plurality of image frames included in the video data recording the movement of the competitor is arbitrarily masked, and recorded in the video data.
- This is a learning method for generating learning model data showing the relationship between the true value score, which is the evaluation value for the competition of the competitor, and the mask score weighted according to the ratio of the masked region.
- One aspect of the present invention is masked video data in which a part of a region surrounding the competitor in each of a plurality of image frames included in the video data recording the movement of the competitor is arbitrarily masked on a computer, and the video data.
- a procedure for generating training model data showing the relationship between the true value score, which is the evaluation value for the competition of the competitor, and the mask score weighted according to the ratio of the masked area is executed. It is a learning program for.
- FIG. 1 It is a block diagram which shows the structure of the learning apparatus by embodiment of this invention. It is a figure which shows an example of the image frame in this embodiment. It is a flowchart which shows the process flow of the learning data generation part of this embodiment. It is a figure which shows the relationship between the image frame of this embodiment, the area shown by the player area specific data, and a mask area. It is a figure which shows the state which masked the mask area of the image frame of this embodiment. It is a flowchart which shows the processing flow of the learning processing part of this embodiment. It is a figure which shows an example of the function approximator provided in the learning processing part of this embodiment, and the data given to the function approximator.
- FIG. 1 is a block diagram showing a configuration of a learning device 1 according to an embodiment of the present invention.
- the learning device 1 includes an input unit 11, a learning data generation unit 12, a learning processing unit 13, and a learning model data storage unit 14.
- the input unit 11 captures video data in which a series of movements to be evaluated for scoring among the movements performed by the athlete are recorded together with the background. For example, if the athlete is a swimmer with a high diving, the video data shows that the athlete stands on the diving platform, jumps, twists, rotates, and so on until the entry into the pool is complete. The operation of will be recorded.
- the input unit 11 captures the player area identification data indicating the position of the area in which the player is displayed in each of the plurality of image frames included in each of the video data is surrounded by a rectangular shape.
- FIG. 2 is a diagram showing one image frame 41 included in video data recording a high diving competition for swimming, and a rectangular area 51 shown by a dotted line surrounding the entire image 71 of the athlete is a player. Area This is the area indicated by the specific data.
- the competitor area identification data is data including XY coordinates of four vertices of a rectangular shape when the position of each pixel of the image frame 41 is indicated by XY coordinates with the upper left corner as the origin, for example.
- the athlete area identification data may be automatically generated from each of the image frames included in the video data by, for example, the technique shown in the following reference, or all the image frames included in the video data can be visually confirmed. However, it may be generated manually.
- the input unit 11 takes in the true score, which is the evaluation value for the movement of the athlete recorded in the video data.
- the true score is, for example, a score actually scored by the referee when the video data is recorded with respect to the movement of the athlete recorded in the video data.
- the input unit 11 captures a plurality of video data
- the input unit 11 captures a plurality of competitor area specific data for each image frame included in the video data and one true value score for each video data.
- the true value score is associated with the video data
- each of the plurality of competitor area specific data is associated with each of the plurality of image frames included in the video data.
- the learning data generation unit 12 corresponds to each of the plurality of image frames included in the video data based on the video data output by the input unit 11 and the competitor area identification data corresponding to the video data. Masked image data in which a part of the area indicated by the athlete area specific data is arbitrarily masked is generated for each image data. The learning data generation unit 12 generates a mask score for each video data, which is weighted according to the ratio of the masked region to the true value score corresponding to the video data output by the input unit 11.
- the learning processing unit 13 generates learning model data showing the relationship between the mask video data and the mask score corresponding to the mask video data. More specifically, the learning processing unit 13 has a function approximation device, reads out the coefficient of the function approximation device stored in the learning model data storage unit 14, and applies the read coefficient to the function approximation device. The learning processing unit 13 performs training processing so that the estimated score obtained as an output value by giving the mask image data to the function approximation device approaches the mask score corresponding to the mask image data, so that the coefficient of the function approximation device is obtained. To update.
- the function approximator is, for example, DNN.
- the coefficient is a weight or bias applied to each of a plurality of neurons included in the DNN.
- the learning model data storage unit 14 stores in advance the initial values of the coefficients applied to the function approximation device of the learning processing unit 13 in the initial state.
- the coefficient stored in the learning model data storage unit 14 is rewritten to a new coefficient by the learning processing unit 13 every time the learning processing unit 13 calculates a new coefficient by the learning processing.
- FIG. 3 is a flowchart showing the flow of the process of generating the mask video data and the mask score performed by the learning data generation unit 12.
- the learning data generation unit 12 takes in the plurality of video data output by the input unit 11, the athlete area specific data corresponding to each of the plurality of video data, and the true value score (step Sa1).
- the learning data generation unit 12 repeatedly performs the processes of steps Sa2 to Sa8 for each of the plurality of video data (loops La1s to La1e).
- the learning data generation unit 12 randomly selects a predetermined ratio ( ⁇ ) indicating the ratio of the area to be masked (hereinafter referred to as “mask area”) from a value between 0 and 1.
- ⁇ a predetermined ratio
- the learning data generation unit 12 selects a predetermined ratio ( ⁇ ) based on a uniform distribution in which each value between 0 and 1 appears with the same probability (step Sa2).
- the learning data generation unit 12 calculates the mask score based on the true value score corresponding to the video data to be processed and the selected predetermined ratio ( ⁇ ). For example, when the true value score is t score and the mask score is m score , the learning data generation unit 12 calculates the mask score m score by the following equation (1) (step Sa3).
- the learning data generation unit 12 repeatedly performs the processes of steps Sa4 to Sa8 for each of the plurality of image frames included in the video data to be processed (loops La2s to La2e).
- steps Sa4 to Sa8 will be described with reference to FIGS. 4 and 5.
- the image frame 41 shown in FIG. 4 is the image frame to be processed by the learning data generation unit 12.
- the learning data generation unit 12 has a vertical length (H), a horizontal length (W), and an area (S) of the region 51 indicated by the athlete area identification data corresponding to the image frame 41 to be processed. ) Is calculated based on the XY coordinates of the four vertices included in the athlete area identification data (step Sa4).
- the learning data generation unit 12 is based on, for example, the following equation (2) based on the predetermined ratio ( ⁇ ) selected in step Sa2 and the area (S) of the region 51 indicated by the calculated athlete area identification data. , The area of the mask area (S') is calculated (step Sa5).
- the learning data generation unit 12 selects the range of the mask area so as to be the area (S') of the mask area. Specifically, the learning data generation unit 12 selects the length (H') in the vertical direction and the length (W') in the horizontal direction of the mask area. For example, the learning data generation unit 12 randomly selects the lateral length (W') of the mask area from the range of the following equation (3).
- the learning data generation unit 12 uses the following equation (H') for the vertical length (H') of the mask area based on the selected horizontal length (W') and the area (S') of the mask area. Calculated according to 4).
- the vertical length (H') of the mask area may be selected first. ..
- the learning data generation unit 12 randomly selects the vertical length (H') of the mask area from the range of the following equation (5).
- the learning data generation unit 12 uses the following equation (W') for the horizontal length (W') of the mask area based on the selected vertical length (H') and the area (S') of the mask area. Calculated according to 6).
- the reason for selecting from the range of the formula (3) or the formula (5) is to make sure that the range of the mask area falls within the range of the area 51 indicated by the athlete area specific data.
- the learning data generation unit 12 randomly selects based on a uniform distribution (step Sa6).
- the learning data generation unit 12 considers the vertical length (H') and the horizontal length (W') of the mask area, and the entire mask area is the area 51 indicated by the athlete area specific data. Randomly select the position of the mask area within the range that fits in.
- each pixel of the image frame 41 is shown by, for example, the XY coordinates with the upper left corner as the origin, the right direction is the direction in which the X coordinates increase, and the downward direction is the direction in which the Y coordinates increase.
- the upper left coordinate of the area 51 indicated by the athlete area identification data is (X 1 , Y 1 ).
- the learning data generation unit 12 randomly selects, for example, the X coordinate at the upper left of the mask area from the range of X 1 to X 1 + (WW') based on a uniform distribution, and the mask area.
- the upper left Y coordinate is randomly selected from the range of Y 1 to Y 1 + (HH') based on a uniform distribution (step Sa7).
- FIG. 4 shows an example of four mask regions 61, 62, 63, 64 randomly selected with respect to the region 51 indicated by the athlete region identification data in one image frame 41. Since the learning data generation unit 12 randomly selects one mask area for one image frame, any one of the mask areas 61, 62, 63, and 64 is used as the mask area of the image frame 41. Will be selected. As shown in FIG. 4, all of the four mask areas 61, 62, 63, and 64 are arranged at positions within the range of the area 51 indicated by the athlete area identification data.
- the learning data generation unit 12 selects a color to fill the mask area. For example, the learning data generation unit 12 selects the average color of the image frame to be processed as the color that fills the mask area. The learning data generation unit 12 performs mask processing by filling the range of the mask area of the image frame to be processed with a selected color (step Sa7).
- the mask areas 61, 62, 63, 64 shown in FIG. 4 are applied to the image frame 41, and the range of the mask areas 61, 62, 63, 64 is an image.
- An example of being filled with the average color of the frame 41 is shown.
- a part of the region 51 indicated by the athlete region identification data in the image frame 41 is arbitrarily masked.
- the learning data generation unit 12 performs the processing of steps Sa4 to Sa8 for each of the image frames included in the video data to be processed (loop La2e), and in all the image frames of the video data to be processed. Generates masked video data that has been masked. The learning data generation unit 12 associates the generated mask video data with the mask score m score calculated in step Sa3 and outputs it to the learning processing unit 13 (step Sa9).
- the entire image of the athlete may be visible depending on the range and position of the mask area.
- a part of the image of the competitor is displayed in a state of being randomly hidden by a mask.
- the learning data generation unit 12 repeatedly performs the processes of steps Sa2 to Sa8 for all the video data (loop La1e). As a result, the learning data generation unit 12 is used by the learning processing unit 13 for learning processing based on the plurality of video data, the competitor area identification data corresponding to each of the plurality of video data, and the true value score. As the training data to be performed, a plurality of mask video data and a plurality of mask scores m score associated with each of the mask video data can be generated.
- the learning data generation unit 12 selects a predetermined ratio ( ⁇ ) from the values between 0 and 1 based on the uniform distribution, but other than the uniform distribution.
- a predetermined percentage ( ⁇ ) may be selected based on other distributions.
- the learning data generation unit 12 is limited to five values such as 0.0, 0.25, 0.5, 0.75, 1.0, and any one of the five values is set to a predetermined ratio. It may be randomly selected as ( ⁇ ), and a range to be selected, that is, a range of 0 to 1 is randomly selected from a plurality of fixed values specified by engraving with an arbitrary step size other than 0.25.
- the value may be a predetermined ratio ( ⁇ ).
- the learning data generation unit 12 randomly selects the horizontal length (W') or the vertical length (H') of the mask region based on the uniform distribution. However, it may be randomly selected based on a distribution other than the uniform distribution. Similar to the selection of a predetermined ratio ( ⁇ ), a value randomly selected from a plurality of fixed values specified by engraving the selected range with an arbitrary step size is used as the horizontal length (W'). Alternatively, the length in the vertical direction (H') may be used.
- step Sa7 the learning data generation unit 12 randomly selects the position of the mask region based on the uniform distribution, but randomly selects the position based on a distribution other than the uniform distribution. You may try to do it. Similar to the selection of a predetermined ratio ( ⁇ ), a value randomly selected from a plurality of fixed values specified by engraving the selected range with an arbitrary step size may be used as the position of the mask area.
- step Sa3 the learning data generation unit 12 calculates the mask score m score by the equation (1), but applies ⁇ to other functions, for example, variable parameters such as the sigmoid function. Then, the mask score m score may be calculated from the true score t score .
- the learning data generation unit 12 selects the average color of the image frame to be processed as the color that fills the mask area corresponding to the image frame. It is not limited to the form of.
- the learning data generation unit 12 may select the average color of all the image frames included in the video data to be processed as the color that fills all the mask areas corresponding to the video data.
- the learning data generation unit 12 may fill all the mask areas with the same color with an arbitrarily determined color to perform mask processing. Since it is better to make the color of the mask area inconspicuous, it is necessary to select an inconspicuous color according to the overall hue of each image frame. It is considered most effective to select the average color for each image frame.
- FIG. 6 is a flowchart showing the flow of learning processing performed by the learning processing unit 13.
- the learning processing unit 13 stores in advance the upper limit of the number of learning steps required for the coefficients of the function approximation device provided inside to sufficiently converge in the internal storage area.
- the learning model data storage unit 14 stores in advance the initial values of the coefficients applied to the function approximation device included in the learning processing unit 13.
- the learning processing unit 13 takes in a plurality of mask video data output by the learning data generation unit 12 and a plurality of mask scores m score associated with each of the plurality of mask video data.
- the learning processing unit 13 assigns a number indicating the order of processing to each of the combinations of the captured plurality of mask video data and the plurality of mask scores m score , and writes and stores the data in the internal storage area (step Sb1). ..
- the learning processing unit 13 generates an area for storing a variable n indicating the number of learning steps (hereinafter referred to as “learning step number n”) in the internal storage area, and writes “1” in the generated area (step Sb2). ..
- the learning processing unit 13 reads out the coefficients stored in the learning model data storage unit 14, and applies the read out coefficients to the function approximation device (step Sb3).
- the learning processing unit 13 reads out the mask video data in the first processing order and the mask score m score from the internal storage area.
- the learning processing unit 13 gives the read mask video data to the function approximation device as input data (step Sb4).
- the learning processing unit 13 calculates an error between the estimated score (hereinafter referred to as “estimated score y score ”) which is the output value of the function approximation device and the mask score m score read in step Sb4 (step Sb5).
- the learning processing unit 13 applies a loss function to the calculated error to calculate the loss.
- the learning processing unit 13 calculates a new coefficient of the function approximation device by a method such as an error back propagation method so as to reduce the calculated loss.
- the learning processing unit 13 writes the calculated new coefficient in the learning model data storage unit 14 and updates the coefficient (step Sb6).
- a function for calculating the estimated score y score and the L1 distance (Manhattan distance) of the mask score m score may be used, or the estimated score y score and the L2 distance (Euclidean distance) of the mask score m score may be used.
- a function for calculating the distance) may be used, or a function for calculating the sum of the L1 distance and the L2 distance may be used.
- the learning processing unit 13 reads the number of learning steps n from the internal storage area, and determines whether or not the read number of learning steps n matches the upper limit value stored in the internal storage area (step Sb7). When the learning processing unit 13 determines that the number of read learning steps n does not match the upper limit value (steps Sb7, No), 1 is added to the number of read learning steps n. The learning processing unit 13 writes the value of n + 1, which is an addition value, as a new number of learning steps n in the area of the number of learning steps n in the internal storage area (step Sb8), and performs the processing after step Sb3 again.
- the learning processing unit 13 reads out the coefficient updated in step Sb6 from the learning model data storage unit 14, and applies the read coefficient to the function approximation device.
- the learning processing unit 13 reads out the mask video data in the next processing order and the mask score m score , and gives the read mask video data to the function approximation device.
- the learning processing unit 13 performs the processing of steps Sb4 to Sb6 for the combination of all the mask video data and the mask score m score while the processing of steps Sb3 to Sb6 is repeated, the order is changed.
- the mask video data in the first processing order and the combination of the mask score m score are read out in order, and the processing of steps Sb4 to Sb6 is performed.
- the learning processing unit 13 determines that the read learning step number n matches the upper limit value (step Sb7, Yes)
- the learning processing unit 13 ends the processing.
- the learning model data storage unit 14 stores the learned coefficients that have been sufficiently converged, and the learned coefficients become the learning model data indicating the learned learning model.
- FIG. 6 shows an online learning method for updating the coefficient of the function approximation device for each combination of the mask video data and the mask score m score .
- Mini-batch learning may be performed to update the function approximation coefficient for each combination with m score , or batch learning to update the function approximation coefficient for each combination of all mask video data and the mask score m score . May be done.
- FIG. 7 is a diagram showing a configuration of DNN in the function approximation device 30 which is an example of the function approximation device included in the learning processing unit 13.
- the learning processing unit 13 resamples the mask video data into 96 frames, divides the 96 frames into 16 frames, and generates 6 divided mask video data.
- the function approximator 30 includes three-dimensional convolutional network layers 31-1 to 31-6, an average unit 32, and a score regression network layer 33. Each of the three-dimensional convolutional network layers 31-1 to 31-6 captures each of the divided mask video data divided into six. Each of the three-dimensional convolutional network layers 31-1 to 31-6 performs feature extraction from the divided mask video data captured by each, and outputs the feature amount of the divided mask video data captured by each.
- the averaging unit 32 averages and outputs the feature amounts of the divided mask video data output by each of the three-dimensional convolutional network layers 31-1 to 31-6.
- the score regression network layer 33 performs regression analysis based on the average of the feature amounts of the divided mask video data output by the averaging unit 32 and the mask score m score corresponding to the mask video data, and the feature amount of the divided mask video data. The relationship between the average of and the mask score m score is extracted.
- the coefficients stored in the learning model data storage unit 14 are applied to the three-dimensional convolutional network layers 31-1 to 31-6 and the score regression network layer 33.
- a shared coefficient, that is, the same coefficient is applied to each of the three-dimensional convolutional network layers 31-1 to 31-6.
- FIG. 8 is a block diagram showing the configuration of the estimation device 2 according to the embodiment of the present invention.
- the estimation device 2 includes an input unit 21, an estimation processing unit 22, and a learning model data storage unit 23.
- the learning model data storage unit 23 stores in advance the learned coefficients stored in the learning model data storage unit 14 of the learning device 1, that is, the learned learning model data.
- the input unit 21 takes in arbitrary video data, that is, video data in which a series of operations performed by an arbitrary competitor is recorded together with the background.
- the estimation processing unit 22 calculates an estimated score corresponding to the video data based on the arbitrary video data captured by the input unit 21 and the learned learning model data stored in the learning model data storage unit 23.
- the estimation processing unit 22 includes a function approximator having the same configuration as the learning processing unit 13.
- FIG. 9 is a flowchart showing the flow of processing by the estimation device 2.
- the input unit 21 captures arbitrary video data and outputs the captured video data to the estimation processing unit 22 (step Sc1).
- the estimation processing unit 22 takes in the video data output by the input unit 21.
- the estimation processing unit 22 reads the learned learning model data, that is, the learned coefficient from the learning model data storage unit 23, and applies the read learned coefficient to the function approximation device (step Sc2).
- the estimation processing unit 22 gives the captured video data to the function approximation device as input data (step Sc3).
- the estimation processing unit 22 outputs the output value of the function approximation device as an estimated score for the video data (step Sc4).
- the input unit 11 records the video data, the athlete area specifying data for specifying the area surrounding the player in each of the plurality of image frames included in the video data, and the video data.
- the true value score which is the evaluation value for the competition of the athlete, is taken in.
- the learning data generation unit 12 is a region of a part of an arbitrary position of the region indicated by the competitor region specific data corresponding to each image frame in each of the plurality of image frames included in the video data, and is for each video data.
- Masked video data is generated by masking an area of a predetermined ratio that is arbitrarily determined, and the true value score of each video data is weighted according to the predetermined ratio corresponding to the video data. Generate a mask score.
- the learning processing unit 13 generates learning model data showing the relationship between the mask video data corresponding to the video data and the mask score corresponding to the video data.
- learning model data showing the relationship between the mask video data corresponding to the video data and the mask score corresponding to the video data.
- the shape of the region indicated by the athlete area specific data is a rectangular shape, but the shape is not limited to the rectangular shape and may be a shape other than the rectangular shape.
- the true score is the score actually scored by the referee, but it may be a score scored by a standard other than the quantitative scoring standard adopted in the actual competition. ..
- the function approximation device included in the learning processing unit 13 of the learning device 1 and the estimation processing unit 22 included in the estimation device 2 of the above embodiment is, for example, a DNN, and the configuration shown in FIG. 7 is shown as an example.
- Neural networks other than DNN, or means by machine learning may be applied.
- the learning device 1 and the estimation device 2 may be integrated and configured.
- the device in which the learning device 1 and the estimation device 2 are integrated has a learning mode and an estimation mode.
- the learning mode is a mode in which the learning process by the learning device 1 is performed to generate a learned learning model. That is, in the learning mode, the device in which the learning device 1 and the estimation device 2 are integrated executes the process shown in FIG.
- the estimation mode is a mode in which an estimated score is output using a trained model. That is, in the estimation mode, the device in which the learning device 1 and the estimation device 2 are integrated executes the process shown in FIG.
- the learning device 1 and the estimation device 2 in the above-described embodiment may be realized by a computer.
- a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed.
- the term "computer system” as used herein includes hardware such as an OS and peripheral devices.
- the "computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, and a storage device such as a hard disk built in a computer system.
- a "computer-readable recording medium” is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that is a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).
- FPGA Field Programmable Gate Array
- 1 ... learning device, 11 ... input unit, 12 ... learning data generation unit, 13 ... learning processing unit, 14 ... learning model data storage unit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Physical Education & Sports Medicine (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
以下、本発明の実施形態について図面を参照して説明する。図1は、本発明の実施形態による学習装置1の構成を示すブロック図である。学習装置1は、入力部11、学習用データ生成部12、学習処理部13、及び学習モデルデータ記憶部14を備える。
図3は、学習用データ生成部12が行うマスク映像データとマスクスコアを生成する処理の流れを示すフローチャートである。学習用データ生成部12は、入力部11が出力する複数の映像データと、複数の映像データの各々に対応する競技者領域特定データ及び真値スコアとを取り込む(ステップSa1)。
図6は、学習処理部13が行う学習処理の流れを示すフローチャートである。学習処理部13は、内部に備える関数近似器の係数が充分に収束するのに必要な学習ステップ回数の上限値を予め内部の記憶領域に記憶させる。学習モデルデータ記憶部14は、学習処理部13が備える関数近似器に適用される係数の初期値を予め記憶する。
図8は、本発明の実施形態による推定装置2の構成を示すブロック図である。推定装置2は、入力部21、推定処理部22、及び学習モデルデータ記憶部23を備える。学習モデルデータ記憶部23は、学習装置1の学習モデルデータ記憶部14が記憶する学習済みの係数、すなわち学習済みの学習モデルデータを予め記憶する。入力部21は、任意の映像データ、すなわち任意の競技者が行う一連の動作を背景と共に記録した映像データを取り込む。
図9は、推定装置2による処理の流れを示すフローチャートである。入力部21は、任意の映像データを取り込み、取り込んだ映像データを推定処理部22に出力する(ステップSc1)。推定処理部22は、入力部21が出力する映像データを取り込む。推定処理部22は、学習モデルデータ記憶部23から学習済みの学習モデルデータ、すなわち学習済みの係数を読み出し、読み出した学習済みの係数を関数近似器に適用する(ステップSc2)。
Claims (7)
- 競技者の動作を記録した映像データに含まれる複数の画像フレームの各々における前記競技者を囲む領域の一部分を任意にマスクしたマスク映像データと、前記映像データに記録された前記競技者の競技に対する評価値である真値スコアに対して前記マスクした領域の割合に応じた重み付けをしたマスクスコアとの関係を示す学習モデルデータを生成する学習処理部
を備える学習装置。 - 前記学習処理部は、
関数近似器を有しており、前記マスク映像データを前記関数近似器に与えることにより出力値として得られる推定スコアが、当該マスク映像データに対応する前記マスクスコアに近づくように学習処理を行って前記関数近似器の係数である前記学習モデルデータを更新する、
請求項1に記載の学習装置。 - 前記映像データと、前記映像データに含まれる複数の画像フレームの各々における前記競技者を囲む領域を特定する競技者領域特定データと、前記映像データに対応する前記真値スコアとを取り込む入力部と、
前記映像データに含まれる複数の画像フレームの各々において、各々の前記画像フレームに対応する前記競技者領域特定データが示す領域の任意の位置の一部分の領域であって前記映像データごとに任意に定める所定の割合の大きさの領域をマスクして前記マスク映像データを生成し、前記映像データごとの前記真値スコアに対して、当該映像データに対応する前記所定の割合に応じた重み付けをして前記マスクスコアを生成する学習用データ生成部と、
を備える請求項1又は2に記載の学習装置。 - 前記学習用データ生成部は、
前記画像フレームの平均色で当該画像フレームに対応するマスク領域を塗りつぶしてマスクするか、または、前記映像データの平均色で当該映像データに対応する全てのマスク領域を塗りつぶしてマスクするか、または、任意に定める色で全てのマスク領域を同じ色に塗りつぶしてマスクする、
請求項3に記載の学習装置。 - 競技者の動作を記録した映像データを取り込む入力部と、
競技者の動作を記録した映像データに含まれる複数の画像フレームの各々における前記競技者を囲む領域の一部分を任意にマスクしたマスク映像データと、前記映像データに記録された前記競技者の競技に対する評価値である真値スコアに対して前記マスクした領域の割合に応じた重み付けをしたマスクスコアとの関係を示す学習モデルデータと、前記映像データとに基づいて、前記映像データに対応する推定スコアを算出する推定処理部と、
を備える推定装置。 - 競技者の動作を記録した映像データに含まれる複数の画像フレームの各々における前記競技者を囲む領域の一部分を任意にマスクしたマスク映像データと、前記映像データに記録された前記競技者の競技に対する評価値である真値スコアに対して前記マスクした領域の割合に応じた重み付けをしたマスクスコアとの関係を示す学習モデルデータを生成する学習方法。
- コンピュータに、
競技者の動作を記録した映像データに含まれる複数の画像フレームの各々における前記競技者を囲む領域の一部分を任意にマスクしたマスク映像データと、前記映像データに記録された前記競技者の競技に対する評価値である真値スコアに対して前記マスクした領域の割合に応じた重み付けをしたマスクスコアとの関係を示す学習モデルデータを生成させる手順、
を実行させるための学習プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/033425 WO2022049694A1 (ja) | 2020-09-03 | 2020-09-03 | 学習装置、推定装置、学習方法、及び学習プログラム |
US18/023,859 US20240024756A1 (en) | 2020-09-03 | 2020-09-03 | Learning device, estimation device, learning method, and learning program |
JP2022546795A JP7393701B2 (ja) | 2020-09-03 | 2020-09-03 | 学習装置、推定装置、学習方法、及び学習プログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2020/033425 WO2022049694A1 (ja) | 2020-09-03 | 2020-09-03 | 学習装置、推定装置、学習方法、及び学習プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022049694A1 true WO2022049694A1 (ja) | 2022-03-10 |
Family
ID=80491897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/033425 WO2022049694A1 (ja) | 2020-09-03 | 2020-09-03 | 学習装置、推定装置、学習方法、及び学習プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240024756A1 (ja) |
JP (1) | JP7393701B2 (ja) |
WO (1) | WO2022049694A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024023911A1 (ja) * | 2022-07-26 | 2024-02-01 | 日本電信電話株式会社 | 学習装置、学習方法及びプログラム |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004535129A (ja) * | 2001-07-10 | 2004-11-18 | ビスタス アンリミテッド、インコーポレイテッド | 目標エリアが画像ストリーム内に含まれる持続時間を算出する方法およびシステム |
US20090220124A1 (en) * | 2008-02-29 | 2009-09-03 | Fred Siegel | Automated scoring system for athletics |
US9230159B1 (en) * | 2013-12-09 | 2016-01-05 | Google Inc. | Action recognition and detection on videos |
JP2017537403A (ja) * | 2014-11-27 | 2017-12-14 | ノキア テクノロジーズ オサケユイチア | 超解像画像を生成するための方法、装置およびコンピュータ・プログラム・プロダクト |
WO2018122956A1 (ja) * | 2016-12-27 | 2018-07-05 | 日本電気株式会社 | スポーツ動作解析支援システム、方法およびプログラム |
-
2020
- 2020-09-03 JP JP2022546795A patent/JP7393701B2/ja active Active
- 2020-09-03 US US18/023,859 patent/US20240024756A1/en active Pending
- 2020-09-03 WO PCT/JP2020/033425 patent/WO2022049694A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004535129A (ja) * | 2001-07-10 | 2004-11-18 | ビスタス アンリミテッド、インコーポレイテッド | 目標エリアが画像ストリーム内に含まれる持続時間を算出する方法およびシステム |
US20090220124A1 (en) * | 2008-02-29 | 2009-09-03 | Fred Siegel | Automated scoring system for athletics |
US9230159B1 (en) * | 2013-12-09 | 2016-01-05 | Google Inc. | Action recognition and detection on videos |
JP2017537403A (ja) * | 2014-11-27 | 2017-12-14 | ノキア テクノロジーズ オサケユイチア | 超解像画像を生成するための方法、装置およびコンピュータ・プログラム・プロダクト |
WO2018122956A1 (ja) * | 2016-12-27 | 2018-07-05 | 日本電気株式会社 | スポーツ動作解析支援システム、方法およびプログラム |
Non-Patent Citations (1)
Title |
---|
SHIN JEONGEUN, SHINJI OZAWA: "ken-system: A Study on Motion Analysis of an Artistic Gymnastics by using Dynamic Image Processing -- for A Development of Automatic Scoring System of Horizontal Bar -", IEICE TECHNICAL REPORT, 15 May 2008 (2008-05-15), pages 13 - 18, XP055643227 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024023911A1 (ja) * | 2022-07-26 | 2024-02-01 | 日本電信電話株式会社 | 学習装置、学習方法及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20240024756A1 (en) | 2024-01-25 |
JP7393701B2 (ja) | 2023-12-07 |
JPWO2022049694A1 (ja) | 2022-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108283809B (zh) | 数据处理方法、装置、计算机设备和存储介质 | |
Kurach et al. | Google research football: A novel reinforcement learning environment | |
US11559738B2 (en) | Machine learned virtual gaming environment | |
Schrum et al. | Interactive evolution and exploration within latent level-design space of generative adversarial networks | |
Yoshida et al. | Application of Monte-Carlo tree search in a fighting game AI | |
Firoiu et al. | At human speed: Deep reinforcement learning with action delay | |
Green et al. | Mario level generation from mechanics using scene stitching | |
WO2022049694A1 (ja) | 学習装置、推定装置、学習方法、及び学習プログラム | |
Barros et al. | Balanced civilization map generation based on open data | |
US20230245366A1 (en) | 3d avatar generation using biomechanical analysis | |
CN113593013A (zh) | 基于vr逝者仿真的交互方法、系统、终端及vr设备 | |
CN105531003B (zh) | 模拟装置及模拟方法 | |
CN103561831A (zh) | 自动传感器驱动的比赛安排 | |
JP6103683B2 (ja) | ロールプレイングゲームの攻略法算出装置、算出方法、算出プログラム及びこのプログラムを記録した記録媒体 | |
US11596867B2 (en) | AI-based content generation for gaming applications | |
CN115944921B (zh) | 游戏数据处理方法、装置、设备及介质 | |
WO2022244135A1 (ja) | 学習装置、推定装置、学習モデルデータ生成方法、推定方法及びプログラム | |
JPWO2020110432A1 (ja) | 学習装置、前景領域推定装置、学習方法、前景領域推定方法、及び、プログラム | |
CN114220312B (zh) | 虚拟训练方法、装置以及虚拟训练系统 | |
CN105536251A (zh) | 一种基于用户体验质量波动模型的游戏任务自动构造方法 | |
JP2024503596A (ja) | イメージ・ソースからのボリュメトリック・ビデオ | |
Edwards et al. | Search-based exploration and diagnosis of TOAD-GAN | |
Anderson-Coto et al. | Fandom culture and identity in esports | |
JP2005078413A (ja) | 電子機器及び電子機器における応答情報出力方法 | |
Nam et al. | Procedural Content Generation of Super Mario Levels Considering Natural Connection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20952433 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022546795 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18023859 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20952433 Country of ref document: EP Kind code of ref document: A1 |