WO2024247080A1 - 学習装置、評価装置、学習方法、評価方法及びプログラム - Google Patents

学習装置、評価装置、学習方法、評価方法及びプログラム Download PDF

Info

Publication number
WO2024247080A1
WO2024247080A1 PCT/JP2023/020016 JP2023020016W WO2024247080A1 WO 2024247080 A1 WO2024247080 A1 WO 2024247080A1 JP 2023020016 W JP2023020016 W JP 2023020016W WO 2024247080 A1 WO2024247080 A1 WO 2024247080A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
evaluation
video data
data
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/020016
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
隆昌 永井
翔一郎 武田
健司 江崎
仁志 瀬下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2025523719A priority Critical patent/JPWO2024247080A1/ja
Priority to PCT/JP2023/020016 priority patent/WO2024247080A1/ja
Publication of WO2024247080A1 publication Critical patent/WO2024247080A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present invention relates to a learning device, an evaluation device, a learning method, an evaluation method, and a program.
  • the environments in which sports and medical procedures are performed are not necessarily the same.
  • the lighting may be different for an indoor movement
  • the weather may be different for an outdoor movement.
  • the footage shows not only the person being evaluated, but also these environmental differences.
  • the computer evaluation results may differ.
  • the accuracy of the computer evaluation may be poor. This situation is not limited to cases where the person being evaluated is a human, but is the same even when the person being evaluated is an animal, such as a dog or cat.
  • the present invention aims to provide a technology that improves the accuracy of motion evaluation.
  • One aspect of the present invention is a learning device that includes a control unit that learns a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and that obtains results according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data that indicates the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.
  • motion video data which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated
  • motion measurement data which is a time series of results obtained during the motion shown in the motion video data by a motion measurement
  • One aspect of the present invention is an evaluation device that includes an interface unit that acquires a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data, a control unit that performs learning of a learning target using motion video data that is video data showing the motion of an evaluation target that is a human or animal whose motion is to be evaluated, motion measurement data that is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target when performing the motion and that obtains a result according to the motion of the evaluation target, and ground truth data that indicates the result of the evaluation of the motion, and the learning target is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution unit that uses the learned mathematical model obtained by the learning device and the set acquired by the interface unit to evaluate the motion shown in the motion video data included in the set.
  • One aspect of the present invention is a learning method that includes a control step of learning a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and which obtains a result according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data indicating the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.
  • motion video data which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated
  • motion measurement data which is a time series of results obtained during the motion shown in the motion video data by a
  • One aspect of the present invention is an evaluation method including an interface step of acquiring a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data; a control step of performing learning of a learning target using motion video data, which is video data showing the motion of an evaluation target which is a person or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor which is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target during the motion, and which obtains a result according to the motion of the evaluation target, and ground truth data indicating the result of the evaluation of the motion, and the learning target is a mathematical model which evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution step of evaluating the motion shown in the motion video data included in the set using the learned mathematical model obtained by the learning method and the set obtained by the interface step.
  • One aspect of the present invention is a program for causing a computer to function as either or both of the above-mentioned learning device and the above-mentioned evaluation device.
  • the present invention makes it possible to provide technology that improves the accuracy of motion evaluation.
  • FIG. 1 is a diagram showing an example of the configuration of an evaluation system according to an embodiment.
  • FIG. 2 is a diagram showing an example of a hardware configuration of a learning device according to an embodiment.
  • FIG. 2 is a diagram showing an example of a hardware configuration of an evaluation apparatus according to an embodiment.
  • 1 is a flowchart showing an example of a flow of a process executed by a learning device in an embodiment.
  • 4 is a flowchart showing an example of a flow of a process executed by an evaluation device in the embodiment.
  • 13A to 13C are diagrams showing an example of a first modification process in the modified example.
  • FIG. 11 is an explanatory diagram illustrating an example of a process for assigning a noteworthy identifier in a modified example.
  • the evaluation system 100 includes a learning apparatus 1 and an evaluation apparatus 2.
  • the learning apparatus 1 includes a control unit 11 including a processor 91 such as a CPU (Central Processing Unit) and a memory 92 connected by a bus, and executes a program.
  • a processor 91 such as a CPU (Central Processing Unit)
  • a memory 92 connected by a bus, and executes a program.
  • the control unit 11 executes a learning process.
  • the learning process is a process for learning a mathematical model of the learning target.
  • the mathematical model of the learning target is a mathematical model (hereinafter referred to as an "evaluation model") that evaluates the movement of the evaluation target using a set of movement video data, movement measurement data, and ground truth data (hereinafter referred to as "learning data").
  • the motion video data is video data of a video showing the motion of a person or animal (hereafter referred to as the "evaluation subject") whose motion is to be evaluated.
  • the subject of evaluation may be any person or animal, so long as it is an object of evaluation of motion.
  • the subject of evaluation may be, for example, an athlete in a sport such as figure skating, surfing, or diving.
  • the motions evaluated by the evaluation model are the athlete's motions during the competition.
  • the subject of evaluation may be, for example, a medical intern.
  • the motions evaluated by the evaluation model are the motions during a medical procedure such as surgery.
  • the motion measurement data is a time series of results obtained by a motion measurement sensor during the motion captured in the motion video data.
  • the motion measurement sensor is a sensor attached to at least one of the body of the subject to be evaluated, something worn by the subject to be evaluated, and something used by the subject to be evaluated when performing the motion, and obtains results according to the motion of the subject to be evaluated.
  • the motion measurement sensor is, for example, an acceleration sensor.
  • the result obtained by the motion measurement sensor is acceleration. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of acceleration.
  • the motion measurement sensor is, for example, an angular velocity sensor. In such a case, the result obtained by the motion measurement sensor is angular velocity. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of angular velocity.
  • the motion measurement sensor may be, for example, an inertial measurement unit (IMU), or a sensor that obtains biosignals that indicate changes in response to the motion of the subject being evaluated, such as heart rate, brain waves, pulse, blood pressure, breathing, and sweating.
  • IMU inertial measurement unit
  • biosignals that indicate changes in response to the motion of the subject being evaluated, such as heart rate, brain waves, pulse, blood pressure, breathing, and sweating.
  • a sensor that obtains results according to the movements of the subject is attached to the body of the subject or to something worn by the subject, the results obtained by the sensor will be less affected by the environment, such as the intensity and color of lighting, than images, and will have a high correlation with the movements of the subject. Therefore, an evaluation based on not only movement video data but also movement measurement data will be more accurate than an evaluation based only on movement video data.
  • the motion evaluated by the evaluation model is a medical procedure
  • the movement of medical equipment used during the medical procedure such as a scalpel
  • the motion measurement sensor will obtain results that are highly correlated with the motion of the object being evaluated, even if the motion measurement sensor is not attached to the body of the object being evaluated. Again, this is a result that is less affected by the environment, such as the intensity and color of lighting, than video.
  • the motion measurement sensor may be attached to the collar, for example, or, if a tag is attached to the foot or the like, it may be attached to the tag.
  • Such motion measurement sensors may be sensors that obtain information on the movement of the evaluation subject in each of three linearly independent directions, such as forward/backward, left/right, and up/down, or may be sensors that obtain information in one or two dimensions.
  • the motion measurement data used by the evaluation model does not have to be the results obtained by a single motion measurement sensor.
  • the evaluation model may use motion measurement data obtained by multiple motion measurement sensors. In such a case, multiple motion measurement data are input to the evaluation model.
  • the multiple motion measurement sensors may be multiple types of motion measurement sensors that obtain different physical quantities, such as a motion measurement sensor that obtains acceleration and a motion measurement sensor that obtains bioinformation. In such a case, the evaluation model uses time series of multiple different types of physical quantities as motion measurement data for evaluation.
  • Correct answer data is data that indicates the results of the evaluation of the subject's movements.
  • control unit 11 updates the evaluation model based on the evaluation result of the evaluation model and the correct answer data so as to reduce the difference between the evaluation result of the evaluation model and the correct answer data.
  • the difference may be expressed, for example, by the L1 norm, the L2 norm, or the linear sum of the L1 norm and the L2 norm.
  • the learning process is executed until a predetermined end condition for learning (hereinafter referred to as the "learning end condition") is satisfied.
  • the learning end condition is, for example, a condition that the change in the learning object due to learning is smaller than a predetermined change.
  • the learning end condition may be, for example, a condition that the learning object has been updated a predetermined number of times or more.
  • the evaluation device 2 evaluates the movement of the evaluation subject using a trained evaluation model, which is the evaluation model at the time when the learning termination condition is satisfied.
  • the evaluation device 2 accepts input of a set of movement video data showing the movement of the evaluation subject and movement measurement data obtained during the movement shown in the movement video data (hereinafter referred to as "inference stage data").
  • the evaluation device 2 performs evaluation based on the accepted inference stage data.
  • the evaluation model may be of any type as long as it can evaluate the motion of the evaluation subject using motion video data, motion measurement data, and ground truth data.
  • the evaluation model includes, for example, a video data feature acquisition process and a measurement data feature acquisition process.
  • the video data feature acquisition process is a process for acquiring the feature of motion video data.
  • the measurement data feature acquisition process is a process for acquiring the feature of motion measurement data.
  • An evaluation model including a video data feature acquisition process and a measurement data feature acquisition process evaluates the motion of the evaluation subject based on the results of the video data feature acquisition process and the results of the measurement data feature acquisition process.
  • the feature may be obtained for each group (hereinafter referred to as "partial video data") of the action video data divided into a plurality of groups along the time axis. Therefore, in this case, the feature of the partial video data is obtained for each partial video data.
  • the video is motion video data consisting of P x Q frames (P is an integer equal to or greater than 1, and Q is an integer equal to or greater than 2)
  • the video is divided into partial video data for every P frames, for example, in the order of frame numbers, and features are acquired for each partial video data. Therefore, in such a case, the video data feature acquisition process obtains Q features from the motion video data.
  • video data division process the process of dividing the motion video data into partial video data (hereinafter referred to as the "video data division process”) may be performed at any time before the video data feature acquisition process is performed.
  • the video data division process is a process of dividing the motion video data along the time axis, for example, for each predetermined number of frames.
  • action video data that has been divided into partial video data is also a type of action video data. Therefore, the action video data input to the evaluation model may be action video data after video data division processing.
  • the video data division processing may be executed, for example, by the control unit 11, or may be executed by another device other than the learning device 1.
  • the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model.
  • the motion video data input to the evaluation model may be motion video data before the video data division process, and in such a case, when the video data division process is also executed, the evaluation model executes the video data division process.
  • the same situation applies to the measurement data feature acquisition process.
  • feature values may be obtained for each group (hereinafter referred to as "partial measurement data") after the motion measurement data, which is divided into a plurality of groups at a predetermined time width along the time axis, is divided. Therefore, in this case, feature values for each partial measurement data are obtained.
  • the number of feature values obtained is greater than that obtained when the motion measurement data is not divided into partial measurement data.
  • the process of dividing the motion measurement data into partial measurement data may be performed at any time before the measurement data feature acquisition process is performed.
  • the measurement data division process is, for example, a process of dividing the measurement video data, which is a type of time series, along the time axis for each predetermined time width.
  • the motion measurement data input to the evaluation model may be motion measurement data after measurement data division processing.
  • the measurement data division processing may be executed, for example, by the control unit 11, or may be executed by a device other than the learning device 1.
  • the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model.
  • the motion measurement data input to the evaluation model may be the motion measurement data before the measurement data division process, and in such a case, when the measurement data division process is also executed, the evaluation model executes the measurement data division process.
  • the evaluation model evaluates the motion of the evaluation object based on the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process.
  • the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process may be used in an integrated state to evaluate the motion of the evaluation object.
  • the results of the video data feature acquisition process and the measurement data feature acquisition process may be used in an unintegrated state to evaluate the motion of the evaluation subject.
  • Integration is the process of combining different data into one.
  • the data when the data is expressed as vectors, it is the process of obtaining the Cartesian product of the vectors.
  • the data when the data is expressed as matrices, it may be the process of concatenating matrices in the row or column direction.
  • one or more fully connected layers may be used, or the Transformer Encoder described in Reference 1 below may be used.
  • the number of Transformer Blocks may be any number, for example, two Transformer Blocks may be connected in a row.
  • ⁇ Example of hardware configuration> 2 is a diagram showing an example of a hardware configuration of the learning device 1 according to the embodiment.
  • the learning device 1 includes the control unit 11 and executes a program.
  • the learning device 1 functions as a device including the control unit 11, an interface unit 12, and a storage unit 13 by executing the program.
  • the processor 91 reads out a program stored in the storage unit 13 and stores the read out program in the memory 92.
  • the processor 91 executes the program stored in the memory 92, whereby the learning device 1 functions as a device including a control unit 11, an interface unit 12, and a storage unit 13.
  • the control unit 11 performs, for example, a learning process.
  • the control unit 11 performs, for example, a video data division process.
  • the control unit 11 performs, for example, a measurement data division process.
  • the control unit 11 controls, for example, the operation of each functional unit provided in the learning device 1.
  • the control unit 11 acquires learning data, for example, via the interface unit 12.
  • the control unit 11 acquires, for example, information stored in the memory unit 13.
  • the process of acquiring information stored in the memory unit 13 is specifically a read process.
  • the interface unit 12 includes a communication interface for connecting the learning device 1 to an external device.
  • the interface unit 12 communicates with the external device via wired or wireless communication.
  • the external device is, for example, a device that transmits learning data. In such a case, the interface unit 12 acquires learning data by communicating with the device that transmits the learning data.
  • the interface unit 12 obtains the motion video data after the video data division processing by the external device as motion video data including learning data.
  • the interface unit 12 obtains the motion measurement data after the measurement data division processing by the external device as motion measurement data including learning data.
  • the external device is, for example, the evaluation device 2.
  • the interface unit 12 transmits the trained evaluation model obtained by the learning process (i.e., the trained evaluation model) to the evaluation device 2 through communication with the evaluation device 2.
  • the interface unit 12 may be configured to include input devices such as a mouse, keyboard, touch panel, etc.
  • the interface unit 12 may be configured as an interface that connects these input devices to the learning device 1. In this way, the interface unit 12 accepts input of various information to the learning device 1 via the input device, either wired or wirelessly. Note that learning data does not necessarily need to be input to a communication interface, and may be input to an input device.
  • the interface unit 12 may output various types of information.
  • the interface unit 12 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, or an organic EL (Electro-Luminescence) display.
  • the interface unit 12 may be configured as an interface that connects these display devices to the learning device 1.
  • the interface unit 12 outputs information that has been input to the interface unit 12, for example.
  • the memory unit 13 is configured using a computer-readable storage medium device (non-transitory computer-readable recording medium) such as a magnetic hard disk device or a semiconductor storage device.
  • the memory unit 13 stores various information related to the learning device 1.
  • the memory unit 13 stores, for example, information necessary for the learning process.
  • the memory unit 13 stores, for example, an evaluation model in advance.
  • the memory unit 13 stores, for example, various information generated by the operation of the control unit 11.
  • the memory unit 13 stores, for example, a learned evaluation model.
  • the memory unit 13 stores, for example, information acquired by the interface unit 12.
  • FIG. 3 is a diagram showing an example of the hardware configuration of the evaluation device 2 in an embodiment.
  • the evaluation device 2 includes a control unit 21 (an example of a trained evaluation model execution unit) and executes a program. By executing the program, the evaluation device 2 functions as a device including the control unit 21, an interface unit 22, and a memory unit 23.
  • the processor 93 reads the program stored in the storage unit 23 and stores the read program in the memory 94.
  • the processor 93 executes the program stored in the memory 94, whereby the evaluation device 2 functions as a device including the control unit 21, the interface unit 22, and the storage unit 23.
  • the control unit 21 executes the learned evaluation model, for example, with the inference stage data as the execution target. This allows the control unit 21 to evaluate the actions depicted in the action video data included in the inference stage data. In other words, the control unit 21 uses the learned evaluation model and the inference stage data to evaluate the actions depicted in the action video data included in the inference stage data.
  • control unit 21 performs the video data splitting processing before executing the learned evaluation model. If the control unit 11 has performed measurement data splitting processing during the learning stage of the evaluation model but the measurement data splitting processing is not included in the evaluation model, the control unit 21 performs the measurement data splitting processing before executing the learned evaluation model.
  • the control unit 21 controls the operation of each functional unit of the evaluation device 2, for example.
  • the control unit 21 acquires various information acquired, for example, via the interface unit 22.
  • the control unit 21 acquires data on which the trained evaluation model is to be executed (i.e., inference stage data).
  • the control unit 21 acquires information stored in the memory unit 23, for example.
  • the process of acquiring information stored in the memory unit 23 is specifically a read process.
  • the interface unit 22 includes a communication interface for connecting the evaluation device 2 to an external device.
  • the interface unit 22 communicates with the external device via wired or wireless communication.
  • the external device is, for example, a device that transmits inference stage data.
  • the interface unit 22 acquires the inference stage data by communicating with the device that transmits the inference stage data.
  • the external device is, for example, the learning device 1.
  • the interface unit 22 acquires the trained evaluation model by communicating with the learning device 1.
  • the interface unit 22 obtains the motion video data after the video data division process by the external device as motion video data including inference stage data.
  • the evaluation device 2 also obtains the result of the video data division process executed by the external device.
  • the interface unit 22 obtains the motion measurement data after the measurement data division process by the external device as motion measurement data including inference stage data.
  • the evaluation device 2 also obtains the result of the measurement data division process executed by the external device.
  • the interface unit 22 includes input devices such as a mouse, keyboard, and touch panel.
  • the interface unit 22 may be configured as an interface that connects these input devices to the evaluation device 2. In this way, the interface unit 22 accepts input of various information to the evaluation device 2 via the input device, either wired or wirelessly.
  • the inference stage data does not necessarily need to be input to a communication interface, and may be input to an input device.
  • the interface unit 22 outputs various types of information.
  • the interface unit 22 includes a display device such as a CRT display, a liquid crystal display, or an organic EL display.
  • the interface unit 22 may be configured as an interface that connects these display devices to the evaluation device 2.
  • the interface unit 22 outputs, for example, information input to the interface unit 22.
  • the storage unit 23 is configured using a computer-readable storage medium device such as a magnetic hard disk device or a semiconductor storage device.
  • the storage unit 23 stores various information related to the evaluation device 2.
  • the storage unit 23 stores, for example, a trained evaluation model.
  • the storage unit 23 stores, for example, various information generated by the operation of the control unit 21.
  • the storage unit 23 stores, for example, information acquired by the interface unit 22.
  • FIG. 4 is a flowchart showing an example of the flow of processing executed by the learning device 1 in the embodiment.
  • the control unit 11 acquires learning data (step S101).
  • the control unit 11 executes a learning process using the acquired learning data until a learning end condition is satisfied (step S102).
  • the control unit 11 outputs the trained evaluation model to a predetermined output destination (step S103).
  • the predetermined output destination is, for example, the memory unit 13. Note that output to the memory unit 13 means writing to the memory unit 13.
  • the predetermined output destination may be, for example, an external device such as the evaluation device 2 that is communicatively connected via the interface unit 12.
  • FIG. 5 is a flowchart showing an example of the flow of processing executed by the evaluation device 2 in an embodiment. For ease of explanation, it is assumed in the example of the flowchart in FIG. 5 that the trained evaluation model has been stored in advance in the storage unit 23.
  • Inference stage data is input to the interface unit 22, and the control unit 21 acquires the inference stage data input to the interface unit 22 (step S201). Inputting inference stage data to the interface unit 22 means that the interface unit 22 acquires the inference stage data.
  • the control unit 21 executes the learned evaluation model on the inference stage data obtained in step S201 (step S202). By the processing of step S202, the action shown in the action video data included in the inference stage data obtained in step S201 is evaluated. After step S202, the control unit 21 outputs the evaluation result obtained in step S202 to a predetermined output destination (step S203).
  • the predetermined output destination is, for example, the memory unit 23. Note that output to the memory unit 23 means writing to the memory unit 23.
  • the predetermined output destination may be, for example, a predetermined external device connected for communication via the interface unit 22.
  • the learning device 1 configured in this manner learns an evaluation model based not only on movement video data but also on movement measurement data. Because the movement measurement data is obtained by a movement measurement sensor, it is less affected by the environment, such as lighting intensity and color, than video. Therefore, an evaluation model learned based not only on movement video data but also on movement measurement data can be evaluated with higher accuracy than when based only on movement video data. Therefore, the learning device 1 can improve the accuracy of movement evaluation.
  • the evaluation device 2 configured in this way evaluates the movement using the learned mathematical model obtained by the learning device 1. Therefore, the evaluation device 2 can improve the accuracy of the movement evaluation.
  • the evaluation system 100 configured in this manner also includes a learning device 1. As a result, the evaluation system 100 can improve the accuracy of the evaluation of actions.
  • the first transformation process is a process in which the dimensions of the feature amounts obtained in the video data feature amount acquisition process and the feature amounts obtained in the measurement data feature amount acquisition process are aligned via a linear layer, input to a transformer block via embedding processing, and evaluated through an estimator.
  • the estimator is, for example, a plurality of fully connected layers.
  • the processing of the Linear layer is a process for aligning the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process, so if the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process are originally the same, it does not necessarily have to be executed.
  • the structure of the Linear layer may be any, and may be, for example, a single fully connected layer.
  • the Transformer block handles inputs in parallel, it cannot take into account the chronological order of the inputs, or whether the input is the result of video data feature acquisition processing or measurement data feature acquisition processing. "Not being able to take into account” means that such information cannot be used. Therefore, when using the Transformer, information indicating the attributes of the features input to the Transformer must be provided by the embedding process to the Transformer block, which is the layer that executes the Transformer.
  • the information indicating the attribute of a feature includes, for example, information as to whether the feature is the result of a video data feature acquisition process or a measurement data feature acquisition process. For example, if the feature is the result of a video data feature acquisition process, the information indicating the attribute may include information indicating which partial video data the feature is a feature of. For example, if the feature is the result of a measurement data feature acquisition process, the information indicating the attribute may include information indicating which partial measurement data the feature is a feature of.
  • the embedding process obtains information indicating the attributes of the features used in the Transformer so that the Transformer process is based on information indicating the attributes of the features.
  • the embedding process in the first transformation process is, for example, sensor embedding.
  • Sensor embedding is a process that assigns an identifier to each feature input to the Transformer, identifying whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process.
  • the identifier can be any identifier as long as it can identify whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process.
  • the identifier may be, for example, "1" indicating that it is the result of the video data feature acquisition process, and "0" indicating that it is the result of the measurement data feature acquisition process. Also, instead of integers such as 1 or 0, it may be a decimal point such as 0.1, or it may not even be a number.
  • the identifier to be assigned is determined in advance based on the results of the video data feature acquisition process and the measurement data feature acquisition process.
  • Positional embedding is a process that assigns an identifier indicating which partial video data the feature is from when the feature is the result of a video data feature acquisition process, and assigns an identifier indicating which partial measurement data the feature is from when the feature is the result of a measurement data feature acquisition process.
  • the assigned identifier is, for example, an identifier that indicates the temporal order of the partial video data or partial measurement data.
  • FIG. 6 is a diagram showing an example of the first transformation process in the modified example.
  • one piece of motion video data and two pieces of motion measurement data are used.
  • one piece of motion video data is divided into partial video data, and both pieces of motion measurement data are divided into partial measurement data.
  • both the partial video data and the partial measurement data are expressed as "Clip”.
  • Position Embedding and Sensor Embedding are performed.
  • there is a Linear layer and the output of the Linear layer is input to a Transformer block together with the results of Position Embedding and Sensor Embedding.
  • evaluation is performed based on the results of the Transformer block.
  • video footage data and movement measurement data may contain movements that are not necessarily suitable for evaluating movements.
  • this data may include data on the period before the competition begins or the period after the competition ends when players are substituted.
  • the movements indicated by this data are not subject to evaluation, and may act as noise when evaluating movements.
  • the learning device 1 may therefore suppress the occurrence of such a situation by making the time series samples in the action video data and the action measurement data non-uniform.
  • the evaluation model also performs evaluation based on a focus identifier, which is an identifier that indicates that an influence on the evaluation of an action is stronger than others in the action video data and the action measurement data. In such a case, the evaluation model is capable of evaluation with higher accuracy.
  • the attention identifier is assigned to samples from a period that includes the time when a specified index showing the intensity of the movement is highest in the graph shown by the movement measurement data and that satisfies a specified condition.
  • the evaluation model evaluates samples that have been assigned an attention identifier by weighting them more heavily than samples that have not been assigned an attention identifier.
  • the predetermined condition may be any condition that specifies a period of time that includes the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data.
  • the predetermined condition is a condition that specifies each time span before and after the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data as a predetermined time span.
  • the evaluation model weights samples that have been assigned a focus identifier more heavily than samples that do not. Therefore, since the more intense the movement, the stronger the impact on the evaluation of the movement should be, the focus identifier is assigned to samples from the period that includes the time when a specified index showing the degree of movement intensity is highest in the graph shown by the movement measurement data, and that satisfies specified conditions. As a result, the evaluation model can perform evaluation with greater accuracy.
  • assigning a focus identifier to data means determining that the data being evaluated satisfies the content indicated by the focus identifier, and recording information indicating the evaluation result in a specified storage device from which an evaluation model, such as storage unit 13 or storage unit 23, can read information.
  • the process of making the time series samples in the action video data and the action measurement data non-uniform includes a process of dividing the action video data and the action measurement data into a plurality of Cilps, and in the process of dividing into Clips, the action video data and the action measurement data may be divided into a plurality of Cilps such that the number of Clips included in a period that includes the time when a predetermined index indicating the degree of intensity of the action is highest in the graph shown by the action measurement data and that satisfies a predetermined condition is greater than in other periods.
  • FIG. 7 is an explanatory diagram illustrating an example of the process of assigning an attention identifier in a modified example.
  • FIG. 7 shows three types of time series.
  • the three types of time series are IMU signals.
  • an attention identifier is assigned to motion video data and motion measurement data within a period that satisfies a predetermined attention period condition that includes at least the condition that the maximum differential coefficient time is included within the period.
  • the maximum derivative time is the time that gives the maximum derivative value in the graph of the movement measurement data. When moving, the value of the movement measurement data should change more drastically than when not moving.
  • the time of maximum derivative coefficient is an example of the time at which a specified index showing the degree of intensity of the movement is highest in the graph shown by the movement measurement data.
  • the set of samples that have been assigned a focus identifier are samples from the period indicated as "Technique".
  • the horizontal axis indicates time.
  • the "Preparation Period” is the period when the athlete is preparing.
  • a "Technique” is the period when the athlete is performing an action.
  • Post-Execution is the period when the athlete has finished performing an action and is not performing an action.
  • the process of assigning an attention identifier to the action video data and action measurement data within a period that satisfies the attention period condition based on the action measurement data is referred to as the attention identifier assignment process.
  • the attention period condition may be any condition including a condition that the maximum derivative time is included within the period.
  • the attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time.
  • the attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time, and is longer than a period that does not satisfy the attention period condition.
  • the attention period condition may be, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period may be a predetermined length that is at least 1/3 of the time from the start to the end of the image shown by the motion image data.
  • the image shown by the motion video data may show a preparatory period when the athlete is getting ready, a period when the athlete is performing, and a period after the athlete has finished performing.
  • the preparatory period when the athlete is getting ready, the period when the athlete is performing, and the period after the athlete has finished performing are all of the same duration, it is preferable to give the evaluation model a period centered on the maximum differential coefficient time and that is 1/3 the length of the video shown by the motion video data.
  • the attention period condition is, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period is a predetermined length that is at least 1/3 of the time from the start to the end of the video shown by the action video data.
  • the attention identifier assignment process When executed, it may be executed at any timing before the evaluation model determines the evaluation of the action.
  • the attention identifier assignment process is executed, for example, before the video data feature acquisition process and the measurement data feature acquisition process are executed.
  • the evaluation model acquires features by weighting data to which an attention identifier has been assigned more heavily than other data in the video data feature acquisition process and the measurement data feature acquisition process.
  • the attention identifier assignment process may be executed, for example, after the video data division process and the measurement data division process are executed. Note that, if the video data division process and the measurement data division process are executed, the attention identifier assignment process may be assigned to each partial video data and partial measurement data, thereby assigning an attention identifier to the sample.
  • the attention identifier assignment process is executed after the video data division process and the measurement data division process, for example, it may be executed after the video data feature acquisition process and the measurement data feature acquisition process.
  • the evaluation model may determine the evaluation by giving a large weight to the feature to which the attention identifier has been assigned among the obtained feature values.
  • the attention identifier assignment process may be executed by the control unit 11 during the learning stage, or may be executed by an external device when the video data division process and the measurement data division process are executed by the external device.
  • the control unit 11 obtains the results of the attention identifier assignment process together with the results of the division of the video data division process and the measurement data division process via the interface unit 12.
  • the evaluation device 2 also uses the result of the attention identifier assignment process.
  • the control unit 21 obtains the result of the attention identifier assignment process by, for example, the control unit 21 executing the attention identifier assignment process.
  • the control unit 21 may obtain the result of the attention identifier assignment process by having an external device execute the attention identifier assignment process and obtaining the result.
  • the control unit 21 acquires the result of the attention identifier assignment process by having the control unit 21 execute the attention identifier assignment process.
  • the control unit 21 may acquire the result of the attention identifier assignment process by having the external device execute the attention identifier assignment process and acquiring the result.
  • the learning device 1 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 11 may be executed in a distributed manner by the multiple information processing devices.
  • the evaluation device 2 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 21 may be executed in a distributed manner by the multiple information processing devices.
  • the learning device 1 and the evaluation device 2 do not necessarily need to be implemented in different housings, and may be integrated into a single housing.
  • the learning device 1 and the evaluation device 2 may be a computer that functions as both the learning device 1 and the evaluation device 2 by executing a program.
  • All or part of the functions of the learning device 1 and the evaluation device 2 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array).
  • the program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems.
  • the program may be transmitted via a telecommunications line.
  • the control unit 21 is an example of a trained evaluation model execution unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
PCT/JP2023/020016 2023-05-30 2023-05-30 学習装置、評価装置、学習方法、評価方法及びプログラム Ceased WO2024247080A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2025523719A JPWO2024247080A1 (https=) 2023-05-30 2023-05-30
PCT/JP2023/020016 WO2024247080A1 (ja) 2023-05-30 2023-05-30 学習装置、評価装置、学習方法、評価方法及びプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/020016 WO2024247080A1 (ja) 2023-05-30 2023-05-30 学習装置、評価装置、学習方法、評価方法及びプログラム

Publications (1)

Publication Number Publication Date
WO2024247080A1 true WO2024247080A1 (ja) 2024-12-05

Family

ID=93656963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/020016 Ceased WO2024247080A1 (ja) 2023-05-30 2023-05-30 学習装置、評価装置、学習方法、評価方法及びプログラム

Country Status (2)

Country Link
JP (1) JPWO2024247080A1 (https=)
WO (1) WO2024247080A1 (https=)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132785A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system for evaluating the quality of a surgical procedure from in-vivo video
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN110711374A (zh) * 2019-10-15 2020-01-21 石家庄铁道大学 多模态舞蹈动作评价方法
WO2022049700A1 (ja) * 2020-09-03 2022-03-10 日本電信電話株式会社 動作評価方法、コンピュータプログラム及び動作評価システム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132785A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system for evaluating the quality of a surgical procedure from in-vivo video
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN110711374A (zh) * 2019-10-15 2020-01-21 石家庄铁道大学 多模态舞蹈动作评价方法
WO2022049700A1 (ja) * 2020-09-03 2022-03-10 日本電信電話株式会社 動作評価方法、コンピュータプログラム及び動作評価システム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZENG, LING-AN ET AL.: "Hybrid Dynamic-Static Context-Aware Attention Network for Action Assessment in Long Videos", MM '20 : PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, pages 2526 - 2534, XP059453862, Retrieved from the Internet <URL:https://dl.acm.org/doi/abs/10.1145/3394171.3413560> [retrieved on 20230803], DOI: 10.1145/3394171.3413560 *

Also Published As

Publication number Publication date
JPWO2024247080A1 (https=) 2024-12-05

Similar Documents

Publication Publication Date Title
Zhang et al. Atari-head: Atari human eye-tracking and demonstration dataset
Romeas et al. Combining 3D-MOT with sport decision-making for perceptual-cognitive training in virtual reality
van Maarseveen et al. Perceptual-cognitive skill and the in situ performance of soccer players
US20210280083A1 (en) Identification and analysis of movement using sensor devices
JP5760139B1 (ja) ゲームプログラム、ゲーム制御方法、およびコンピュータ
Hosp et al. Soccer goalkeeper expertise identification based on eye movements
Brice et al. Comparing inertial measurement units and marker-based biomechanical models during dynamic rotation of the torso
Gou et al. Study on the correlation between basketball players’ multiple-object tracking ability and sports decision-making
CN112149602A (zh) 动作计数方法、装置、电子设备及存储介质
CN117015802B (zh) 用于改进无标记运动分析的方法
JP7747131B2 (ja) 情報処理装置、情報処理方法およびプログラム
JP7409390B2 (ja) 運動認識方法、運動認識プログラムおよび情報処理装置
Bačić et al. Computational intelligence for qualitative coaching diagnostics: Automated assessment of tennis swings to improve performance and safety
Tanaka et al. Estimating putting outcomes in golf: Experts have a better sense of distance
JP2019136493A (ja) 運動の採点方法、システム及びプログラム
Barnaveli et al. Hippocampal-entorhinal cognitive maps and cortical motor system represent action plans and their outcomes
Bańkosz et al. The application of statistical parametric mapping to evaluate differences in topspin backhand between Chinese and Polish female table tennis players
CN114296539B (zh) 方向预测方法、虚拟实境装置及非暂态计算机可读取媒体
Beck et al. From simple lab tasks to the virtual court: Bayesian integration in tennis
WO2024247080A1 (ja) 学習装置、評価装置、学習方法、評価方法及びプログラム
Li et al. [Retracted] Deep Learning Algorithm‐Based Target Detection and Fine Localization of Technical Features in Basketball
JP7310929B2 (ja) 運動メニュー評価装置、方法、及びプログラム
Van Biemen et al. How do referees visually explore? An in-situ examination of the referential head and eye movements of football referees
Jian et al. DL-shuttle: Badminton coaching training assistance system using deep learning approach
CN118968610A (zh) 依进攻球员投篮行为产生防守战术建议的系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23939559

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025523719

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025523719

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE