WO2021192149A1 - Information processing method, information processing device, and program - Google Patents

Information processing method, information processing device, and program Download PDF

Info

Publication number
WO2021192149A1
WO2021192149A1 PCT/JP2020/013705 JP2020013705W WO2021192149A1 WO 2021192149 A1 WO2021192149 A1 WO 2021192149A1 JP 2020013705 W JP2020013705 W JP 2020013705W WO 2021192149 A1 WO2021192149 A1 WO 2021192149A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
information
specific scene
scene
specific
Prior art date
Application number
PCT/JP2020/013705
Other languages
French (fr)
Japanese (ja)
Inventor
悠二 石村
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Priority to PCT/JP2020/013705 priority Critical patent/WO2021192149A1/en
Publication of WO2021192149A1 publication Critical patent/WO2021192149A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B69/00Training appliances or apparatus for special sports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the present invention relates to an information processing method, an information processing device and a program.
  • Posture estimation technology extracts multiple key points (if the target is a human, multiple feature points indicating shoulders, elbows, wrists, hips, knees, ankles, etc.) from the image of the target person or object, and the keys This is a technique for estimating the posture of the target based on the relative positions of the points.
  • Posture estimation technology is expected to be applied in a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction.
  • Japanese Unexamined Patent Publication No. 2013-138742 Japanese Unexamined Patent Publication No. 2012-066026 Japanese Unexamined Patent Publication No. 2019-0963328
  • this disclosure proposes an information processing method, an information processing device, and a program capable of efficiently grasping the operation of a target to be focused on and the analysis result thereof.
  • the movement of the target in the specific scene is analyzed, the movement of the target is paused in the specific scene, and the specific scene is described.
  • an information processing method executed by a computer which comprises displaying analysis information indicating an analysis result of the target in a scene together with a still image of the target in the specific scene.
  • an information processing device that implements this information processing method and a program that realizes this information processing method on a computer are provided.
  • FIG. 1 is a diagram showing an example of a motion analysis service CS using cloud computing.
  • the motion of the target is analyzed based on the video data.
  • the application AP for analysis is created using the software development kit SDK.
  • the user U downloads the application AP uploaded by the developer DV to the store STR and installs it on the client terminal 100.
  • a program that supplies posture information to the application AP is installed in the client terminal 100.
  • the client terminal 100 is an information processing device that analyzes the operation of the target using moving image data of the target.
  • the client terminal 100 extracts one or more frame images indicating a specific scene to be analyzed from the moving image data.
  • the client terminal 100 transmits one or more extracted frame images to the server 200.
  • the server 200 extracts the posture information of the target for each frame image from the extracted one or more frame images.
  • the client terminal 100 acquires the posture information of the target extracted for each frame image from one or more frame images by the server 200.
  • the application AP analyzes the movement of the target in a specific scene by using the posture information of the target acquired from the server 200.
  • the image processed by the server 200 is only the frame image of the specific scene. Therefore, the cost incurred when using the server 200 is reduced.
  • the movements before and after the specific scene are often shot so that the specific scene is surely included in the moving image data.
  • the moving image data before and after the specific scene does not contribute to the motion analysis. By omitting the image processing of the data area that does not contribute to the motion analysis, the time and cost required for the motion analysis can be reduced.
  • the motion analysis service CS can be applied to a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction.
  • the scene to be analyzed is appropriately defined according to the field to which the motion analysis service CS is applied, the purpose of analysis, and the like.
  • a specific motion scene according to a coaching target is defined as a specific scene.
  • the scene of functional recovery training is defined as a specific scene.
  • a scene in which a pedestrian is detected is defined as a specific scene.
  • a scene for detecting an abnormal posture state is defined as a specific scene.
  • the following describes an example in which the motion analysis service CS is applied to the field of learning support in sports.
  • FIG. 2 is a schematic view of the information processing system 1 of the first embodiment.
  • the information processing system 1 has, for example, a client terminal 100 and a server 200.
  • the client terminal 100 is, for example, an information terminal such as a smartphone, a tablet terminal, a notebook personal computer, and a desktop personal computer.
  • the client terminal 100 and the server 200 are connected via a network NW.
  • the client terminal 100 has, for example, a processing device 110, a storage device 120, a communication device 130, a camera 140, and a display device 150.
  • the processing device 110 includes, for example, a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.
  • the video acquisition unit 111 acquires, for example, the video data of the target TG shot by the camera 140.
  • the moving image includes, for example, a specific scene to be analyzed and a scene before and after the specific scene.
  • the camera 140 includes, for example, an image sensor such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Sensor).
  • the scene extraction unit 112 acquires the moving image data output from the moving image acquisition unit 111.
  • the scene extraction unit 112 extracts one or more frame images indicating a specific scene from the moving image data.
  • the number of frame images to be extracted is, for example, 1 or more and 10 or less.
  • the scene extraction unit 112 determines a specific scene based on, for example, the operation of the target TG. For example, the scene extraction unit 112 determines a specific scene by collating the operation characteristics of the target TG with the scene information stored in the storage device 120.
  • Information about a specific scene is stored in the storage device 120 as, for example, scene information 122.
  • scene information 122 for example, for each coaching target, one or more specific scenes to be analyzed and a determination condition for determining each specific scene are defined in association with each other.
  • dribbling, shooting and heading are defined as coaching targets.
  • the coaching target is a soccer shot
  • the timing of stepping on the axial foot for example, (i) the timing of the thigh of the foot toward the ball, (iii) the timing of impact, and (iv) impact.
  • the timing after the specified second later is defined as a specific scene.
  • the determination conditions for a specific scene are defined based on, for example, the angle of a specific joint, the relative position of the ball and a specific key point, and the like.
  • the scene extraction unit 112 extracts the attitude information of the target TG using, for example, the first analysis model 123 obtained by machine learning.
  • the first analysis model 123 is, for example, an analysis model having a lower attitude estimation accuracy than the analysis model (second analysis model 221) used when extracting posture information by the server 200.
  • the scene extraction unit 112 determines the operation of the target TG based on, for example, a change in the posture of the target TG.
  • the video data includes information on a series of operations including a plurality of specific scenes that occur in time series.
  • the scene extraction unit 112 determines which specific scene is occurring in the flow of the operation from individual viewpoints while considering the context before and after. For example, in the shooting operation, the specific scene of (i) above is first determined, and then each specific scene is determined in the order of (ii), (iii), and (iv) from the moving image data after (i). NS. Each specific scene is determined based on the body movement assumed for each specific scene.
  • the scene extraction unit 112 may, for example, operate the target TG when the target TG and a specific object (such as a ball in the case of soccer) have a predetermined positional relationship, or the target TG and the target TG.
  • a specific scene is determined based on a change in the positional relationship with a specific object. In this configuration, the specific scene is determined more accurately than when the specific scene is determined based only on the relative positional relationship between the skeletons.
  • a singular region where it is assumed that the foot does not move much when the foot is stepped on is defined based on the relative positional relationship with the ball.
  • the singular region is defined as, for example, an image region having a radius A ⁇ r (r is the radius of the ball; A is a number larger than 1) from the center of the ball.
  • the scene extraction unit 112 extracts a frame image in which the distance between the foot and the ball is within the threshold value as a reference frame image.
  • the scene extraction unit 112 extracts N frame images up to the reference frame image from the frame image traced back by (N-1) frames from the reference frame image (N is an integer of 1 or more).
  • the scene extraction unit 112 extracts a skeleton region in which the skeleton of the ankle of the target TG fits in each of the N frame images.
  • the scene extraction unit 112 extracts a skeletal motion region in which all N skeletal regions are contained.
  • the scene extraction unit 112 determines that the axial foot has been stepped on when the size of the skeletal movement region is within the threshold value and the skeletal movement region is included in the singular region.
  • the scene extraction unit 112 extracts one or more frame images indicating the timing at which the shaft foot is stepped on from the moving image data.
  • the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (ii).
  • the scene extraction unit 112 determines, for example, the timing at which the extension line of the foot detected as the axial foot passes through the ball as the specific scene of the above (ii).
  • the determination of the specific scene of the above (ii) is performed on the moving image data after the specific scene of the above (i). Considering the context before and after in the flow of the operation, it is considered that the specific scene of (ii) above occurs immediately after the specific scene of (i) above.
  • the scene extraction unit 112 determines that the scene is the specific scene of the above (ii), and extracts one or more frame images indicating the specific scene from the moving image data.
  • the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iii). For example, the scene extraction unit 112 determines that the timing at which the distance between the center of the waist and the center of the ball is reduced and then the distance is expanded at a speed greater than the speed at which the distance is reduced is defined as the specific scene of the above (iii).
  • the distance between the center of the waist and the center of the ball tends to shrink until just before that, but when the ball is impacted, the distance begins to widen at a speed much higher than the speed at which it shrinks.
  • the scene extraction unit 112 calculates the distance between the center of the hipbone and the center of the ball in each frame image, and the value obtained by dividing the difference in distance between the frames by the diameter of the ball exceeds the threshold value. Occasionally, it is determined that the mode of change in distance is reversed. The scene extraction unit 112 determines that the scene immediately before the aspect of the change in distance is reversed is the specific scene of the above (iii).
  • the determination of the specific scene of the above (iii) is performed on the moving image data after the specific scene of the above (iii). Considering the context before and after in the flow of the operation, it is considered that the specific scene of the above (iii) occurs immediately after the specific scene of the above (iii). Therefore, if the above-mentioned change in distance occurs within a predetermined time immediately after the specific scene of (iii), it is highly possible that the scene is the specific scene of (iii). Therefore, the scene extraction unit 112 determines that the scene is the specific scene of the above (iii), and extracts one or more frame images indicating the specific scene from the moving image data.
  • the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iv).
  • the frame image of the specific scene in (iv) above is used to analyze the posture after shooting.
  • the specific scene of (iv) is defined as a scene after a predetermined time has elapsed after the specific scene of (iii). The time it takes for a posture suitable for analysis to be detected depends on the rush into the ball and the speed of motion. Therefore, how much time has passed since the shooting is determined as the specific scene of the above (iv) differs for each target TG.
  • the scene extraction unit 112 sets the timing at which a predetermined number of times the frame time from the specific scene of the above (iii) to the specific scene of the above (iii) elapses. It is determined that it is a specific scene of iv).
  • Posture estimation accuracy varies depending on the scale of the neural network used in the analysis model.
  • many key points are extracted from the image data, and various operations of the target TG are estimated with high accuracy. Even if there is a lack of information due to occlusion or the like, the key points of the target TG are accurately extracted.
  • As a method of increasing the scale of the neural network there are a method of increasing the feature map (channel) and a method of deepening the layer (layer). In either method, the processing amount of the convolution operation increases and the calculation speed decreases. There is a trade-off between posture estimation accuracy and calculation speed.
  • the scene extraction unit 112 extracts the attitude information of the target TG from all the frame images constituting the moving image data by using, for example, the first analysis model 123 with a small scale of the neural network and a low precision and low calculation amount. If only the operation scene of the target TG is to be determined, it suffices if the rough operation of the target TG can be grasped. Even if there is a lack of information due to occlusion, etc., the characteristics of the movement can be grasped by a rough change in posture. Therefore, the operation scene of the target TG can be determined even by using the first analysis model 123 with low accuracy and low calculation amount. When the first analysis model 123 is used, since the processing amount of the convolution calculation for each frame image is small, rapid processing is possible even if the moving image data is large.
  • Data of one or more frame images indicating a specific scene is transmitted to the server 200 via the communication device 130.
  • the motion analysis unit 113 acquires the attitude information of the target TG extracted for each frame image from the one or more frame images by the server 200.
  • the motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information of the target TG acquired from the server 200 (posture information of the target TG in the specific scene extracted from the moving image data).
  • the motion analysis unit 113 outputs the analysis result to the output unit 114.
  • the output unit 114 notifies the user U of one or more analysis information MAI indicating the analysis result of the motion analysis unit 113, for example.
  • the analysis information MAI includes, for example, information on the evaluation result based on the comparison with the movement of the specific person RM as a model of the movement.
  • Information on the operation of the specific person RM is stored in the storage device 120 as role model information 121.
  • the output unit 114 displays, for example, a still image IM including the analysis information MAI on the display device 150.
  • the display device 150 is a display unit of the client terminal 100 that displays various information.
  • the display device 150 is, for example, an LCD (Liquid Crystal Display) or an OLED (Organic Light Emitting Diode).
  • the method of motion analysis by the motion analysis unit 113 and the notification mode of the analysis information MAI notified by the output unit 114 follow the algorithm of the application AP (see FIG. 1).
  • FIG. 3 is a diagram showing an example of analysis information MAI.
  • the output unit 114 notifies, for example, the first analysis information MAI1 and the second analysis information MAI2 as one or more analysis information MAIs.
  • the first analysis information MAI1 includes, for example, information indicating a comparison between the operation of the target TG and the operation of the specific person RM as a model in a specific scene.
  • the second analysis information MAI2 includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM.
  • the skeleton information of the specific person RM in the specific scene is included in, for example, the role model information 121. In the example of FIG. 3, a comment that "the kick foot is swung up high" and an evaluation point of "86 points" are shown. The evaluation points indicate the degree of achievement of the evaluation items set in the specific scene.
  • the first analysis information MAI1 includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison.
  • the specific scene is the timing of stepping on the shaft foot.
  • one or more reference skeleton information RSIs for example, the first reference skeleton information RSI1, the second reference skeleton information RSI2, and the third reference skeleton information RSI3 are displayed.
  • the first reference skeleton information RSI1 is, for example, skeleton information of an operation that serves as a model.
  • the second reference skeleton information RSI2 is, for example, skeleton information of a specific level (for example, a level of 80 points when the model is 100 points) that is less than the model.
  • the first reference skeleton information RSI1 and the second reference skeleton information RSI3 are model skeleton information at the timing when the position of the waist coincides with the target TG.
  • the third reference skeleton information RSI3 is, for example, model skeleton information at the timing when the position of the axial foot coincides with the target TG.
  • the third reference skeleton information RSI3 is always displayed in conjunction with the movement of the target TG during a series of movements from the stepping on the shaft foot to immediately after the impact.
  • the third reference skeletal information RSI3 is used to compare a series of movements from the stepping on the shaft foot to immediately after the impact with the target TG. Therefore, the third reference skeleton information RSI3 is different from the first reference skeleton information RSI1 and the second reference skeleton information RSI2, and shows the skeleton information of the whole body.
  • the time required for a series of operations differs between the specific person RM and the target TG. Therefore, effective timings for making comparisons (for example, impact timings or stepping timings) are defined, and the third reference skeleton information RSI3 is superimposed on the target TG so that the defined timings match.
  • the timing of stepping is matched, but which timing should be aligned is appropriately set according to the purpose of the lesson and the like.
  • the output unit 114 offsets and displays the position of the third reference skeleton information RSI3 so that the position of the ankle of the target TG and the position of the ankle of the specific person RM match at a defined timing. .. This makes it easier to understand how different the stepping position of the target TG and the specific person RM is.
  • skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI.
  • the waist and leg skeleton information is selectively displayed as the skeleton information SI and one or more reference skeleton information RSI.
  • One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.
  • the scale of the reference skeleton information RSI is set as follows, for example. First, one or more bones suitable for comparing the physiques of a specific person RM and a target TG are defined. For example, in the example of FIG. 3, the spine and the bones of the foot are defined as criteria for comparison.
  • the motion analysis unit 113 detects, for example, the lengths of the spine and the leg bones at the timing when the postures of the specific person RM and the target TG are aligned.
  • the motion analysis unit 113 calculates the ratio of the sum of the lengths of the spine and the bones of the foot as the ratio of the body size of the specific person RM and the target TG, and changes the scale of the skeleton of the specific person RM based on this ratio. .. This facilitates comparison with the specific person RM and makes it easier to understand how the target TG should behave.
  • FIG. 4 is a diagram showing an example of the notification mode of the analysis information MAI.
  • the analysis information MAI is displayed superimposed on the frame image indicating the specific scene, for example, when the specific scene is reproduced.
  • the display device 150 pauses the reproduction of the moving image data in a specific scene. Then, the display device 150 displays a still image IM in which the analysis information MAI indicating the analysis result of the target TG in the specific scene is superimposed on the frame image of the specific scene.
  • the reproduction of the moving image data is paused for each specific scene, and the analysis information MAI in the specific scene is notified.
  • slow motion reproduction may be used so that the posture of the target TG can be easily confirmed. Further, slow motion playback may be applied only to playback of video data in a section from the first specific scene to the last specific scene, and the video data before and after that section may be played back at a normal playback speed. ..
  • FIG. 4 shows an example in which three specific scenes A1 to A3 are set.
  • the specific scene A1 is, for example, the timing of stepping on the shaft foot.
  • the specific scene A2 is, for example, the timing of impact.
  • the specific scene A3 is, for example, a timing immediately after the impact (a few seconds after the impact).
  • the display device 150 reproduces the moving image of the target TG based on the reproduction operation for the client terminal 100.
  • the display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A1 is reproduced.
  • the display device 150 displays the still image IM (first still image IM1) in which the analysis information MAI of the operation of the target TG in the specific scene A1 is superimposed on the frame image of the specific scene A1.
  • the display device 150 starts playing the moving image after the specific scene A1 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the first still image IM1.
  • the display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A2 is reproduced. Then, the display device 150 displays the still image IM (second still image IM2) in which the analysis information MAI of the operation of the target TG in the specific scene A2 is superimposed on the frame image of the specific scene A2. After that, the display device 150 starts playing the moving image after the specific scene A2 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the second still image IM2.
  • the display device 150 starts playing the moving image after the specific scene A2 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the second still image IM2.
  • the display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A3 is reproduced. Then, the display device 150 displays the still image IM (third still image IM3) in which the analysis information MAI of the operation of the target TG in the specific scene A3 is superimposed on the frame image of the specific scene A3. After that, the display device 150 starts playing the moving image after the specific scene A3 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the third still image IM3. As a result, the display device 150 can pause the movement of the target TG in the specific scene and display the analysis information indicating the analysis result of the target TG in the specific scene together with the still image of the target TG in the specific scene.
  • the display device 150 can pause the movement of the target TG in the specific scene and display the analysis information indicating the analysis result of the target TG in the specific scene together with the still image of the target TG in the specific scene.
  • the display device 150 plays the remaining moving images to the end.
  • the client terminal 100 may generate new moving image data (corrected moving image data) incorporating the analysis information MAI, and the generated modified moving image data may be reproduced on the display device 150.
  • analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data.
  • the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.
  • FIG. 5 is a diagram showing another example of the notification mode of the analysis information MAI.
  • FIG. 5 shows an example in which the motion analysis service CS is applied to golf learning support.
  • FIG. 5 shows an example in which six specific scenes are set.
  • the backswing timing, the downswing timing, the timing immediately before the impact, the impact timing, the timing immediately after the impact, and the follow-through timing are set as specific scenes.
  • the moving image data is paused at the timing when each specific scene is reproduced, and the analysis information MAI is superimposed and displayed.
  • the analysis information MAI of the past specific scene is not erased and continues to be displayed on the display device 150 as it is.
  • model skeletal information is not displayed.
  • the analysis information MAI of the specific scene even after the notification of the analysis information MAI of the specific scene was completed, the analysis information MAI was not erased and continued to be displayed on the screen.
  • the display mode of the analysis information MAI is not limited to this.
  • the analysis information MAI is deleted once until the next specific scene is displayed, and when the last specific scene is played back, or the analysis information MAI of all the specific scenes is displayed.
  • the analysis information MAI of all specific scenes may be redisplayed together.
  • the communication device 130 is a communication unit of the client terminal 100 that transmits and receives various data to and from an external device. For example, the communication device 130 transmits one or more frame images extracted by the scene extraction unit 112 to the server 200. The communication device 130 acquires the attitude information of the target TG extracted by the server 200 for each frame image from the one or more frame images from the server 200.
  • the storage device 120 stores, for example, the program 124 executed by the processing device 110, the application AP, the first analysis model 123, the role model information 121, and the scene information 122.
  • the program 124 and the application AP are programs that cause a computer to execute information processing according to the present embodiment.
  • the processing device 110 performs various processes according to the program 124 and the application AP stored in the storage device 120.
  • the storage device 120 may be used as a work area for temporarily storing the processing result of the processing device 110.
  • the storage device 120 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium.
  • the storage device 120 includes, for example, an optical disk, a magneto-optical disk, or a flash memory.
  • Program 124 is stored, for example, in a non-transient storage medium that can be read by a computer.
  • the processing device 110 is, for example, a computer composed of a processor and a memory.
  • the memory of the processing device 110 includes a RAM (Random Access Memory) and a ROM (Read Only Memory).
  • the processing device 110 functions as the moving image acquisition unit 111 and the scene extraction unit 112 by executing the program 124.
  • the processing device 110 functions as the motion analysis unit 113 and the output unit 114 by executing the application AP.
  • the server 200 has, for example, a processing device 210, a storage device 220, and a communication device 230.
  • the processing device 210 has a posture information extraction unit 211.
  • the posture information extraction unit 211 acquires one or more frame images indicating a specific scene transmitted from the client terminal 100 via the communication device 230.
  • the posture information extraction unit 211 uses the second analysis model 221 obtained by machine learning to extract the posture information of the target TG for each frame image from one or more frame images showing a specific scene.
  • the second analysis model 221 is an analysis model having higher posture estimation accuracy than the analysis model (first analysis model 123) used when the scene extraction unit 112 determines a specific scene.
  • the attitude information extraction unit 211 extracts the attitude information of the target TG from a specific one or more frame images by using, for example, the second analysis model 221 with a large scale of the neural network and a high precision and high calculation amount.
  • the target of the posture estimation process by the posture information extraction unit 211 is only a specific one or more frame images selected from a plurality of frame images constituting the moving image data. Therefore, even if the processing amount of the convolution operation for each frame image is large, rapid processing is possible.
  • the storage device 220 stores, for example, the program 222 executed by the processing device 210 and the second analysis model 221.
  • the program 222 is a program that causes a computer to execute information processing according to the present embodiment.
  • the processing device 210 performs various processes according to the program 222 stored in the storage device 220.
  • the storage device 220 may be used as a work area for temporarily storing the processing result of the processing device 210.
  • the storage device 220 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium.
  • the storage device 220 includes, for example, an optical disk, a magneto-optical disk, or a flash memory.
  • Program 222 is stored, for example, in a non-transient storage medium that can be read by a computer.
  • the processing device 210 is, for example, a computer composed of a processor and a memory.
  • the memory of the processing device 210 includes a RAM and a ROM.
  • the processing device 210 functions as the posture information extraction unit 211 by executing the program 222.
  • the communication device 130 of the client terminal 100 and the communication device 230 of the server 200 are connected to a network NW such as the Internet.
  • the client terminal 100 and the server 200 transmit and receive data via the network NW.
  • a known method is adopted as the communication method of the communication device 130 and the communication device 230.
  • step S1 of FIG. 8 the client terminal 100 shoots a moving image of the target TG.
  • the moving image data MD is composed of a plurality of frame images arranged in chronological order.
  • the moving image includes a specific scene to be analyzed and scenes before and after the specific scene.
  • step S2 of FIG. 8 the client terminal 100 extracts one or more frame image FIs (specific frame image SFIs) indicating a specific scene from the moving image data MD.
  • the determination of the specific scene is performed based on, for example, the operation of the target TG.
  • the operation of the target TG is, for example, the attitude information LPI (first) of the target TG extracted from the all-frame image FI of the moving image data MD using the first analysis model 123 with low accuracy and low calculation amount. It is estimated based on the information indicating the low-precision attitude estimation result by the analysis model 123).
  • step S3 of FIG. 8 the server 200 extracts the attitude information HPI of the target TG for each frame image FI from the extracted one or more frame image FIs (specific frame image SFIs).
  • the attitude information HPI of the target TG is extracted only from one or more specific frame image SFIs using, for example, a second analysis model 221 with high accuracy and high computational complexity.
  • step S4 of FIG. 8 the client terminal 100 performs motion analysis of the target TG based on the extracted posture information HPI (information indicating the high-precision posture estimation result by the second analysis model 221).
  • step S5 of FIG. 8 the client terminal 100 notifies the user U of the analysis information MAI indicating the analysis result.
  • the analytical information MAI is notified, for example, by a combination of letters, charts and sounds.
  • the client terminal 100 has a motion analysis unit 113 and a display device 150.
  • the motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information HPI of the target TG in the specific scene extracted from the moving image data MD.
  • the display device 150 pauses the movement of the target TG in the specific scene, and displays the analysis information MAI indicating the analysis result of the target TG in the specific scene together with the still image IM of the target TG in the specific scene.
  • the computer is made to execute the information processing of the client terminal 100 described above.
  • the program 124 of the present embodiment makes the computer realize the above-mentioned information processing method.
  • the analysis result is provided in a form linked to the playback scene of the moving image. Therefore, the operation of the target TG to be noted and the analysis result thereof can be efficiently grasped.
  • the analysis information MAI includes, for example, information indicating a comparison between the operation of the target TG in the specific scene and the operation of the specific person RM as a model.
  • the analysis information MAI includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison.
  • skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI.
  • One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.
  • the analysis information MAI includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM.
  • FIG. 9 is a schematic view of the information processing system 2 of the second embodiment.
  • the difference between the first embodiment and the first embodiment is that the function of performing motion analysis is realized by the server 300.
  • the differences from the first embodiment will be mainly described.
  • the client terminal 400 has a communication device 430, a camera 140, and a display device 150.
  • the communication device 430 is connected to the network NW.
  • the client terminal 400 and the server 300 transmit and receive data via the network NW.
  • the processing device 310 of the server 300 has the moving image acquisition unit 111, the scene extraction unit 112, the motion analysis unit 113, and the output unit 114.
  • the role model information 121, the scene information 122, the first analysis model 123, the program 124, and the application AP are stored in the storage device 320 of the server 300.
  • the processing device 310 By executing the program 124, the processing device 310 functions as a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.
  • the motion analysis unit 113 generates, for example, new video data (corrected video data) incorporating the analysis information MAI.
  • the analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data.
  • the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.
  • the motion analysis unit 113 transmits the modified moving image data to the client terminal 400 via the output unit 114 and the communication device 130.
  • the client terminal 400 causes the display device 150 to display the modified moving image data.
  • the movement of the target in the specific scene is analyzed.
  • Information processing executed by a computer which comprises suspending the movement of the target in the specific scene and displaying analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
  • Method. (2) The information processing method according to (1) above, wherein the analysis information includes information indicating a comparison between the movement of the target and the movement of a specific person as a model in the specific scene.
  • the analysis information includes the skeleton information of the target in the specific scene and one or more reference skeleton information as a reference for the comparison.
  • the skeleton information corresponding to the target site to be analyzed in the specific scene is selectively displayed as the target skeleton information and the one or more reference skeleton information (3).
  • Information processing method described in. (5) The one or more reference skeleton information is generated by using the skeleton information obtained by modifying the skeleton information of the specific person in the specific scene based on the physical disparity between the target and the specific person.
  • a motion analysis unit that analyzes the motion of the target in the specific scene based on the posture information of the target in the specific scene extracted from the video data.
  • a display device that pauses the movement of the target in the specific scene and displays analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
  • Information processing device with.
  • a program that suspends the movement of the target in the specific scene and causes a computer to display analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Image Analysis (AREA)

Abstract

This information processing method executes, on a computer: a process in which the movement of a target in a specific scene extracted from video data is analyzed on the basis of posture information about the target in the specific scene; and a process in which the movement of the target in the specific scene is paused, and analysis information indicating an analysis result for the target in the specific scene id displayed together with a still image of the target in the specific scene.

Description

情報処理方法、情報処理装置およびプログラムInformation processing methods, information processing devices and programs
 本発明は、情報処理方法、情報処理装置およびプログラムに関する。 The present invention relates to an information processing method, an information processing device and a program.
 近年、姿勢推定技術を用いた動作分析の手法が提案されている(例えば、特許文献1ないし3を参照)。姿勢推定技術は、ターゲットとなる人物または物の画像から複数のキーポイント(ターゲットが人間であれば、肩・肘・手首・腰・膝・足首などを示す複数の特徴点)を抽出し、キーポイント同士の相対位置に基づいてターゲットの姿勢を推定する技術である。姿勢推定技術は、スポーツにおける学習支援、ヘルスケア、自動運転および危険予知などの広範な分野で応用が期待されている。 In recent years, a method of motion analysis using a posture estimation technique has been proposed (see, for example, Patent Documents 1 to 3). Posture estimation technology extracts multiple key points (if the target is a human, multiple feature points indicating shoulders, elbows, wrists, hips, knees, ankles, etc.) from the image of the target person or object, and the keys This is a technique for estimating the posture of the target based on the relative positions of the points. Posture estimation technology is expected to be applied in a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction.
特開2013-138742号公報Japanese Unexamined Patent Publication No. 2013-138742 特開2012-066026号公報Japanese Unexamined Patent Publication No. 2012-066026 特開2019-096328号公報Japanese Unexamined Patent Publication No. 2019-0963328
 ターゲットの動画を用いて動作分析を行う場合、分析が必要な特定シーンが確実に動画データに含まれるように、特定シーンの前後の動作まで撮影される場合が多い。上記の従来技術では、分析結果は動画の再生シーンとリンクした態様で提供されない。そのため、着目すべきターゲットの動作およびその分析結果を効率よく把握することが難しい。 When performing motion analysis using the target video, it is often the case that the motion before and after the specific scene is shot so that the specific scene that needs to be analyzed is surely included in the video data. In the above prior art, the analysis result is not provided in a manner linked to the playback scene of the moving image. Therefore, it is difficult to efficiently grasp the movement of the target to be focused on and the analysis result thereof.
 そこで、本開示では、着目すべきターゲットの動作およびその分析結果を効率よく把握することが可能な情報処理方法、情報処理装置およびプログラムを提案する。 Therefore, this disclosure proposes an information processing method, an information processing device, and a program capable of efficiently grasping the operation of a target to be focused on and the analysis result thereof.
 本開示によれば、動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析し、前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示することを有する、コンピュータにより実行される情報処理方法が提供される。また、本開示によれば、この情報処理方法を実施する情報処理装置、および、この情報処理方法をコンピュータに実現させるプログラムが提供される。 According to the present disclosure, based on the attitude information of the target in the specific scene extracted from the moving image data, the movement of the target in the specific scene is analyzed, the movement of the target is paused in the specific scene, and the specific scene is described. Provided is an information processing method executed by a computer, which comprises displaying analysis information indicating an analysis result of the target in a scene together with a still image of the target in the specific scene. Further, according to the present disclosure, an information processing device that implements this information processing method and a program that realizes this information processing method on a computer are provided.
クラウドコンピューティングを用いた動作分析サービスの一例を示す図である。It is a figure which shows an example of the motion analysis service using cloud computing. 第1実施形態の情報処理システムの概略図である。It is the schematic of the information processing system of 1st Embodiment. 分析情報の通知態様の一例を示す図である。It is a figure which shows an example of the notification mode of analysis information. 分析情報の通知態様の一例を示す図である。It is a figure which shows an example of the notification mode of analysis information. 分析情報の通知態様の一例を示す図である。It is a figure which shows an example of the notification mode of analysis information. 情報処理方法の一例を示す図である。It is a figure which shows an example of an information processing method. 情報処理方法の一例を示す図である。It is a figure which shows an example of an information processing method. 情報処理方法の一例を示す図である。It is a figure which shows an example of an information processing method. 第2実施形態の情報処理システムの概略図である。It is the schematic of the information processing system of 2nd Embodiment.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.
 なお、説明は以下の順序で行われる。
[1.第1実施形態]
 [1-1.動作分析サービスの概要]
 [1-2.情報処理システムの構成]
 [1-3.情報処理方法]
 [1-4.効果]
[2.第2実施形態]
 [2-1.情報処理システムの構成]
 [2-2.効果]
The explanation will be given in the following order.
[1. First Embodiment]
[1-1. Overview of motion analysis service]
[1-2. Information processing system configuration]
[1-3. Information processing method]
[1-4. effect]
[2. Second Embodiment]
[2-1. Information processing system configuration]
[2-2. effect]
[1.第1実施形態]
[1-1.動作分析サービスの概要]
 図1は、クラウドコンピューティングを用いた動作分析サービスCSの一例を示す図である。
[1. First Embodiment]
[1-1. Overview of motion analysis service]
FIG. 1 is a diagram showing an example of a motion analysis service CS using cloud computing.
 動作分析サービスCSでは、動画データに基づいてターゲットの動作が分析される。分析用のアプリケーションAPは、ソフトウェア開発キットSDKを用いて作製される。ユーザUは、開発者DVがストアSTRにアップロードしたアプリケーションAPをダウンロードしてクライアント端末100にインストールする。クライアント端末100には、アプリケーションAPに姿勢情報を供給するプログラムがインストールされている。 In the motion analysis service CS, the motion of the target is analyzed based on the video data. The application AP for analysis is created using the software development kit SDK. The user U downloads the application AP uploaded by the developer DV to the store STR and installs it on the client terminal 100. A program that supplies posture information to the application AP is installed in the client terminal 100.
 クライアント端末100は、ターゲットを写した動画データを用いて、ターゲットの動作を分析する情報処理装置である。クライアント端末100は、動画データから、分析の対象となる特定シーンを示す1以上のフレーム画像を抽出する。クライアント端末100は、抽出された1以上のフレーム画像をサーバ200に送信する。サーバ200は、抽出された1以上のフレーム画像からフレーム画像ごとにターゲットの姿勢情報を抽出する。クライアント端末100は、サーバ200により1以上のフレーム画像からフレーム画像ごとに抽出されたターゲットの姿勢情報を取得する。アプリケーションAPは、サーバ200から取得したターゲットの姿勢情報を用いて、特定シーンにおけるターゲットの動作を分析する。 The client terminal 100 is an information processing device that analyzes the operation of the target using moving image data of the target. The client terminal 100 extracts one or more frame images indicating a specific scene to be analyzed from the moving image data. The client terminal 100 transmits one or more extracted frame images to the server 200. The server 200 extracts the posture information of the target for each frame image from the extracted one or more frame images. The client terminal 100 acquires the posture information of the target extracted for each frame image from one or more frame images by the server 200. The application AP analyzes the movement of the target in a specific scene by using the posture information of the target acquired from the server 200.
 動作分析サービスCSでは、サーバ200が処理する画像は特定シーンのフレーム画像のみである。そのため、サーバ200を利用する際に生じるコストが削減される。ターゲットの動画を撮影する場合、特定シーンが確実に動画データに含まれるように、特定シーンの前後の動作まで撮影される場合が多い。特定シーンの前後の動画データは、動作分析に寄与しない。動作分析に寄与しないデータ領域の画像処理を省略することで、動作分析に必要な時間およびコストが軽減される。 In the motion analysis service CS, the image processed by the server 200 is only the frame image of the specific scene. Therefore, the cost incurred when using the server 200 is reduced. When shooting a target moving image, the movements before and after the specific scene are often shot so that the specific scene is surely included in the moving image data. The moving image data before and after the specific scene does not contribute to the motion analysis. By omitting the image processing of the data area that does not contribute to the motion analysis, the time and cost required for the motion analysis can be reduced.
 動作分析サービスCSは、スポーツにおける学習支援、ヘルスケア、自動運転および危険予知などの広範な分野に適用可能である。分析の対象となるシーンは、動作分析サービスCSが適用される分野および分析の目的などに応じて適切に規定される。 The motion analysis service CS can be applied to a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction. The scene to be analyzed is appropriately defined according to the field to which the motion analysis service CS is applied, the purpose of analysis, and the like.
 例えば、スポーツにおける学習支援の分野では、コーチング対象(サッカーのシュート、テニスのサーブ、ゴルフのスイングなど)に応じた特定の動作シーンが特定シーンとして規定される。ヘルスケアの分野では、機能回復訓練のシーンが特定シーンとして規定される。自動運転の分野では、歩行者を検出したシーンが特定シーンとして規定される。危険予知の分野では、異常姿勢状態(横たわっている、長時間うずくまっている、泥酔者の動き、不審な行動、転倒など)の検出シーンが特定シーンとして規定される。 For example, in the field of learning support in sports, a specific motion scene according to a coaching target (soccer shoot, tennis serve, golf swing, etc.) is defined as a specific scene. In the field of health care, the scene of functional recovery training is defined as a specific scene. In the field of autonomous driving, a scene in which a pedestrian is detected is defined as a specific scene. In the field of danger prediction, a scene for detecting an abnormal posture state (lying, crouching for a long time, movement of a drunk person, suspicious behavior, falling, etc.) is defined as a specific scene.
 以下では、動作分析サービスCSがスポーツにおける学習支援の分野に適用される例を説明する。 The following describes an example in which the motion analysis service CS is applied to the field of learning support in sports.
[1-2.情報処理システムの構成]
 図2は、第1実施形態の情報処理システム1の概略図である。
[1-2. Information processing system configuration]
FIG. 2 is a schematic view of the information processing system 1 of the first embodiment.
 情報処理システム1は、例えば、クライアント端末100と、サーバ200と、を有する。クライアント端末100は、例えば、スマートフォン、タブレット端末、ノートパソコンおよびデスクトップパソコンなどの情報端末である。クライアント端末100とサーバ200とはネットワークNWを介して接続されている。 The information processing system 1 has, for example, a client terminal 100 and a server 200. The client terminal 100 is, for example, an information terminal such as a smartphone, a tablet terminal, a notebook personal computer, and a desktop personal computer. The client terminal 100 and the server 200 are connected via a network NW.
 クライアント端末100は、例えば、処理装置110と、記憶装置120と、通信装置130と、カメラ140と、表示装置150と、を有する。 The client terminal 100 has, for example, a processing device 110, a storage device 120, a communication device 130, a camera 140, and a display device 150.
 処理装置110は、例えば、動画取得部111と、シーン抽出部112と、動作分析部113と、出力部114と、を有する。 The processing device 110 includes, for example, a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.
 動画取得部111は、例えば、カメラ140で撮影されたターゲットTGの動画データを取得する。動画には、例えば、分析の対象となる特定シーンと、特定シーンの前後のシーンと、が含まれる。カメラ140は、例えば、CCD(Charge Coupled Device Image Sensor)、CMOS(Complementary Metal Oxide Semiconductor)等のイメージセンサを含む。 The video acquisition unit 111 acquires, for example, the video data of the target TG shot by the camera 140. The moving image includes, for example, a specific scene to be analyzed and a scene before and after the specific scene. The camera 140 includes, for example, an image sensor such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Sensor).
 シーン抽出部112は、動画取得部111から出力された動画データを取得する。シーン抽出部112は、動画データから特定シーンを示す1以上のフレーム画像を抽出する。抽出されるフレーム画像の数は、例えば、1以上10以下である。シーン抽出部112は、例えば、ターゲットTGの動作に基づいて特定シーンを判定する。シーン抽出部112は、例えば、ターゲットTGの動作の特徴を記憶装置120に記憶されたシーン情報と照合して特定シーンの判定を行う。 The scene extraction unit 112 acquires the moving image data output from the moving image acquisition unit 111. The scene extraction unit 112 extracts one or more frame images indicating a specific scene from the moving image data. The number of frame images to be extracted is, for example, 1 or more and 10 or less. The scene extraction unit 112 determines a specific scene based on, for example, the operation of the target TG. For example, the scene extraction unit 112 determines a specific scene by collating the operation characteristics of the target TG with the scene information stored in the storage device 120.
 特定シーンに関する情報は、例えば、シーン情報122として記憶装置120に記憶されている。シーン情報122には、例えば、コーチング対象ごとに、分析の対象となる1以上の特定シーンと、それぞれの特定シーンを判定するための判定条件と、が関連付けて規定されている。 Information about a specific scene is stored in the storage device 120 as, for example, scene information 122. In the scene information 122, for example, for each coaching target, one or more specific scenes to be analyzed and a determination condition for determining each specific scene are defined in association with each other.
 サッカーの学習支援の例では、コーチング対象として、例えば、ドリブル、シュートおよびヘディングが規定される。コーチング対象がサッカーのシュートである場合には、例えば、(i)軸足の踏み込みのタイミング、(ii)けり足の大腿がボールに向かうタイミング、(iii)インパクトのタイミング、および、(iv)インパクト後の指定秒後のタイミングなどが特定シーンとして規定される。特定シーンの判定条件は、例えば、特定の関節の角度、および、ボールと特定のキーポイントとの相対位置などに基づいて規定される。 In the example of soccer learning support, for example, dribbling, shooting and heading are defined as coaching targets. When the coaching target is a soccer shot, for example, (i) the timing of stepping on the axial foot, (ii) the timing of the thigh of the foot toward the ball, (iii) the timing of impact, and (iv) impact. The timing after the specified second later is defined as a specific scene. The determination conditions for a specific scene are defined based on, for example, the angle of a specific joint, the relative position of the ball and a specific key point, and the like.
 シーン抽出部112は、例えば、機械学習によって得られた第1分析モデル123を用いてターゲットTGの姿勢情報を抽出する。第1分析モデル123は、例えば、サーバ200により姿勢情報を抽出する際に用いられる分析モデル(第2分析モデル221)よりも姿勢推定精度が低い分析モデルである。シーン抽出部112は、例えば、ターゲットTGの姿勢の変化に基づいてターゲットTGの動作を判定する。 The scene extraction unit 112 extracts the attitude information of the target TG using, for example, the first analysis model 123 obtained by machine learning. The first analysis model 123 is, for example, an analysis model having a lower attitude estimation accuracy than the analysis model (second analysis model 221) used when extracting posture information by the server 200. The scene extraction unit 112 determines the operation of the target TG based on, for example, a change in the posture of the target TG.
 動画データは、時系列で生じる複数の特定シーンを含む一連の動作の情報を含む。シーン抽出部112は、どの特定シーンが生じているかを、動作の流れの中で、前後の文脈を考慮しながら、それぞれ個別の観点で判定する。例えば、シュートの動作では、まず上記(i)の特定シーンが判定され、その後、(i)以降の動画データから(ii)、(iii)および(iv)の順番でそれぞれの特定シーンが判定される。それぞれの特定シーンは、特定シーンごとに想定される体の動きに基づいて判定される。 The video data includes information on a series of operations including a plurality of specific scenes that occur in time series. The scene extraction unit 112 determines which specific scene is occurring in the flow of the operation from individual viewpoints while considering the context before and after. For example, in the shooting operation, the specific scene of (i) above is first determined, and then each specific scene is determined in the order of (ii), (iii), and (iv) from the moving image data after (i). NS. Each specific scene is determined based on the body movement assumed for each specific scene.
 判定を容易にするために、シーン抽出部112は、例えば、ターゲットTGと特定のオブジェクト(サッカーの場合はボールなど)とが所定の位置関係にあるときのターゲットTGの動作、または、ターゲットTGと特定のオブジェクトとの位置関係の変化に基づいて特定シーンを判定する。この構成では、骨格同士の相対的な位置関係のみに基づいて特定シーンを判定する場合よりも、精度よく特定シーンが判定される。 In order to facilitate the determination, the scene extraction unit 112 may, for example, operate the target TG when the target TG and a specific object (such as a ball in the case of soccer) have a predetermined positional relationship, or the target TG and the target TG. A specific scene is determined based on a change in the positional relationship with a specific object. In this configuration, the specific scene is determined more accurately than when the specific scene is determined based only on the relative positional relationship between the skeletons.
 例えば、上記(i)の特定シーンの判定は次のように行われる。まず、軸足を踏み込む際に軸足があまり動かなくなると想定される特異領域が、ボールとの相対的な位置関係に基づいて定義される。特異領域は、例えば、ボールの中心から半径A×r(rはボールの半径。Aは1よりも大きい数)の画像領域として定義される。 For example, the determination of the specific scene in (i) above is performed as follows. First, a singular region where it is assumed that the foot does not move much when the foot is stepped on is defined based on the relative positional relationship with the ball. The singular region is defined as, for example, an image region having a radius A × r (r is the radius of the ball; A is a number larger than 1) from the center of the ball.
 例えば、シーン抽出部112は、軸足とボールとの距離が閾値以内にあるフレーム画像を基準フレーム画像として抽出する。シーン抽出部112は、基準フレーム画像から(N-1)フレームだけ遡ったフレーム画像から、基準フレーム画像までのN個のフレーム画像を抽出する(Nは1以上の整数)。シーン抽出部112は、N個のフレーム画像のそれぞれについて、ターゲットTGのくるぶしの骨格が収まる骨格領域を抽出する。シーン抽出部112は、N個の骨格領域が全て収まる骨格動作領域を抽出する。シーン抽出部112は、骨格動作領域の大きさが閾値以内であり、且つ、骨格動作領域が特異領域に含まれる場合に、軸足が踏み込まれたと判定する。シーン抽出部112は、軸足が踏み込まれたタイミングを示す1以上のフレーム画像を動画データから抽出する。 For example, the scene extraction unit 112 extracts a frame image in which the distance between the foot and the ball is within the threshold value as a reference frame image. The scene extraction unit 112 extracts N frame images up to the reference frame image from the frame image traced back by (N-1) frames from the reference frame image (N is an integer of 1 or more). The scene extraction unit 112 extracts a skeleton region in which the skeleton of the ankle of the target TG fits in each of the N frame images. The scene extraction unit 112 extracts a skeletal motion region in which all N skeletal regions are contained. The scene extraction unit 112 determines that the axial foot has been stepped on when the size of the skeletal movement region is within the threshold value and the skeletal movement region is included in the singular region. The scene extraction unit 112 extracts one or more frame images indicating the timing at which the shaft foot is stepped on from the moving image data.
 上記(i)の特定シーンのフレーム画像が抽出されたら、シーン抽出部112は、上記(ii)の特定シーンのフレーム画像を抽出する作業に移る。シーン抽出部112は、例えば、軸足として検出された足の延長線がボールを通過するタイミングを上記(ii)の特定シーンと判定する。上記(ii)の特定シーンの判定は、上記(i)の特定シーン以降の動画データに対して行われる。動作の流れの中で前後の文脈を考慮すると、上記(ii)の特定シーンは上記(i)の特定シーンの直後に生じると考えられる。そのため、上記(i)の特定シーンの直後から所定の時間内に、軸足として検出された足の延長線がボールを通過したシーンが存在すれば、そのシーンが上記(ii)の特定シーンである可能性が高い。よって、シーン抽出部112は、そのシーンを上記(ii)の特定シーンと判定し、特定シーンを示す1以上のフレーム画像を動画データから抽出する。 After the frame image of the specific scene of the above (i) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (ii). The scene extraction unit 112 determines, for example, the timing at which the extension line of the foot detected as the axial foot passes through the ball as the specific scene of the above (ii). The determination of the specific scene of the above (ii) is performed on the moving image data after the specific scene of the above (i). Considering the context before and after in the flow of the operation, it is considered that the specific scene of (ii) above occurs immediately after the specific scene of (i) above. Therefore, if there is a scene in which the extension line of the foot detected as the axial foot passes through the ball within a predetermined time immediately after the specific scene of the above (i), that scene is the specific scene of the above (ii). There is a high possibility. Therefore, the scene extraction unit 112 determines that the scene is the specific scene of the above (ii), and extracts one or more frame images indicating the specific scene from the moving image data.
 上記(ii)の特定シーンのフレーム画像が抽出されたら、シーン抽出部112は、上記(iii)の特定シーンのフレーム画像を抽出する作業に移る。シーン抽出部112は、例えば、腰の中心とボールの中心との距離が縮まった後、縮まるスピードよりも大きなスピードで距離が広がるタイミングを上記(iii)の特定シーンと判定する。ボールをインパクトすると、その直前までは腰の中心とボールの中心との距離は縮む傾向にあるが、インパクトをすると縮んでいくスピードよりもはるかに大きいスピードで距離が広がり始める。このことを利用して、シーン抽出部112は、各フレーム画像において腰骨の中心とボールの中心との距離を算出し、フレーム間の距離の差をボールの直径で割った値が閾値を超えたときに、距離の変化の態様が反転したと判定する。シーン抽出部112は、距離の変化の態様が反転する直前のシーンを上記(iii)の特定シーンと判定する。 After the frame image of the specific scene of the above (iii) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iii). For example, the scene extraction unit 112 determines that the timing at which the distance between the center of the waist and the center of the ball is reduced and then the distance is expanded at a speed greater than the speed at which the distance is reduced is defined as the specific scene of the above (iii). When the ball is impacted, the distance between the center of the waist and the center of the ball tends to shrink until just before that, but when the ball is impacted, the distance begins to widen at a speed much higher than the speed at which it shrinks. Utilizing this, the scene extraction unit 112 calculates the distance between the center of the hipbone and the center of the ball in each frame image, and the value obtained by dividing the difference in distance between the frames by the diameter of the ball exceeds the threshold value. Occasionally, it is determined that the mode of change in distance is reversed. The scene extraction unit 112 determines that the scene immediately before the aspect of the change in distance is reversed is the specific scene of the above (iii).
 上記(iii)の特定シーンの判定は、上記(ii)の特定シーン以降の動画データに対して行われる。動作の流れの中で前後の文脈を考慮すると、上記(iii)の特定シーンは上記(ii)の特定シーンの直後に生じると考えられる。そのため、上記(ii)の特定シーンの直後から所定の時間内に、上述した、距離の変化が生じた場合、そのシーンが上記(iii)の特定シーンである可能性が高い。よって、シーン抽出部112は、そのシーンを上記(iii)の特定シーンと判定し、特定シーンを示す1以上のフレーム画像を動画データから抽出する。 The determination of the specific scene of the above (iii) is performed on the moving image data after the specific scene of the above (iii). Considering the context before and after in the flow of the operation, it is considered that the specific scene of the above (iii) occurs immediately after the specific scene of the above (iii). Therefore, if the above-mentioned change in distance occurs within a predetermined time immediately after the specific scene of (iii), it is highly possible that the scene is the specific scene of (iii). Therefore, the scene extraction unit 112 determines that the scene is the specific scene of the above (iii), and extracts one or more frame images indicating the specific scene from the moving image data.
 上記(iii)の特定シーンのフレーム画像が抽出されたら、シーン抽出部112は、上記(iv)の特定シーンのフレーム画像を抽出する作業に移る。上記(iv)の特定シーンのフレーム画像は、シュート後の姿勢を分析するために用いられる。上記(iv)の特定シーンは、上記(iii)の特定シーンの後、所定の時間だけ経過した後のシーンとして規定される。分析に適した姿勢が検出されるまでの時間は、ボールへの突入やモーションの速さによって左右される。よって、シュート後からどれくらいの時間が経過したタイミングを上記(iv)の特定シーンと判定するかは、ターゲットTGごとに異なる。そのため、シーン抽出部112は、個人差を考慮して、例えば、上記(ii)の特定シーンから上記(iii)の特定シーンまでのフレーム数の所定数倍のフレーム時間が経過したタイミングを上記(iv)の特定シーンと判定する。 After the frame image of the specific scene of the above (iii) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iv). The frame image of the specific scene in (iv) above is used to analyze the posture after shooting. The specific scene of (iv) is defined as a scene after a predetermined time has elapsed after the specific scene of (iii). The time it takes for a posture suitable for analysis to be detected depends on the rush into the ball and the speed of motion. Therefore, how much time has passed since the shooting is determined as the specific scene of the above (iv) differs for each target TG. Therefore, in consideration of individual differences, the scene extraction unit 112 sets the timing at which a predetermined number of times the frame time from the specific scene of the above (iii) to the specific scene of the above (iii) elapses. It is determined that it is a specific scene of iv).
 姿勢の推定精度は、分析モデルに用いられるニューラルネットワークの規模によって変化する。規模の大きいニューラルネットワークを用いた場合には、画像データから多くのキーポイントが抽出され、ターゲットTGの様々な動作が精度よく推定される。オクルージョン等による情報の欠落があっても、ターゲットTGのキーポイントは精度よく抽出される。ニューラルネットワークの規模を大きくする方法としては、特徴マップ(チャンネル)を増やす方法と層(レイヤ)を深くする方法とがある。いずれの方法でも、畳み込み演算の処理量が増加し、計算速度が低下する。姿勢の推定精度と計算速度とはトレードオフの関係にある。 Posture estimation accuracy varies depending on the scale of the neural network used in the analysis model. When a large-scale neural network is used, many key points are extracted from the image data, and various operations of the target TG are estimated with high accuracy. Even if there is a lack of information due to occlusion or the like, the key points of the target TG are accurately extracted. As a method of increasing the scale of the neural network, there are a method of increasing the feature map (channel) and a method of deepening the layer (layer). In either method, the processing amount of the convolution operation increases and the calculation speed decreases. There is a trade-off between posture estimation accuracy and calculation speed.
 シーン抽出部112は、例えば、ニューラルネットワークの規模が小さい低精度低計算量の第1分析モデル123を用いて、動画データを構成する全てのフレーム画像からターゲットTGの姿勢情報を抽出する。ターゲットTGの動作シーンを判定するだけであれば、ターゲットTGの大まかな動作が把握できればよい。オクルージョン等による情報の欠落があっても、姿勢の大まかな変化によって動作の特徴は把握される。よって、低精度低計算量の第1分析モデル123を用いても、ターゲットTGの動作シーンを判定することができる。第1分析モデル123を用いた場合には、フレーム画像ごとの畳み込み演算の処理量が小さいため、動画データが大きくても迅速な処理が可能である。 The scene extraction unit 112 extracts the attitude information of the target TG from all the frame images constituting the moving image data by using, for example, the first analysis model 123 with a small scale of the neural network and a low precision and low calculation amount. If only the operation scene of the target TG is to be determined, it suffices if the rough operation of the target TG can be grasped. Even if there is a lack of information due to occlusion, etc., the characteristics of the movement can be grasped by a rough change in posture. Therefore, the operation scene of the target TG can be determined even by using the first analysis model 123 with low accuracy and low calculation amount. When the first analysis model 123 is used, since the processing amount of the convolution calculation for each frame image is small, rapid processing is possible even if the moving image data is large.
 特定シーンを示す1以上のフレーム画像のデータは通信装置130を介してサーバ200に送信される。動作分析部113は、サーバ200によりこの1以上のフレーム画像からフレーム画像ごとに抽出されたターゲットTGの姿勢情報を取得する。動作分析部113は、サーバ200から取得したターゲットTGの姿勢情報(動画データから抽出された特定シーンにおけるターゲットTGの姿勢情報)に基づいて、特定シーンにおけるターゲットTGの動作を分析する。動作分析部113は、分析結果を出力部114に出力する。 Data of one or more frame images indicating a specific scene is transmitted to the server 200 via the communication device 130. The motion analysis unit 113 acquires the attitude information of the target TG extracted for each frame image from the one or more frame images by the server 200. The motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information of the target TG acquired from the server 200 (posture information of the target TG in the specific scene extracted from the moving image data). The motion analysis unit 113 outputs the analysis result to the output unit 114.
 出力部114は、例えば、動作分析部113の分析結果を示す1以上の分析情報MAIをユーザUに通知する。分析情報MAIは、例えば、動作の手本となる特定人物RMの動作との比較に基づく評価結果の情報を含む。特定人物RMの動作の情報は、ロールモデル情報121として記憶装置120に記憶されている。 The output unit 114 notifies the user U of one or more analysis information MAI indicating the analysis result of the motion analysis unit 113, for example. The analysis information MAI includes, for example, information on the evaluation result based on the comparison with the movement of the specific person RM as a model of the movement. Information on the operation of the specific person RM is stored in the storage device 120 as role model information 121.
 通知は、例えば、文字、図表および音声の組み合わせによって行われる。図2の例では、分析情報MAIは、文字情報および骨格を示す骨格情報として提示される。出力部114は、例えば、分析情報MAIを含む静止画像IMを表示装置150に表示される。表示装置150は、各種情報を表示するクライアント端末100の表示部である。表示装置150は、例えば、LCD(Liquid Crystal Display)またはOLED(Organic Light Emitting Diode)である。 Notification is made by, for example, a combination of letters, charts and voice. In the example of FIG. 2, the analytical information MAI is presented as textual information and skeleton information indicating the skeleton. The output unit 114 displays, for example, a still image IM including the analysis information MAI on the display device 150. The display device 150 is a display unit of the client terminal 100 that displays various information. The display device 150 is, for example, an LCD (Liquid Crystal Display) or an OLED (Organic Light Emitting Diode).
 動作分析部113による動作分析の手法および出力部114により通知される分析情報MAIの通知態様は、アプリケーションAP(図1参照)のアルゴリズムに従う。 The method of motion analysis by the motion analysis unit 113 and the notification mode of the analysis information MAI notified by the output unit 114 follow the algorithm of the application AP (see FIG. 1).
 図3は、分析情報MAIの一例を示す図である。 FIG. 3 is a diagram showing an example of analysis information MAI.
 出力部114は、例えば、1以上の分析情報MAIとして、第1分析情報MAI1と第2分析情報MAI2と、を通知する。第1分析情報MAI1は、例えば、特定シーンにおける、ターゲットTGの動作と、手本となる特定人物RMの動作と、の比較を示す情報を含む。第2分析情報MAI2は、例えば、ターゲットTGの動作を特定人物RMの動作に近づけるための指針を示す情報を含む。特定シーンにおける特定人物RMの骨格情報は、例えば、ロールモデル情報121に含まれる。図3の例では、「蹴り足が高く振り上げられています」というコメントと、「86点」という評価点が示されている。評価点は、特定シーンにおいて設定された評価項目の達成度を示す。 The output unit 114 notifies, for example, the first analysis information MAI1 and the second analysis information MAI2 as one or more analysis information MAIs. The first analysis information MAI1 includes, for example, information indicating a comparison between the operation of the target TG and the operation of the specific person RM as a model in a specific scene. The second analysis information MAI2 includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM. The skeleton information of the specific person RM in the specific scene is included in, for example, the role model information 121. In the example of FIG. 3, a comment that "the kick foot is swung up high" and an evaluation point of "86 points" are shown. The evaluation points indicate the degree of achievement of the evaluation items set in the specific scene.
 第1分析情報MAI1は、例えば、特定シーンにおける、ターゲットTGの骨格情報SIと、比較の基準となる1以上の基準骨格情報RSIと、を含む。図3の例では、特定シーンは、軸足の踏み込みのタイミングである。1以上の基準骨格情報RSIとして、例えば、第1基準骨格情報RSI1と、第2基準骨格情報RSI2と、第3基準骨格情報RSI3と、が表示されている。 The first analysis information MAI1 includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison. In the example of FIG. 3, the specific scene is the timing of stepping on the shaft foot. As one or more reference skeleton information RSIs, for example, the first reference skeleton information RSI1, the second reference skeleton information RSI2, and the third reference skeleton information RSI3 are displayed.
 第1基準骨格情報RSI1は、例えば、手本となる動作の骨格情報である。第2基準骨格情報RSI2は、例えば、手本に満たない特定レベル(例えば、手本を100点としたときの80点のレベル)の動作の骨格情報である。第1基準骨格情報RSI1および第2基準骨格情報RSI3は、腰の位置がターゲットTGと一致したタイミングにおける手本の骨格情報である。第3基準骨格情報RSI3は、例えば、軸足の位置がターゲットTGと一致したタイミングにおける手本の骨格情報である。 The first reference skeleton information RSI1 is, for example, skeleton information of an operation that serves as a model. The second reference skeleton information RSI2 is, for example, skeleton information of a specific level (for example, a level of 80 points when the model is 100 points) that is less than the model. The first reference skeleton information RSI1 and the second reference skeleton information RSI3 are model skeleton information at the timing when the position of the waist coincides with the target TG. The third reference skeleton information RSI3 is, for example, model skeleton information at the timing when the position of the axial foot coincides with the target TG.
 第3基準骨格情報RSI3は、例えば、軸足の踏み込みからインパクト直後までの一連の動作の期間中、常に、ターゲットTGの動きに連動させて表示される。第3基準骨格情報RSI3は、軸足の踏み込みからインパクト直後までの一連の動作をターゲットTGと比較するために用いられる。そのため、第3基準骨格情報RSI3は、第1基準骨格情報RSI1および第2基準骨格情報RSI2とは異なり、全身の骨格情報を示すものとなっている。 The third reference skeleton information RSI3 is always displayed in conjunction with the movement of the target TG during a series of movements from the stepping on the shaft foot to immediately after the impact. The third reference skeletal information RSI3 is used to compare a series of movements from the stepping on the shaft foot to immediately after the impact with the target TG. Therefore, the third reference skeleton information RSI3 is different from the first reference skeleton information RSI1 and the second reference skeleton information RSI2, and shows the skeleton information of the whole body.
 一連の動作に要する時間は特定人物RMとターゲットTGとで異なる。そのため、比較を行うのに効果的なタイミング(例えば、インパクトのタイミングまたは踏み込みのタイミングなど)が定義され、定義されたタイミングが一致するように第3基準骨格情報RSI3がターゲットTGに重畳される。図3の例では、踏み込みのタイミングを一致させているが、どのタイミングを揃えるべきかは、レッスンの趣旨などに応じて適切に設定される。 The time required for a series of operations differs between the specific person RM and the target TG. Therefore, effective timings for making comparisons (for example, impact timings or stepping timings) are defined, and the third reference skeleton information RSI3 is superimposed on the target TG so that the defined timings match. In the example of FIG. 3, the timing of stepping is matched, but which timing should be aligned is appropriately set according to the purpose of the lesson and the like.
 出力部114は、例えば、例えば、ターゲットTGのくるぶしの位置と特定人物RMのくるぶしの位置とが、定義されたタイミングで一致するように、第3基準骨格情報RSI3の位置をオフセットして表示する。これにより、ターゲットTGと特定人物RMの踏み込みの位置がどれくらい違うかが理解しやすくなる。 For example, the output unit 114 offsets and displays the position of the third reference skeleton information RSI3 so that the position of the ankle of the target TG and the position of the ankle of the specific person RM match at a defined timing. .. This makes it easier to understand how different the stepping position of the target TG and the specific person RM is.
 静止画像IMには、ターゲットTGの骨格情報SIおよび1以上の基準骨格情報RSIとして、特定シーンにおいて分析されるべきターゲットTGの部位に対応した骨格の情報が選択的に表示される。図3の例では、骨格情報SIおよび1以上の基準骨格情報RSIとして、腰と脚の骨格の情報が選択的に表示されている。1以上の基準骨格情報RSIは、例えば、特定人物RMの特定シーンにおける骨格情報を、ターゲットTGと特定人物RMとの体格差に基づいて修正した骨格情報を用いて生成される。 In the still image IM, skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI. In the example of FIG. 3, the waist and leg skeleton information is selectively displayed as the skeleton information SI and one or more reference skeleton information RSI. One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.
 基準骨格情報RSIの縮尺は、例えば、次のように設定される。まず、特定人物RMとターゲットTGの体格を比較するのに適した1以上の骨が定義される。例えば、図3の例では、背骨と足の骨が比較の基準として定義されている。動作分析部113は、例えば、特定人物RMとターゲットTGのそれぞれについて、姿勢が揃うタイミングにおける背骨と足の骨の長さを検出する。動作分析部113は、背骨と足の骨の長さ和の比を特定人物RMとターゲットTGの体の大きさの比として算出し、この比に基づいて特定人物RMの骨格の縮尺を変更する。これにより、特定人物RMとの比較が容易になり、ターゲットTGがどのように動作すべきかが理解しやすくなる。 The scale of the reference skeleton information RSI is set as follows, for example. First, one or more bones suitable for comparing the physiques of a specific person RM and a target TG are defined. For example, in the example of FIG. 3, the spine and the bones of the foot are defined as criteria for comparison. The motion analysis unit 113 detects, for example, the lengths of the spine and the leg bones at the timing when the postures of the specific person RM and the target TG are aligned. The motion analysis unit 113 calculates the ratio of the sum of the lengths of the spine and the bones of the foot as the ratio of the body size of the specific person RM and the target TG, and changes the scale of the skeleton of the specific person RM based on this ratio. .. This facilitates comparison with the specific person RM and makes it easier to understand how the target TG should behave.
 図4は、分析情報MAIの通知態様の一例を示す図である。 FIG. 4 is a diagram showing an example of the notification mode of the analysis information MAI.
 分析情報MAIは、例えば、特定シーンの再生時に、特定シーンを示すフレーム画像に重畳して表示される。表示装置150は、特定シーンにおいて動画データの再生を一時停止する。そして、表示装置150は、特定シーンにおけるターゲットTGの分析結果を示す分析情報MAIを特定シーンのフレーム画像に重畳させた静止画像IMを表示する。複数の特定シーンが設定されている場合には、特定シーンごとに動画データの再生が一時停止され、特定シーンでの分析情報MAIが通知される。なお、動画データの再生は、ターゲットTGの姿勢を確認しやすいように、スローモーション再生が用いられてもよい。また、スローモーション再生は、例えば、最初の特定シーンから最後の特定シーンの区間の動画データの再生にのみ適用し、その区間の前後の動画データについては通常の再生速度で再生を行ってもよい。 The analysis information MAI is displayed superimposed on the frame image indicating the specific scene, for example, when the specific scene is reproduced. The display device 150 pauses the reproduction of the moving image data in a specific scene. Then, the display device 150 displays a still image IM in which the analysis information MAI indicating the analysis result of the target TG in the specific scene is superimposed on the frame image of the specific scene. When a plurality of specific scenes are set, the reproduction of the moving image data is paused for each specific scene, and the analysis information MAI in the specific scene is notified. For the reproduction of the moving image data, slow motion reproduction may be used so that the posture of the target TG can be easily confirmed. Further, slow motion playback may be applied only to playback of video data in a section from the first specific scene to the last specific scene, and the video data before and after that section may be played back at a normal playback speed. ..
 図4は、3つの特定シーンA1~A3が設定された例を示す。特定シーンA1は、例えば、軸足の踏み込みのタイミングである。特定シーンA2は、例えば、インパクトのタイミングである。特定シーンA3は、例えば、インパクト直後(インパクトから指定数秒後)のタイミングである。 FIG. 4 shows an example in which three specific scenes A1 to A3 are set. The specific scene A1 is, for example, the timing of stepping on the shaft foot. The specific scene A2 is, for example, the timing of impact. The specific scene A3 is, for example, a timing immediately after the impact (a few seconds after the impact).
 まず、表示装置150は、クライアント端末100に対する再生操作に基づいてターゲットTGの動画を再生する。表示装置150は、特定シーンA1が再生されるタイミングで動画データの再生を一時停止する。そして、表示装置150は、特定シーンA1におけるターゲットTGの動作の分析情報MAIを特定シーンA1のフレーム画像に重畳させた静止画像IM(第1静止画像IM1)を表示する。その後、表示装置150は、クライアント端末100に対する再生操作または第1静止画像IM1の表示開始から予め設定された時間だけ経過したことを契機として、特定シーンA1以降の動画の再生を開始する。 First, the display device 150 reproduces the moving image of the target TG based on the reproduction operation for the client terminal 100. The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A1 is reproduced. Then, the display device 150 displays the still image IM (first still image IM1) in which the analysis information MAI of the operation of the target TG in the specific scene A1 is superimposed on the frame image of the specific scene A1. After that, the display device 150 starts playing the moving image after the specific scene A1 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the first still image IM1.
 表示装置150は、特定シーンA2が再生されるタイミングで動画データの再生を一時停止する。そして、表示装置150は、特定シーンA2におけるターゲットTGの動作の分析情報MAIを特定シーンA2のフレーム画像に重畳させた静止画像IM(第2静止画像IM2)を表示する。その後、表示装置150は、クライアント端末100に対する再生操作または第2静止画像IM2の表示開始から予め設定された時間だけ経過したことを契機として、特定シーンA2以降の動画の再生を開始する。 The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A2 is reproduced. Then, the display device 150 displays the still image IM (second still image IM2) in which the analysis information MAI of the operation of the target TG in the specific scene A2 is superimposed on the frame image of the specific scene A2. After that, the display device 150 starts playing the moving image after the specific scene A2 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the second still image IM2.
 表示装置150は、特定シーンA3が再生されるタイミングで動画データの再生を一時停止する。そして、表示装置150は、特定シーンA3におけるターゲットTGの動作の分析情報MAIを特定シーンA3のフレーム画像に重畳させた静止画像IM(第3静止画像IM3)を表示する。その後、表示装置150は、クライアント端末100に対する再生操作または第3静止画像IM3の表示開始から予め設定された時間だけ経過したことを契機として、特定シーンA3以降の動画の再生を開始する。これにより、表示装置150は、特定シーンにおいてターゲットTGの動きを一時停止し、特定シーンにおけるターゲットTGの分析結果を示す分析情報を特定シーンにおけるターゲットTGの静止画像と共に表示することができる。 The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A3 is reproduced. Then, the display device 150 displays the still image IM (third still image IM3) in which the analysis information MAI of the operation of the target TG in the specific scene A3 is superimposed on the frame image of the specific scene A3. After that, the display device 150 starts playing the moving image after the specific scene A3 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the third still image IM3. As a result, the display device 150 can pause the movement of the target TG in the specific scene and display the analysis information indicating the analysis result of the target TG in the specific scene together with the still image of the target TG in the specific scene.
 全ての特定シーンの分析情報MAIが通知されたら、表示装置150は、残りの動画を最後まで再生する。 When the analysis information MAI of all specific scenes is notified, the display device 150 plays the remaining moving images to the end.
 なお、ここでは、特定シーンで動画データの再生を一時停止し、分析情報MAIを動画の画面に重畳表示する例が示された。しかし、分析情報MAIの通知手法はこれに限られない。例えば、クライアント端末100が分析情報MAIを組み込んだ新たな動画データ(修正動画データ)を生成し、生成された修正動画データを表示装置150で再生してもよい。例えば、修正動画データの特定シーンを示す1以上のフレーム画像には、分析情報MAIが書きこまれている。修正動画データでは、特定シーンにおいてターゲットTGの動きが停止され、分析情報MAIを含むターゲットTGの停止画像が所定時間だけ表示された後、特定シーン以降のターゲットTGの動作が再開されるように表示が調整される。 Here, an example is shown in which the playback of the video data is paused in a specific scene and the analysis information MAI is superimposed and displayed on the screen of the video. However, the notification method of the analysis information MAI is not limited to this. For example, the client terminal 100 may generate new moving image data (corrected moving image data) incorporating the analysis information MAI, and the generated modified moving image data may be reproduced on the display device 150. For example, analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data. In the modified video data, the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.
 図5は、分析情報MAIの通知態様の他の一例を示す図である。図5は、ゴルフの学習支援に動作分析サービスCSが適用された例を示す。 FIG. 5 is a diagram showing another example of the notification mode of the analysis information MAI. FIG. 5 shows an example in which the motion analysis service CS is applied to golf learning support.
 図5は、6つの特定シーンが設定された例を示す。例えば、バックスイングのタイミング、ダウンスイングのタイミング、インパクト直前のタイミング、インパクトのタイミング、インパクト直後のタイミング、および、フォロースルーのタイミングが特定シーンとして設定されている。前述した例と同様に、各特定シーンが再生されるタイミングで動画データの一時停止が行われ、分析情報MAIが重畳表示される。図5の例では、過去の特定シーンの分析情報MAIは、消されずにそのまま表示装置150に表示され続ける。また、手本となる骨格情報は表示されない。最後の特定シーンの再生時、あるいは、全ての特定シーンの分析情報MAIが通知され、残りの動画が最後まで再生されたときに、特定シーンごとの評価点(スコア)がまとめて表示される。 FIG. 5 shows an example in which six specific scenes are set. For example, the backswing timing, the downswing timing, the timing immediately before the impact, the impact timing, the timing immediately after the impact, and the follow-through timing are set as specific scenes. Similar to the above-described example, the moving image data is paused at the timing when each specific scene is reproduced, and the analysis information MAI is superimposed and displayed. In the example of FIG. 5, the analysis information MAI of the past specific scene is not erased and continues to be displayed on the display device 150 as it is. In addition, model skeletal information is not displayed. When the last specific scene is played back, or when the analysis information MAI of all the specific scenes is notified and the remaining moving images are played back to the end, the evaluation points (scores) for each specific scene are collectively displayed.
 図5の例では、特定シーンの分析情報MAIの通知が終わった後も、分析情報MAIは消去されずに画面に表示され続けた。しかし、分析情報MAIの表示態様はこれに限られない。特定シーンの分析情報MAIの通知が終わった後、次の特定シーンが表示されるまで、分析情報MAIがいったん消去され、最後の特定シーンの再生時、あるいは、全ての特定シーンの分析情報MAIが通知され、残りの動画が最後まで再生されたときに、全ての特定シーンの分析情報MAIがまとめて再表示されてもよい。 In the example of FIG. 5, even after the notification of the analysis information MAI of the specific scene was completed, the analysis information MAI was not erased and continued to be displayed on the screen. However, the display mode of the analysis information MAI is not limited to this. After the notification of the analysis information MAI of the specific scene is finished, the analysis information MAI is deleted once until the next specific scene is displayed, and when the last specific scene is played back, or the analysis information MAI of all the specific scenes is displayed. When notified and the remaining video is played to the end, the analysis information MAI of all specific scenes may be redisplayed together.
 通信装置130は、外部機器との間で各種データの送受信を行うクライアント端末100の通信部である。例えば、通信装置130は、シーン抽出部112で抽出された1以上のフレーム画像をサーバ200に送信する。通信装置130は、この1以上のフレーム画像からフレーム画像ごとにサーバ200により抽出されたターゲットTGの姿勢情報をサーバ200から取得する。 The communication device 130 is a communication unit of the client terminal 100 that transmits and receives various data to and from an external device. For example, the communication device 130 transmits one or more frame images extracted by the scene extraction unit 112 to the server 200. The communication device 130 acquires the attitude information of the target TG extracted by the server 200 for each frame image from the one or more frame images from the server 200.
 図2に戻って、記憶装置120は、例えば、処理装置110が実行するプログラム124と、アプリケーションAPと、第1分析モデル123と、ロールモデル情報121と、シーン情報122と、を記憶する。プログラム124およびアプリケーションAPは、本実施形態に係る情報処理をコンピュータに実行させるプログラムである。処理装置110は、記憶装置120に記憶されているプログラム124およびアプリケーションAPにしたがって各種の処理を行う。記憶装置120は、処理装置110の処理結果を一時的に記憶する作業領域として利用されてもよい。記憶装置120は、例えば、半導体記憶媒体および磁気記憶媒体などの任意の非一過的な記憶媒体を含む。記憶装置120は、例えば、光ディスク、光磁気ディスクまたはフラッシュメモリを含んで構成される。プログラム124は、例えば、コンピュータにより読み取り可能な非一過的な記憶媒体に記憶されている。 Returning to FIG. 2, the storage device 120 stores, for example, the program 124 executed by the processing device 110, the application AP, the first analysis model 123, the role model information 121, and the scene information 122. The program 124 and the application AP are programs that cause a computer to execute information processing according to the present embodiment. The processing device 110 performs various processes according to the program 124 and the application AP stored in the storage device 120. The storage device 120 may be used as a work area for temporarily storing the processing result of the processing device 110. The storage device 120 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium. The storage device 120 includes, for example, an optical disk, a magneto-optical disk, or a flash memory. Program 124 is stored, for example, in a non-transient storage medium that can be read by a computer.
 処理装置110は、例えば、プロセッサとメモリとで構成されるコンピュータである。処理装置110のメモリには、RAM(Random Access Memory)およびROM(Read Only Memory)が含まれる。処理装置110は、プログラム124を実行することにより、動画取得部111およびシーン抽出部112として機能する。処理装置110は、アプリケーションAPを実行することにより、動作分析部113および出力部114として機能する。 The processing device 110 is, for example, a computer composed of a processor and a memory. The memory of the processing device 110 includes a RAM (Random Access Memory) and a ROM (Read Only Memory). The processing device 110 functions as the moving image acquisition unit 111 and the scene extraction unit 112 by executing the program 124. The processing device 110 functions as the motion analysis unit 113 and the output unit 114 by executing the application AP.
 サーバ200は、例えば、処理装置210と、記憶装置220と、通信装置230と、を有する。 The server 200 has, for example, a processing device 210, a storage device 220, and a communication device 230.
 処理装置210は、姿勢情報抽出部211を有する。姿勢情報抽出部211は、クライアント端末100から送信された特定シーンを示す1以上のフレーム画像を通信装置230を介して取得する。姿勢情報抽出部211は、機械学習によって得られた第2分析モデル221を用いて、特定シーンを示す1以上のフレーム画像からフレーム画像ごとにターゲットTGの姿勢情報を抽出する。 The processing device 210 has a posture information extraction unit 211. The posture information extraction unit 211 acquires one or more frame images indicating a specific scene transmitted from the client terminal 100 via the communication device 230. The posture information extraction unit 211 uses the second analysis model 221 obtained by machine learning to extract the posture information of the target TG for each frame image from one or more frame images showing a specific scene.
 第2分析モデル221は、シーン抽出部112で特定シーンを判定する際に用いる分析モデル(第1分析モデル123)よりも姿勢推定精度が高い分析モデルである。姿勢情報抽出部211は、例えば、ニューラルネットワークの規模が大きい高精度高計算量の第2分析モデル221を用いて特定の1以上のフレーム画像からターゲットTGの姿勢情報を抽出する。姿勢情報抽出部211による姿勢推定処理の対象となるのは、動画データを構成する複数のフレーム画像から選択された特定の1以上のフレーム画像のみである。そのため、フレーム画像ごとの畳み込み演算の処理量が大きくても迅速な処理が可能である。 The second analysis model 221 is an analysis model having higher posture estimation accuracy than the analysis model (first analysis model 123) used when the scene extraction unit 112 determines a specific scene. The attitude information extraction unit 211 extracts the attitude information of the target TG from a specific one or more frame images by using, for example, the second analysis model 221 with a large scale of the neural network and a high precision and high calculation amount. The target of the posture estimation process by the posture information extraction unit 211 is only a specific one or more frame images selected from a plurality of frame images constituting the moving image data. Therefore, even if the processing amount of the convolution operation for each frame image is large, rapid processing is possible.
 記憶装置220は、例えば、処理装置210が実行するプログラム222と、第2分析モデル221と、を記憶する。プログラム222は、本実施形態に係る情報処理をコンピュータに実行させるプログラムである。処理装置210は、記憶装置220に記憶されているプログラム222にしたがって各種の処理を行う。記憶装置220は、処理装置210の処理結果を一時的に記憶する作業領域として利用されてもよい。記憶装置220は、例えば、半導体記憶媒体および磁気記憶媒体などの任意の非一過的な記憶媒体を含む。記憶装置220は、例えば、光ディスク、光磁気ディスクまたはフラッシュメモリを含んで構成される。プログラム222は、例えば、コンピュータにより読み取り可能な非一過的な記憶媒体に記憶されている。 The storage device 220 stores, for example, the program 222 executed by the processing device 210 and the second analysis model 221. The program 222 is a program that causes a computer to execute information processing according to the present embodiment. The processing device 210 performs various processes according to the program 222 stored in the storage device 220. The storage device 220 may be used as a work area for temporarily storing the processing result of the processing device 210. The storage device 220 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium. The storage device 220 includes, for example, an optical disk, a magneto-optical disk, or a flash memory. Program 222 is stored, for example, in a non-transient storage medium that can be read by a computer.
 処理装置210は、例えば、プロセッサとメモリとで構成されるコンピュータである。処理装置210のメモリには、RAMおよびROMが含まれる。処理装置210は、プログラム222を実行することにより、姿勢情報抽出部211として機能する。 The processing device 210 is, for example, a computer composed of a processor and a memory. The memory of the processing device 210 includes a RAM and a ROM. The processing device 210 functions as the posture information extraction unit 211 by executing the program 222.
 クライアント端末100の通信装置130とサーバ200の通信装置230はインターネットなどのネットワークNWと接続されている。クライアント端末100とサーバ200は、ネットワークNWを介してデータの送受信を行う。通信装置130および通信装置230の通信方法は公知の方法が採用される。 The communication device 130 of the client terminal 100 and the communication device 230 of the server 200 are connected to a network NW such as the Internet. The client terminal 100 and the server 200 transmit and receive data via the network NW. A known method is adopted as the communication method of the communication device 130 and the communication device 230.
[1-3.情報処理方法]
 図6ないし図8は、本実施形態の情報処理方法の一例を示す図である。
[1-3. Information processing method]
6 to 8 are diagrams showing an example of the information processing method of the present embodiment.
 図8のステップS1において、クライアント端末100は、ターゲットTGの動画を撮影する。図6に示すように、動画データMDは時系列的に並ぶ複数のフレーム画像によって構成される。動画には、分析の対象となる特定シーンと、特定シーンの前後のシーンが含まれる。 In step S1 of FIG. 8, the client terminal 100 shoots a moving image of the target TG. As shown in FIG. 6, the moving image data MD is composed of a plurality of frame images arranged in chronological order. The moving image includes a specific scene to be analyzed and scenes before and after the specific scene.
 図8のステップS2において、クライアント端末100は、動画データMDから特定シーンを示す1以上のフレーム画像FI(特定のフレーム画像SFI)を抽出する。特定シーンの判定は、例えば、ターゲットTGの動作に基づいて行われる。図6に示すように、ターゲットTGの動作は、例えば、低精度低計算量の第1分析モデル123を用いて動画データMDの全フレーム画像FIから抽出されたターゲットTGの姿勢情報LPI(第1分析モデル123による低精度姿勢推定結果を示す情報)に基づいて推定される。 In step S2 of FIG. 8, the client terminal 100 extracts one or more frame image FIs (specific frame image SFIs) indicating a specific scene from the moving image data MD. The determination of the specific scene is performed based on, for example, the operation of the target TG. As shown in FIG. 6, the operation of the target TG is, for example, the attitude information LPI (first) of the target TG extracted from the all-frame image FI of the moving image data MD using the first analysis model 123 with low accuracy and low calculation amount. It is estimated based on the information indicating the low-precision attitude estimation result by the analysis model 123).
 図8のステップS3において、サーバ200は、抽出された1以上のフレーム画像FI(特定のフレーム画像SFI)からフレーム画像FIごとにターゲットTGの姿勢情報HPIを抽出する。図7に示すように、ターゲットTGの姿勢情報HPIは、例えば、高精度高計算量の第2分析モデル221を用いて1以上の特定のフレーム画像SFIのみから抽出される。 In step S3 of FIG. 8, the server 200 extracts the attitude information HPI of the target TG for each frame image FI from the extracted one or more frame image FIs (specific frame image SFIs). As shown in FIG. 7, the attitude information HPI of the target TG is extracted only from one or more specific frame image SFIs using, for example, a second analysis model 221 with high accuracy and high computational complexity.
 図8のステップS4において、クライアント端末100は、抽出された姿勢情報HPI(第2分析モデル221による高精度姿勢推定結果を示す情報)に基づいてターゲットTGの動作分析を行う。 In step S4 of FIG. 8, the client terminal 100 performs motion analysis of the target TG based on the extracted posture information HPI (information indicating the high-precision posture estimation result by the second analysis model 221).
 図8のステップS5において、クライアント端末100は、ユーザUに分析結果を示す分析情報MAIを通知する。図7に示すように、分析情報MAIは、例えば、文字、図表および音声の組み合わせによって通知される。 In step S5 of FIG. 8, the client terminal 100 notifies the user U of the analysis information MAI indicating the analysis result. As shown in FIG. 7, the analytical information MAI is notified, for example, by a combination of letters, charts and sounds.
[1-4.効果]
 クライアント端末100は、動作分析部113と表示装置150とを有する。動作分析部113は、動画データMDから抽出された特定シーンにおけるターゲットTGの姿勢情報HPIに基づいて、特定シーンにおけるターゲットTGの動作を分析する。表示装置150は、特定シーンにおいてターゲットTGの動きを一時停止し、特定シーンにおけるターゲットTGの分析結果を示す分析情報MAIを特定シーンにおけるターゲットTGの静止画像IMと共に表示する。本実施形態の情報処理方法は、上述したクライアント端末100の情報処理をコンピュータに実行させる。本実施形態のプログラム124は、上述した情報処理方法をコンピュータに実現させる。
[1-4. effect]
The client terminal 100 has a motion analysis unit 113 and a display device 150. The motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information HPI of the target TG in the specific scene extracted from the moving image data MD. The display device 150 pauses the movement of the target TG in the specific scene, and displays the analysis information MAI indicating the analysis result of the target TG in the specific scene together with the still image IM of the target TG in the specific scene. In the information processing method of the present embodiment, the computer is made to execute the information processing of the client terminal 100 described above. The program 124 of the present embodiment makes the computer realize the above-mentioned information processing method.
 この構成によれば、分析結果は動画の再生シーンとリンクした態様で提供される。そのため、着目すべきターゲットTGの動作およびその分析結果が効率よく把握される。 According to this configuration, the analysis result is provided in a form linked to the playback scene of the moving image. Therefore, the operation of the target TG to be noted and the analysis result thereof can be efficiently grasped.
 分析情報MAIは、例えば、特定シーンにおけるターゲットTGの動作と、手本となる特定人物RMの動作と、の比較を示す情報を含む。 The analysis information MAI includes, for example, information indicating a comparison between the operation of the target TG in the specific scene and the operation of the specific person RM as a model.
 この構成によれば、ターゲットTGがどのような動作を行っているのかが、手本との比較に基づいて容易に把握される。 According to this configuration, what kind of operation the target TG is performing can be easily grasped based on the comparison with the model.
 分析情報MAIは、例えば、特定シーンにおけるターゲットTGの骨格情報SIと、比較の基準となる1以上の基準骨格情報RSIと、を含む。 The analysis information MAI includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison.
 この構成によれば、ターゲットTGと手本との差が把握しやすい。 According to this configuration, it is easy to grasp the difference between the target TG and the model.
 静止画像IMには、例えば、ターゲットTGの骨格情報SIおよび1以上の基準骨格情報RSIとして、特定シーンにおいて分析されるべきターゲットTGの部位に対応した骨格の情報が選択的に表示される。 In the still image IM, for example, skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI.
 この構成によれば、着目すべき骨格の情報が容易に把握される。 According to this configuration, information on the skeleton that should be noted can be easily grasped.
 1以上の基準骨格情報RSIは、例えば、特定人物RMの特定シーンにおける骨格情報を、ターゲットTGと特定人物RMとの体格差に基づいて修正した骨格情報を用いて生成される。 One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.
 この構成によれば、ターゲットTGと特定人物RMとの間に体格差があってもターゲットTGの動作と手本となる動作とを精度よく比較することができる。 According to this configuration, even if there is a physical disparity between the target TG and the specific person RM, the movement of the target TG and the movement that serves as a model can be accurately compared.
 分析情報MAIは、例えば、ターゲットTGの動作を特定人物RMの動作に近づけるための指針を示す情報を含む。 The analysis information MAI includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM.
 この構成によれば、指針に基づいてターゲットTGの動作の改善を促すことができる。 According to this configuration, it is possible to promote improvement of the operation of the target TG based on the guideline.
[2.第2実施形態]
[2-1.情報処理システムの構成]
 図9は、第2実施形態の情報処理システム2の概略図である。
[2. Second Embodiment]
[2-1. Information processing system configuration]
FIG. 9 is a schematic view of the information processing system 2 of the second embodiment.
 本実施形態において第1実施形態と異なる点は、動作分析を行う機能がサーバ300によって実現される点である。以下、第1実施形態との相違点を中心に説明を行う。 The difference between the first embodiment and the first embodiment is that the function of performing motion analysis is realized by the server 300. Hereinafter, the differences from the first embodiment will be mainly described.
 クライアント端末400は、通信装置430と、カメラ140と、表示装置150と、を有する。通信装置430はネットワークNWと接続されている。クライアント端末400とサーバ300は、ネットワークNWを介してデータの送受信を行う。 The client terminal 400 has a communication device 430, a camera 140, and a display device 150. The communication device 430 is connected to the network NW. The client terminal 400 and the server 300 transmit and receive data via the network NW.
 動画取得部111、シーン抽出部112、動作分析部113および出力部114はサーバ300の処理装置310が有する。ロールモデル情報121、シーン情報122、第1分析モデル123、プログラム124およびアプリケーションAPは、サーバ300の記憶装置320が記憶する。処理装置310は、プログラム124を実行することにより、動画取得部111、シーン抽出部112、動作分析部113および出力部114として機能する。 The processing device 310 of the server 300 has the moving image acquisition unit 111, the scene extraction unit 112, the motion analysis unit 113, and the output unit 114. The role model information 121, the scene information 122, the first analysis model 123, the program 124, and the application AP are stored in the storage device 320 of the server 300. By executing the program 124, the processing device 310 functions as a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.
 動作分析部113は、例えば、分析情報MAIを組み込んだ新たな動画データ(修正動画データ)を生成する。修正動画データの特定シーンを示す1以上のフレーム画像には、分析情報MAIが書きこまれている。修正動画データでは、特定シーンにおいてターゲットTGの動きが停止され、分析情報MAIを含むターゲットTGの停止画像が所定時間だけ表示された後、特定シーン以降のターゲットTGの動作が再開されるように表示が調整される。 The motion analysis unit 113 generates, for example, new video data (corrected video data) incorporating the analysis information MAI. The analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data. In the modified video data, the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.
 動作分析部113は、出力部114および通信装置130を介して修正動画データをクライアント端末400に送信する。クライアント端末400は、修正動画データを表示装置150に表示させる。 The motion analysis unit 113 transmits the modified moving image data to the client terminal 400 via the output unit 114 and the communication device 130. The client terminal 400 causes the display device 150 to display the modified moving image data.
[2-2.効果]
 本実施形態では、動作分析を行う主要な機能がクライアント端末400からサーバ300に移されている。そのため、クライアント端末400の演算の負荷が軽減される。
[2-2. effect]
In this embodiment, the main function of performing motion analysis is transferred from the client terminal 400 to the server 300. Therefore, the calculation load of the client terminal 400 is reduced.
 なお、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、また他の効果があってもよい。 Note that the effects described in this specification are merely examples and are not limited, and other effects may be obtained.
 なお、本技術は以下のような構成も取ることができる。 Note that this technology can also take the following configurations.
(1)
 動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析し、
 前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する
 ことを有する、コンピュータにより実行される情報処理方法。
(2)
 前記分析情報は、前記特定シーンにおける、前記ターゲットの動作と手本となる特定人物の動作との比較を示す情報を含む
 上記(1)に記載の情報処理方法。
(3)
 前記分析情報は、前記特定シーンにおける、前記ターゲットの骨格情報と前記比較の基準となる1以上の基準骨格情報とを含む
 上記(2)に記載の情報処理方法。
(4)
 前記静止画像には、前記ターゲットの骨格情報および前記1以上の基準骨格情報として、前記特定シーンにおいて分析されるべき前記ターゲットの部位に対応した骨格の情報が選択的に表示される
 上記(3)に記載の情報処理方法。
(5)
 前記1以上の基準骨格情報は、前記特定人物の前記特定シーンにおける骨格情報を、前記ターゲットと前記特定人物との体格差に基づいて修正した骨格情報を用いて生成される
 上記(3)または(4)に記載の情報処理方法。
(6)
 前記分析情報は、前記ターゲットの動作を前記特定人物の動作に近づけるための指針を示す情報を含む
 上記(5)に記載の情報処理方法。
(7)
 動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析する動作分析部と、
 前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する表示装置と、
 を有する情報処理装置。
(8)
 動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析し、
 前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する
 ことをコンピュータに実現させるプログラム。
(1)
Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
Information processing executed by a computer, which comprises suspending the movement of the target in the specific scene and displaying analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene. Method.
(2)
The information processing method according to (1) above, wherein the analysis information includes information indicating a comparison between the movement of the target and the movement of a specific person as a model in the specific scene.
(3)
The information processing method according to (2) above, wherein the analysis information includes the skeleton information of the target in the specific scene and one or more reference skeleton information as a reference for the comparison.
(4)
In the still image, the skeleton information corresponding to the target site to be analyzed in the specific scene is selectively displayed as the target skeleton information and the one or more reference skeleton information (3). Information processing method described in.
(5)
The one or more reference skeleton information is generated by using the skeleton information obtained by modifying the skeleton information of the specific person in the specific scene based on the physical disparity between the target and the specific person. The information processing method according to 4).
(6)
The information processing method according to (5) above, wherein the analysis information includes information indicating a guideline for bringing the movement of the target closer to the movement of the specific person.
(7)
A motion analysis unit that analyzes the motion of the target in the specific scene based on the posture information of the target in the specific scene extracted from the video data.
A display device that pauses the movement of the target in the specific scene and displays analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
Information processing device with.
(8)
Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
A program that suspends the movement of the target in the specific scene and causes a computer to display analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
100 クライアント端末(情報処理装置)
113 動作分析部
150 表示装置
FI フレーム画像
HPI 姿勢情報
IM 静止画像
MAI 分析情報
MD 動画データ
RM 特定人物
RSI 基準骨格情報
SI ターゲットの骨格情報
TG ターゲット
100 Client terminal (information processing device)
113 Motion analysis unit 150 Display device FI Frame image HPI Posture information IM Still image MAI Analysis information MD Video data RM Specific person RSI Reference skeleton information SI Target skeleton information TG target

Claims (8)

  1.  動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析し、
     前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する
     ことを有する、コンピュータにより実行される情報処理方法。
    Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
    Information processing executed by a computer, which comprises suspending the movement of the target in the specific scene and displaying analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene. Method.
  2.  前記分析情報は、前記特定シーンにおける、前記ターゲットの動作と手本となる特定人物の動作との比較を示す情報を含む
     請求項1に記載の情報処理方法。
    The information processing method according to claim 1, wherein the analysis information includes information indicating a comparison between the movement of the target and the movement of a specific person as a model in the specific scene.
  3.  前記分析情報は、前記特定シーンにおける、前記ターゲットの骨格情報と前記比較の基準となる1以上の基準骨格情報とを含む
     請求項2に記載の情報処理方法。
    The information processing method according to claim 2, wherein the analysis information includes the skeleton information of the target and one or more reference skeleton information as a reference for the comparison in the specific scene.
  4.  前記静止画像には、前記ターゲットの骨格情報および前記1以上の基準骨格情報として、前記特定シーンにおいて分析されるべき前記ターゲットの部位に対応した骨格の情報が選択的に表示される
     請求項3に記載の情報処理方法。
    In claim 3, the still image selectively displays skeleton information corresponding to the target site to be analyzed in the specific scene as skeleton information of the target and one or more reference skeleton information. The information processing method described.
  5.  前記1以上の基準骨格情報は、前記特定人物の前記特定シーンにおける骨格情報を、前記ターゲットと前記特定人物との体格差に基づいて修正した骨格情報を用いて生成される
     請求項3に記載の情報処理方法。
    The third or more reference skeleton information is generated by using the skeleton information obtained by modifying the skeleton information of the specific person in the specific scene based on the physical disparity between the target and the specific person. Information processing method.
  6.  前記分析情報は、前記ターゲットの動作を前記特定人物の動作に近づけるための指針を示す情報を含む
     請求項5に記載の情報処理方法。
    The information processing method according to claim 5, wherein the analysis information includes information indicating a guideline for bringing the movement of the target closer to the movement of the specific person.
  7.  動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析する動作分析部と、
     前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する表示装置と、
     を有する情報処理装置。
    A motion analysis unit that analyzes the motion of the target in the specific scene based on the posture information of the target in the specific scene extracted from the video data.
    A display device that pauses the movement of the target in the specific scene and displays analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
    Information processing device with.
  8.  動画データから抽出された特定シーンにおけるターゲットの姿勢情報に基づいて、前記特定シーンにおける前記ターゲットの動作を分析し、
     前記特定シーンにおいて前記ターゲットの動きを一時停止し、前記特定シーンにおける前記ターゲットの分析結果を示す分析情報を前記特定シーンにおける前記ターゲットの静止画像と共に表示する、
     ことをコンピュータに実現させるプログラム。
    Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
    The movement of the target in the specific scene is paused, and analysis information indicating the analysis result of the target in the specific scene is displayed together with the still image of the target in the specific scene.
    A program that makes a computer realize that.
PCT/JP2020/013705 2020-03-26 2020-03-26 Information processing method, information processing device, and program WO2021192149A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/013705 WO2021192149A1 (en) 2020-03-26 2020-03-26 Information processing method, information processing device, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/013705 WO2021192149A1 (en) 2020-03-26 2020-03-26 Information processing method, information processing device, and program

Publications (1)

Publication Number Publication Date
WO2021192149A1 true WO2021192149A1 (en) 2021-09-30

Family

ID=77891008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/013705 WO2021192149A1 (en) 2020-03-26 2020-03-26 Information processing method, information processing device, and program

Country Status (1)

Country Link
WO (1) WO2021192149A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015119833A (en) * 2013-12-24 2015-07-02 カシオ計算機株式会社 Exercise support system, exercise support method, and exercise support program
JP2020005192A (en) * 2018-06-29 2020-01-09 キヤノン株式会社 Information processing unit, information processing method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015119833A (en) * 2013-12-24 2015-07-02 カシオ計算機株式会社 Exercise support system, exercise support method, and exercise support program
JP2020005192A (en) * 2018-06-29 2020-01-09 キヤノン株式会社 Information processing unit, information processing method, and program

Similar Documents

Publication Publication Date Title
CN110705390A (en) Body posture recognition method and device based on LSTM and storage medium
Bloom et al. G3D: A gaming action dataset and real time action recognition evaluation framework
CN103336576B (en) A kind of moving based on eye follows the trail of the method and device carrying out browser operation
KR102594938B1 (en) Apparatus and method for comparing and correcting sports posture using neural network
KR102241414B1 (en) Electronic device for providing a feedback for a specivic motion using a machine learning model a and machine learning model and method for operating thereof
WO2021098616A1 (en) Motion posture recognition method, motion posture recognition apparatus, terminal device and medium
US20220362630A1 (en) Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing
US11568617B2 (en) Full body virtual reality utilizing computer vision from a single camera and associated systems and methods
KR102412553B1 (en) Method and apparatus for comparing dance motion based on ai
JP2019136493A (en) Exercise scoring method, system and program
WO2023108842A1 (en) Motion evaluation method and system based on fitness teaching training
CN114926762A (en) Motion scoring method, system, terminal and storage medium
WO2021192149A1 (en) Information processing method, information processing device, and program
CN109407826A (en) Ball game analogy method, device, storage medium and electronic equipment
CN110148072A (en) Sport course methods of marking and system
US20230285802A1 (en) Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing
Fung et al. Hybrid markerless tracking of complex articulated motion in golf swings
Destelle et al. A multi-modal 3D capturing platform for learning and preservation of traditional sports and games
WO2021192143A1 (en) Information processing method, information processing device, and information processing system
KR20220052450A (en) Method and apparatus for assisting in golf swing practice
CN110996149A (en) Information processing method, device and system
CN113842622A (en) Motion teaching method, device, system, electronic equipment and storage medium
WO2020153031A1 (en) User attribute estimation device and user attribute estimation method
JP2021191356A (en) Correction content learning device, operation correction device and program
US20230398408A1 (en) Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20926426

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP