WO2021192149A1

WO2021192149A1 - Information processing method, information processing device, and program

Info

Publication number: WO2021192149A1
Application number: PCT/JP2020/013705
Authority: WO
Inventors: 悠二石村
Original assignee: ソニーグループ株式会社
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2021-09-30

Abstract

This information processing method executes, on a computer: a process in which the movement of a target in a specific scene extracted from video data is analyzed on the basis of posture information about the target in the specific scene; and a process in which the movement of the target in the specific scene is paused, and analysis information indicating an analysis result for the target in the specific scene id displayed together with a still image of the target in the specific scene.

Description

Information processing methods, information processing devices and programs

The present invention relates to an information processing method, an information processing device and a program.

In recent years, a method of motion analysis using a posture estimation technique has been proposed (see, for example, Patent Documents 1 to 3). Posture estimation technology extracts multiple key points (if the target is a human, multiple feature points indicating shoulders, elbows, wrists, hips, knees, ankles, etc.) from the image of the target person or object, and the keys This is a technique for estimating the posture of the target based on the relative positions of the points. Posture estimation technology is expected to be applied in a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction.

Japanese Unexamined Patent Publication No. 2013-138742 Japanese Unexamined Patent Publication No. 2012-066026 Japanese Unexamined Patent Publication No. 2019-0963328

When performing motion analysis using the target video, it is often the case that the motion before and after the specific scene is shot so that the specific scene that needs to be analyzed is surely included in the video data. In the above prior art, the analysis result is not provided in a manner linked to the playback scene of the moving image. Therefore, it is difficult to efficiently grasp the movement of the target to be focused on and the analysis result thereof.

Therefore, this disclosure proposes an information processing method, an information processing device, and a program capable of efficiently grasping the operation of a target to be focused on and the analysis result thereof.

According to the present disclosure, based on the attitude information of the target in the specific scene extracted from the moving image data, the movement of the target in the specific scene is analyzed, the movement of the target is paused in the specific scene, and the specific scene is described. Provided is an information processing method executed by a computer, which comprises displaying analysis information indicating an analysis result of the target in a scene together with a still image of the target in the specific scene. Further, according to the present disclosure, an information processing device that implements this information processing method and a program that realizes this information processing method on a computer are provided.

It is a figure which shows an example of the motion analysis service using cloud computing. It is the schematic of the information processing system of 1st Embodiment. It is a figure which shows an example of the notification mode of analysis information. It is a figure which shows an example of the notification mode of analysis information. It is a figure which shows an example of the notification mode of analysis information. It is a figure which shows an example of an information processing method. It is a figure which shows an example of an information processing method. It is a figure which shows an example of an information processing method. It is the schematic of the information processing system of 2nd Embodiment.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.

The explanation will be given in the following order.
[1. First Embodiment]
[1-1. Overview of motion analysis service]
[1-2. Information processing system configuration]
[1-3. Information processing method]
[1-4. effect]
[2. Second Embodiment]
[2-1. Information processing system configuration]
[2-2. effect]

[1. First Embodiment]
[1-1. Overview of motion analysis service]
FIG. 1 is a diagram showing an example of a motion analysis service CS using cloud computing.

In the motion analysis service CS, the motion of the target is analyzed based on the video data. The application AP for analysis is created using the software development kit SDK. The user U downloads the application AP uploaded by the developer DV to the store STR and installs it on the client terminal 100. A program that supplies posture information to the application AP is installed in the client terminal 100.

The client terminal 100 is an information processing device that analyzes the operation of the target using moving image data of the target. The client terminal 100 extracts one or more frame images indicating a specific scene to be analyzed from the moving image data. The client terminal 100 transmits one or more extracted frame images to the server 200. The server 200 extracts the posture information of the target for each frame image from the extracted one or more frame images. The client terminal 100 acquires the posture information of the target extracted for each frame image from one or more frame images by the server 200. The application AP analyzes the movement of the target in a specific scene by using the posture information of the target acquired from the server 200.

In the motion analysis service CS, the image processed by the server 200 is only the frame image of the specific scene. Therefore, the cost incurred when using the server 200 is reduced. When shooting a target moving image, the movements before and after the specific scene are often shot so that the specific scene is surely included in the moving image data. The moving image data before and after the specific scene does not contribute to the motion analysis. By omitting the image processing of the data area that does not contribute to the motion analysis, the time and cost required for the motion analysis can be reduced.

The motion analysis service CS can be applied to a wide range of fields such as learning support in sports, health care, autonomous driving and danger prediction. The scene to be analyzed is appropriately defined according to the field to which the motion analysis service CS is applied, the purpose of analysis, and the like.

For example, in the field of learning support in sports, a specific motion scene according to a coaching target (soccer shoot, tennis serve, golf swing, etc.) is defined as a specific scene. In the field of health care, the scene of functional recovery training is defined as a specific scene. In the field of autonomous driving, a scene in which a pedestrian is detected is defined as a specific scene. In the field of danger prediction, a scene for detecting an abnormal posture state (lying, crouching for a long time, movement of a drunk person, suspicious behavior, falling, etc.) is defined as a specific scene.

The following describes an example in which the motion analysis service CS is applied to the field of learning support in sports.

[1-2. Information processing system configuration]
FIG. 2 is a schematic view of the information processing system 1 of the first embodiment.

The information processing system 1 has, for example, a client terminal 100 and a server 200. The client terminal 100 is, for example, an information terminal such as a smartphone, a tablet terminal, a notebook personal computer, and a desktop personal computer. The client terminal 100 and the server 200 are connected via a network NW.

The client terminal 100 has, for example, a processing device 110, a storage device 120, a communication device 130, a camera 140, and a display device 150.

The processing device 110 includes, for example, a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.

The video acquisition unit 111 acquires, for example, the video data of the target TG shot by the camera 140. The moving image includes, for example, a specific scene to be analyzed and a scene before and after the specific scene. The camera 140 includes, for example, an image sensor such as a CCD (Charge Coupled Device Image Sensor) or a CMOS (Complementary Metal Oxide Sensor).

The scene extraction unit 112 acquires the moving image data output from the moving image acquisition unit 111. The scene extraction unit 112 extracts one or more frame images indicating a specific scene from the moving image data. The number of frame images to be extracted is, for example, 1 or more and 10 or less. The scene extraction unit 112 determines a specific scene based on, for example, the operation of the target TG. For example, the scene extraction unit 112 determines a specific scene by collating the operation characteristics of the target TG with the scene information stored in the storage device 120.

Information about a specific scene is stored in the storage device 120 as, for example, scene information 122. In the scene information 122, for example, for each coaching target, one or more specific scenes to be analyzed and a determination condition for determining each specific scene are defined in association with each other.

In the example of soccer learning support, for example, dribbling, shooting and heading are defined as coaching targets. When the coaching target is a soccer shot, for example, (i) the timing of stepping on the axial foot, (ii) the timing of the thigh of the foot toward the ball, (iii) the timing of impact, and (iv) impact. The timing after the specified second later is defined as a specific scene. The determination conditions for a specific scene are defined based on, for example, the angle of a specific joint, the relative position of the ball and a specific key point, and the like.

The scene extraction unit 112 extracts the attitude information of the target TG using, for example, the first analysis model 123 obtained by machine learning. The first analysis model 123 is, for example, an analysis model having a lower attitude estimation accuracy than the analysis model (second analysis model 221) used when extracting posture information by the server 200. The scene extraction unit 112 determines the operation of the target TG based on, for example, a change in the posture of the target TG.

The video data includes information on a series of operations including a plurality of specific scenes that occur in time series. The scene extraction unit 112 determines which specific scene is occurring in the flow of the operation from individual viewpoints while considering the context before and after. For example, in the shooting operation, the specific scene of (i) above is first determined, and then each specific scene is determined in the order of (ii), (iii), and (iv) from the moving image data after (i). NS. Each specific scene is determined based on the body movement assumed for each specific scene.

In order to facilitate the determination, the scene extraction unit 112 may, for example, operate the target TG when the target TG and a specific object (such as a ball in the case of soccer) have a predetermined positional relationship, or the target TG and the target TG. A specific scene is determined based on a change in the positional relationship with a specific object. In this configuration, the specific scene is determined more accurately than when the specific scene is determined based only on the relative positional relationship between the skeletons.

For example, the determination of the specific scene in (i) above is performed as follows. First, a singular region where it is assumed that the foot does not move much when the foot is stepped on is defined based on the relative positional relationship with the ball. The singular region is defined as, for example, an image region having a radius A × r (r is the radius of the ball; A is a number larger than 1) from the center of the ball.

For example, the scene extraction unit 112 extracts a frame image in which the distance between the foot and the ball is within the threshold value as a reference frame image. The scene extraction unit 112 extracts N frame images up to the reference frame image from the frame image traced back by (N-1) frames from the reference frame image (N is an integer of 1 or more). The scene extraction unit 112 extracts a skeleton region in which the skeleton of the ankle of the target TG fits in each of the N frame images. The scene extraction unit 112 extracts a skeletal motion region in which all N skeletal regions are contained. The scene extraction unit 112 determines that the axial foot has been stepped on when the size of the skeletal movement region is within the threshold value and the skeletal movement region is included in the singular region. The scene extraction unit 112 extracts one or more frame images indicating the timing at which the shaft foot is stepped on from the moving image data.

After the frame image of the specific scene of the above (i) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (ii). The scene extraction unit 112 determines, for example, the timing at which the extension line of the foot detected as the axial foot passes through the ball as the specific scene of the above (ii). The determination of the specific scene of the above (ii) is performed on the moving image data after the specific scene of the above (i). Considering the context before and after in the flow of the operation, it is considered that the specific scene of (ii) above occurs immediately after the specific scene of (i) above. Therefore, if there is a scene in which the extension line of the foot detected as the axial foot passes through the ball within a predetermined time immediately after the specific scene of the above (i), that scene is the specific scene of the above (ii). There is a high possibility. Therefore, the scene extraction unit 112 determines that the scene is the specific scene of the above (ii), and extracts one or more frame images indicating the specific scene from the moving image data.

After the frame image of the specific scene of the above (iii) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iii). For example, the scene extraction unit 112 determines that the timing at which the distance between the center of the waist and the center of the ball is reduced and then the distance is expanded at a speed greater than the speed at which the distance is reduced is defined as the specific scene of the above (iii). When the ball is impacted, the distance between the center of the waist and the center of the ball tends to shrink until just before that, but when the ball is impacted, the distance begins to widen at a speed much higher than the speed at which it shrinks. Utilizing this, the scene extraction unit 112 calculates the distance between the center of the hipbone and the center of the ball in each frame image, and the value obtained by dividing the difference in distance between the frames by the diameter of the ball exceeds the threshold value. Occasionally, it is determined that the mode of change in distance is reversed. The scene extraction unit 112 determines that the scene immediately before the aspect of the change in distance is reversed is the specific scene of the above (iii).

The determination of the specific scene of the above (iii) is performed on the moving image data after the specific scene of the above (iii). Considering the context before and after in the flow of the operation, it is considered that the specific scene of the above (iii) occurs immediately after the specific scene of the above (iii). Therefore, if the above-mentioned change in distance occurs within a predetermined time immediately after the specific scene of (iii), it is highly possible that the scene is the specific scene of (iii). Therefore, the scene extraction unit 112 determines that the scene is the specific scene of the above (iii), and extracts one or more frame images indicating the specific scene from the moving image data.

After the frame image of the specific scene of the above (iii) is extracted, the scene extraction unit 112 moves on to the work of extracting the frame image of the specific scene of the above (iv). The frame image of the specific scene in (iv) above is used to analyze the posture after shooting. The specific scene of (iv) is defined as a scene after a predetermined time has elapsed after the specific scene of (iii). The time it takes for a posture suitable for analysis to be detected depends on the rush into the ball and the speed of motion. Therefore, how much time has passed since the shooting is determined as the specific scene of the above (iv) differs for each target TG. Therefore, in consideration of individual differences, the scene extraction unit 112 sets the timing at which a predetermined number of times the frame time from the specific scene of the above (iii) to the specific scene of the above (iii) elapses. It is determined that it is a specific scene of iv).

Posture estimation accuracy varies depending on the scale of the neural network used in the analysis model. When a large-scale neural network is used, many key points are extracted from the image data, and various operations of the target TG are estimated with high accuracy. Even if there is a lack of information due to occlusion or the like, the key points of the target TG are accurately extracted. As a method of increasing the scale of the neural network, there are a method of increasing the feature map (channel) and a method of deepening the layer (layer). In either method, the processing amount of the convolution operation increases and the calculation speed decreases. There is a trade-off between posture estimation accuracy and calculation speed.

The scene extraction unit 112 extracts the attitude information of the target TG from all the frame images constituting the moving image data by using, for example, the first analysis model 123 with a small scale of the neural network and a low precision and low calculation amount. If only the operation scene of the target TG is to be determined, it suffices if the rough operation of the target TG can be grasped. Even if there is a lack of information due to occlusion, etc., the characteristics of the movement can be grasped by a rough change in posture. Therefore, the operation scene of the target TG can be determined even by using the first analysis model 123 with low accuracy and low calculation amount. When the first analysis model 123 is used, since the processing amount of the convolution calculation for each frame image is small, rapid processing is possible even if the moving image data is large.

Data of one or more frame images indicating a specific scene is transmitted to the server 200 via the communication device 130. The motion analysis unit 113 acquires the attitude information of the target TG extracted for each frame image from the one or more frame images by the server 200. The motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information of the target TG acquired from the server 200 (posture information of the target TG in the specific scene extracted from the moving image data). The motion analysis unit 113 outputs the analysis result to the output unit 114.

The output unit 114 notifies the user U of one or more analysis information MAI indicating the analysis result of the motion analysis unit 113, for example. The analysis information MAI includes, for example, information on the evaluation result based on the comparison with the movement of the specific person RM as a model of the movement. Information on the operation of the specific person RM is stored in the storage device 120 as role model information 121.

Notification is made by, for example, a combination of letters, charts and voice. In the example of FIG. 2, the analytical information MAI is presented as textual information and skeleton information indicating the skeleton. The output unit 114 displays, for example, a still image IM including the analysis information MAI on the display device 150. The display device 150 is a display unit of the client terminal 100 that displays various information. The display device 150 is, for example, an LCD (Liquid Crystal Display) or an OLED (Organic Light Emitting Diode).

The method of motion analysis by the motion analysis unit 113 and the notification mode of the analysis information MAI notified by the output unit 114 follow the algorithm of the application AP (see FIG. 1).

FIG. 3 is a diagram showing an example of analysis information MAI.

The output unit 114 notifies, for example, the first analysis information MAI1 and the second analysis information MAI2 as one or more analysis information MAIs. The first analysis information MAI1 includes, for example, information indicating a comparison between the operation of the target TG and the operation of the specific person RM as a model in a specific scene. The second analysis information MAI2 includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM. The skeleton information of the specific person RM in the specific scene is included in, for example, the role model information 121. In the example of FIG. 3, a comment that "the kick foot is swung up high" and an evaluation point of "86 points" are shown. The evaluation points indicate the degree of achievement of the evaluation items set in the specific scene.

The first analysis information MAI1 includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison. In the example of FIG. 3, the specific scene is the timing of stepping on the shaft foot. As one or more reference skeleton information RSIs, for example, the first reference skeleton information RSI1, the second reference skeleton information RSI2, and the third reference skeleton information RSI3 are displayed.

The first reference skeleton information RSI1 is, for example, skeleton information of an operation that serves as a model. The second reference skeleton information RSI2 is, for example, skeleton information of a specific level (for example, a level of 80 points when the model is 100 points) that is less than the model. The first reference skeleton information RSI1 and the second reference skeleton information RSI3 are model skeleton information at the timing when the position of the waist coincides with the target TG. The third reference skeleton information RSI3 is, for example, model skeleton information at the timing when the position of the axial foot coincides with the target TG.

The third reference skeleton information RSI3 is always displayed in conjunction with the movement of the target TG during a series of movements from the stepping on the shaft foot to immediately after the impact. The third reference skeletal information RSI3 is used to compare a series of movements from the stepping on the shaft foot to immediately after the impact with the target TG. Therefore, the third reference skeleton information RSI3 is different from the first reference skeleton information RSI1 and the second reference skeleton information RSI2, and shows the skeleton information of the whole body.

The time required for a series of operations differs between the specific person RM and the target TG. Therefore, effective timings for making comparisons (for example, impact timings or stepping timings) are defined, and the third reference skeleton information RSI3 is superimposed on the target TG so that the defined timings match. In the example of FIG. 3, the timing of stepping is matched, but which timing should be aligned is appropriately set according to the purpose of the lesson and the like.

For example, the output unit 114 offsets and displays the position of the third reference skeleton information RSI3 so that the position of the ankle of the target TG and the position of the ankle of the specific person RM match at a defined timing. .. This makes it easier to understand how different the stepping position of the target TG and the specific person RM is.

In the still image IM, skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI. In the example of FIG. 3, the waist and leg skeleton information is selectively displayed as the skeleton information SI and one or more reference skeleton information RSI. One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.

The scale of the reference skeleton information RSI is set as follows, for example. First, one or more bones suitable for comparing the physiques of a specific person RM and a target TG are defined. For example, in the example of FIG. 3, the spine and the bones of the foot are defined as criteria for comparison. The motion analysis unit 113 detects, for example, the lengths of the spine and the leg bones at the timing when the postures of the specific person RM and the target TG are aligned. The motion analysis unit 113 calculates the ratio of the sum of the lengths of the spine and the bones of the foot as the ratio of the body size of the specific person RM and the target TG, and changes the scale of the skeleton of the specific person RM based on this ratio. .. This facilitates comparison with the specific person RM and makes it easier to understand how the target TG should behave.

FIG. 4 is a diagram showing an example of the notification mode of the analysis information MAI.

The analysis information MAI is displayed superimposed on the frame image indicating the specific scene, for example, when the specific scene is reproduced. The display device 150 pauses the reproduction of the moving image data in a specific scene. Then, the display device 150 displays a still image IM in which the analysis information MAI indicating the analysis result of the target TG in the specific scene is superimposed on the frame image of the specific scene. When a plurality of specific scenes are set, the reproduction of the moving image data is paused for each specific scene, and the analysis information MAI in the specific scene is notified. For the reproduction of the moving image data, slow motion reproduction may be used so that the posture of the target TG can be easily confirmed. Further, slow motion playback may be applied only to playback of video data in a section from the first specific scene to the last specific scene, and the video data before and after that section may be played back at a normal playback speed. ..

FIG. 4 shows an example in which three specific scenes A1 to A3 are set. The specific scene A1 is, for example, the timing of stepping on the shaft foot. The specific scene A2 is, for example, the timing of impact. The specific scene A3 is, for example, a timing immediately after the impact (a few seconds after the impact).

First, the display device 150 reproduces the moving image of the target TG based on the reproduction operation for the client terminal 100. The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A1 is reproduced. Then, the display device 150 displays the still image IM (first still image IM1) in which the analysis information MAI of the operation of the target TG in the specific scene A1 is superimposed on the frame image of the specific scene A1. After that, the display device 150 starts playing the moving image after the specific scene A1 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the first still image IM1.

The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A2 is reproduced. Then, the display device 150 displays the still image IM (second still image IM2) in which the analysis information MAI of the operation of the target TG in the specific scene A2 is superimposed on the frame image of the specific scene A2. After that, the display device 150 starts playing the moving image after the specific scene A2 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the second still image IM2.

The display device 150 pauses the reproduction of the moving image data at the timing when the specific scene A3 is reproduced. Then, the display device 150 displays the still image IM (third still image IM3) in which the analysis information MAI of the operation of the target TG in the specific scene A3 is superimposed on the frame image of the specific scene A3. After that, the display device 150 starts playing the moving image after the specific scene A3 when a preset time has elapsed from the playback operation on the client terminal 100 or the start of displaying the third still image IM3. As a result, the display device 150 can pause the movement of the target TG in the specific scene and display the analysis information indicating the analysis result of the target TG in the specific scene together with the still image of the target TG in the specific scene.

When the analysis information MAI of all specific scenes is notified, the display device 150 plays the remaining moving images to the end.

Here, an example is shown in which the playback of the video data is paused in a specific scene and the analysis information MAI is superimposed and displayed on the screen of the video. However, the notification method of the analysis information MAI is not limited to this. For example, the client terminal 100 may generate new moving image data (corrected moving image data) incorporating the analysis information MAI, and the generated modified moving image data may be reproduced on the display device 150. For example, analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data. In the modified video data, the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.

FIG. 5 is a diagram showing another example of the notification mode of the analysis information MAI. FIG. 5 shows an example in which the motion analysis service CS is applied to golf learning support.

FIG. 5 shows an example in which six specific scenes are set. For example, the backswing timing, the downswing timing, the timing immediately before the impact, the impact timing, the timing immediately after the impact, and the follow-through timing are set as specific scenes. Similar to the above-described example, the moving image data is paused at the timing when each specific scene is reproduced, and the analysis information MAI is superimposed and displayed. In the example of FIG. 5, the analysis information MAI of the past specific scene is not erased and continues to be displayed on the display device 150 as it is. In addition, model skeletal information is not displayed. When the last specific scene is played back, or when the analysis information MAI of all the specific scenes is notified and the remaining moving images are played back to the end, the evaluation points (scores) for each specific scene are collectively displayed.

In the example of FIG. 5, even after the notification of the analysis information MAI of the specific scene was completed, the analysis information MAI was not erased and continued to be displayed on the screen. However, the display mode of the analysis information MAI is not limited to this. After the notification of the analysis information MAI of the specific scene is finished, the analysis information MAI is deleted once until the next specific scene is displayed, and when the last specific scene is played back, or the analysis information MAI of all the specific scenes is displayed. When notified and the remaining video is played to the end, the analysis information MAI of all specific scenes may be redisplayed together.

The communication device 130 is a communication unit of the client terminal 100 that transmits and receives various data to and from an external device. For example, the communication device 130 transmits one or more frame images extracted by the scene extraction unit 112 to the server 200. The communication device 130 acquires the attitude information of the target TG extracted by the server 200 for each frame image from the one or more frame images from the server 200.

Returning to FIG. 2, the storage device 120 stores, for example, the program 124 executed by the processing device 110, the application AP, the first analysis model 123, the role model information 121, and the scene information 122. The program 124 and the application AP are programs that cause a computer to execute information processing according to the present embodiment. The processing device 110 performs various processes according to the program 124 and the application AP stored in the storage device 120. The storage device 120 may be used as a work area for temporarily storing the processing result of the processing device 110. The storage device 120 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium. The storage device 120 includes, for example, an optical disk, a magneto-optical disk, or a flash memory. Program 124 is stored, for example, in a non-transient storage medium that can be read by a computer.

The processing device 110 is, for example, a computer composed of a processor and a memory. The memory of the processing device 110 includes a RAM (Random Access Memory) and a ROM (Read Only Memory). The processing device 110 functions as the moving image acquisition unit 111 and the scene extraction unit 112 by executing the program 124. The processing device 110 functions as the motion analysis unit 113 and the output unit 114 by executing the application AP.

The server 200 has, for example, a processing device 210, a storage device 220, and a communication device 230.

The processing device 210 has a posture information extraction unit 211. The posture information extraction unit 211 acquires one or more frame images indicating a specific scene transmitted from the client terminal 100 via the communication device 230. The posture information extraction unit 211 uses the second analysis model 221 obtained by machine learning to extract the posture information of the target TG for each frame image from one or more frame images showing a specific scene.

The second analysis model 221 is an analysis model having higher posture estimation accuracy than the analysis model (first analysis model 123) used when the scene extraction unit 112 determines a specific scene. The attitude information extraction unit 211 extracts the attitude information of the target TG from a specific one or more frame images by using, for example, the second analysis model 221 with a large scale of the neural network and a high precision and high calculation amount. The target of the posture estimation process by the posture information extraction unit 211 is only a specific one or more frame images selected from a plurality of frame images constituting the moving image data. Therefore, even if the processing amount of the convolution operation for each frame image is large, rapid processing is possible.

The storage device 220 stores, for example, the program 222 executed by the processing device 210 and the second analysis model 221. The program 222 is a program that causes a computer to execute information processing according to the present embodiment. The processing device 210 performs various processes according to the program 222 stored in the storage device 220. The storage device 220 may be used as a work area for temporarily storing the processing result of the processing device 210. The storage device 220 includes any non-transient storage medium such as, for example, a semiconductor storage medium and a magnetic storage medium. The storage device 220 includes, for example, an optical disk, a magneto-optical disk, or a flash memory. Program 222 is stored, for example, in a non-transient storage medium that can be read by a computer.

The processing device 210 is, for example, a computer composed of a processor and a memory. The memory of the processing device 210 includes a RAM and a ROM. The processing device 210 functions as the posture information extraction unit 211 by executing the program 222.

The communication device 130 of the client terminal 100 and the communication device 230 of the server 200 are connected to a network NW such as the Internet. The client terminal 100 and the server 200 transmit and receive data via the network NW. A known method is adopted as the communication method of the communication device 130 and the communication device 230.

[1-3. Information processing method]
6 to 8 are diagrams showing an example of the information processing method of the present embodiment.

In step S1 of FIG. 8, the client terminal 100 shoots a moving image of the target TG. As shown in FIG. 6, the moving image data MD is composed of a plurality of frame images arranged in chronological order. The moving image includes a specific scene to be analyzed and scenes before and after the specific scene.

In step S2 of FIG. 8, the client terminal 100 extracts one or more frame image FIs (specific frame image SFIs) indicating a specific scene from the moving image data MD. The determination of the specific scene is performed based on, for example, the operation of the target TG. As shown in FIG. 6, the operation of the target TG is, for example, the attitude information LPI (first) of the target TG extracted from the all-frame image FI of the moving image data MD using the first analysis model 123 with low accuracy and low calculation amount. It is estimated based on the information indicating the low-precision attitude estimation result by the analysis model 123).

In step S3 of FIG. 8, the server 200 extracts the attitude information HPI of the target TG for each frame image FI from the extracted one or more frame image FIs (specific frame image SFIs). As shown in FIG. 7, the attitude information HPI of the target TG is extracted only from one or more specific frame image SFIs using, for example, a second analysis model 221 with high accuracy and high computational complexity.

In step S4 of FIG. 8, the client terminal 100 performs motion analysis of the target TG based on the extracted posture information HPI (information indicating the high-precision posture estimation result by the second analysis model 221).

In step S5 of FIG. 8, the client terminal 100 notifies the user U of the analysis information MAI indicating the analysis result. As shown in FIG. 7, the analytical information MAI is notified, for example, by a combination of letters, charts and sounds.

[1-4. effect]
The client terminal 100 has a motion analysis unit 113 and a display device 150. The motion analysis unit 113 analyzes the motion of the target TG in the specific scene based on the posture information HPI of the target TG in the specific scene extracted from the moving image data MD. The display device 150 pauses the movement of the target TG in the specific scene, and displays the analysis information MAI indicating the analysis result of the target TG in the specific scene together with the still image IM of the target TG in the specific scene. In the information processing method of the present embodiment, the computer is made to execute the information processing of the client terminal 100 described above. The program 124 of the present embodiment makes the computer realize the above-mentioned information processing method.

According to this configuration, the analysis result is provided in a form linked to the playback scene of the moving image. Therefore, the operation of the target TG to be noted and the analysis result thereof can be efficiently grasped.

The analysis information MAI includes, for example, information indicating a comparison between the operation of the target TG in the specific scene and the operation of the specific person RM as a model.

According to this configuration, what kind of operation the target TG is performing can be easily grasped based on the comparison with the model.

The analysis information MAI includes, for example, the skeleton information SI of the target TG in a specific scene and one or more reference skeleton information RSI as a reference for comparison.

According to this configuration, it is easy to grasp the difference between the target TG and the model.

In the still image IM, for example, skeleton information corresponding to the site of the target TG to be analyzed in a specific scene is selectively displayed as the skeleton information SI of the target TG and one or more reference skeleton information RSI.

According to this configuration, information on the skeleton that should be noted can be easily grasped.

One or more reference skeleton information RSIs are generated, for example, by using skeleton information obtained by modifying skeleton information in a specific scene of a specific person RM based on the physical disparity between the target TG and the specific person RM.

According to this configuration, even if there is a physical disparity between the target TG and the specific person RM, the movement of the target TG and the movement that serves as a model can be accurately compared.

The analysis information MAI includes, for example, information indicating a guideline for bringing the movement of the target TG closer to the movement of the specific person RM.

According to this configuration, it is possible to promote improvement of the operation of the target TG based on the guideline.

[2. Second Embodiment]
[2-1. Information processing system configuration]
FIG. 9 is a schematic view of the information processing system 2 of the second embodiment.

The difference between the first embodiment and the first embodiment is that the function of performing motion analysis is realized by the server 300. Hereinafter, the differences from the first embodiment will be mainly described.

The client terminal 400 has a communication device 430, a camera 140, and a display device 150. The communication device 430 is connected to the network NW. The client terminal 400 and the server 300 transmit and receive data via the network NW.

The processing device 310 of the server 300 has the moving image acquisition unit 111, the scene extraction unit 112, the motion analysis unit 113, and the output unit 114. The role model information 121, the scene information 122, the first analysis model 123, the program 124, and the application AP are stored in the storage device 320 of the server 300. By executing the program 124, the processing device 310 functions as a moving image acquisition unit 111, a scene extraction unit 112, a motion analysis unit 113, and an output unit 114.

The motion analysis unit 113 generates, for example, new video data (corrected video data) incorporating the analysis information MAI. The analysis information MAI is written in one or more frame images indicating a specific scene of the modified moving image data. In the modified video data, the movement of the target TG is stopped in the specific scene, the stop image of the target TG including the analysis information MAI is displayed for a predetermined time, and then the operation of the target TG after the specific scene is restarted. Is adjusted.

The motion analysis unit 113 transmits the modified moving image data to the client terminal 400 via the output unit 114 and the communication device 130. The client terminal 400 causes the display device 150 to display the modified moving image data.

[2-2. effect]
In this embodiment, the main function of performing motion analysis is transferred from the client terminal 400 to the server 300. Therefore, the calculation load of the client terminal 400 is reduced.

Note that the effects described in this specification are merely examples and are not limited, and other effects may be obtained.

Note that this technology can also take the following configurations.

(1)
Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
Information processing executed by a computer, which comprises suspending the movement of the target in the specific scene and displaying analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene. Method.
(2)
The information processing method according to (1) above, wherein the analysis information includes information indicating a comparison between the movement of the target and the movement of a specific person as a model in the specific scene.
(3)
The information processing method according to (2) above, wherein the analysis information includes the skeleton information of the target in the specific scene and one or more reference skeleton information as a reference for the comparison.
(4)
In the still image, the skeleton information corresponding to the target site to be analyzed in the specific scene is selectively displayed as the target skeleton information and the one or more reference skeleton information (3). Information processing method described in.
(5)
The one or more reference skeleton information is generated by using the skeleton information obtained by modifying the skeleton information of the specific person in the specific scene based on the physical disparity between the target and the specific person. The information processing method according to 4).
(6)
The information processing method according to (5) above, wherein the analysis information includes information indicating a guideline for bringing the movement of the target closer to the movement of the specific person.
(7)
A motion analysis unit that analyzes the motion of the target in the specific scene based on the posture information of the target in the specific scene extracted from the video data.
A display device that pauses the movement of the target in the specific scene and displays analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
Information processing device with.
(8)
Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
A program that suspends the movement of the target in the specific scene and causes a computer to display analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.

100 Client terminal (information processing device)
113 Motion analysis unit 150 Display device FI Frame image HPI Posture information IM Still image MAI Analysis information MD Video data RM Specific person RSI Reference skeleton information SI Target skeleton information TG target

Claims

Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
Information processing executed by a computer, which comprises suspending the movement of the target in the specific scene and displaying analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene. Method.
The information processing method according to claim 1, wherein the analysis information includes information indicating a comparison between the movement of the target and the movement of a specific person as a model in the specific scene.
The information processing method according to claim 2, wherein the analysis information includes the skeleton information of the target and one or more reference skeleton information as a reference for the comparison in the specific scene.
In claim 3, the still image selectively displays skeleton information corresponding to the target site to be analyzed in the specific scene as skeleton information of the target and one or more reference skeleton information. The information processing method described.
The third or more reference skeleton information is generated by using the skeleton information obtained by modifying the skeleton information of the specific person in the specific scene based on the physical disparity between the target and the specific person. Information processing method.
The information processing method according to claim 5, wherein the analysis information includes information indicating a guideline for bringing the movement of the target closer to the movement of the specific person.
A motion analysis unit that analyzes the motion of the target in the specific scene based on the posture information of the target in the specific scene extracted from the video data.
A display device that pauses the movement of the target in the specific scene and displays analysis information indicating the analysis result of the target in the specific scene together with a still image of the target in the specific scene.
Information processing device with.
Based on the posture information of the target in the specific scene extracted from the video data, the movement of the target in the specific scene is analyzed.
The movement of the target in the specific scene is paused, and analysis information indicating the analysis result of the target in the specific scene is displayed together with the still image of the target in the specific scene.
A program that makes a computer realize that.