WO2017207802A1

WO2017207802A1 - Physical activity feedback

Info

Publication number: WO2017207802A1
Application number: PCT/EP2017/063559
Authority: WO
Inventors: Michel ANTUNES; Girum DEMISSE; Djamila AOUADA
Original assignee: Université Du Luxembourg
Priority date: 2016-06-03
Filing date: 2017-06-02
Publication date: 2017-12-07

Abstract

The invention is directed to a method for analysing the position of a human body using a computing device, comprising the following steps: (a) capturing at least one frame; (b) detecting on the at least one captured image frame the position of joints of the at least one portion of the body; (c) registering the detected positions and predefined reference positions of the joints in a common reference system; (d) computing a displacement of at least one of the joints that minimizes the error between the detected position and the corresponding predefined reference position of said joints; (e) using output means of said computing device, providing an indication of said displacement as an output. In step (d), the computed displacement is at least one rotation of one of the at least one body part.

Description

PHYSICAL ACTIVITY FEEDBACK

Technical field

[0001] The invention is directed to the field of physical training activity, more particularly to automated physical activity training and feedback.

Background art

[0002] Physical training activity is vital for the general population for maintaining a healthy lifestyle. It is crucial for elderly people in the prevention of diseases, maintenance of independence and improvement of quality of life. For stroke survivors it is critical and essential for recovering some autonomy in daily life activities. Despite the benefits of physical activity, many stroke survivors do not exercise regularly due to many reasons, such as lack of motivation, confidence, and skill levels. Traditionally, the post-stroke patients are initially subject to physical therapy under the supervision of a health professional aimed at restoring and maintaining activities of daily living in rehabilitation centres. The physiotherapist explains the movement to be performed by the patient, and continuously advises her/him how to improve the motion as well as interrupts the exercise in case of health related risk issues. Unfortunately, and due to the high economic burden, the on-site rehabilitation is usually of a short period of time and prescribed treatments and activities for home based rehabilitation are usually suggested. Unfortunately, stroke patients, and more frequently older adults, do not appropriately adhere to the recommended treatments, because, among other factors, they do not always understand or remember well enough what and how they are supposed to do the physical treatment.

[0003] In order to support the rehabilitation of stroke patients at home, human tracking and gesture therapy systems are being investigated for monitoring and assistance purposes. These home rehabilitation systems are advantageous not only because they are less costly for the patients and for the health care systems, but also because having it at home and regularly available, the users tend to do more exercise. A well accepted sensing technology for these purposes are RGB-D sensors (e.g. Kinect™ of Microsoft®) that are affordable and versatile, allowing to capture in real-time colour and depth information.

[0004] Existing systems and research either (1) combine exercises with video games as a means to educate and train people, while keeping a high level of motivation; or (2) try to emulate a physical therapy session. These works usually involve the detection, recognition and analysis of specific motions and actions performed. Very recent works tackle the problem of assessing how well the people perform certain actions, which can be used in rehabilitation e.g. to evaluate mobility and measure the risk of relapse.

[0005] The scientific publication of Pirsiavash, H., Vondrick, C, Torralba, A.:

"Assessing the quality of actions" in Computer Vision-ECCV 2014, pp. 556- 571 , Springer (2014), uses computer vision, i.e. videos, for assessing the quality of movements of persons practising sport. The approach followed in that document is based on a regression model that correlates spatiotemporal pose features of the body with scores obtained from expert judges. More specifically, the pose of the person is obtained in every frame. The body joint positions are computed as vectors. In the regression model, the observed videos are compared with reference videos showing perfect execution. In addition to provide a quality assessment, a feedback is also provided to the performer by differentiating a scoring function with joint location. More specifically, the gradient of the scoring function with respect to the location of each joint is computed in order to compute the movements of the joints that maximizes the score. Feedback vectors are also computed and superposed on specific joint(s) on the video for providing an improvement incentive to the performer. The corrective feedback is also analysed per joint, which involves a complex set of instructions for suggesting a particular body-part motion.

[0006] In the medical community, the publication of Ofli, F., Kurillo, G., Obdrz^'alek, S., Bajcsy, R., Jimison, H.B., Pavel, M.: "Design and evaluation of an interactive exercise coaching system for older adults: Lessons learned", IEEE J. Biomedical and Health Informatics (2016), provides assistive feedback during the performance of exercises. For each particular movement, they define constraints such as keeping hands close to each other or maintaining the torso in an upright position. These constraints are constantly measured during the exercise for assessing if the movement is performed correctly and in case pre-defined values for metrics on these constraints are violated, then corrective feedback is provided. The motion constraints are however action specific and manually defined.

Summary of invention

Technical Problem

[0007] The invention has for technical problem to alleviate at least one drawback of the above mentioned prior art. More specifically, the invention has for technical problem to provide an improved feedback to persons practising physical exercise.

Technical solution

[0008] The invention is directed to a method for analysing the position of a human body using a computing device, comprising the following steps: (a) capturing, using image capturing means of said device, at least one frame comprising at least a portion of a human body, said portion comprising joints of articulation of said body; (b) detecting on the at least one captured image frame the position of the joints of the at least one portion of the body; (c) registering the detected positions and predefined reference positions of the joints, said predefined reference positions being pre-stored in a memory element, in a common reference system; (d) computing a displacement of at least one of the joints that minimizes an error between the detected position of the joints, and the corresponding predefined reference position of said joints; (e) using output means of said computing device, providing an indication of said displacement as an output; wherein in step (d) the computed displacement is at least one rotation of one of the at least one body part of the at least one portion of the human body, each of said at least one body part comprising at least two of the joints.

[0009] According to a preferred embodiment, in step (d) the at least one portion of the human body is split into a set of N body-parts B = {b¹, ... , b^k, ... , b^N} where each body-part b^k is composed by n^k joints where b^k = {b^k, ... , b^k _k}. [0010] According to a preferred embodiment, in step (d) the at least one body part whose rotation is computed is/are the one among the body parts whose rotation causes the smallest error for said body part.

[001 1] According to a preferred embodiment, in step (d) the computed displacement is a sequence of rotations of the at least one rotation, said rotations being sorted digressively according to the impact of each rotation on the error for the corresponding body part, the rotation with the highest error reduction being the first one of said sequence.

[0012] According to a preferred embodiment, the error for a body part is based on the Euclidian distance between the joints of said body part and the predefined reference positions of said joints.

[0013] According to a preferred embodiment, the Euclidian distance m^k for a body part b^k is computed as follows m^k =∑"=-,_ Hb ¹ - bf \\² where the predefined reference positions are B = [b¹, ... , b^k, ... , b^N] for the set of N body-parts B.

[0014] According to a preferred embodiment, the sequence of rotations is computed as follows R = {R_lt ... , R ... , R_N} where R_t is a rotation R^k for each body part b^k that minimizes the error e^k(R^k)

- bf \\².

[0015] According to a preferred embodiment, the sorting of the rotations is based on iteratively selecting the body part b^k that maximizes the cost cf = m_k - e^k(R^k) where in each iteration i the body part b^k selected in the previous i - 1 iterations are not taken into account.

[0016] According to a preferred embodiment, in step (e) the indication of the displacement comprises at least one feedback vector illustrating the at least one rotation of the body-part(s).

[0017] According to a preferred embodiment, each of the at least one feedback vector is anchored to the corresponding body-part and/or to a spatial centroid of the corresponding body-part.

[0018] According to a preferred embodiment, each of the at least one feedback vector f_k is calculated as follows f_k = R^kc^k - c^k where c^k is a vector of a centroid of the body part b^k.

[0019] According to a preferred embodiment, in step (e) the indication of the displacement comprises visual and/or oral feedback information describing the displacement and identifying the corresponding body parts. [0020] According to a preferred embodiment, the feedback information indicating the displacement is based on the at least one feedback vector.

[0021] According to a preferred embodiment, in step (e) the Cartesian coordinates of the at least one feedback vector are analysed to identify the coordinate showing the highest magnitude, the feedback information indicating the displacement being determined by the direction and/or the sign of said coordinate.

[0022] According to a preferred embodiment, the feedback information indicating the displacement comprises expressions that correspond to at least one of a right, left, forward, backward, downward and/or upward displacement.

[0023] According to a preferred embodiment, steps (a) to (e) are executed in an iterative manner for different successive frames.

[0024] The invention is also directed to a computer program comprising instructions that are executable by a computer, wherein the instructions are configured for executing the steps of the method according to the invention when running on said computer.

[0025] The invention is also directed to a computing device with a storage medium for a computer program, wherein the storage medium comprises a computer program according to the invention.

[0026] According to a preferred embodiment, the computing device further comprises: the image capturing means; the output means; a computer connected with the image capture means and the output means for computing the feedback to the user.

[0027] According to a preferred embodiment, the output means comprises a visual display device and/or a sound output device.

Advantages of the invention

[0028] The invention is particularly interesting in that it does not compute feedback for single joints, but rather rotations for body-parts, defined as configurations of skeleton joints that may or may not move rigidly. Such a computation is less complex and requires less computer resources than some solutions of the prior art. [0029] Also, feedback proposals are automatically computed by comparing the movement being performed with a template action, without specifying pose constraints of joint configurations.

[0030] In addition, feedback instructions are not only presented visually, but also human interpretable feedback can be proposed from discretized spatial transformations that can be suggested to the user using, for example, audio messages.

Brief description of the drawings

[0031] Figure 1 illustrates a schematic representation of the skeleton of a performer body composed of a series of joints.

[0032] Figures 2 and 3 illustrates different body-parts of the skeleton of figure 1.

[0033] Figure 4 illustrates a first example of a feedback proposal according to the invention.

[0034] Figure 5 illustrates a second example of a feedback proposal according to the invention.

[0035] Figure 6 illustrates two successive feedback proposals for the movement illustrated in figure 4.

[0036] Figure 7 illustrates a template pose, the effective poses of two different subjects, and the best poses of these subjects that minimize that relative error.

Description of an embodiment

[0037] Figure 1 illustrates in a schematic way the skeleton of a subject that intends to perform a physical movement to achieve a given body pose. The skeleton comprises a series of joints, for instance 21 joints, interconnected by skeleton sections that are considered rigid.

[0038] Let S = [j^ ... ,j_n, ... ,j_N] denotes a skeleton with N joints, where each joint is given by its 3D coordinates j = [j_x,j_y,j_z ] ■ We define an action or movement as being a skeleton sequence M = [S_lt ... , Sf, ... ,S_P], where F is the number of frames of the sequence. Given a template skeleton sequence M and a subject performing a movement M, it will be provided, at each time instant, feedback proposals such that the movement can be iteratively improved to better match M. [0039] As a first step, pre-processing on the input skeleton data is achieved. Existent approaches were previously introduced in the literature (e.g. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)), and are adapted for the present invention.

[0040] A first requirement for comparing two skeletal sequences is that they need to be spatially registered. This is achieved by transforming the joints of each skeleton S such that the world coordinate system is placed at the hip center, and the projection of the vector from the left hip to the right hip onto the x-y plan is parallel to the x-axis. Then, for achieving invariance to absolute locations, the skeletons in M are normalized such that the body-part lengths match the corresponding part lengths of the skeletons in M. This is performed without modifying the joint angles.

[0041 ] Different subjects, or the same subject at different times, perform a particular action or movement at different rates. In order to handle rate variations and mitigate the temporal misalignment of time series, Dynamic Time Warping (DTW) can be usually employed (e.g. Rabiner, L, Juang, B.H.: Fundamentals of speech recognition. Prentice hall (1993)). In the present case, it is sought to align a given sequence M with a template sequence M. The template sequence M can be aligned with respect to M, or vice-versa. It is assumed that the subject is trying to replicate the same action as M, and given M, it is sought to provide feedback proposals. Since a feedback proposal is to be computed for each temporal instant of M, it is reasonable to compute the temporal correspondences of M with respect to M.

[0042] After the spatial and temporal alignment processing described in the previous section, the skeleton instance Sf in M will be in correspondence with Sf in M. This section explains how to compute the body motion required to align corresponding body-parts of aligned skeletons S and S, and discloses a method for extracting human-interpretable feedback from these transformations. [0043] With reference to figure 2, the human motion is analysed using a body-part based representation of the subject. The skeleton S is represented by a set of body-parts B = {b¹, ... , b^k, ... , b^N}. Each body part b^k is composed by n^k joints b^k = [b^k, ... , b^k _k] and has a local reference system defined by the joint b^k. In figure 2, the skeleton comprises 12 body-parts b₁, ... , b₁₂ being the right forearm, the left forearm, the back, the right arm, the left arm, the right leg, the left leg, the torso, the upper body, the lower body, the full upper body and the full body. It is however understood that other definitions of the body-parts can be considered.

[0044] Given the aligned skeletons S and S, the objective is to compute the motion that each body-part of S needs to undergo to better match the template skeleton. This analysis is performed for each body-part using the corresponding local coordinate system. As a metric for measuring how similar is the pose of corresponding body-parts, we use the Euclidean distance as the scoring function. Following this, the error between b^k and b^k is given by:

[0045] It is to be noticed that \\b^k - b^k \\ = 0, because the previous computation is performed using the local coordinate systems that are assumed to be in correspondence.

[0046] For providing feedback to the performer of skeleton S on how the movement can be improved to better match S, we compute the transformation that each body-part b^k needs to undergo for decreasing the scoring function m^k. We anchor the reference joints b^k and b^k of the corresponding body-parts. The aim is then to compute the rotation R^k ε 50 (3) that minimizes the following error:

which can be computed in closed form.

It is important to refer that since the human motion is articulated, depending on the movement being performed, a given body-part b^k may or may not move rigidly. This is not a critical issue because body-parts that do not moving rigidly have high joint matching error and will be considered not relevant by the method described next. Note that different body-parts b^k can contain subsets of the same joints, which implies that the transformation R^k will also have impact on the location of the other body-parts b^l≠k. Taking this into account, we want to compute a sequence of transformations R = {R , ... , Ri, ... , R_N], one rotation = R^k for each body-part b^k, such that the first rotation R has the highest decrease in the joint location error until R_N, which has the lowest impact in the human pose matching. This sorting is performed maximizing the following cost:

where in iteration i, the body-parts£^fc selected in the previous i - 1 iterations are not taken into account.

[0047] Here is the pseudo code for computing the above:

Input: S, S, B

Output: sequence of rotations R, list of body— part indexes K

L := B, K = { }, R = { }, i = 1; e^k(R^k)

[0048] The rotations R_t = R^k correspond to the motion required for the best alignment of b^k and b^k. However, it might be difficult to present this rigid- body transformation as feedback proposals on, for example, a screen. For overcoming this, we can compute feedback vectors for suggesting improvements on the motion. For each body-part, we can pre-calculate the spatial centroid c^fc(note that in case of single limbs, this point is located on the body-part itself). Then, the feedback vector anchored to c^k is defined as f_k = R^kc^k - c^k [0049] Figure 4 illustrates a feedback proposal for a reaching a waving target pose S. The left images are two views of the target pose S that is for instance a waving pose. The central images are two corresponding views of the actual pose S of the subject, for instance a clapping pose. The right images are the superposition of the left and central views and illustrate the feedback vector f_k anchored to the body-part being the right arm (corresponding to the body- part b₄ as illustrated in figure 2). The vector suggest the subject to move the right arm upwardly in order to reach the clapping target pose S. Here only the first feedback proposal R is shown.

[0050] Figure 5 illustrates another feedback proposal for a reaching another target pose S. Similarly to figure 4, the left images are two views of the target pose S that is for instance a standing pose. The central images are two corresponding views of the actual pose S of the subject, for instance a bending pose. The right images are the superposition of the left and central views and illustrate the feedback vector f_k anchored to the body-part being the torso (corresponding to the body-part b₈ as illustrated in figure 2). The vector suggest the subject to move the right arm upwardly in order to reach the clapping target pose S. The vector suggest the subject to move the torso backwards and upwards in order to reach the standing target pose S. Similarly to figure 4, here only the first feedback proposal R is shown.

[0051 ] Not all the persons have the same spatial awareness to realize how to perform the motion suggested by the feedback vector f_k as discussed above and illustrated in figures 4 and 5. This difficulty is even more evident in cognitive impaired individuals. In order to support the patient in improving their movements, simple human-interpretable feedback messages can be shown or/and spoken to the patient by the computer system.

[0052] Let us analyse the case of the body-part b^k that needs to undergo the largest motion R = R^k. Initially, to each b^k was assigned a body-part name BN, e.g. b is the right forearm and b₈ is the torso (see Fig. 2). These labels are used directly for informing the user which body-parts should be moved. Then, the feedback vector f^k = [f^k, f^k, f_z ^k] is discretized by selecting the dimension d with highest magnitude | _d ^fc | . The messages regarding the direction of the motion BD are then defined as: if d = x

• if f^k < 0, then BD=Right

• if f^k > 0, then BD=Left

if d = y

• if f^k < 0, then BD=Forth

• if f_y ^k > 0, then BD=Back

if d = z

• if f^k < 0, then BD=Down

• if f^k > 0, then SD=L/p

The feedback proposal messages are represented as the concatenation of strings: Feedback message := "Move" + BN + BD.

[0053] Figure 6 illustrate feedback proposals comprising a vector representation and a message according to the above. Here first and second feedback proposals R and R₂ are shown. The two images on the left illustrate a waving target pose S and an actual clapping pose S of the subject. The two images in the rights illustrate a superposition of the target and actual poses with a feedback 1 and a feedback 2, respectively. The feedback 1 comprises a vector anchored on the right arm and oriented upwards (similarly to figure 4) and a message reading "Move Right Arm up". The feedback 2 comprises a vector anchored to the left arm and oriented also generally upwards and a message reading "Move Left Arm up". A colour coding can be used for identifying the directions BD.

[0054] In this section, we experimentally evaluate the proposed system using data captured using the Kinect™ version 2. The idea is to simulate a person who suffered a stroke: the bad arm issue due to the paralysis of an upper limb is simulated by lifting a kettle-bell using one of the arms, and the balance problem is replicated using a balance ball. The objective in this section is to simulate a simple physiotherapy session at home, and test if the feedback proposals are able to guide the user. We assume that a person needs to perform a template human pose S. The subject puts himself above the balance ball and lifts the kettle-bell. Giving only the guidance of the feedback vectors, body-part motion intensity and feedback messages, the objective is to converge to the template pose without actually seeing it. The exercise lasts for 20 seconds and feedback proposals are shown at each time instant.

[0055] The experimental results are shown in Figure 7. Part (a) show two views of the template pose S. Parts (b) and (c) show a first pose S and the best pose S_Best for two subjects. The best pose S_Best is the one that minimizes the error m¹² for the body-part b¹² (i.e. the whole body, see figure 2). Parts (d) and (e) show the relative error (difference between initial and current error divided by the initial error) in % for the body-part b¹².

[0056] Physical activity is essential for stroke survivors for recovering some autonomy in daily life activities. Post-stroke patients are initially subject to physical therapy under the supervision of a health professional, but due to economical aspects, home based rehabilitation is eventually suggested. In order to support the physical activity of stroke patients at home, this paper presents a system for guiding the user in how to properly perform certain actions and movements. This is achieved by presenting feedback in form of visual information and human-interpretable messages. The core of the proposed approach is the analysis of the motion required for aligning body- parts with respect to a template skeleton pose, and how this information can be presented to the user in form of simple recommendations. Experimental results in three datasets show the potential of the proposed framework.

[0057] As discussed previously, the objective of this invention is not only to assess the quality of an action, but also to provide feedback in how to improve the movement being performed. In contrast to previous works, there are three main contributions: We do not compute feedback for single joints, but for body-parts, defined as configurations of skeleton joints that may or may not move rigidly; Feedback proposals are automatically computed by comparing the movement being performed with a template action, without specifying pose constraints of joint configurations; Feedback instructions are not only presented visually, but also human interpretable feedback is proposed from discretized spatial transformations that can be suggest to the user using, for example, audio messages.

Claims

1 . Method for analysing the position of a human body using a computing device, comprising the following steps:

(a) capturing, using image capturing means of said device, at least one frame comprising at least a portion of a human body, said portion comprising joints of articulation of said body;

(b) detecting on the at least one captured image frame the position of the joints of the at least one portion of the body;

(c) registering the detected positions and predefined reference positions of the joints, said predefined reference positions being pre-stored in a memory element, in a common reference system;

(d) computing a displacement of at least one of the joints that minimizes the error between the detected position of the joints, and the corresponding predefined reference position of said joints;

(e) using output means of said computing device, providing an indication of said displacement as an output;

characterized in that

in step (d) the computed displacement is at least one rotation of one of the at least one body part of the at least one portion of the human body, each of said at least one body part comprising at least two of the joints.

2. Method according to claim 1 , wherein in step (d) the at least one portion of the human body is split into a set of N body parts B = {b¹, ... , b^k, ... , b^N} where each body part b^k is composed by n^k joints where b^k = [b^k, ... , b^k _k}.

3. Method according to one of claims 1 and 2, wherein in step (d) the at least one body part whose rotation is computed is/are the one among the body parts whose rotation causes the smallest error for said body part.

4. Method according to any one of claims 1 to 3, wherein in step (d) the computed displacement is a sequence of rotations of the at least one rotation, said rotations being sorted digressively according to the impact of each rotation on the error for the corresponding body part, the rotation with the highest error reduction being the first one of said sequence.

5. Method according to claim 2 and one of claims 3 and 4, wherein the error for a body part is based on the Euclidian distance between the joints of said body part and the predefined reference positions of said joints.

6. Method according to claim 5 wherein the Euclidian distance m^k for a body part b^k is computed as follows m^k =∑"=-,_ Hb ¹ - bf \\² where the predefined reference positions are B = [b¹, ... , b^k, ... , b^N] for the set of N body parts B.

7. Method according to claim 4 and one of claims 5 and 6, wherein the sequence of rotations is computed as follows R = {R_lt ... , Ri, ... , R_N} where R_t is a rotation R^k for each body part b^k that minimizes the error e^k (R^k) =∑^rj'=₁ \\R^kb^k - bf \\².

8. Method according to claim 7 wherein the sorting of the rotations is based on iteratively selecting the body part b^k that maximizes the cost cf = m_k - e^k (R^k) where in each iteration i the body part b^k selected in the previous i - 1 iterations are not taken into account.

9. Method according to any one of claims 1 to 8, wherein in step (e) the representation of the displacement comprises at least one feedback vector illustrating the at least one rotation of the body part(s).

10. Method according to claim 9 wherein each of the at least one feedback vector is anchored to the corresponding body part and/or to a spatial centroid of the corresponding body part.

1 1 . Method according to one of claims 7 and 8 and according to one of claims 9 and 10 wherein each of the at least one feedback vector f_k is calculated as follows f_k = R^kc^k - c^k where c^k is a vector of a centroid of the body part b^k.

12. Method according to any one of claims 1 to 1 1 , wherein in step (e) the indication of the displacement comprises visual and/or oral feedback information describing the displacement and identifying the corresponding body parts.

13. Method according to any one of claims 9 to 1 1 , and according to claim 12, wherein the feedback information describing the displacement is based on the at least one feedback vector.

14. Method according to clainn 13, wherein in step (e) the Cartesian coordinates of the at least one feedback vector are analysed to identify the coordinate showing the highest magnitude, the feedback information describing the displacement being determined by the direction and/or the sign of said coordinate.

15. Method according to claim 14, wherein the feedback information describing the displacement comprises expressions that correspond to at least one of a right, left, forward, backward, downward and/or upward displacement.

16. Method according to any one of claims 1 to 15 wherein steps (a) to (e) are executed in an iterative manner for different successive frames.

17. A computer program comprising instructions that are executable by a computer, characterized in that the instructions are configured for executing the steps of the method according to any one of claims 1 to 16 when running on said computer.

18. A computing device with a storage medium for a computer program, characterized in that the storage medium comprises a computer program according to claim 17.

19. The computing device according to claim 18, wherein said device further comprises:

- the image capturing means;

- the output means;

- a computer connected with the image capture means and the output means for computing the feedback to the user.

20. The computing device according to claim 19, characterized in that the output means comprises a visual display device and/or a sound output device.