WO2021129487A1 - 一种用户的肢体节点的位置确定方法、装置、介质及系统 - Google Patents

一种用户的肢体节点的位置确定方法、装置、介质及系统 Download PDF

Info

Publication number
WO2021129487A1
WO2021129487A1 PCT/CN2020/136834 CN2020136834W WO2021129487A1 WO 2021129487 A1 WO2021129487 A1 WO 2021129487A1 CN 2020136834 W CN2020136834 W CN 2020136834W WO 2021129487 A1 WO2021129487 A1 WO 2021129487A1
Authority
WO
WIPO (PCT)
Prior art keywords
limb
limb node
occluded
time period
node
Prior art date
Application number
PCT/CN2020/136834
Other languages
English (en)
French (fr)
Inventor
姜永航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021129487A1 publication Critical patent/WO2021129487A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • One or more embodiments of the present application generally relate to the field of artificial intelligence, and specifically relate to a method, device, medium, and system for determining the position of a user's limb node.
  • the position of the occluded limb is usually directly guessed through deep learning (such as a neural network).
  • deep learning such as a neural network.
  • the specific method is to manually label the possible poses of the occluded parts in the training sample set, let the model learn these occluded situations through training, and directly infer the poses of the occluded parts during use.
  • an embodiment of the present application provides a method for determining the position of at least one limb node of a user.
  • the method includes, when the at least one limb node is not blocked, according to the position of the at least one limb node at the first moment. And the position at the second time, determine the first displacement of at least one limb node in the first time period between the first time and the second time; obtain information related to the movement of the at least one limb node in the first time period First motion data; training a speculative model based at least in part on the first displacement and the first motion data, where the speculative model is used to speculate the occluded position of at least one limb node when at least one limb node is occluded.
  • the motion data and displacement of at least one limb node of the user's limb are used to train the inference model, due to the movement of at least one limb node of the user's limb.
  • the inferred model is trained, the possible posture of the occluded part is manually guessed as the training label, the accuracy and robustness of the inferred model in the embodiment of the application is Sex will be higher.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the first displacement of the at least one limb node in the first time period between the first time and the second time is determined , Further comprising: acquiring a first image frame at a first moment, and acquiring a second image frame at a second moment; according to the position of at least one limb node in the first image frame and the position of at least one limb node in the second image frame , Determine the first displacement of at least one limb node in the first time period.
  • training the inferred model based at least in part on the first displacement and the first motion data further includes: training the inferred model at least in part using the first motion data as a feature input and the first displacement as a target category.
  • the speculative model includes a recurrent neural network (RNN), a long short-term memory (LSTM) network, a gated recurrent unit (GRU) network, and a two-way recurrent neural network.
  • RNN recurrent neural network
  • LSTM long short-term memory
  • GRU gated recurrent unit
  • BRNN bidirectional recurrent neural network
  • the method further includes: in a case where at least one limb node has never been occluded to be occluded, acquiring second motion data related to the motion in a second time period, wherein the second time period includes at least The time period between the moment when a limb node is not occluded and the moment when it is occluded; using the inference model, based on the second motion data, infer the second displacement of at least one limb node in the second time period; based at least in part on The second displacement and the unoccluded position of the at least one limb node at the time when it is unoccluded determine the occluded position of the at least one limb node in the case of being occluded.
  • the movement data of at least one limb node of the user's limb during the period of time from being unobstructed to being obscured is used to speculate the displacement of at least one limb node of the user's limb during the period of time. , And then obtain the occluded position of at least one limb node in the case of being occluded. Since there is a direct correspondence between the motion data and displacement of at least one limb node of the user’s limbs, compared with the prior art using manual guessing For the possible postures of the occluded parts, the accuracy of the displacement estimated according to the embodiment of the present application is higher.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement pattern.
  • the length of the second time period is the same as the length of the first time period.
  • the method further includes: acquiring third motion data related to the motion in the third time period when at least one limb node is not occluded from being occluded to not occluded again, wherein , The third time period includes the time period between the time when it is not blocked and the time when it is not blocked again; using the inference model, based on the third motion data, infer the third displacement of at least one limb node in the third time period ; Based at least in part on the third displacement and at least one of the unoccluded position of the at least one limb node at the time of unoccluded and the unoccluded position of the at least one limb node at the moment of unoccluded again, it is determined that at least one limb node is being The shaded position in the case of occlusion.
  • the movement data of the limb node between the moment of being occluded and the moment of being unoccluded is used as the posterior knowledge, and the displacement of the limb node between the moment of being unoccluded and the moment of being occluded can be inferred. Accuracy of displacement estimation.
  • the third movement data includes at least one of a third acceleration, a third angular velocity, a third movement direction, and a third movement mode.
  • the length of the third time period is the same as the length of the first time period.
  • the method further includes: receiving other speculative models for at least one other user, where the other speculative models are used to speculate at least one other user when at least one limb node of the at least one other user is occluded The occluded position of at least one limb node of; Integrate the speculative model and other speculative models, and obtain the integrated speculative model; In the case where at least one limb node of the user is occluded, use the integrated speculative model to speculate at least one limb node The occluded position.
  • the embodiments of the present application by integrating the user’s inference model with other users’ inference models, it is possible to improve the accuracy of the estimation of the displacement of the user’s limb nodes, especially when the user’s training data is less, which leads to the user’s inference model. In the case of poor performance.
  • an embodiment of the present application provides a method for determining the position of at least one limb node of a user.
  • the method includes: in the case that at least one limb node has never been occluded to be occluded, obtaining the first time period The first motion data related to the motion within the first time period, where the first time period includes the time period between the moment when at least one limb node is not occluded and the moment when it is occluded; using a speculation model, based on the first motion data, at least one The first displacement of the limb node in the first time period; based at least in part on the first displacement and the unobstructed position of the at least one limb node at the time when it is not obstructed, it is determined that at least one limb node is occluded at the time when it is obstructed position.
  • the movement data of at least one limb node of the user's limb during the period of time from being unobstructed to being obscured is used to speculate the displacement of at least one limb node of the user's limb during the period of time. , And then obtain the occluded position of at least one limb node at the moment of being occluded. Since there is a direct correspondence between the motion data and displacement of at least one limb node of the user’s limb, it is compared with the manual guessing in the prior art. For the possible posture of the blocked part, the accuracy of the displacement estimated according to the embodiment of the present application is higher.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the inferred model includes a model trained at least in part on the second motion data and the second displacement of at least one limb node in the second time period, wherein at least one limb node is not occluded in the second time period , And the length of the second time period is the same as the length of the first time period.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement mode
  • the speculative model includes at least one of a recurrent neural network, a long and short-term memory network, and a gated recurrent unit.
  • determining the occluded position of the at least one limb node at the occluded moment based at least in part on the first displacement and the unoccluded position of the at least one limb node at the occluded moment further includes: Acquire the unoccluded image frame of at least one limb node at the moment of, and determine the unoccluded position according to the unoccluded image frame.
  • the embodiments of the present application provide a method for determining the position of at least one limb node of a user.
  • the method includes, in the case where at least one limb node of the user has never been occluded but has been occluded to not occluded again, acquiring first motion data related to the motion in a first time period, where the first time period is included in The time period between the time when it is not blocked and the time when it is not blocked again; using the inference model, based on the first motion data, infer the first displacement of at least one limb node in the first time period; based at least in part on the first The displacement and at least one of the unoccluded position of the at least one limb node at the time of being unoccluded and the unoccluded position of the at least one limb node at the time of being unoccluded again determine the occluded position of the at least one limb node at the time of being occluded.
  • the motion data of at least one limb node of the user's limb in the time period from never being occluded to not being occluded again is used to speculate that at least one limb node of the user's limb is in the time period
  • the displacement of at least one limb node at the moment of being occluded is obtained. Since there is a direct correspondence between the movement data and displacement of at least one limb node of the user’s limb, compared with the prior art using artificial The guessed possible posture of the occluded part, according to the embodiment of the present application, the accuracy of the displacement inferred is higher.
  • the movement data of the limb node between the moment when it is occluded and the moment when it is not occluded again is used as the posterior knowledge to speculate
  • the displacement of the limb node between the unobstructed moment and the obstructed moment can further improve the accuracy of displacement estimation.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the inferred model includes a model trained at least in part on the second motion data and the second displacement of at least one limb node in the second time period, wherein at least one limb node is not occluded in the second time period , And wherein the length of the second time period is the same as the length of the time period between the time when it is never blocked and the time when it is blocked, and/or the length of the second time period is the same as the length of the time period from the time when it is blocked to the time when it is not blocked again. The length of the time period between the moments of occlusion is the same.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement pattern.
  • the speculative model includes a bidirectional recurrent neural network.
  • the first displacement includes at least one of a displacement from the unobstructed position to the obstructed position and a displacement from the obstructed position to the unobstructed position again.
  • the at least one limb node is determined The occluded position of the limb node in the case of being occluded, further includes: obtaining the unoccluded image frame of at least one limb node at the moment when it is not occluded, and determining the unoccluded position according to the unoccluded image frame; and/or acquiring At the moment when at least one limb node is not blocked again, the image frame is not blocked again, and the position that is not blocked again is determined according to the unblocked image frame again.
  • an embodiment of the present application provides a computer-readable storage medium on which instructions are stored. When the instructions are executed on a machine, the machine executes any of the above methods.
  • an embodiment of the present application provides a system for determining the position of at least one limb node of a user.
  • the system includes: a processor; a memory, where instructions are stored in the memory, and when the instructions are executed by the processor, the processing The device executes any of the above methods.
  • an embodiment of the present application provides a device for determining the position of at least one limb node of a user.
  • the device includes: an image processing module, which is used to determine the position of at least one limb node when the at least one limb node is not blocked.
  • the position of the node at the first time and the position at the second time are used to determine the first displacement of at least one limb node in the first time period between the first time and the second time;
  • the motion data acquisition module acquires and at least one The first movement data related to the movement of the limb node in the first time period;
  • the inferred model training module trains the inferred model based at least in part on the first displacement and the first movement data, wherein the inferred model is used to be used in at least one limb node In the case of occlusion, guess the occluded position of at least one limb node.
  • the motion data and displacement of at least one limb node of the user's limb are used to train the inference model, due to the movement of at least one limb node of the user's limb.
  • the inferred model is trained, the possible posture of the occluded part is manually guessed as the training label, the accuracy and robustness of the inferred model in the embodiment of the application is Sex will be higher.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the device further includes an image acquisition module, configured to acquire a first image frame at the first moment and a second image frame at the second moment; and wherein, the image processing module Determine the position of the at least one limb node in the first time period according to the position of the at least one limb node in the first image frame and the position of the at least one limb node in the second image frame The first displacement.
  • an image acquisition module configured to acquire a first image frame at the first moment and a second image frame at the second moment
  • the image processing module Determine the position of the at least one limb node in the first time period according to the position of the at least one limb node in the first image frame and the position of the at least one limb node in the second image frame The first displacement.
  • the inferred model training module is used to train the inferred model based at least in part on the first displacement and the first motion data, including being used to: at least partially use the first motion data as a feature input and the first displacement as a feature input Target category, train a speculative model.
  • the speculative model includes at least one of a recurrent neural network, a long and short-term memory network, a gated recurrent unit network, and a bidirectional recurrent neural network.
  • the motion data acquisition module is also used to acquire second motion data related to the motion in the second time period when at least one limb node has never been occluded to be occluded, where the second time period
  • the segment includes the time period between the moment when at least one limb node is not occluded and the moment when it is occluded
  • the device also includes a speculation module, the speculation module is configured to use the speculation model to speculate at least one limb node based on the second motion data The second displacement in the second time period; and the inference module is further configured to determine that at least one limb node is being occluded based at least in part on the second displacement and the unoccluded position of the at least one limb node at the time when it is not occluded Occluded position in the case of.
  • the movement data of at least one limb node of the user's limb during the period of time from being unobstructed to being obscured is used to speculate the displacement of at least one limb node of the user's limb during the period of time. , And then obtain the occluded position of at least one limb node in the case of being occluded. Since there is a direct correspondence between the motion data and displacement of at least one limb node of the user’s limbs, compared with the prior art using manual guessing For the possible postures of the occluded parts, the accuracy of the displacement estimated according to the embodiment of the present application is higher.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement pattern.
  • the length of the second time period is the same as the length of the first time period.
  • the motion data acquisition module is also used to acquire a third motion related to the motion in the third time period when at least one limb node has never been occluded but has been occluded again.
  • the third time period includes the time period between the time when it is not blocked and the time when it is not blocked again;
  • the device also includes a speculation module, and the speculation module is used to use the speculation model to speculate based on the third motion data
  • the third displacement of at least one limb node in the third time period; and the inference module is also used to, at least in part, be based on the third displacement and the unobstructed position of the at least one limb node at the time when it is not obstructed and when it is not obstructed again. At least one of the unobstructed positions at the moment of occlusion, determining the occluded position of at least one limb node in the case of being occluded.
  • the movement data of the limb node between the moment of being occluded and the moment of being unoccluded is used as the posterior knowledge, and the displacement of the limb node between the moment of being unoccluded and the moment of being occluded can be inferred. Accuracy of displacement estimation.
  • the third movement data includes at least one of a third acceleration, a third angular velocity, a third movement direction, and a third movement mode.
  • the length of the third time period is the same as the length of the first time period.
  • the device further includes a communication module for receiving other speculative models for at least one other user, where the other speculative models are used to speculate that at least one limb node of at least one other user is occluded.
  • the occluded position of at least one limb node of another user; and the inferred model training module is also used to integrate the inferred model with other inferred models, and to obtain the integrated inferred model; and the inferred module is also used to In the case that a limb node is occluded, the occluded position of at least one limb node is estimated using the integrated speculation model.
  • the embodiments of the present application by integrating the user’s inference model with other users’ inference models, it is possible to improve the accuracy of the estimation of the displacement of the user’s limb nodes, especially when the user’s training data is less, which leads to the user’s inference model. In the case of poor performance.
  • an embodiment of the present application provides an apparatus for determining the position of at least one limb node of a user.
  • the apparatus includes: a motion data acquisition module, which is used for the condition that at least one limb node has never been occluded to be occluded , Obtain the first motion data related to the motion in the first time period, where the first time period includes the time period between the moment when at least one limb node is not occluded and the moment when it is occluded; the inference module is used to use The inference model, based on the first motion data, infers the first displacement of at least one limb node in the first time period; the inference module is also used to, at least in part, based on the first displacement and the moment when the at least one limb node is not occluded The unoccluded position is to determine the occluded position of at least one limb node at the moment of being occluded.
  • the movement data of at least one limb node of the user's limb during the period of time from being unobstructed to being obscured is used to speculate the displacement of at least one limb node of the user's limb during the period of time. , And then obtain the occluded position of at least one limb node at the moment of being occluded. Since there is a direct correspondence between the motion data and displacement of at least one limb node of the user’s limb, it is compared with the manual guessing in the prior art. For the possible posture of the blocked part, the accuracy of the displacement estimated according to the embodiment of the present application is higher.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the inferred model includes a model trained at least in part on the second motion data and the second displacement of at least one limb node in the second time period, wherein at least one limb node is not occluded in the second time period , And the length of the second time period is the same as the length of the first time period.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement pattern.
  • the speculative model includes at least one of a recurrent neural network, a long and short-term memory network, and a gated recurrent unit.
  • the device further includes an image acquisition module and an image processing module, wherein the image acquisition module is used to acquire an unobstructed image frame of at least one limb node when it is not obstructed, and the image processing module is used to obtain an unobstructed image frame of at least one limb node.
  • the occluded image frame determines the unoccluded position.
  • an embodiment of the present application provides a device for determining the position of at least one limb node of a user.
  • the device includes: a motion data acquisition module, which is used to detect when at least one limb node of the user has never been occluded. Under the condition that it is not blocked again, obtain the first motion data related to the movement in the first time period, where the first time period includes the time period between the time when it is not blocked and the time when it is not blocked again; inferred;
  • the module is used to use the speculation model to speculate the first displacement of at least one limb node in the first time period based on the first motion data; the speculation module is also used to speculate at least partly based on the first displacement and at least one limb node in the future. At least one of the unoccluded position at the time of being occluded and the unoccluded position at the time of being unoccluded again, determining the occluded position of at least one limb node at the time of being occluded.
  • the motion data of at least one limb node of the user's limb in the time period from never being occluded to not being occluded again is used to speculate that at least one limb node of the user's limb is in the time period
  • the displacement of at least one limb node at the moment of being occluded is obtained. Since there is a direct correspondence between the movement data and displacement of at least one limb node of the user’s limb, compared with the prior art using artificial The guessed possible posture of the occluded part, according to the embodiment of the present application, the accuracy of the displacement inferred is higher.
  • the movement data of the limb node between the moment when it is occluded and the moment when it is not occluded again is used as the posterior knowledge to speculate
  • the displacement of the limb node between the unobstructed moment and the obstructed moment can further improve the accuracy of displacement estimation.
  • the first movement data includes at least one of a first acceleration, a first angular velocity, a first movement direction, and a first movement pattern.
  • the inferred model includes a model trained at least in part on the second motion data and the second displacement of at least one limb node in the second time period, wherein at least one limb node is not occluded in the second time period , And wherein the length of the second time period is the same as the length of the time period from the moment when it is never blocked to the moment when it is blocked, and/or the length of the second time period is the same as the length from the moment when it is blocked to the moment when it is not blocked again The length of the time period between the moments of occlusion is the same.
  • the second movement data includes at least one of a second acceleration, a second angular velocity, a second movement direction, and a second movement pattern.
  • the speculative model includes a bidirectional recurrent neural network.
  • the first displacement includes at least one of a displacement from the unobstructed position to the obstructed position and a displacement from the obstructed position to the unobstructed position again.
  • the device further includes: an image acquisition module, configured to obtain an unobstructed image frame of at least one limb node when it is not obstructed, and/or obtain a moment when at least one limb node is unobstructed again The image frame is unoccluded again; the image processing module is used to determine the unoccluded position according to the unoccluded image frame, and/or determine the unoccluded position again according to the unoccluded image frame again.
  • an image acquisition module configured to obtain an unobstructed image frame of at least one limb node when it is not obstructed, and/or obtain a moment when at least one limb node is unobstructed again The image frame is unoccluded again
  • the image processing module is used to determine the unoccluded position according to the unoccluded image frame, and/or determine the unoccluded position again according to the unoccluded image frame again.
  • Fig. 1 shows a schematic diagram of the principle of body posture estimation according to an embodiment of the present application
  • Figure 2 shows a schematic structural diagram of a limb posture estimation device according to an embodiment of the present application
  • Fig. 3 shows a schematic diagram of an image frame sequence when a limb is not blocked according to an embodiment of the present application
  • Fig. 4 shows a schematic structural diagram of a recurrent neural network according to an embodiment of the present application
  • FIG. 5 shows a schematic diagram of a structure of a bidirectional cyclic neural network according to an embodiment of the present application
  • Fig. 6A shows a schematic diagram of an image frame sequence including an image frame of a body occluded according to an embodiment of the present application
  • FIG. 6B shows another schematic diagram of an image frame sequence including an image frame of a body occluded according to an embodiment of the present application
  • FIG. 7 shows a schematic flowchart of a training method of a speculative model for limb posture speculation according to an embodiment of the present application
  • FIG. 8 shows a schematic flowchart of a method for estimating a limb posture according to an embodiment of the present application
  • FIG. 9 shows another schematic flow chart of a method for estimating a limb posture according to an embodiment of the present application.
  • Fig. 10 shows a schematic structural diagram of a limb posture estimation system according to an embodiment of the present application.
  • Figure 1 shows a schematic diagram of the principle of limb posture estimation according to an embodiment of the present application, where the limb posture refers to the posture or state presented by the limb, and the limb posture can be determined by the positions of multiple limb nodes of the limb, where the limb
  • the node may include, but is not limited to, a bone node.
  • the limb node may include, but is not limited to, a hand, a wrist, an elbow, and a shoulder.
  • the limb posture estimation includes two stages, namely the training stage of the inferred model and the inferred stage.
  • the training stage of the inferred model includes the condition that at least one limb node of the user’s limb is not occluded (for example, but Not limited to, it is not blocked by other objects or does not exceed the image collection range), obtain the position 10 of at least one limb node of the limb at the two unobstructed moments, and determine that at least one limb node of the limb is in the two unobstructed moments.
  • the displacement 20 between the occluded moments, where the unoccluded moments may include, but are not limited to, the collection moments of image frames where at least one limb node of the limb is not occluded, and the moment of at least one limb node of the limb is not occluded.
  • the position 10 may include, but is not limited to, the position of at least one limb node of the limb in the image frame collected at the moment when it is not blocked.
  • the training phase of the inferred model also includes obtaining movement data 30 related to the movement of at least one limb node of the limb between the two unobstructed moments, wherein the movement of at least one limb node of the limb refers to at least one of the limbs.
  • the movement data 30 related to the movement of at least one limb node of the limb may include, but is not limited to, data that reflects the above-mentioned movement state of at least one limb node.
  • the movement data 30 may include, but is not limited to , At least one of acceleration, angular velocity, direction of movement, etc., where acceleration, angular velocity, and direction of movement can be passed, for example, by sensors worn on at least one limb node of the limb (for example, but not limited to, acceleration sensor, gyroscope, magnetic force
  • the exercise data 30 may also include exercise patterns.
  • the exercise patterns refer to the types of body movements that the user is doing, such as, but not limited to, jumping movements, squats, and arm movements.
  • movement patterns can be obtained as prior knowledge.
  • the training stage of the inferred model further includes training the inferred model 40 by using the displacement 20 of at least one limb node of the limb between the two unobstructed moments and the motion data 30.
  • the speculation stage of the speculation model includes the use of speculation model 40 based on at least one limb node of the limb when at least one limb node of the user's limb is occluded (for example, but not limited to, blocked by other objects or beyond the image collection range) Movement data 30 from the moment of unocclusion to the moment of occlusion, inferring the displacement 20 of at least one limb node of the limb from the moment of never being occluded to the moment of being occluded, wherein the moment of occlusion may include, but is not limited to, at least one limb of the limb The collection time of the image frame where the node is occluded.
  • the speculation stage of the speculation model also includes determining at least one limb node of the limb based on the displacement 20 from the moment when at least one limb node of the limb is never occluded to the moment when it is occluded, and the position 10 of at least one limb node of the limb at the moment when it is not occluded.
  • the position of the limb at the occluded moment 60 is determined according to the position 50 of at least one limb node of the limb at the occluded moment, where the position 50 of the at least one limb node of the limb at the occluded moment can be It includes, but is not limited to, the occluded position of at least one limb node of the limb in the image frame collected at the moment of occlusion.
  • Fig. 2 shows a schematic structural diagram of a limb posture estimation device 100 according to an embodiment of the present application.
  • the body posture estimation device 100 includes, but is not limited to, an image acquisition module 110, an image processing module 120, and a sports The data acquisition module 130, the inferred model training module 140, the inferred module 150, and the optional communication module 160.
  • one or more components of the limb posture estimation device 100 (for example, one or more of the image acquisition module 110, the image processing module 120, the motion data acquisition module 130, the estimation model training module 140, the estimation module 150, and the communication module 160) ), which can be composed of application-specific integrated circuits (ASIC), electronic circuits, (shared, dedicated or group) processors and/or memories that execute one or more software or firmware programs, combinational logic circuits, and other functions that provide the described functions Any combination of suitable components.
  • the processor may be a microprocessor, a digital signal processor, a microcontroller, etc., and/or any combination thereof.
  • the processor may be a single-core processor, a multi-core processor, etc., and/or any combination thereof.
  • the image acquisition module 110 is used to acquire image data of a user, where the image data may include multiple image frames.
  • the image acquisition module 110 may be, but are not limited to, a video camera, a camera, and the like.
  • the image processing module 120 is configured to use, but is not limited to, a skeletal node recognition technology to perform node recognition on users in multiple image frames collected by the image collection module 110.
  • the image processing module 120 is also used to determine the position (for example, coordinates) of at least one limb node of the user's limb in multiple image frames when at least one limb node of the user's limb is not occluded, and thereby determine the user's limb
  • the displacement of at least one limb node between the acquisition moments of two image frames, where the acquisition moments of the two image frames may have a predetermined time interval, and the predetermined time interval may be a multiple of the inverse of the image acquisition frame rate For example, but not limited to, 1 times, 2 times, 3 times the reciprocal of the frame rate, etc.
  • the motion data acquisition module 130 is used to acquire the motion data 30 in FIG. 1.
  • the motion data acquisition module 130 may Including but not limited to, at least one sensor worn on at least one limb node of the user's limb, such as, but not limited to, an acceleration sensor, a gyroscope, a magnetometer, etc., where the acceleration sensor is used to obtain the acceleration of the limb node, and the gyroscope It is used to obtain the angular velocity of the limb node, and the magnetometer is used to obtain the movement direction of the limb node.
  • the time of the motion data acquisition module 130 may be synchronized with the time of the image acquisition module 110.
  • the exercise data acquisition module 130 can obtain the user’s current exercise mode based on the instruction received by the user, where the instruction requires the user to perform a certain type of exercise. Work with the limbs, and the instruction may come from the limb posture estimation device 100 or other devices.
  • the motion data acquisition module 130 may determine the user's current motion mode by using the positions of the unoccluded limb nodes in the multiple image frames determined by the image processing module 120.
  • the inferred model training module 140 is used for setting at least one limb node of the user's limb obtained from the image processing module 120 in two image frames when at least one limb node of the user's limb is not occluded.
  • the displacement between the collection moments of, and the movement data of at least one limb node of the user's limb acquired from the movement data acquisition module 130 between the collection moments of the two image frames are used to train the inference model.
  • the inferred model training module 140 can obtain multiple displacement and motion data related to multiple sets of image frames, where each set of image frames includes two image frames, and the acquisition time of the two image frames has The predetermined time interval described above.
  • speculative models may include, but are not limited to, recurrent neural network (RNN), long short-term memory (LSTM) network, gated recurrent unit (GRU) network, bidirectional At least one of bidirectional recurrent neural network (BRNN).
  • RNN recurrent neural network
  • LSTM long short-term memory
  • GRU gated recurrent unit
  • BRNN bidirectional At least one of bidirectional recurrent neural network
  • the inference model training module 140 is also used to integrate the user's inference model with other users' inference models, so as to improve the accuracy of the inference model.
  • the inference module 150 is used to determine whether at least one limb node of the limb of the user is occluded from the sensor module 13 from the moment it is never occluded to the moment it is occluded.
  • Motion data infer the displacement of at least one limb node of the limb from the moment when it is not occluded to the moment when it is occluded, and determine the position of at least one limb node of the limb according to the position of at least one limb node in the image frame at the time when the limb is not occluded The position in the image frame at the moment of occlusion.
  • the time period between the unobstructed moment and the obstructed moment is the same as the predetermined time interval described above.
  • the inference module 150 is further configured to determine the posture of the limb in the image frame at the occluded moment according to the position of at least one limb node of the limb in the image frame at the occluded moment.
  • the communication module 160 is configured to send an inferred model of the user to an external server, and receive an inferred model of at least one other user from the external server.
  • the image processing module 120 may determine the displacement of at least one limb node of the user's limb between the collection moments of two image frames.
  • FIG. 3 shows the results of node recognition of the image frame sequence F1-F5 through the image processing module 120.
  • the image processing module 120 of FIG. 2 can determine the position coordinates of the right wrist (nodes shown in gray in FIG. 3) in the image frames F1-F5, and determine the position of the right wrist in the image The displacement between the acquisition moments of the two image frames in frames F1-F5.
  • the time interval between the collection moments of two image frames can be 1 times the reciprocal of the frame rate, that is, the image processing module 120 in FIG. 2 can determine whether the right wrist is between the collection moments of the image frames F1 and F2.
  • the time interval between the collection moments of two image frames may be twice the reciprocal of the frame rate.
  • the image processing module 120 in FIG. 2 may determine whether the right wrist is between the collection moments of the image frames F1 and F3.
  • nodes of the right arm such as one or more of the right hand, right elbow, and right shoulder, can also wear sensors, and the image processing module 120 of FIG. 2 can also determine that these nodes are in multiple image frames. The position and the displacement between the acquisition moments of the two image frames. In addition, the number of image frames collected by the image acquisition module and the posture of the user in the image frames are not limited to those shown in FIG. 3.
  • the inferred model training module 140 may obtain the motion data of at least one limb node of the user's limb between the collection moments of two image frames from the motion data acquisition module 130, such as, but not limited to, acceleration, Angular velocity, direction of movement, etc.
  • the speculative model training module 140 of FIG. 2 may obtain the movement data x1 of the right wrist between the collection moments of the image frames F1 and F2 from the movement data acquisition module 130, and the collection of the image frames F2 and F3.
  • the motion data x2 between the times, the motion data x3 between the acquisition times of the image frames F3 and F4, and the motion data x4 between the acquisition times of the image frames F4 and F5.
  • the inferred model training module 140 may use the motion data of at least one limb node of the user's limb between the collection moments of the two image frames as training data, and use the movement data of at least one limb node of the user's limb at the collection moment of the two image frames as training data.
  • the displacement between is used as the training label to train the speculative model.
  • the speculative model training module 140 may train the recurrent neural network through a back propagation algorithm (Back Propagation, BP), a Newton gradient descent algorithm, or other algorithms.
  • Figure 4 shows a schematic structural diagram of a recurrent neural network. As shown in Figure 4, the recurrent neural network includes t (t is a positive integer) neurons A 1 -A t , and the input of the t-th neuron is x t , The output is y t , and the hidden state is h t .
  • the input x t of the t-th neuron may include the movement data of a node of the user's limb between the collection moments of two image frames, for example , but not limited to, acceleration, angular velocity, movement direction, etc.
  • the output y t may include the displacement of a node of the user's limb between the collection moments of the two image frames.
  • the output y t and hidden state h t of the t-th group of neurons can be calculated by the following formula:
  • h t-1 represents the hidden state of the t-1 neuron
  • f and g are activation functions, where f can be activation functions such as tanh, relu, and sigmoid, g can be activation functions such as softmax, and U represents and Input-related weights, W represents the weights related to the hidden state, and V represents the weights related to the output. Therefore, in a recurrent neural network, the output of a neuron is not only related to the input of the neuron, but also related to the hidden state of the previous neuron of the neuron.
  • the inferred model training module 140 when the inferred model training module 140 is training a recurrent neural network, for a node of the user's limb, the inferred model training module 140 can initialize the hidden state and weight parameters of the recurrent neural network, and the node is combined with multiple groups.
  • the multiple sets of motion data related to the image frame are used as the input of each neuron of the cyclic neural network, and the output of each neuron is obtained, that is, multiple displacements of the node and multiple sets of image frames, and then it is inferred that the model training module 140 can According to the error value between the displacement output by each neuron and the actual displacement determined by the image processing module 120, the weight parameter of the recurrent neural network is optimized in the reverse direction.
  • the speculative model training module 140 of FIG. 2 may use the movement data x1, x2, x3, and x4 of the right wrist as the input of the first to fourth neurons of the recurrent neural network in FIG. 4, respectively. , And optimize the weight parameters of the recurrent neural network according to the real displacements s1, s2, s3, and s4 of the right wrist.
  • the inferred model training module 140 can train the inferred models of other nodes wearing sensors of the limbs based on similar principles, and the inferred model training module 140 can also train long short-term memory based on similar principles.
  • LSTM high-term memory
  • GRU gated recurrent unit
  • the speculative model training module 140 may train the bidirectional recurrent neural network through a back propagation algorithm (Back Propagation, BP), a Newton gradient descent algorithm, or other algorithms.
  • BP Back Propagation
  • Figure 5 shows a schematic diagram of the structure of the bidirectional cyclic neural network.
  • the bidirectional cyclic neural network is composed of two cyclic neural networks with opposite directions superimposed on top of each other, including t+1 (t is a positive integer) neuron group (a 1, a '1) ⁇ (a t + 1, a' t + 1), the t-th input neuron group is x t, the output y t, the forward state is hidden h t, trans the hidden states of h 't, in the embodiment of the present application, the t-th input neuron group X t may comprise a data node user motion between limb acquisition instants of the two image frames, for example, but not Limited to acceleration, angular velocity, movement direction, etc., the output y t may include the displacement of a node of the user's limb between the collection moments of the two image frames.
  • the t-th output neuron group Y t, H t hidden and hidden h 't can be calculated by the following equation:
  • h t-1 represents the forward hidden state of the t-1th group of neurons
  • h't +1 represents the reverse hidden state of the t+1th group of neurons
  • f and g are activation functions, where f can be Are activation functions such as tanh, relu, sigmoid, g can be activation functions such as softmax
  • U represents the input-related weight of the forward loop neural network
  • W represents The weight related to the hidden state of the forward recurrent neural network
  • V represents the weight related to the output of the forward recurrent neural network
  • the output of a group of neurons is not only related to the input of the group of neurons, but also related to the hidden state of the two groups of neurons before and after the group of neurons.
  • the training process of the inferred model training module 140 for the bidirectional cyclic neural network can refer to the above-mentioned training process for the cyclic neural network, which will not be repeated here.
  • the inferred model training module 140 can use the information obtained by the motion data acquisition module 130 and the limb node in the motion mode.
  • the acceleration, angular velocity, movement direction and other data are used to train the estimation model of the limb node in the movement mode in the manner described in the above embodiment.
  • the inference module 150 may infer the position of at least one limb node of the limb in the image frame at the moment when at least one limb node of the user's limb is occluded.
  • FIG. 6A shows the result of the image processing module 120 performing node recognition on the image frame sequence F6-F9. As shown in the figure, taking the user's right wrist (shown as a gray node in FIG.
  • the speculation module 150 can use the speculation model based on
  • the movement data of the user's right wrist between t7 and t8 (for example, but not limited to, acceleration, angular velocity, movement direction, etc.) can be used to estimate the displacement of the user's right wrist between t7 and t8, based on the user's
  • the movement data (for example, but not limited to, acceleration, angular velocity, movement direction, etc.) of the right wrist between t8 and t9 are used to estimate the displacement of the user's right wrist between t8 and t9.
  • the inference module 150 can calculate the movement data of the user's right wrist between t7 and t8 (for example, but not limited to, acceleration, angular velocity, movement direction Etc.) and the motion data between time t8 and time t9 (for example, but not limited to, acceleration, angular velocity, motion direction, etc.) are used as the input of neurons A1 and A2 respectively, then the output of the two neurons may respectively include The displacement of the user's right wrist between t7 and t8 and between t8 and t9.
  • the inference module 150 may determine the user's right wrist based on the position coordinates of the user's right wrist in the image frame at time t7 determined by the image processing module 120, and the displacement of the user's right wrist between time t7 and time t8.
  • the output of a neuron is related to the input of the neuron and the hidden state of the previous neuron of the neuron, so the inference module 150 can use the cyclic neural network to infer the limbs in real time.
  • the cyclic neural network can also be used in the scene of non-real-time estimation of the position of the limb node.
  • FIG. 6B shows a situation where the image frame sequence in FIG. 6A also includes image frame F10.
  • the speculation module 150 can use the speculation model , Based on the movement data of the user's right wrist between t7 and t8, between t8 and t9, and between t9 and t10 (for example, but not limited to, acceleration, angular velocity, movement direction, etc.) To estimate the displacement of the user's right wrist between t7 and t8 and between t8 and t9.
  • the inference module 150 can calculate the user's right wrist between t7 and t8, between t8 and t9, and between t9 and t9.
  • data movement between the time t10 e.g., but not limited to, acceleration, angular velocity, direction of movement, etc.
  • the output of the tuple may include the displacement of the user's right wrist between t7 and t8, between t8 and t9, and between t9 and t10, respectively.
  • the inference module 150 may be based on the position coordinates of the user's right wrist in the image frame at time t7 determined by the image processing module 120, and the user's right wrist between time t7 and t8, and between t8 and t9.
  • the output of a neuron group is not only related to the input of the neuron group and the hidden state of the previous neuron group of the neuron group, but also requires the output of the neuron group.
  • the hidden state of the latter neuron group is used as the posterior knowledge, so the inference module 150 can use the bidirectional cyclic neural network to infer the position of at least one limb node of the limb in the image frame at the occluded moment in non-real-time.
  • the inference module 150 may use the user’s current motion pattern acquired by the motion data acquisition module 130 as the prior knowledge. Infer the position of at least one limb node of the user's limb at the moment of being occluded. Specifically, the inference module 150 may determine the inference model corresponding to the motion mode based on the user's current motion mode, and then use the inference model to infer that at least one limb node of the user's limb is blocked in the manner described in the above-mentioned embodiment. The location at the moment.
  • the inference module 150 may also use the user’s current motion pattern acquired by the motion data acquisition module 130 as the prior knowledge. To infer the position of at least one limb node of the user's limb at the moment of being occluded. Specifically, in the case where at least one limb node of the user's limb is occluded, the inference module 150 may use a speculation model based on the movement data (for example, but not limited to) of at least one limb node of the user's limb acquired from the movement data acquisition module 130.
  • the inferred module 150 can reduce the probability of a displacement that does not conform to the user’s current motion pattern among the multiple displacements, and increase accordingly According to the displacement probability of the user's current exercise mode, the displacement with the maximum estimated probability is finally output.
  • the inference module 150 can infer the position coordinates of other nodes of the user's right wrist in the image frame at the occluded moment based on a principle similar to the above, and thereby determine that the user's right arm is in the image frame at the occluded moment. Stance. Further, in at least one occluded limb node of the user's right arm, if only part of the node wears the sensor, the inference module 150 can also infer the user's right based on the position coordinates of the part of the node in the image frame at the moment of occlusion.
  • the position coordinates of other nodes of the arm that are occluded in the image frame at the occluded moment are used to determine the posture of the user's right arm in the image frame at the occluded moment.
  • the inference module 150 can pass the inverse kinematics (IK) method based on the user’s right wrist.
  • IK inverse kinematics
  • the position solves the rotation angle of each joint in the limb motion chain; in another example, the inference module 150 can be based on the constraints of human joint motion, the arm length is fixed, and the position of the limb node is continuously changing, using the user
  • the position coordinates of the right wrist in the image frame at the occluded moment are used to infer the position coordinates of other nodes of the user's right arm that are occluded in the image frame at the occluded moment.
  • the inferred model training module 140 may use the limb node wearing the sensor and the limb node not wearing the sensor determined by the image processing module 120 ( For example, the displacement of the right elbow of the right arm between the collection moments of two image frames, and the movement data of the limb node wearing the sensor obtained from the movement data acquisition module 130 between the collection
  • the number of image frames collected by the image acquisition module 110 and the user's posture in the image frames are not limited to those shown in FIGS. 6A and 6B, and the inference module 150 may infer other users based on similar principles as described above.
  • the inference module 150 can also use the inference model to infer the position coordinates of the at least one limb node of the user's limb in the image frame at the time when the user's limb is not occluded.
  • the prediction model training module 140 may compare it with the position coordinates of at least one limb node of the user's limb determined by the image processing module 120 in the image frame at the unobstructed moment, and obtain the prediction accuracy of the prediction model.
  • the speculative model training module 140 may calculate the position coordinates of the node in an image frame speculated by the speculative model and the positional coordinates of the node determined by the image processing module 120.
  • the distance between the position coordinates of the node in the image frame (for example, but not limited to, Euclidean distance, cosine distance, etc.), and calculate the estimation accuracy of the estimation model based on multiple distances related to multiple image frames, for example, The mean, maximum, median, etc. of these distances are used as the estimation accuracy of the estimation model.
  • the inferred model training module 140 can use the communication module 160 to calculate the user's physical parameters (for example, but not limited to, the length of each part of the limb), the inferred model of at least one limb node of the user's limb, and the inferred accuracy of the inferred model.
  • the external server may return inferred models of other users whose physical parameters are similar to those of the user.
  • the inferred models of other users are used to infer at least one of the limbs of other users when at least one limb node of the limbs of other users is occluded.
  • the position of a limb node at the moment of being occluded, and the estimation accuracy of other users' estimation models is greater than or equal to a predetermined accuracy value.
  • the inferred model training module 140 may integrate the user's inferred model with other users' inferred models, and at least one limb node of the user's limb is occluded.
  • the inference module 150 may use an integrated inference model to infer the position of at least one limb node of the user's limb at the moment when it is blocked.
  • the inferred model training module 140 may be integrated based on the Bagging algorithm (bootstrap aggregating, guided aggregation algorithm).
  • the Bagging algorithm combines multiple models to reduce generalization errors. The principle is to train multiple different models separately.
  • Models and then vote on the output of multiple models in the test set according to the rules. For example, the average value of the outputs of multiple models is used as the final output.
  • the test set may be included in the body’s output.
  • the movement data of at least one limb node of the limb between the collection moments of two image frames, and the model training module 140 can be estimated based on the at least one limb node determined by the image processing module 120. The actual displacement of the limb node between the collection moments of the two image frames is used to optimize the voting rules.
  • the motion data and displacement of at least one limb node of the user's limb are used to train the inference model.
  • the motion data and the displacement are used to train the inference model.
  • the possible posture of the occluded part is manually guessed as the training label.
  • the accuracy and accuracy of the inferred model according to the embodiments of the present application are Robustness will be higher.
  • the movement data of the limb node after the moment of being occluded and the position of the limb node at the moment of being unoccluded again are used as The posterior knowledge, inferring the displacement of the limb nodes, can improve the accuracy of displacement estimation.
  • the user's movement pattern is used as a priori knowledge to estimate the position of at least one limb node of the user's limb at the moment of being occluded, which can improve the accuracy of displacement estimation.
  • the accuracy of the estimation of the displacement of the user's limb node can be improved, especially when the user's training data is less, which leads to the user's When the prediction performance of the prediction model is poor.
  • FIG. 7 shows a schematic flowchart of a method for training a speculation model for limb posture inference according to an embodiment of the present application.
  • One or more modules of the limb posture inference apparatus 100 in FIG. 2 can implement different blocks of the method. Or other parts.
  • the training method of the inference model used for body posture estimation may include:
  • Step 701 when at least one limb node of the user's limb is not blocked, collect image data of the user's movement through the image acquisition module 110, where the image data may include multiple image frames.
  • An example of the image acquisition module 110 may be, But not limited to video cameras, cameras, etc.
  • Step 702 through the image processing module 120, perform node identification on the user in the multiple image frames collected by the image acquisition module 110, for example, but not limited to, identifying the user's skeletal nodes, such as the head and the wrist, through the skeletal node recognition technology , Elbows, shoulders, knees, ankles, etc.
  • Step 703 Through the image processing module 120, the position of at least one limb node of the user's limb in the multiple image frames and the displacement between the collection moments of the two image frames are determined.
  • the position of at least one limb node of the user's limb in the multiple image frames may include, but is not limited to, the relative coordinates of the at least one limb node of the user's limb in the multiple image frames.
  • the acquisition moments of the two image frames may have a predetermined time interval, and the predetermined time interval may be a multiple of the reciprocal of the image acquisition frame rate, for example, but not limited to, 1 times, 2 times, and 3 times the reciprocal of the frame rate Wait.
  • the image processing module 120 can determine the displacement of at least one limb node of the user's limbs with respect to multiple sets of image frames, where each set of image frames includes two image frames, and the acquisition time of the two image frames There is a predetermined time interval as described above.
  • step 704 the movement data of at least one limb node of the user's limb is obtained through the movement data acquisition module 130, such as, but not limited to, acceleration, angular velocity, movement direction, movement mode, and the like.
  • the model training module 140 is used to estimate the displacement of at least one limb node of the user's limb obtained from the image processing module 120 between the collection moments of two image frames, and the movement data obtained from the motion data obtaining module 130.
  • the motion data of at least one limb node between the collection moments of the two image frames is used to train a speculative model.
  • the inferred model training module 140 can obtain multiple displacement and motion data related to multiple sets of image frames, where each set of image frames includes two image frames, and the acquisition time of the two image frames has The predetermined time interval described above.
  • speculative models may include, but are not limited to, recurrent neural network (RNN), long short-term memory (LSTM) network, gated recurrent unit (GRU) network, bidirectional At least one of bidirectional recurrent neural network (BRNN).
  • RNN recurrent neural network
  • LSTM long short-term memory
  • GRU gated recurrent unit
  • BRNN bidirectional At least one of bidirectional recurrent neural network
  • Step 706 when at least one limb node of the user's limb is not blocked, the prediction model training module 140 determines the prediction accuracy of the prediction model, and the communication module 160 calculates the user's body parameters (for example, but not limited to, limb The length of each part), the estimation model of at least one limb node of the user's limb, and the estimation accuracy of the estimation model are sent to the external server.
  • the prediction model training module 140 determines the prediction accuracy of the prediction model
  • the communication module 160 calculates the user's body parameters (for example, but not limited to, limb The length of each part), the estimation model of at least one limb node of the user's limb, and the estimation accuracy of the estimation model are sent to the external server.
  • the inference module 150 can be used to infer the position coordinates of the at least one limb node of the user's limb in the image frame at the time when the user's limb is not occluded.
  • the prediction model training module 140 may compare it with the position coordinates of at least one limb node of the user's limb determined by the image processing module 120 in the image frame at the unobstructed moment, and obtain the prediction accuracy of the prediction model.
  • the speculative model training module 140 may calculate the position coordinates of the node in an image frame inferred by the speculative model and the position coordinates of the node in the image determined by the image processing module 120.
  • the distance between the position coordinates in the frame (for example, but not limited to, Euclidean distance, cosine distance, etc.), and the estimation accuracy of the estimation model is calculated based on the multiple distances related to multiple image frames, for example, based on these distances
  • the mean, maximum, median, etc. are used as the estimation accuracy of the estimation model.
  • Step 707 through the communication module 160, receive from an external server inferred models of other users whose physical parameters are similar to those of the user, where the inferred models of other users are used when at least one limb node of the limb of the other user is blocked. , Infer the position of at least one limb node of the limbs of other users at the moment of being occluded, and the estimation accuracy of the estimation models of other users is greater than or equal to a predetermined accuracy value.
  • step 708 the inference model of the user and the inference model of at least one other user are integrated through the inference model training module 140, and an integrated inference model is obtained.
  • the inferred model training module 140 may be integrated based on the Bagging algorithm (bootstrap aggregating, guided aggregation algorithm).
  • the Bagging algorithm combines multiple models to reduce generalization errors. The principle is to train multiple different models separately. Models, and then vote on the output of multiple models in the test set according to the rules. For example, the average value of the outputs of multiple models is used as the final output.
  • the test set may be included in the body’s output.
  • the model training module 140 can be estimated based on the at least one limb node determined by the image processing module 120. The actual displacement of the limb node between the collection moments of the two image frames is used to optimize the voting rules.
  • FIG. 8 shows a schematic flow chart of a limb posture estimation method according to an embodiment of the present application.
  • One or more modules of the limb posture estimation apparatus 100 in Fig. 2 may implement different blocks or other parts of the method.
  • the method for inferring body posture may include:
  • image data of the user's movement is collected through the image acquisition module 110, where the image data may include image frames.
  • the image acquisition module 110 may be, but are not limited to, a video camera, a camera, and the like.
  • Step 802 through the image processing module 120, perform node recognition on the user in the current image frame, for example, but not limited to, recognize the user’s skeletal nodes, such as head, wrist, elbow, shoulder, and knee, through skeletal node recognition technology. , Ankles, etc.
  • step 803 through the image processing module 120, it is determined whether there is a node of a limb in the current image frame is occluded, if it is, step 804 is performed, and if not, step 807 is performed.
  • the image processing module 120 may compare the node recognition result of the current image frame with the complete nodes of the human body to determine whether there are nodes of the user's limbs that are occluded in the current image frame, and which nodes are occluded.
  • Step 804 through the image processing module 120, determine the previous image frame from the current image frame, and determine the motion data of at least one limb node of the user's limb that is occluded between the current image frame and the collection moment of the previous image frame, For example, but not limited to, acceleration, angular velocity, movement direction, movement mode, etc.
  • the collection time of the previous image frame and the current image frame has a predetermined time interval
  • the predetermined time interval may be a multiple of the reciprocal of the image acquisition frame rate, for example, but not limited to, 1 times or 2 times the reciprocal of the frame rate , 3 times, etc.
  • the position of the at least one limb node of the limb in the previous image frame is known. For example, in the previous image frame, if the at least one limb node of the limb is not occluded, image processing can be performed.
  • the module 120 determines the position of the at least one limb node of the limb in the previous image frame; in the previous image frame, when the at least one limb node of the limb is occluded, the inference module 150 can be used according to this embodiment The position of the at least one limb node of the limb in the previous image frame is determined.
  • Step 805 through the inference module 150, use the inference model, for example, but not limited to, the recurrent neural network to infer the limb based on the movement data of at least one limb node that is occluded between the current image frame and the previous image frame.
  • the displacement of at least one occluded limb node between the acquisition time of the current image frame and the previous image frame.
  • Step 806 through the inference module 150, based on the position of the at least one limb node whose limb is occluded in the previous image frame, and the position of the at least one limb node whose limb is determined in step 805 is in the current image frame and the previous image frame The displacement between the collection moments of the, determines the position of at least one limb node that is occluded by the limb in the current image frame.
  • step 807 the posture of the limb is determined based on the position of each limb node of the limb in the current image frame through the estimation module 150.
  • the inference module 150 can also infer other occluded limb nodes of the user's limbs based on the position of the part of the limb nodes in the current image frame The position in the current image frame determines the posture of the user's limbs in the current image frame.
  • the inference module 150 can use, but is not limited to, inverse kinematics to determine that other limb nodes of the user’s right arm are currently The position in the image frame, where the inverse kinematics method solves the rotation angle of each joint in the limb motion chain by the position of the given extremity and the fixed end; in another example, the inference module 150 may be based on the limitation of the human joint motion, The arm length is fixed and the position of the limb node is continuously changing. The position coordinates of the user's right wrist in the image frame at the moment of occlusion are used to infer other nodes of the user's right arm that are occluded at the moment of occlusion. The position coordinates in the image frame.
  • Fig. 9 shows a schematic flow chart of a limb posture estimation method according to an embodiment of the present application.
  • One or more modules of the limb posture estimation apparatus 100 in Fig. 2 may implement different blocks or other parts of the method.
  • the method for inferring body posture may include:
  • Step 901 Collect image data of the user's movement through the image acquisition module 110, where the image data may include multiple image frames.
  • the image acquisition module 110 may be, but are not limited to, a video camera, a camera, and the like.
  • Step 902 through the image processing module 120, perform node identification on the user in the multiple image frames collected by the image acquisition module 110, for example, but not limited to, identifying the user's skeletal nodes, such as the head and the wrist, through the skeletal node recognition technology , Elbows, shoulders, knees, ankles, etc.
  • Step 903 Through the image processing module 120, it is determined whether there is an image frame in which the limb node is occluded in the multiple image frames, if yes, step 904 is performed, and if not, step 908 is performed.
  • the image processing module 120 may compare the node recognition results of multiple image frames with the complete nodes of the human body to determine whether there are limb nodes that are occluded and which ones in each image frame of the multiple image frames. Limb nodes are blocked.
  • Step 904 Through the image processing module 120, for a limb node that is occluded in at least one of the multiple image frames, determine the image frame at the unoccluded moment before the limb node is occluded in the multiple image frames, and The image frame at the moment when it is not blocked again.
  • the acquisition time of two adjacent image frames in time may include a predetermined time interval, and the predetermined time interval may be a multiple of the inverse of the image acquisition frame rate, for example, But it is not limited to 1 times, 2 times, 3 times, etc. of the reciprocal of the frame rate.
  • Step 905 Obtain the movement data of at least one limb node of the user's limb through the movement data acquisition module 130, such as, but not limited to, acceleration, angular velocity, movement direction, movement mode, etc., which are included in at least one of the multiple image frames. Motion data of at least one limb node that is occluded in the image frame.
  • Step 906 For a limb node that is occluded in at least one of the multiple image frames, the inference module 150 uses the inference model to infer the movement data of the limb node between the collection moments of the two image frames. The displacement of the limb node between the collection moments of the two image frames.
  • the two image frames are two image frames that are adjacent in time among the image frame at the time when it is not blocked, the image frame at the time when it is blocked, and the image frame at the time when it is not blocked again.
  • step 907 for a limb node that is occluded in at least one of the multiple image frames, through the estimation module 150, based on the position of the limb node in the image frame at the time when the limb is not occluded, and the position determined in step 906
  • the displacement of the limb node between the collection moments of the two image frames determines the position of the limb node in the image frame at at least one occluded moment.
  • the position of the limb node in the image frame at the moment when the limb is not occluded again and in step The displacement of the limb node determined in 907 between the collection moments of the two image frames determines the position of the limb node in the image frame at at least one occluded moment.
  • Step 908 through the estimation module 150, determine the posture of the limb in the image frame at the occluded moment based on the position of each limb node of the limb in the image frame at the occluded moment.
  • the inference module 150 may also be based on the position of the part of the limb node in the image frame For example, through, but not limited to, the inverse kinematics method, infer the position of other occluded limb nodes of the user's limb in the image frame, and then determine the posture of the user's limb in the image frame.
  • the motion data and displacement of at least one limb node of the user's limb are used to train the inference model.
  • the motion data and the displacement are used to train the inference model.
  • the possible posture of the occluded part is manually guessed as the training label.
  • the accuracy and accuracy of the inferred model according to the embodiments of the present application are Robustness will be higher.
  • the movement data of the limb node after the moment of being occluded and the position of the limb node at the moment of being unoccluded again are used as The posterior knowledge, inferring the displacement of the limb nodes, can improve the accuracy of displacement estimation.
  • the user's movement pattern is used as a priori knowledge to estimate the position of at least one limb node of the user's limb at the moment of being occluded, which can improve the accuracy of displacement estimation.
  • the accuracy of the estimation of the displacement of the user's limb node can be improved, especially when the user's training data is less, which leads to the user's When the prediction performance of the prediction model is poor.
  • FIG. 10 shows a schematic structural diagram of a running route processing device 1000 according to an embodiment of the present application.
  • the device 1000 may include one or more processors 1002, a system control logic 1008 connected to at least one of the processors 1002, a system memory 1004 connected to the system control logic 1008, and a non-volatile memory connected to the system control logic 1008 (NVM) 1006, and a network interface 1010 connected to the system control logic 1008.
  • processors 1002 a system control logic 1008 connected to at least one of the processors 1002, a system memory 1004 connected to the system control logic 1008, and a non-volatile memory connected to the system control logic 1008 (NVM) 1006, and a network interface 1010 connected to the system control logic 1008.
  • NVM non-volatile memory
  • the processor 1002 may include one or more single-core or multi-core processors.
  • the processor 1002 may include any combination of a general-purpose processor and a special-purpose processor (for example, a graphics processor, an application processor, a baseband processor, etc.).
  • the processor 1002 may be configured to execute one or more embodiments according to the various embodiments shown in FIGS. 7-9.
  • system control logic 1008 may include any suitable interface controller to provide any suitable interface to at least one of the processors 1002 and/or any suitable device or component in communication with the system control logic 1008.
  • system control logic 1008 may include one or more memory controllers to provide an interface to the system memory 1004.
  • the system memory 1004 can be used to load and store data and/or instructions.
  • the memory 1004 of the device 1000 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).
  • DRAM dynamic random access memory
  • the NVM/memory 1006 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the NVM/memory 1006 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as HDD (Hard Disk Drive, hard disk drive), CD (Compact Disc , At least one of an optical disc drive and a DVD (Digital Versatile Disc, Digital Versatile Disc) drive.
  • the NVM/memory 1006 may include a part of storage resources installed on the apparatus of the apparatus 1000, or it may be accessed by the device, but not necessarily a part of the device.
  • the NVM/storage 1006 can be accessed through the network via the network interface 1010.
  • system memory 1004 and the NVM/memory 1006 may respectively include: a temporary copy and a permanent copy of the instruction 1020.
  • the instruction 1020 may include an instruction that, when executed by at least one of the processors 1002, causes the apparatus 1000 to implement the method shown in FIGS. 3-4.
  • instructions 1020, hardware, firmware, and/or software components thereof may additionally/alternatively be placed in system control logic 1008, network interface 1010, and/or processor 1002.
  • the network interface 1010 may include a transceiver to provide a radio interface for the device 1000, and then communicate with any other suitable equipment (such as a front-end module, an antenna, etc.) through one or more networks.
  • the network interface 1010 may be integrated with other components of the device 1000.
  • the network interface 1010 may be integrated in at least one of the processor 1002, the system memory 1004, the NVM/storage 1006, and a firmware device (not shown) with instructions.
  • the network interface 1010 may further include any suitable hardware and/or firmware to provide a multiple input multiple output radio interface.
  • the network interface 1010 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.
  • At least one of the processors 1002 may be packaged with the logic of one or more controllers used for the system control logic 1008 to form a system in package (SiP). In one embodiment, at least one of the processors 1002 may be integrated on the same die with the logic of one or more controllers used for the system control logic 1008 to form a system on chip (SoC).
  • SiP system in package
  • SoC system on chip
  • the device 1000 may further include: an input/output (I/O) interface 1012.
  • the I/O interface 1012 may include a user interface to enable the user to interact with the device 1000; the design of the peripheral component interface enables the peripheral components to also interact with the device 1000.
  • the device 1000 further includes a sensor for determining at least one of environmental conditions and location information related to the device 1000.
  • the user interface may include, but is not limited to, a display (e.g., liquid crystal display, touch screen display, etc.), speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
  • a display e.g., liquid crystal display, touch screen display, etc.
  • speakers e.g., speakers, microphones, one or more cameras (e.g., still image cameras and/or video cameras), flashlights (e.g., LED flash) and keyboard.
  • the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
  • the senor may include, but is not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit.
  • the positioning unit may also be part of or interact with the network interface 1010 to communicate with components of the positioning network (eg, global positioning system (GPS) satellites).
  • GPS global positioning system
  • module or “unit” can refer to, be, or include: application specific integrated circuit (ASIC), electronic circuit, (shared, dedicated, or group) processing that executes one or more software or firmware programs And/or memory, combinatorial logic circuits, and/or other suitable components that provide the described functions.
  • ASIC application specific integrated circuit
  • electronic circuit shared, dedicated, or group
  • processing that executes one or more software or firmware programs And/or memory, combinatorial logic circuits, and/or other suitable components that provide the described functions.
  • the various embodiments of the mechanism disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • the embodiments of the present application can be implemented as a computer program or program code executed on a programmable system.
  • the programmable system includes at least one processor and a storage system (including volatile and non-volatile memory and/or storage elements) , At least one input device and at least one output device.
  • Program codes can be applied to input instructions to perform the functions described in this application and generate output information.
  • the output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • the program code can be implemented in a high-level programming language or an object-oriented programming language to communicate with the processing system.
  • assembly language or machine language can also be used to implement the program code.
  • the mechanism described in this application is not limited to the scope of any particular programming language. In either case, the language can be a compiled language or an interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • one or more aspects of at least some embodiments may be implemented by representative instructions stored on a computer-readable storage medium.
  • the instructions represent various logics in the processor, and the instructions, when read by a machine, cause This machine makes the logic used to execute the techniques described in this application.
  • IP cores can be stored on a tangible computer-readable storage medium and provided to multiple customers or production facilities to be loaded into the manufacturing machine that actually manufactures the logic or processor.
  • Such computer-readable storage media may include, but are not limited to, non-transitory tangible arrangements of objects manufactured or formed by machines or equipment, including storage media, such as hard disks, any other types of disks, including floppy disks, optical disks, compact disks, etc.
  • CD-ROM Compact disk rewritable
  • CD-RW compact disk rewritable
  • magneto-optical disk semiconductor devices such as read only memory (ROM), such as dynamic random access memory (DRAM) and static random access Random access memory (RAM) such as memory (SRAM), erasable programmable read-only memory (EPROM), flash memory, electrically erasable programmable read-only memory (EEPROM); phase change memory (PCM); magnetic card Or optical card; or any other type of medium suitable for storing electronic instructions.
  • ROM read only memory
  • DRAM dynamic random access memory
  • RAM static random access Random access memory
  • SRAM erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • PCM phase change memory
  • magnetic card Or optical card or any other type of medium suitable for storing electronic instructions.
  • each embodiment of the present application also includes a non-transitory computer-readable storage medium, which contains instructions or contains design data, such as hardware description language (HDL), which defines the structures, circuits, devices, etc. described in the present application. Processor and/or system characteristics.
  • HDL hardware description language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

一种确定用户的至少一个肢体节点的位置的方法、装置、介质及系统,其中所述方法包括:在至少一个肢体节点未被遮挡的情况下,根据至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定至少一个肢体节点在第一时刻和第二时刻之间的第一时间段内的第一位移;获取与至少一个肢体节点在第一时间段内的运动相关的第一运动数据;至少部分地根据第一位移和第一运动数据,训练推测模型,其中推测模型用于在至少一个肢体节点被遮挡的情况下,推测至少一个肢体节点的被遮挡位置。该方法利用存在直接的对应关系的运动数据和位移来训练推测模型,可以提高推测模型的准确性和鲁棒性。

Description

一种用户的肢体节点的位置确定方法、装置、介质及系统
本申请要求2019年12月25日递交的申请号为201911358174.4、发明名称为“一种用户的肢体节点的位置确定方法、装置、介质及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请的一个或多个实施例通常涉及人工智能领域,具体涉及一种用户的肢体节点的位置确定方法、装置、介质及系统。
背景技术
基于图像的人体姿态识别中,经常遇到部分肢体不可见(被遮挡、超出摄像头视野等)的情况,此时该部分肢体的姿态通常难以准确估计,给上层的应用带来不便。
在现有技术中,通常直接通过深度学习(如神经网络)猜测被遮挡肢体的位置。具体方法为,在训练样本集中,人工标注被遮挡部位的可能姿态,通过训练让模型学习这些被遮挡的情况,并在使用过程中,直接推测被遮挡部分肢体的姿态。
发明内容
以下从多个方面介绍本申请,以下多个方面的实施方式和有益效果可互相参考。
第一方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的方法,该方法包括在至少一个肢体节点未被遮挡的情况下,根据至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定至少一个肢体节点在第一时刻和第二时刻之间的第一时间段内的第一位移;获取与至少一个肢体节点在第一时间段内的运动相关的第一运动数据;至少部分地根据第一位移和第一运动数据,训练推测模型,其中推测模型用于在至少一个肢体节点被遮挡的情况下,推测至少一个肢体节点的被遮挡位置。
根据本申请的实施例,在用户肢体的至少一个肢体节点未被遮挡的情况下,利用用户肢体的至少一个肢体节点的运动数据和位移来训练推测模型,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中在训练推测模型时以人工猜测被遮挡部位的可能姿态作为训练标签,本申请实施例的推测模型的准确性和鲁棒性会更高。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,根据至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定至少一个肢体节点在第一时刻和第二时刻之间的第一时间段内的第一位移,还包括:在第一时刻获取第一图像帧,并且在第二时刻获取第二图像帧;根据第一图像帧 中至少一个肢体节点的位置和在第二图像帧中至少一个肢体节点的位置,确定至少一个肢体节点在第一时间段内的第一位移。
在一些实施例中,至少部分地根据第一位移和第一运动数据,训练推测模型,还包括:至少部分地将第一运动数据作为特征输入并且将第一位移作为目标类别,训练推测模型。
在一些实施例中,推测模型包括循环神经网络(recurrent neural network,RNN)、长短期记忆(long short-term memory,LSTM)网络、门控循环单元(gated recurrent unit,GRU)网络、双向循环神经网络(bidirectional recurrent neural network,BRNN)中的至少一种。
在一些实施例中,该方法还包括:在至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第二时间段内的运动相关的第二运动数据,其中第二时间段包括至少一个肢体节点在未被遮挡的时刻到被遮挡的时刻之间的时间段;利用推测模型,基于第二运动数据,推测至少一个肢体节点在第二时间段内的第二位移;至少部分地基于第二位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置,确定至少一个肢体节点在被遮挡的情况下的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的情况下的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
在一些实施例中,第二时间段的长度与第一时间段的长度相同。
在一些实施例中,该方法还包括:在至少一个肢体节点在从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第三时间段内的运动相关的第三运动数据,其中,第三时间段包括在未被遮挡的时刻和再次未被遮挡的时刻之间的时间段;利用推测模型,基于第三运动数据,推测至少一个肢体节点在第三时间段内的第三位移;至少部分地基于第三位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置和在再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定至少一个肢体节点在被遮挡的情况下的被遮挡位置。
根据本申请的实施例,以肢体节点在被遮挡时刻和再次未被遮挡的时刻之间的运动数据作为后验知识,推测肢体节点在未被遮挡时刻和被遮挡时刻之间的位移,可以提高位移推测的准确性。
在一些实施例中,第三运动数据包括第三加速度、第三角速度、第三运动方向以及第三运动模式中的至少一个。
在一些实施例中,其中第三时间段的长度与第一时间段的长度相同。
在一些实施例中,该方法还包括:接收用于至少一个其他用户的其他推测模型,其中其他推测模型用于在至少一个其他用户的至少一个肢体节点被遮挡的情况下,推 测至少一个其他用户的至少一个肢体节点的被遮挡位置;对推测模型和其他推测模型进行集成,并获取集成的推测模型;在用户的至少一个肢体节点被遮挡的情况下,利用集成的推测模型推测至少一个肢体节点的被遮挡位置。
根据本申请的实施例,通过对用户的推测模型和其他用户的推测模型进行集成,可以提升对用户肢体节点的位移的推测精度,尤其是在用户的训练数据较少导致用户的推测模型的推测性能较差的情况下。
第二方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的方法,该方法包括:在至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中第一时间段包括至少一个肢体节点在未被遮挡的时刻到被遮挡的时刻之间的时间段;利用推测模型,基于第一运动数据,推测至少一个肢体节点在第一时间段内的第一位移;至少部分地基于第一位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置,确定至少一个肢体节点在被遮挡的时刻的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的时刻的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,推测模型包括至少部分地基于至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中至少一个肢体节点在第二时间段内未被遮挡,并且第二时间段的长度与第一时间段的长度相同。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个
在一些实施例中,推测模型包括循环神经网络、长短期记忆网络、门控循环单元中的至少一种。
在一些实施例中,至少部分地基于第一位移以及至少一个肢体节点在未被遮挡时刻的未被遮挡位置,确定至少一个肢体节点在被遮挡时刻的被遮挡位置,还包括:在未被遮挡的时刻获取至少一个肢体节点的未被遮挡图像帧,并根据未被遮挡图像帧确定未被遮挡位置。
第三方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的方法,
该方法包括,在用户的至少一个肢体节点从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中第一时间段包括在未被遮挡的时刻和再次未被遮挡的时刻之间的时间段;利用推测模型,基于第一运动数据,推测至少一个肢体节点在第一时间段内的第一位移;至少部分地基于第一位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置和再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定至少一个肢体节点在被遮挡的时刻的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到再次未被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的时刻的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高,另外,以肢体节点在被遮挡时刻和再次未被遮挡的时刻之间的运动数据作为后验知识,推测肢体节点在未被遮挡时刻和被遮挡时刻之间的位移,可以进一步提高位移推测的准确性。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,推测模型包括至少部分地基于至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中至少一个肢体节点在第二时间段内未被遮挡,并且其中第二时间段的长度与从未被遮挡的时刻到被遮挡的时刻之间的时间段的长度相同,和/或,第二时间段的长度与从被遮挡的时刻到再次未被遮挡的时刻之间的时间段的长度相同。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
在一些实施例中,推测模型包括双向循环神经网络。
在一些实施例中,第一位移包括从未被遮挡位置到被遮挡位置的位移和从被遮挡位置到再次未被遮挡位置的位移中的至少一个。
在一些实施例中,至少部分地基于第一位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置和再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定至少一个肢体节点在被遮挡情况下的被遮挡位置,还包括:获取至少一个肢体节点在未被遮挡的时刻的未被遮挡图像帧,并根据未被遮挡图像帧确定未被遮挡位置;和/或获取至少一个肢体节点在再次未被遮挡的时刻的再次未被遮挡图像帧,并根据再次未被遮挡图像帧确定再次未被遮挡位置。
第四方面,本申请实施例提供了一种计算机可读存储介质,在该计算机可读存储上存储有指令,当指令在机器上运行时,使得机器执行以上任意一种方法。
第五方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的系统,该系统包括:处理器;存储器,在存储器上存储有指令,当指令被处理器运行时,使得处理器执行以上任意一种方法。
第六方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的装置,该装置包括:图像处理模块,用于在至少一个肢体节点未被遮挡的情况下,根据至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定至少一个肢体节点在第一时刻和第二时刻之间的第一时间段内的第一位移;运动数据获取模块,获取与至少一个肢体节点在第一时间段内的运动相关的第一运动数据;推测模型训练模块,至少部分地根据第一位移和第一运动数据,训练推测模型,其中推测模型用于在至少一个肢体节点被遮挡的情况下,推测至少一个肢体节点的被遮挡位置。
根据本申请的实施例,在用户肢体的至少一个肢体节点未被遮挡的情况下,利用 用户肢体的至少一个肢体节点的运动数据和位移来训练推测模型,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中在训练推测模型时以人工猜测被遮挡部位的可能姿态作为训练标签,本申请实施例的推测模型的准确性和鲁棒性会更高。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,所述装置还包括图像采集模块,用于在所述第一时刻获取第一图像帧,并且在所述第二时刻获取第二图像帧;并且其中,所述图像处理模块根据所述第一图像帧中所述至少一个肢体节点的位置和在所述第二图像帧中所述至少一个肢体节点的位置,确定所述至少一个肢体节点在所述第一时间段内的所述第一位移。
在一些实施例中,推测模型训练模块用于至少部分地根据第一位移和第一运动数据,训练推测模型,包括用于:至少部分地将第一运动数据作为特征输入并且将第一位移作为目标类别,训练推测模型。
在一些实施例中,推测模型包括循环神经网络、长短期记忆网络、门控循环单元网络、双向循环神经网络中的至少一种。
在一些实施例中,运动数据获取模块还用于,在至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第二时间段内的运动相关的第二运动数据,其中第二时间段包括至少一个肢体节点在未被遮挡的时刻到被遮挡的时刻之间的时间段;和装置还包括推测模块,推测模块用于,利用推测模型,基于第二运动数据,推测至少一个肢体节点在第二时间段内的第二位移;以及推测模块还用于,至少部分地基于第二位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置,确定至少一个肢体节点在被遮挡的情况下的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的情况下的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
在一些实施例中,第二时间段的长度与第一时间段的长度相同。
在一些实施例中,运动数据获取模块还用于,在至少一个肢体节点在从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第三时间段内的运动相关的第三运动数据,其中,第三时间段包括在未被遮挡的时刻和再次未被遮挡的时刻之间的时间段;装置还包括推测模块,推测模块用于,利用推测模型,基于第三运动数据,推测至少一个肢体节点在第三时间段内的第三位移;以及推测模块还用于,至少部分地基于第三位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置和在再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定至少一个肢体节点在被遮挡的情况下的被遮挡位置。
根据本申请的实施例,以肢体节点在被遮挡时刻和再次未被遮挡的时刻之间的运动数据作为后验知识,推测肢体节点在未被遮挡时刻和被遮挡时刻之间的位移,可以提高位移推测的准确性。
在一些实施例中,第三运动数据包括第三加速度、第三角速度、第三运动方向以及第三运动模式中的至少一个。
在一些实施例中,其中第三时间段的长度与第一时间段的长度相同。
在一些实施例中,装置还包括通信模块,用于接收用于至少一个其他用户的其他推测模型,其中其他推测模型用于在至少一个其他用户的至少一个肢体节点被遮挡的情况下,推测至少一个其他用户的至少一个肢体节点的被遮挡位置;和推测模型训练模块还用于,对推测模型和其他推测模型进行集成,并获取集成的推测模型;以及推测模块还用于,在用户的至少一个肢体节点被遮挡的情况下,利用集成的推测模型推测至少一个肢体节点的被遮挡位置。
根据本申请的实施例,通过对用户的推测模型和其他用户的推测模型进行集成,可以提升对用户肢体节点的位移的推测精度,尤其是在用户的训练数据较少导致用户的推测模型的推测性能较差的情况下。
第七方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的装置,该装置包括:运动数据获取模块,用于在至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中第一时间段包括至少一个肢体节点在未被遮挡的时刻到被遮挡的时刻之间的时间段;推测模块,用于利用推测模型,基于第一运动数据,推测至少一个肢体节点在第一时间段内的第一位移;推测模块还用于,至少部分地基于第一位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置,确定至少一个肢体节点在被遮挡的时刻的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的时刻的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,推测模型包括至少部分地基于至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中至少一个肢体节点在第二时间段内未被遮挡,并且第二时间段的长度与第一时间段的长度相同。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
在一些实施例中,推测模型包括循环神经网络、长短期记忆网络、门控循环单元中的至少一种。
在一些实施例中,该装置还包括图像采集模块和图像处理模块,其中,图像采集模块用于在未被遮挡的时刻获取至少一个肢体节点的未被遮挡图像帧,图像处理模块 用于根据未被遮挡图像帧确定未被遮挡位置。
第八方面,本申请实施例提供了一种确定用户的至少一个肢体节点的位置的装置,该装置包括:运动数据获取模块,用于在用户的至少一个肢体节点从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中第一时间段包括在未被遮挡的时刻和再次未被遮挡的时刻之间的时间段;推测模块,用于利用推测模型,基于第一运动数据,推测至少一个肢体节点在第一时间段内的第一位移;推测模块还用于,至少部分地基于第一位移以及至少一个肢体节点在未被遮挡的时刻的未被遮挡位置和再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定至少一个肢体节点在被遮挡的时刻的被遮挡位置。
根据本申请的实施例,基于推测模型,利用用户肢体的至少一个肢体节点在从未被遮挡到再次未被遮挡的时间段内的运动数据来推测用户肢体的至少一个肢体节点在该时间段内的位移,进而得到至少一个肢体节点在被遮挡的时刻的被遮挡位置,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中利用人工猜测的被遮挡部位的可能姿态,根据本申请实施例推测的位移的准确性更高,另外,以肢体节点在被遮挡时刻和再次未被遮挡的时刻之间的运动数据作为后验知识,推测肢体节点在未被遮挡时刻和被遮挡时刻之间的位移,可以进一步提高位移推测的准确性。
在一些实施例中,第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
在一些实施例中,推测模型包括至少部分地基于至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中至少一个肢体节点在第二时间段内未被遮挡,并且其中第二时间段的长度与从未被遮挡的时刻到被遮挡的时刻之间的时间段的长度相同,和/或,第二时间段的长度与从被遮挡的时刻到再次未被遮挡的时刻之间的时间段的长度相同。
在一些实施例中,第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
在一些实施例中,推测模型包括双向循环神经网络。
在一些实施例中,第一位移包括从未被遮挡位置到被遮挡位置的位移和从被遮挡位置到再次未被遮挡位置的位移中的至少一个。
在一些实施例中,装置还包括:图像采集模块,用于获取至少一个肢体节点在未被遮挡的时刻的未被遮挡图像帧,和/或,获取至少一个肢体节点在再次未被遮挡的时刻的再次未被遮挡图像帧;图像处理模块,用于根据未被遮挡图像帧确定未被遮挡位置,和/或,根据再次未被遮挡图像帧确定再次未被遮挡位置。
附图说明
图1示出根据本申请实施例的肢体姿态推测的原理示意图;
图2示出了根据本申请实施例的肢体姿态推测装置的一种结构示意图;
图3示出了根据本申请实施例的肢体未被遮挡时的图像帧序列的示意图;
图4示出了根据本申请实施例的循环神经网络的一种结构示意图;
图5示出了根据本申请实施例的双向循环神经网络的一种结构示意图;
图6A示出了根据本申请实施例的包括肢体被遮挡图像帧的图像帧序列的一种示意图;
图6B示出了根据本申请实施例的包括肢体被遮挡图像帧的图像帧序列的另一种示意图;
图7示出了根据本申请实施例的用于肢体姿态推测的推测模型的训练方法的一种流程示意图;
图8示出了根据本申请实施例的肢体姿态推测方法的一种流程示意图;
图9示出了根据本申请实施例的肢体姿态推测方法的另一种流程示意图;
图10示出了根据本申请实施例的肢体姿态推测系统的一种结构示意图。
具体实施方式
下面结合具体实施例和附图对本申请做进一步说明。此处描述的具体实施例仅仅是为了解释本申请,而非对本申请的限定。此外,为了便于描述,附图中仅示出了与本申请相关的部分而非全部的结构或过程。应注意的是,在本说明书中,相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
图1示出了根据本申请实施例的肢体姿态推测的原理示意图,其中,肢体姿态指的是肢体呈现的姿势或状态,肢体姿态可以通过肢体的多个肢体节点的位置来确定,其中,肢体节点可以包括,但不限于,骨骼节点,例如,在肢体是手臂的情况下,肢体节点可以包括,但不限于,手、手腕、手肘以及肩膀。
如图1所示,肢体姿态推测包括两个阶段,即推测模型的训练阶段和推测阶段,其中,推测模型的训练阶段包括在用户肢体的至少一个肢体节点未被遮挡的情况下(例如,但不限于,未被其他物体遮挡或者未超出图像的采集范围),获取肢体的至少一个肢体节点在两个未被遮挡时刻的位置10,并以此确定肢体的至少一个肢体节点在该两个未被遮挡时刻之间的位移20,其中,未被遮挡时刻可以包括,但不限于,肢体的至少一个肢体节点未被遮挡的图像帧的采集时刻,肢体的至少一个肢体节点在未被遮挡时刻的位置10可以包括,但不限于,肢体的至少一个肢体节点在未被遮挡时刻采集的图像帧中的位置。推测模型的训练阶段还包括获取与肢体的至少一个肢体节点在该两个未被遮挡时刻之间的运动相关的运动数据30,其中,肢体的至少一个肢体节点的运动指的是肢体的至少一个肢体节点的非静止状态,与肢体的至少一个肢体节点的运动相关的运动数据30可以包括,但不限于体现至少一个肢体节点的上述运动状态的数据,例如,运动数据30可以包括,但不限于,加速度、角速度、运动方向等中的至少一个,其中,加速度、角速度和运动方向等可以通过例如佩戴在肢体的至少一个肢体节点上的传感器(例如,但不限于,加速度传感器、陀螺仪、磁力计等)采集获得,或者通过其他方式获得,运动数据30还可以包括运动模式,运动模式指的是用户正在进行的肢体动作的类型,例如,但不限于,跳跃动作、深蹲动作、手臂上举动作等,运动模式可以作为先验知识而获得。推测模型的训练阶段还包括利用肢体的至少一个肢体节点在该两个未被遮挡时刻之间的位移20以及运动数据30训练推测模型40。
推测模型的推测阶段包括在用户肢体的至少一个肢体节点被遮挡的情况下(例如,但不限于,被其他物体遮挡或者超出图像的采集范围),利用推测模型40,基于肢体的至少一个肢体节点从未被遮挡时刻到被遮挡时刻的运动数据30,推测肢体的至少一个肢体节点从未被遮挡时刻到被遮挡时刻的位移20,其中,遮挡时刻可以包括,但不限于,肢体的至少一个肢体节点被遮挡的图像帧的采集时刻。推测模型的推测阶段还包括根据肢体的至少一个肢体节点从未被遮挡时刻到被遮挡时刻的位移20,以及肢体的至少一个肢体节点在未被遮挡时刻的位置10,确定肢体的至少一个肢体节点在被遮挡时刻的位置50,最后,根据肢体的至少一个肢体节点在被遮挡时刻的位置50确定肢体在被遮挡时刻的姿态60,其中,肢体的至少一个肢体节点在被遮挡时刻的位置50可以包括,但不限于,肢体的至少一个肢体节点在被遮挡时刻采集的图像帧中的被遮挡位置。
图2示出了根据本申请实施例的肢体姿态推测装置100的一种结构示意图,如图2所示,肢体姿态推测装置100包括,但不限于,图像采集模块110、图像处理模块120、运动数据获取模块130、推测模型训练模块140、推测模块150以及可选的通信模块160。其中,肢体姿态推测装置100的一个或多个组件(例如,图像采集模块110、图像处理模块120、运动数据获取模块130、推测模型训练模块140、推测模块150以及通信模块160中的一个或多个),可以由专用集成电路(ASIC)、电子电路、执行一个或多个软件或固件程序的(共享、专用或组)处理器和/或存储器、组合逻辑电路、提供所描述的功能的其他合适的组件的任意组合构成。根据一个方面,处理器可以是微处理器、数字信号处理器、微控制器等,和/或其任何组合。根据另一个方面,所述处理器可以是单核处理器,多核处理器等,和/或其任何组合。
根据本申请的一些实施例,图像采集模块110用于采集用户的图像数据,其中图像数据可以包括多个图像帧,图像采集模块110的示例可以是,但不限于,摄像机、照相机等。
根据本申请的一些实施例,图像处理模块120用于通过,但不限于,骨骼节点识别技术,对图像采集模块110采集的多个图像帧中的用户进行节点识别。图像处理模块120还用于在用户肢体的至少一个肢体节点未被遮挡的情况下,确定用户肢体的至少一个肢体节点在多个图像帧中的位置(例如,坐标),并以此确定用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的位移,其中,该两个图像帧的采集时刻可以具有预定的时间间隔,该预定的时间间隔可以是图像采集帧率的倒数的倍数,例如,但不限于,帧率倒数的1倍、2倍、3倍等。
根据本申请的一些实施例,运动数据获取模块130用于获取图1中的运动数据30,在运动数据30包括加速度、角速度、运动方向等中的至少一个的情况下,运动数据获取模块130可以包括但不限于,佩戴在用户肢体的至少一个肢体节点上的至少一个传感器,例如,但不限于,加速度传感器、陀螺仪、磁力计等,其中,加速度传感器用于获取肢体节点的加速度,陀螺仪用于获取肢体节点的角速度,磁力计用于获取肢体节点的运动方向等。另外,运动数据获取模块130的时间可以与图像采集模块110的时间同步。
根据本申请的另一些实施例,在运动数据30还包括运动模式的情况下,运动数据 获取模块130可以基于用户接收的指令而获知用户当前的运动模式,其中,该指令要求用户进行某个类型的肢体工作,并且该指令可以来自肢体姿态推测装置100,也可以来自其他装置。在另一种示例中,运动数据获取模块130可以通过利用图像处理模块120确定的多个图像帧中未被遮挡的肢体节点的位置,确定用户当前的运动模式。
根据本申请的一些实施例,推测模型训练模块140用于在用户肢体的至少一个肢体节点未被遮挡的情况下,根据从图像处理模块120获取的用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的位移,以及从运动数据获取模块130获取的用户肢体的至少一个肢体节点在该两个图像帧的采集时刻之间的运动数据,训练推测模型。需要说明的是,推测模型训练模块140可以获取与多组图像帧相关的多个位移和运动数据,其中,每组图像帧包括两个图像帧,并且这两个图像帧的采集时刻具有以上所述的预定的时间间隔。推测模型的示例可以包括,但不限于,循环神经网络(recurrent neural network,RNN)、长短期记忆(long short-term memory,LSTM)网络、门控循环单元(gated recurrent unit,GRU)网络、双向循环神经网络(bidirectional recurrent neural network,BRNN)中的至少一种。
根据本申请的一些实施例,推测模型训练模块140还用于对用户的推测模型和其他用户的推测模型进行集成,以提升推测模型的推测精度。
根据本申请的一些实施例,推测模块150用于在用户肢体的至少一个肢体节点被遮挡的情况下,根据从传感器模块13获取的肢体的至少一个肢体节点从未被遮挡时刻到被遮挡时刻的运动数据,推测肢体的至少一个肢体节点从未被遮挡时刻到被遮挡时刻的位移,并且根据肢体的至少一个肢体节点在未被遮挡时刻的图像帧中的位置,确定肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置。其中,未被遮挡时刻和被遮挡时刻之间的时间段与以上所述的预定的时间间隔相同。
根据本申请的一些实施例,推测模块150还用于根据肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置确定肢体在被遮挡时刻的图像帧中的姿态。
根据本申请的一些实施例,通信模块160用于向外部服务器发送用户的推测模型,以及从外部服务器接收至少一个其他用户的推测模型。
以下,参考图3-图6进一步介绍图2中肢体姿态推测装置100的多个模块的功能。
根据本申请的一些实施例,图像处理模块120可以确定用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的位移。图3示出了通过图像处理模块120对图像帧序列F1-F5进行节点识别的结果,以用户的肢体为右手臂作为举例,右手臂包括四个节点,右手、右手腕、右手肘以及右肩膀,在右手腕佩戴传感器的情况下,图2的图像处理模块120可以确定右手腕(在图3中以灰色示出的节点)在图像帧F1-F5中的位置坐标,并确定右手腕在图像帧F1-F5中的两个图像帧的采集时刻之间的位移。在一种示例中,两个图像帧的采集时刻之间的时间间隔可以是帧率的倒数的1倍,即图2的图像处理模块120可以确定右手腕在图像帧F1和F2的采集时刻之间的位移s1、图像帧F2和F3的采集时刻之间的位移s2、图像帧F3和F4的采集时刻之间的位移s3以及图像帧F4和F5的采集时刻之间的位移s4。在另一种示例中,两个图像帧的采集时刻之间的时间间隔可以是帧率的倒数的2倍,图2的图像处理模块120可以确定右手腕在图像帧F1和F3的采集时刻之间的位移s5、图像帧F3和F5的采集时刻之间的 位移s6。
需要说明的是,右手臂的其他节点,例如右手、右手肘以及右肩膀中的一个或多个也可以佩戴传感器,并且图2的图像处理模块120同样也可以确定这些节点在多个图像帧中的位置以及在两个图像帧的采集时刻之间的位移。另外,经图像采集模块采集的图像帧的数量以及图像帧中用户的姿态不限于图3中所示出的。
根据本申请的一些实施例,推测模型训练模块140可以从运动数据获取模块130获取用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的运动数据,例如,但不限于,加速度、角速度、运动方向等。例如,在图3的示例中,图2的推测模型训练模块140可以从运动数据获取模块130获取右手腕在图像帧F1和F2的采集时刻之间的运动数据x1、图像帧F2和F3的采集时刻之间的运动数据x2、图像帧F3和F4的采集时刻之间的运动数据x3以及图像帧F4和F5的采集时刻之间的运动数据x4。然后,推测模型训练模块140可以以用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的运动数据作为训练数据,以用户肢体的至少一个肢体节点在该两个图像帧的采集时刻之间的位移作为训练标签,训练推测模型。
在一种示例中,推测模型训练模块140可以通过反向传播算法(Back Propagation,BP)、牛顿梯度下降算法或其他算法来训练循环神经网络。图4示出了循环神经网络的一种结构示意图,如图4所示,循环神经网络包括t(t为正整数)个神经元A 1-A t,第t个神经元的输入为x t,输出为y t,隐藏状态为h t,在本申请的实施例中,第t个神经元的输入x t可以包括用户肢体的一个节点在两个图像帧的采集时刻之间的运动数据例如,但不限于,加速度、角速度、运动方向等,输出y t可以包括用户肢体的一个节点在该两个图像帧的采集时刻之间的位移。在循环神经网络中,第t组神经元的输出y t和隐藏状态h t可以通过以下公式计算:
h t=f(Ux t+Wh t-1)             公式1
y t=g(Vh t)            公式2
其中,h t-1表示第t-1个神经元的隐藏状态,f和g均为激活函数,其中f可以是tanh、relu、sigmoid等激活函数,g可以是softmax等激活函数,U表示与输入相关的权重,W表示与隐藏状态相关的权重,V表示与输出相关的权重。因此,在循环神经网络中,一个神经元的输出不仅与该神经元的输入相关,还与该神经元的前一个神经元的隐藏状态相关。
以反向传播算法为例,推测模型训练模块140在训练循环神经网络时,对于用户肢体的一个节点,推测模型训练模块140可以初始化循环神经网络的隐藏状态和权重参数,以该节点与多组图像帧相关的多组运动数据分别作为循环神经网络的各个神经元的输入,得到各个神经元的输出,即该节点与多组图像帧相关的多个位移,然后推测模型训练模块140可以根据各个神经元输出的位移与通过图像处理模块120确定的各个真实位移之间的误差值,反向优化循环神经网络的权重参数。例如,在图3的示例中,图2的推测模型训练模块140可以将右手腕的运动数据x1、x2、x3、x4分别作为图4中的循环神经网络的第1至4个神经元的输入,并根据右手腕的真实位移s1、s2、s3、s4来优化循环神经网络的权重参数。
需要说明的是,推测模型训练模块140可以基于类似的原理训练肢体的佩戴传感 器的其他节点的推测模型,并且推测模型训练模块140也可以基于类似的原理训练长短期记忆(long short-term memory,LSTM)网络、门控循环单元(gated recurrent unit,GRU)网络或者其他具有记忆能力的神经网络。
在另一种示例中,推测模型训练模块140可以通过反向传播算法(Back Propagation,BP)、牛顿梯度下降算法或其他算法来训练双向循环神经网络。图5示出了双向循环神经网络的一种结构示意图,如图5所示,双向循环神经网络由两个方向相反的循环神经网络上下叠加在一起组成,包括t+1(t为正整数)组神经元(A 1,A' 1)~(A t+1,A' t+1),第t组神经元的输入为x t,输出为y t,正向隐藏状态为h t,反向隐藏状态为h' t,在本申请的实施例中,第t组神经元的输入x t可以包括用户肢体的一个节点在两个图像帧的采集时刻之间的运动数据,例如,但不限于,加速度、角速度、运动方向等,输出y t可以包括用户肢体的一个节点在该两个图像帧的采集时刻之间的位移。在循环神经网络中,第t组神经元的输出y t、隐藏状态h t和隐藏状态h' t可以通过以下公式计算:
h t=f(Ux t+Wh t-1)            公式3
h' t=f(U'x t+W'h' t+1)            公式4
y t=g(Vh t+V'h' t)            公式5
其中,h t-1表示第t-1组神经元的正向隐藏状态,h' t+1表示第t+1组神经元的反向隐藏状态,f和g均为激活函数,其中f可以是tanh、relu、sigmoid等激活函数,g可以是softmax等激活函数,U表示正向循环神经网络的与输入相关的权重,U'表示反向循环神经网络的与输入相关的权重,W表示正向循环神经网络的与隐藏状态相关的权重,W'表示反向循环神经网络的与隐藏状态相关的权重,V表示正向循环神经网络的与输出相关的权重,V'表示反向循环神经网络的与输出相关的权重。因此,在双向循环神经网络中,一组神经元的输出不仅与该组神经元的输入相关,还与该组神经元的前后两组神经元的隐藏状态相关。
推测模型训练模块140对双向循环神经网络的训练过程可以参照上述对循环神经网络的训练过程,在此不再赘述。
根据本申请的另一些实施例,在运动数据30还包括运动模式的情况下,对于一个运动模式,推测模型训练模块140可以使用通过运动数据获取模块130获取的、肢体节点在该运动模式下的加速度、角速度、运动方向等数据,以上述实施例中描述的方式,训练肢体节点在该运动模式下的推测模型。
根据本申请的一些实施例,推测模块150可以在用户肢体的至少一个肢体节点被遮挡的情况下,推测肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置。图6A示出了图像处理模块120对图像帧序列F6-F9进行节点识别后的结果,如图所示,以用户的右手腕(在图6中以灰色节点示出)为例,用户的右手腕在图像帧F6的采集时刻t6和图像帧F7的采集时刻t7未被遮挡,在图像帧F8的采集时刻t8和图像帧F9的采集时刻t9被遮挡,那么推测模块150可以利用推测模型,基于用户的右手腕在t7时刻和t8时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)来推测用户的右手腕在t7时刻和t8时刻之间的位移,基于用户的右手腕在t8时刻和t9时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)来推测用 户的右手腕在t8时刻和t9时刻之间的位移。例如,在图4循环神经网络的示例中,令t=2,推测模块150可以将用户的右手腕在t7时刻和t8时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)以及在t8时刻和t9时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)分别作为神经元A1和A2的输入,那么该两个神经元的输出可以分别包括用户的右手腕在t7时刻和t8时刻之间以及在t8时刻和t9时刻之间的位移。
进一步地,推测模块150可以基于通过图像处理模块120确定的用户的右手腕在t7时刻的图像帧中的位置坐标,以及用户的右手腕在t7时刻和t8时刻之间的位移,确定用户的右手腕在t8时刻的图像帧中的位置坐标;同样地,推测模块150可以基于用户的右手腕在t8时刻的图像帧中的位置坐标,以及用户的右手腕在t8时刻和t9时刻之间的位移,确定用户的右手腕在t9时刻的图像帧中的位置坐标。
需要说明的是,由于在循环神经网络中,一个神经元的输出与该神经元的输入和该神经元的前一神经元的隐藏状态相关,因此推测模块150可以使用循环神经网络实时地推测肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置,另外,循环神经网络也可以用于肢体节点位置的非实时推测场景中。
图6B示出了图6A中的图像帧序列还包括图像帧F10的情况,如图所示,用户的右手腕在图像帧F10的采集时刻t10再次未被遮挡,那么推测模块150可以利用推测模型,基于用户的右手腕在t7时刻和t8时刻之间、在t8时刻和t9时刻之间以及在t9时刻和t10时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)来推测用户的右手腕在t7时刻和t8时刻之间以及在t8时刻和t9时刻之间的位移。例如,在图6双向循环神经网络的示例中,令t=2,那么推测模块150可以将用户的右手腕在t7时刻和t8时刻之间、在t8时刻和t9时刻之间以及在t9时刻和t10时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等)分别作为神经元组(A1,A' 1)~(A3,A' 3)的输入,那么该三个神经元组的输出可以分别包括用户的右手腕在t7时刻和t8时刻之间、在t8时刻和t9时刻之间以及在t9时刻和t10时刻之间的位移。
进一步地,推测模块150可以基于通过图像处理模块120确定的用户的右手腕在t7时刻图像帧中的位置坐标,以及用户的右手腕在t7时刻和t8时刻之间、在t8时刻和t9时刻之间的位移,确定用户的右手腕在t8时刻和t9时刻的图像帧中的位置坐标;或者,可以基于通过图像处理模块120确定的用户的右手腕在t10时刻图像帧中的位置坐标,以及用户的右手腕在t9时刻和t10时刻之间、在t8时刻和t9时刻之间的位移,确定用户的右手腕在t9时刻和t8时刻的图像帧中的位置坐标。
需要说明的是,由于在双向循环神经网络中,一个神经元组的输出既与该神经元组的输入和该神经元组的前一神经元组的隐藏状态相关,又需要该神经元组的后一神经元组的隐藏状态作为后验知识,因此推测模块150可以使用双向循环神经网络非实时地推测肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置。
根据本申请的另一些实施例,在用于训练推测模型的运动数据30还包括运动模式的情况下,推测模块150可以以通过运动数据获取模块130获取的用户当前的运动模式作为先验知识来推测用户肢体的至少一个肢体节点在被遮挡时刻的位置。具体地, 推测模块150可以通过用户当前的运动模式,确定与该运动模式相对应的推测模型,然后利用推测模型,以上述实施例中描述的方式,推测用户肢体的至少一个肢体节点在被遮挡时刻的位置。
根据本申请的另一些实施例,在用于训练推测模型的运动数据30不包括运动模式的情况下,推测模块150也可以以通过运动数据获取模块130获取的用户当前的运动模式作为先验知识来推测用户肢体的至少一个肢体节点在被遮挡时刻的位置。具体地,在用户肢体的至少一个肢体节点被遮挡的情况下,推测模块150可以利用推测模型,基于从运动数据获取模块130获取的用户肢体的至少一个肢体节点的运动数据(例如,但不限于,加速度、角速度、运动方向等),获得多个位移的推测(或分类)概率,推测模块150可以减小该多个位移中不符合用户当前的运动模式的位移的概率,并相应地增大符合用户当前的运动模式的位移的概率,最终输出具有最大推测概率的位移。
需要说明的是,推测模块150可以基于与上述类似的原理推测用户右手腕的其他节点在被遮挡时刻的图像帧中的位置坐标,并由此确定用户的右手臂在被遮挡时刻的图像帧中的姿态。进一步地,在用户右手臂的被遮挡的至少一个肢体节点中,如果只有部分节点佩戴了传感器,那么推测模块150也可以基于该部分节点在被遮挡时刻的图像帧中的位置坐标,推测用户右手臂的被遮挡的其他节点在被遮挡时刻的图像帧中的位置坐标,进而确定用户的右手臂在被遮挡时刻的图像帧中的姿态。例如,在用户的右手臂均被遮挡并且只有用户的右手腕佩戴了传感器的情况下,在一种示例中,推测模块150可以通过,反向运动学方法(Inverse Kinematics,IK),基于用户右手腕在被遮挡时刻的图像帧中的位置坐标,确定用户右手臂的被遮挡的其他节点在被遮挡时刻的图像帧中的位置坐标,其中,反向运动学方法通过给定肢端和固定端的位置求解肢体运动链上各关节的转角;在另一种示例中,推测模块150可以基于人体关节运动的限制、手臂长度是固定的以及肢体节点的位置是连续变化的这几个约束,利用用户右手腕在被遮挡时刻的图像帧中的位置坐标,来推测用户右手臂的被遮挡的其他节点在被遮挡时刻的图像帧中的位置坐标。
需要说明的是,由于肢体的一个肢体节点的运动数据(例如,但不限于,加速度、角速度、运动方向等)与该肢体的各个肢体节点的位移相关联,因此,在构成肢体的多个肢体节点中存在未佩戴传感器的肢体节点的情况下,为了确定肢体在被遮挡时刻的姿态,在肢体的至少一个肢体节点(包括佩戴传感器的肢体节点和未佩戴传感器的肢体节点)未被遮挡的情况下,对于肢体的一个佩戴传感器的肢体节点(例如,右手臂的右手腕),推测模型训练模块140可以利用通过图像处理模块120确定的、该佩戴传感器的肢体节点和未佩戴传感器的肢体节点(例如,右手臂的右手肘)在两个图像帧的采集时刻之间的位移,以及从运动数据获取模块130获得的、该佩戴传感器的肢体节点在两个图像帧的采集时刻之间的运动数据(例如,但不限于,加速度、角速度、运动方向等),训练推测模型;在肢体的至少一个肢体节点(包括佩戴传感器的肢体节点和未佩戴传感器的肢体节点)被遮挡的情况下,对于肢体的一个佩戴传感器的肢体节点(例如,右手臂的右手腕),推测模块150可以利用该佩戴传感器的肢体节点的推测模型,推测该佩戴传感器的肢体节点以及未佩戴传感器的肢体节点(例如,右手臂的右手肘)从未被遮挡时刻到被遮挡时刻的位移,进而确定该佩戴传感器的肢 体节点以及未佩戴传感器的肢体节点在遮挡时刻的位置,以及肢体在被遮挡时刻的姿态(例如,右手臂的姿态)。
需要说明的是,经图像采集模块110采集的图像帧的数量以及图像帧中用户的姿态不限于图6A和图6B中所示出的,并且推测模块150可以基于与上述类似的原理推测用户其他肢体的至少一个肢体节点在被遮挡时刻的图像帧中的位置坐标。
根据本申请的实施例,在用户肢体的至少一个肢体节点未被遮挡的情况下,推测模块150也可以通过推测模型推测用户肢体的至少一个肢体节点在未被遮挡时刻的图像帧中的位置坐标,推测模型训练模块140可以将其与通过图像处理模块120确定的用户肢体的至少一个肢体节点在未被遮挡时刻的图像帧中的位置坐标进行对比,并获得推测模型的推测精度。在一种示例中,对于与用户肢体的一个节点对应的推测模型,推测模型训练模块140可以计算通过该推测模型推测的该节点在一个图像帧中的位置坐标与通过图像处理模块120确定的该节点在该图像帧中的位置坐标之间的距离(例如,但不限于,欧式距离、余弦距离等),并根据与多个图像帧相关的多个距离计算推测模型的推测精度,例如,以这些距离的均值、最大值、中位数等作为推测模型的推测精度。
进一步地,推测模型训练模块140可以通过通信模块160,将用户的身体参数(例如,但不限于,肢体各部位的长度)、用户肢体的至少一个肢体节点的推测模型以及推测模型的推测精度,发送给外部服务器。外部服务器可以返回与用户的身体参数相近的其他用户的推测模型,其中,其他用户的推测模型用于在其他用户的肢体的至少一个肢体节点在被遮挡的情况下,推测其他用户的肢体的至少一个肢体节点在被遮挡时刻的位置,并且其他用户的推测模型的推测精度大于或等于预定的精度值。
进一步地,在用户肢体的至少一个肢体节点未被遮挡的情况下,推测模型训练模块140可以对用户的推测模型和其他用户的推测模型进行集成,并且在用户肢体的至少一个肢体节点被遮挡的情况下,推测模块150可以使用集成的推测模型推测用户肢体的至少一个肢体节点在被遮挡时刻的位置。在一种示例中,推测模型训练模块140可以基于Bagging算法(bootstrap aggregating,引导聚集算法)进行集成,Bagging算法通过组合多个模型来减少泛化误差的技术,其原理是单独训练多个不同的模型,然后按照规则对多个模型在测试集的输出进行票选,例如,以多个模型的输出的平均值作为最后的输出,其中,在本申请的实施例中,测试集可以包括在肢体的至少一个肢体节点未被遮挡的情况下,肢体的至少一个肢体节点在两个图像帧的采集时刻之间的运动数据,并且推测模型训练模块140可以根据通过图像处理模块120确定的肢体的至少一个肢体节点在该两个图像帧的采集时刻之间的真实位移来优化票选规则。
在本申请的实施例中,在用户肢体的至少一个肢体节点未被遮挡的情况下,利用用户肢体的至少一个肢体节点的运动数据和位移来训练推测模型,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中在训练推测模型时以人工猜测被遮挡部位的可能姿态作为训练标签,根据本申请实施例的推测模型的准确性和鲁棒性会更高。
进一步地,在本申请的实施例中,在肢体姿态的非实时推测场景下,通过使用双向循环网络,以肢体节点在被遮挡时刻之后的运动数据以及肢体节点在再次未被遮挡 时刻的位置作为后验知识,推测肢体节点的位移,可以提高位移推测的准确性。
进一步地,在本申请的实施例中,以用户的运动模式作为先验知识来推测用户肢体的至少一个肢体节点在被遮挡时刻的位置,可以提高位移推测的准确性。
进一步地,在本申请的实施例中,通过对用户的推测模型和其他用户的推测模型进行集成,可以提升对用户肢体节点的位移的推测精度,尤其是在用户的训练数据较少导致用户的推测模型的推测性能较差的情况下。
图7示出了根据本申请实施例的用于肢体姿态推测的推测模型的训练方法的一种流程示意图,图2中的肢体姿态推测装置100的一个或多个模块可以实施该方法的不同块或其他部分。对于上述装置实施例中未描述的内容,可以参见下述方法实施例,同样,对于方法实施例中未描述的内容,可参见上述装置实施例。如图7所示,用于肢体姿态推测的推测模型的训练方法可以包括:
步骤701,在用户肢体的至少一个肢体节点未被遮挡的情况下,通过图像采集模块110,采集用户运动的图像数据,其中图像数据可以包括多个图像帧,图像采集模块110的示例可以是,但不限于,摄像机、照相机等。
步骤702,通过图像处理模块120,对图像采集模块110采集的多个图像帧中的用户进行节点识别,例如,但不限于,通过骨骼节点识别技术,识别用户的骨骼节点,诸如头部、手腕、手肘、肩膀、膝盖、脚踝等。
步骤703,通过图像处理模块120,确定用户肢体的至少一个肢体节点在多个图像帧中的位置以及在两个图像帧的采集时刻之间的位移。
其中,用户肢体的至少一个肢体节点在多个图像帧中的位置可以包括,但不限于,用户肢体的至少一个肢体节点在多个图像帧中的相对坐标。其中,两个图像帧的采集时刻可以具有预定的时间间隔,该预定的时间间隔可以是图像采集帧率的倒数的倍数,例如,但不限于,帧率倒数的1倍、2倍、3倍等。需要说明的是,图像处理模块120可以确定用户肢体的至少一个肢体节点与多组图像帧相关的位移,其中,每组图像帧包括两个图像帧,并且这两个图像帧的采集时刻具有以上所述的预定的时间间隔。
步骤704,通过运动数据获取模块130,获取用户肢体的至少一个肢体节点的运动数据,例如,但不限于,加速度、角速度、运动方向、运动模式等。
步骤705,通过推测模型训练模块140,根据从图像处理模块120获取的用户肢体的至少一个肢体节点在两个图像帧的采集时刻之间的位移,以及从运动数据获取模块130获取的用户肢体的至少一个肢体节点在该两个图像帧的采集时刻之间的运动数据,训练推测模型。
需要说明的是,推测模型训练模块140可以获取与多组图像帧相关的多个位移和运动数据,其中,每组图像帧包括两个图像帧,并且这两个图像帧的采集时刻具有以上所述的预定的时间间隔。
推测模型的示例可以包括,但不限于,循环神经网络(recurrent neural network,RNN)、长短期记忆(long short-term memory,LSTM)网络、门控循环单元(gated recurrent unit,GRU)网络、双向循环神经网络(bidirectional recurrent neural network,BRNN)中的至少一种。另外,对推测模型的具体训练过程可以参照以上与推测模型训练模块140相关的描述,在此不再赘述。
步骤706,在用户肢体的至少一个肢体节点未被遮挡的情况下,通过推测模型训练模块140确定推测模型的推测精度,并通过通信模块160,将用户的身体参数(例如,但不限于,肢体各部位的长度)、用户肢体的至少一个肢体节点的推测模型以及推测模型的推测精度发送给外部服务器。
在一种示例中,在用户肢体的至少一个肢体节点未被遮挡的情况下,可以通过推测模块150,利用推测模型推测用户肢体的至少一个肢体节点在未被遮挡时刻的图像帧中的位置坐标,推测模型训练模块140可以将其与通过图像处理模块120确定的用户肢体的至少一个肢体节点在未被遮挡时刻的图像帧中的位置坐标进行对比,并获得推测模型的推测精度。例如,对于与用户肢体的一个节点对应的推测模型,推测模型训练模块140可以计算通过该推测模型推测的该节点在一个图像帧中的位置坐标与通过图像处理模块120确定的该节点在该图像帧中的位置坐标之间的距离(例如,但不限于,欧式距离、余弦距离等),并根据与多个图像帧相关的多个距离计算该推测模型的推测精度,例如,以这些距离的均值、最大值、中位数等作为推测模型的推测精度。
步骤707,通过通信模块160,从外部服务器接收与用户的身体参数相近的其他用户的推测模型,其中,其他用户的推测模型用于在其他用户的肢体的至少一个肢体节点在被遮挡的情况下,推测其他用户的肢体的至少一个肢体节点在被遮挡时刻的位置,并且其他用户的推测模型的推测精度大于或等于预定的精度值。
步骤708,通过推测模型训练模块140,对用户的推测模型和至少一个其他用户的推测模型进行集成,并获得集成后的推测模型。
在一种示例中,推测模型训练模块140可以基于Bagging算法(bootstrap aggregating,引导聚集算法)进行集成,Bagging算法通过组合多个模型来减少泛化误差的技术,其原理是单独训练多个不同的模型,然后按照规则对多个模型在测试集的输出进行票选,例如,以多个模型的输出的平均值作为最后的输出,其中,在本申请的实施例中,测试集可以包括在肢体的至少一个肢体节点未被遮挡的情况下,肢体的至少一个肢体节点在两个图像帧的采集时刻之间的运动数据,并且推测模型训练模块140可以根据通过图像处理模块120确定的肢体的至少一个肢体节点在该两个图像帧的采集时刻之间的真实位移来优化票选规则。
图8示出了根据本申请实施例的肢体姿态推测方法的一种流程示意图,图2中的肢体姿态推测装置100的一个或多个模块可以实施方法的不同块或其他部分。对于上述装置实施例中未描述的内容,可以参见下述方法实施例,同样,对于方法实施例中未描述的内容,可参见上述装置实施例。如图7所示,肢体姿态推测方法可以包括:
步骤801,通过图像采集模块110,采集用户运动的图像数据,其中图像数据可以包括图像帧,图像采集模块110的示例可以是,但不限于,摄像机、照相机等。
步骤802,通过图像处理模块120,对当前图像帧中的用户进行节点识别,例如,但不限于,通过骨骼节点识别技术,识别用户的骨骼节点,诸如头部、手腕、手肘、肩膀、膝盖、脚踝等。
步骤803,通过图像处理模块120,确定当前图像帧中是否存在肢体的节点被遮挡,若是,则执行步骤804,若否,则执行步骤807。
作为一种示例,图像处理模块120可以将对当前图像帧的节点识别结果与人体的完整节点相比较,以确定在当前图像帧中,是否存在用户肢体的节点被遮挡,以及哪些节点被遮挡。
步骤804,通过图像处理模块120,确定与当前图像帧的前一图像帧,并确定用户肢体的被遮挡的至少一个肢体节点在当前图像帧和前一图像帧的采集时刻之间的运动数据,例如,但不限于,加速度、角速度、运动方向、运动模式等。
其中,前一图像帧与当前图像帧的采集时刻具有预定的时间间隔,该预定的时间间隔可以是图像采集帧率的倒数的倍数,例如,但不限于,帧率倒数的1倍、2倍、3倍等。另外,肢体的该至少一个肢体节点在前一图像帧中的位置是已知的,例如,在前一图像帧中,在肢体的该至少一个肢体节点未被遮挡的情况下,可以通过图像处理模块120确定肢体的该至少一个肢体节点在前一图像帧中的位置;在前一图像帧中,在肢体的该至少一个肢体节点被遮挡的情况下,可以通过推测模块150,根据本实施例确定肢体的该至少一个肢体节点在前一图像帧中的位置。
步骤805,通过推测模块150,利用推测模型,例如,但不限于,循环神经网络基于肢体被遮挡的至少一个肢体节点在当前图像帧和前一图像帧的采集时刻之间的运动数据,推测肢体被遮挡的至少一个肢体节点在当前图像帧和前一图像帧的采集时刻之间的位移。
需要说明的是,对利用推测模型的具体推测过程可以参照以上与推测模块150相关的描述,在此不再赘述。
步骤806,通过推测模块150,基于肢体被遮挡的至少一个肢体节点在前一图像帧中的位置,以及在步骤805中确定的肢体被遮挡的至少一个肢体节点在当前图像帧和前一图像帧的采集时刻之间的位移,确定肢体被遮挡的至少一个肢体节点在当前图像帧中的位置。
步骤807,通过推测模块150,基于肢体的各个肢体节点在当前图像帧中的位置,确定肢体的姿态。
在用户肢体的被遮挡的多个肢体节点中,如果只有部分节点佩戴了传感器,那么推测模块150也可以基于该部分肢体节点在当前图像帧中的位置,推测用户肢体的被遮挡的其他肢体节点在当前图像帧中的位置,进而确定用户肢体在当前图像帧中的姿态。例如,在用户的右手臂均被遮挡并且只有用户的右手腕佩戴了传感器的情况下,推测模块150可以通过,但不限于,反向运动学方法,来确定用户右手臂的其他肢体节点在当前图像帧中的位置,其中,反向运动学方法通过给定肢端和固定端的位置求解肢体运动链上各关节的转角;在另一种示例中,推测模块150可以基于人体关节运动的限制、手臂长度是固定的以及肢体节点的位置是连续变化的这几个约束,利用用户右手腕在被遮挡时刻的图像帧中的位置坐标,来推测用户右手臂的被遮挡的其他节点在被遮挡时刻的图像帧中的位置坐标。
图9示出了根据本申请实施例的肢体姿态推测方法的一种流程示意图,图2中的肢体姿态推测装置100的一个或多个模块可以实施方法的不同块或其他部分。对于上述装置实施例中未描述的内容,可以参见下述方法实施例,同样,对于方法实施例中未描述的内容,可参见上述装置实施例。如图9所示,肢体姿态推测方法可以包括:
步骤901,通过图像采集模块110,采集用户运动的图像数据,其中图像数据可以包括多个图像帧,图像采集模块110的示例可以是,但不限于,摄像机、照相机等。
步骤902,通过图像处理模块120,对图像采集模块110采集的多个图像帧中的用户进行节点识别,例如,但不限于,通过骨骼节点识别技术,识别用户的骨骼节点,诸如头部、手腕、手肘、肩膀、膝盖、脚踝等。
步骤903,通过图像处理模块120,确定多个图像帧中是否存在肢体节点被遮挡的图像帧,若是,则执行步骤904,若否,则执行步骤908。
作为一种示例,图像处理模块120可以将对多个图像帧的节点识别结果与人体的完整节点相比较,以确定在多个图像帧的每个图像帧中,是否存在肢体节点被遮挡以及哪些肢体节点被遮挡。
步骤904,通过图像处理模块120,对于在多个图像帧的至少一个图像帧中被遮挡的一个肢体节点,在多个图像帧中确定该肢体节点被遮挡前的未被遮挡时刻的图像帧以及再次未被遮挡时刻的图像帧。
其中,在该肢体节点的未被遮挡时刻的图像帧和再次未被遮挡时刻的图像帧之间可以包括至少一个被遮挡时刻的图像帧,并且在未被遮挡时刻的图像帧、被遮挡时刻的图像帧以及再次未被遮挡时刻的图像帧中,时间上相邻的两个图像帧的采集时刻可以包括预定的时间间隔,该预定的时间间隔可以是图像采集帧率的倒数的倍数,例如,但不限于,帧率倒数的1倍、2倍、3倍等。
步骤905,通过运动数据获取模块130,获取用户肢体的至少一个肢体节点的运动数据,例如,但不限于,加速度、角速度、运动方向、运动模式等,其中,包括在多个图像帧的至少一个图像帧中被遮挡的至少一个肢体节点的运动数据。
步骤906,对于在多个图像帧的至少一个图像帧中被遮挡的一个肢体节点,通过推测模块150,利用推测模型,基于该肢体节点在两个图像帧的采集时刻之间的运动数据,推测该肢体节点在两个图像帧的采集时刻之间的位移。
其中,该两个图像帧是在未被遮挡时刻的图像帧、被遮挡时刻的图像帧以及再次未被遮挡时刻的图像帧中,时间上相邻的两个图像帧。
需要说明的是,对利用推测模型的具体推测过程可以参照以上与推测模块150相关的描述,在此不再赘述。
步骤907,对于在多个图像帧的至少一个图像帧中被遮挡的一个肢体节点,通过推测模块150,基于该肢体节点在未被遮挡时刻的图像帧中的位置,以及在步骤906中确定的该肢体节点在两个图像帧的采集时刻之间的位移,确定该肢体节点在至少一个被遮挡时刻的图像帧中的位置。
在另一种示例中,对于在多个图像帧的至少一个图像帧中被遮挡的一个肢体节点,通过推测模块,基于该肢体节点在再次未被遮挡时刻的图像帧中的位置,以及在步骤907中确定的该肢体节点在两个图像帧的采集时刻之间的位移,确定该肢体节点在至少一个被遮挡时刻的图像帧中的位置。
步骤908,通过推测模块150,基于肢体的各个肢体节点在被遮挡时刻的图像帧中的位置,确定肢体在被遮挡时刻的图像帧中的姿态。
对于用户肢体的至少一个肢体节点被遮挡的一个图像帧,如果在这些被遮挡肢体 节点中,只有部分肢体节点佩戴了传感器,那么推测模块150也可以基于该部分肢体节点在该图像帧中的位置,例如通过,但不限于,反向运动学方法,推测用户肢体的其他被遮挡肢体节点在该图像帧中的位置,进而确定用户肢体在该图像帧中的姿态。
在本申请的实施例中,在用户肢体的至少一个肢体节点未被遮挡的情况下,利用用户肢体的至少一个肢体节点的运动数据和位移来训练推测模型,由于用户肢体的至少一个肢体节点的运动数据和位移之间存在直接的对应关系,因此,相对于现有技术中在训练推测模型时以人工猜测被遮挡部位的可能姿态作为训练标签,根据本申请实施例的推测模型的准确性和鲁棒性会更高。
进一步地,在本申请的实施例中,在肢体姿态的非实时推测场景下,通过使用双向循环网络,以肢体节点在被遮挡时刻之后的运动数据以及肢体节点在再次未被遮挡时刻的位置作为后验知识,推测肢体节点的位移,可以提高位移推测的准确性。
进一步地,在本申请的实施例中,以用户的运动模式作为先验知识来推测用户肢体的至少一个肢体节点在被遮挡时刻的位置,可以提高位移推测的准确性。
进一步地,在本申请的实施例中,通过对用户的推测模型和其他用户的推测模型进行集成,可以提升对用户肢体节点的位移的推测精度,尤其是在用户的训练数据较少导致用户的推测模型的推测性能较差的情况下。
图10示出了根据本申请实施例的跑步路线处理装置1000的一种结构示意图。装置1000可以包括一个或多个处理器1002,与处理器1002中的至少一个连接的系统控制逻辑1008,与系统控制逻辑1008连接的系统内存1004,与系统控制逻辑1008连接的非易失性存储器(NVM)1006,以及与系统控制逻辑1008连接的网络接口1010。
处理器1002可以包括一个或多个单核或多核处理器。处理器1002可以包括通用处理器和专用处理器(例如,图形处理器,应用处理器,基带处理器等)的任何组合。在本申请的实施例中,处理器1002可以被配置为执行根据如图7-9所示的各种实施例的一个或多个实施例。
在一些实施例中,系统控制逻辑1008可以包括任意合适的接口控制器,以向处理器1002中的至少一个和/或与系统控制逻辑1008通信的任意合适的设备或组件提供任意合适的接口。
在一些实施例中,系统控制逻辑1008可以包括一个或多个存储器控制器,以提供连接到系统内存1004的接口。系统内存1004可以用于加载以及存储数据和/或指令。在一些实施例中装置1000的内存1004可以包括任意合适的易失性存储器,例如合适的动态随机存取存储器(DRAM)。
NVM/存储器1006可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中,NVM/存储器1006可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备,例如HDD(Hard Disk Drive,硬盘驱动器),CD(Compact Disc,光盘)驱动器,DVD(Digital Versatile Disc,数字通用光盘)驱动器中的至少一个。
NVM/存储器1006可以包括安装在装置1000的装置上的一部分存储资源,或者它可以由设备访问,但不一定是设备的一部分。例如,可以经由网络接口1010通过网络访问NVM/存储1006。
特别地,系统内存1004和NVM/存储器1006可以分别包括:指令1020的暂时副本和永久副本。指令1020可以包括:由处理器1002中的至少一个执行时导致装置1000实施如图3-4所示的方法的指令。在一些实施例中,指令1020、硬件、固件和/或其软件组件可另外地/替代地置于系统控制逻辑1008,网络接口1010和/或处理器1002中。
网络接口1010可以包括收发器,用于为装置1000提供无线电接口,进而通过一个或多个网络与任意其他合适的设备(如前端模块,天线等)进行通信。在一些实施例中,网络接口1010可以集成于装置1000的其他组件。例如,网络接口1010可以集成于处理器1002的,系统内存1004,NVM/存储器1006,和具有指令的固件设备(未示出)中的至少一种。
网络接口1010可以进一步包括任意合适的硬件和/或固件,以提供多输入多输出无线电接口。例如,网络接口1010可以是网络适配器,无线网络适配器,电话调制解调器和/或无线调制解调器。
在一个实施例中,处理器1002中的至少一个可以与用于系统控制逻辑1008的一个或多个控制器的逻辑封装在一起,以形成系统封装(SiP)。在一个实施例中,处理器1002中的至少一个可以与用于系统控制逻辑1008的一个或多个控制器的逻辑集成在同一管芯上,以形成片上系统(SoC)。
装置1000可以进一步包括:输入/输出(I/O)接口1012。I/O接口1012可以包括用户界面,使得用户能够与装置1000进行交互;外围组件接口的设计使得外围组件也能够与装置1000交互。在一些实施例中,装置1000还包括传感器,用于确定与装置1000相关的环境条件和位置信息的至少一种。
在一些实施例中,用户界面可包括但不限于显示器(例如,液晶显示器,触摸屏显示器等),扬声器,麦克风,一个或多个相机(例如,静止图像照相机和/或摄像机),手电筒(例如,发光二极管闪光灯)和键盘。
在一些实施例中,外围组件接口可以包括但不限于非易失性存储器端口、音频插孔和电源接口。
在一些实施例中,传感器可包括但不限于陀螺仪传感器,加速度计,近程传感器,环境光线传感器和定位单元。定位单元还可以是网络接口1010的一部分或与网络接口1010交互,以与定位网络的组件(例如,全球定位系统(GPS)卫星)进行通信。
虽然本申请的描述将结合较佳实施例一起介绍,但这并不代表此发明的特征仅限于该实施方式。恰恰相反,结合实施方式作发明介绍的目的是为了覆盖基于本申请的权利要求而有可能延伸出的其它选择或改造。为了提供对本申请的深度了解,以下描述中将包含许多具体的细节。本申请也可以不使用这些细节实施。此外,为了避免混乱或模糊本申请的重点,有些具体细节将在描述中被省略。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
此外,各种操作将以最有助于理解说明性实施例的方式被描述为多个离散操作;然而,描述的顺序不应被解释为暗示这些操作必须依赖于顺序。特别是,这些操作不需要按呈现顺序执行。
除非上下文另有规定,否则术语“包含”,“具有”和“包括”是同义词。短语 “A/B”表示“A或B”。短语“A和/或B”表示“(A和B)或者(A或B)”。
如这里所使用的,术语“模块”或“单元”可以指代、是或者包括:专用集成电路(ASIC)、电子电路、执行一个或多个软件或固件程序的(共享、专用或组)处理器和/或存储器、组合逻辑电路和/或提供所描述的功能的其他合适的组件。
在附图中,以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可以不需要这样的特定布置和/或排序。在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包含结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。在一些情况下,至少一些实施例的一个或多个方面可以由存储在计算机可读存储介质上的表示性指令来实现,指令表示处理器中的各种逻辑,指令在被机器读取时使得该机器制作用于执行本申请所述的技术的逻辑。被称为“IP核”的这些表示可以被存储在有形的计算机可读存储介质上,并被提供给多个客户或生产设施以加载到实际制造该逻辑或处理器的制造机器中。
这样的计算机可读存储介质可以包括但不限于通过机器或设备制造或形成的物品的非瞬态的有形安排,其包括存储介质,诸如:硬盘任何其它类型的盘,包括软盘、光盘、紧致盘只读存储器(CD-ROM)、紧致盘可重写(CD-RW)以及磁光盘;半导体器件,例如只读存储器(ROM)、诸如动态随机存取存储器(DRAM)和静态随机存取存储器(SRAM)之类的随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、闪存、电可擦除可编程只读存储器(EEPROM);相变存储器(PCM);磁卡或光卡;或适于存储电子指令的任何其它类型的介质。
因此,本申请的各实施例还包括非瞬态的计算机可读存储介质,该介质包含指令或包含设计数据,诸如硬件描述语言(HDL),它定义本申请中描述的结构、电路、装置、处理器和/或系统特征。

Claims (52)

  1. 一种确定用户的至少一个肢体节点的位置的方法,其特征在于,所述方法包括:
    在所述至少一个肢体节点未被遮挡的情况下,根据所述至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定所述至少一个肢体节点在所述第一时刻和所述第二时刻之间的第一时间段内的第一位移;
    获取与所述至少一个肢体节点在所述第一时间段内的运动相关的第一运动数据;
    至少部分地根据所述第一位移和所述第一运动数据,训练推测模型,其中所述推测模型用于在所述至少一个肢体节点被遮挡的情况下,推测所述至少一个肢体节点的被遮挡位置。
  2. 如权利要求1所述的方法,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  3. 如权利要求1所述的方法,其特征在于,根据所述至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定所述至少一个肢体节点在所述第一时刻和所述第二时刻之间的第一时间段内的第一位移,还包括:
    在所述第一时刻获取第一图像帧,并且在所述第二时刻获取第二图像帧;
    根据所述第一图像帧中所述至少一个肢体节点的位置和在所述第二图像帧中所述至少一个肢体节点的位置,确定所述至少一个肢体节点在所述第一时间段内的所述第一位移。
  4. 如权利要求1所述的方法,其特征在于,所述至少部分地根据所述第一位移和所述第一运动数据,训练推测模型,还包括:
    至少部分地将所述第一运动数据作为特征输入并且将所述第一位移作为目标类别,训练所述推测模型。
  5. 如权利要求1所述的方法,其特征在于,所述推测模型包括循环神经网络(recurrent neural network,RNN)、长短期记忆(long short-term memory,LSTM)网络、门控循环单元(gated recurrent unit,GRU)网络、双向循环神经网络(bidirectional recurrent neural network,BRNN)中的至少一种。
  6. 如权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    在所述至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第二时间段内的运动相关的第二运动数据,其中所述第二时间段包括所述至少一个肢体节点在所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段;
    利用所述推测模型,基于所述第二运动数据,推测所述至少一个肢体节点在所述第二时间段内的第二位移;
    至少部分地基于所述第二位移以及所述至少一个肢体节点在所述未被遮挡的时刻 的未被遮挡位置,确定所述至少一个肢体节点在所述被遮挡的情况下的所述被遮挡位置。
  7. 如权利要求6所述的方法,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  8. 如权利要求6所述的方法,其特征在于,所述第二时间段的长度与所述第一时间段的长度相同。
  9. 如权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    在所述至少一个肢体节点在从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第三时间段内的运动相关的第三运动数据,其中,所述第三时间段包括在所述未被遮挡的时刻和所述再次未被遮挡的时刻之间的时间段;
    利用所述推测模型,基于所述第三运动数据,推测所述至少一个肢体节点在所述第三时间段内的第三位移;
    至少部分地基于所述第三位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置和在所述再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定所述至少一个肢体节点在所述被遮挡的情况下的所述被遮挡位置。
  10. 如权利要求9所述的方法,其特征在于,所述第三运动数据包括第三加速度、第三角速度、第三运动方向以及第三运动模式中的至少一个。
  11. 如权利要求9所述的方法,其特征在于,其中所述第三时间段的长度与所述第一时间段的长度相同。
  12. 如权利要求1至11中任一项所述的方法,其特征在于,所述方法还包括:
    接收用于至少一个其他用户的其他推测模型,其中所述其他推测模型用于在所述至少一个其他用户的至少一个肢体节点被遮挡的情况下,推测所述至少一个其他用户的所述至少一个肢体节点的被遮挡位置;
    对所述推测模型和所述其他推测模型进行集成,并获取集成的推测模型;
    在所述用户的所述至少一个肢体节点所述被遮挡的情况下,利用所述集成的推测模型推测所述至少一个肢体节点的所述被遮挡位置。
  13. 一种确定用户的至少一个肢体节点的位置的方法,其特征在于,所述方法包括:
    在所述至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中所述第一时间段包括所述至少一个肢体节点在所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段;
    利用推测模型,基于所述第一运动数据,推测所述至少一个肢体节点在所述第一 时间段内的第一位移;
    至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置,确定所述至少一个肢体节点在所述被遮挡的时刻的被遮挡位置。
  14. 如权利要求13所述的方法,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  15. 如权利要求13或14所述的方法,其特征在于,所述推测模型包括至少部分地基于所述至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中所述至少一个肢体节点在所述第二时间段内未被遮挡,并且所述第二时间段的长度与所述第一时间段的长度相同。
  16. 如权利要求15所述的方法,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  17. 如权利要求13至16中任一项所述的方法,其特征在于,所述推测模型包括循环神经网络、长短期记忆网络、门控循环单元中的至少一种。
  18. 如权利要求13至17中任一项所述的方法,其特征在于,所述至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡时刻的未被遮挡位置,确定所述至少一个肢体节点在所述被遮挡时刻的所述被遮挡位置,还包括:
    在所述未被遮挡的时刻获取所述至少一个肢体节点的未被遮挡图像帧,并根据所述未被遮挡图像帧确定所述未被遮挡位置。
  19. 一种确定用户的至少一个肢体节点的位置的方法,其特征在于,所述方法包括:
    在用户的至少一个肢体节点从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中所述第一时间段包括在所述未被遮挡的时刻和所述再次未被遮挡的时刻之间的时间段;
    利用推测模型,基于所述第一运动数据,推测所述至少一个肢体节点在所述第一时间段内的第一位移;
    至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置和所述再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定所述至少一个肢体节点在所述被遮挡的时刻的被遮挡位置。
  20. 如权利要求19所述的方法,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  21. 如权利要求19或20所述的方法,其特征在于,所述推测模型包括至少部分地基于所述至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型, 其中所述至少一个肢体节点在所述第二时间段内未被遮挡,并且其中所述第二时间段的长度与从所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段的长度相同,和/或,所述第二时间段的长度与从所述被遮挡的时刻到所述再次未被遮挡的时刻之间的时间段的长度相同。
  22. 如权利要求21所述的方法,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  23. 如权利要求19至22中任一项所述的方法,其特征在于,所述推测模型包括双向循环神经网络。
  24. 如权利要求19至23中任一项所述的方法,其特征在于,所述第一位移包括从所述未被遮挡位置到所述被遮挡位置的位移和从所述被遮挡位置到所述再次未被遮挡位置的位移中的至少一个。
  25. 如权利要求19至24中任一项所述的方法,其特征在于,所述至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置和所述再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定所述至少一个肢体节点在所述被遮挡情况下的被遮挡位置,还包括:
    获取所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡图像帧,并根据所述未被遮挡图像帧确定所述未被遮挡位置;和/或
    获取所述至少一个肢体节点在所述再次未被遮挡的时刻的再次未被遮挡图像帧,并根据所述再次未被遮挡图像帧确定所述再次未被遮挡位置。
  26. 一种计算机可读存储介质,其特征在于,在所述计算机可读存储上存储有指令,当所述指令在所述计算机上运行时,使得所述计算机执行权利要求1至25中任意一项所述的方法。
  27. 一种确定用户的至少一个肢体节点的位置的系统,其特征在于,包括:
    处理器;
    存储器,在所述存储器上存储有指令,当所述指令被所述处理器运行时,使得所述处理器执行权利要求1至25中任意一项所述的方法。
  28. 一种确定用户的至少一个肢体节点的位置的装置,其特征在于,所述装置包括:
    图像处理模块,用于在所述至少一个肢体节点未被遮挡的情况下,根据所述至少一个肢体节点在第一时刻的位置和在第二时刻的位置,确定所述至少一个肢体节点在所述第一时刻和所述第二时刻之间的第一时间段内的第一位移;
    运动数据获取模块,获取与所述至少一个肢体节点在所述第一时间段内的运动相 关的第一运动数据;
    推测模型训练模块,至少部分地根据所述第一位移和所述第一运动数据,训练推测模型,其中所述推测模型用于在所述至少一个肢体节点被遮挡的情况下,推测所述至少一个肢体节点的被遮挡位置。
  29. 如权利要求28所述的装置,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  30. 如权利要求28所述的装置,其特征在于,所述装置还包括图像采集模块,用于在所述第一时刻获取第一图像帧,并且在所述第二时刻获取第二图像帧;并且
    其中,所述图像处理模块根据所述第一图像帧中所述至少一个肢体节点的位置和在所述第二图像帧中所述至少一个肢体节点的位置,确定所述至少一个肢体节点在所述第一时间段内的所述第一位移。
  31. 如权利要求28所述的装置,其特征在于,所述推测模型训练模块用于至少部分地根据所述第一位移和所述第一运动数据,训练推测模型,包括用于:
    至少部分地将所述第一运动数据作为特征输入并且将所述第一位移作为目标类别,训练所述推测模型。
  32. 如权利要求28所述的装置,其特征在于,所述推测模型包括循环神经网络、长短期记忆网络、门控循环单元网络、双向循环神经网络中的至少一种。
  33. 如权利要求28至32中任一项所述的装置,其特征在于:
    所述运动数据获取模块还用于,在所述至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第二时间段内的运动相关的第二运动数据,其中所述第二时间段包括所述至少一个肢体节点在所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段;和
    所述装置还包括推测模块,所述推测模块用于,利用所述推测模型,基于所述第二运动数据,推测所述至少一个肢体节点在所述第二时间段内的第二位移;以及
    所述推测模块还用于,至少部分地基于所述第二位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置,确定所述至少一个肢体节点在所述被遮挡的情况下的所述被遮挡位置。
  34. 如权利要求33所述的装置,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  35. 如权利要求33所述的装置,其特征在于,所述第二时间段的长度与所述第一时间段的长度相同。
  36. 如权利要求28至32中任一项所述的装置,其特征在于:
    所述运动数据获取模块还用于,在所述至少一个肢体节点在从未被遮挡经被遮挡 到再次未被遮挡的情况下,获取与第三时间段内的运动相关的第三运动数据,其中,所述第三时间段包括在所述未被遮挡的时刻和所述再次未被遮挡的时刻之间的时间段;
    所述装置还包括推测模块,所述推测模块用于,利用所述推测模型,基于所述第三运动数据,推测所述至少一个肢体节点在所述第三时间段内的第三位移;以及
    所述推测模块还用于,至少部分地基于所述第三位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置和在所述再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定所述至少一个肢体节点在所述被遮挡的情况下的所述被遮挡位置。
  37. 如权利要求36所述的装置,其特征在于,所述第三运动数据包括第三加速度、第三角速度、第三运动方向以及第三运动模式中的至少一个。
  38. 如权利要求36所述的装置,其特征在于,其中所述第三时间段的长度与所述第一时间段的长度相同。
  39. 如权利要求28至38中任一项所述的装置,其特征在于:
    所述装置还包括通信模块,用于接收用于至少一个其他用户的其他推测模型,其中所述其他推测模型用于在所述至少一个其他用户的至少一个肢体节点被遮挡的情况下,推测所述至少一个其他用户的所述至少一个肢体节点的被遮挡位置;和
    所述推测模型训练模块还用于,对所述推测模型和所述其他推测模型进行集成,并获取集成的推测模型;以及
    所述推测模块还用于,在所述用户的所述至少一个肢体节点所述被遮挡的情况下,利用所述集成的推测模型推测所述至少一个肢体节点的所述被遮挡位置。
  40. 一种确定用户的至少一个肢体节点的位置的装置,其特征在于,所述装置包括:
    运动数据获取模块,用于在所述至少一个肢体节点从未被遮挡到被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中所述第一时间段包括所述至少一个肢体节点在所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段;
    推测模块,用于利用推测模型,基于所述第一运动数据,推测所述至少一个肢体节点在所述第一时间段内的第一位移;
    所述推测模块还用于,至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置,确定所述至少一个肢体节点在所述被遮挡的时刻的被遮挡位置。
  41. 如权利要求40所述的装置,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  42. 如权利要求40或41所述的装置,其特征在于,所述推测模型包括至少部分 地基于所述至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中所述至少一个肢体节点在所述第二时间段内未被遮挡,并且所述第二时间段的长度与所述第一时间段的长度相同。
  43. 如权利要求42所述的装置,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  44. 如权利要求40至43中任一项所述的装置,其特征在于,所述推测模型包括循环神经网络、长短期记忆网络、门控循环单元中的至少一种。
  45. 如权利要求40至44中任一项所述的装置,其特征在于,所述装置还包括图像采集模块和图像处理模块,其中,所述图像采集模块用于在所述未被遮挡的时刻获取所述至少一个肢体节点的未被遮挡图像帧,所述图像处理模块用于根据所述未被遮挡图像帧确定所述未被遮挡位置。
  46. 一种确定用户的至少一个肢体节点的位置的装置,其特征在于,所述装置包括:
    运动数据获取模块,用于在用户的至少一个肢体节点从未被遮挡经被遮挡到再次未被遮挡的情况下,获取与第一时间段内的运动相关的第一运动数据,其中所述第一时间段包括在所述未被遮挡的时刻和所述再次未被遮挡的时刻之间的时间段;
    推测模块,用于利用推测模型,基于所述第一运动数据,推测所述至少一个肢体节点在所述第一时间段内的第一位移;
    所述推测模块还用于,至少部分地基于所述第一位移以及所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡位置和所述再次未被遮挡的时刻的再次未被遮挡位置中的至少一个,确定所述至少一个肢体节点在所述被遮挡的时刻的被遮挡位置。
  47. 如权利要求46所述的装置,其特征在于,所述第一运动数据包括第一加速度、第一角速度、第一运动方向以及第一运动模式中的至少一个。
  48. 如权利要求46或47所述的装置,其特征在于,所述推测模型包括至少部分地基于所述至少一个肢体节点在第二时间段内的第二运动数据和第二位移训练的模型,其中所述至少一个肢体节点在所述第二时间段内未被遮挡,并且其中所述第二时间段的长度与从所述未被遮挡的时刻到所述被遮挡的时刻之间的时间段的长度相同,和/或,所述第二时间段的长度与从所述被遮挡的时刻到所述再次未被遮挡的时刻之间的时间段的长度相同。
  49. 如权利要求48所述的装置,其特征在于,所述第二运动数据包括第二加速度、第二角速度、第二运动方向以及第二运动模式中的至少一个。
  50. 如权利要求46至49中任一项所述的装置,其特征在于,所述推测模型包括双向循环神经网络。
  51. 如权利要求46至50中任一项所述的装置,其特征在于,所述第一位移包括从所述未被遮挡位置到所述被遮挡位置的位移和从所述被遮挡位置到所述再次未被遮挡位置的位移中的至少一个。
  52. 如权利要求46至51中任一项所述的装置,其特征在于,所述装置还包括:
    图像采集模块,用于获取所述至少一个肢体节点在所述未被遮挡的时刻的未被遮挡图像帧,和/或,获取所述至少一个肢体节点在所述再次未被遮挡的时刻的再次未被遮挡图像帧;
    图像处理模块,用于根据所述未被遮挡图像帧确定所述未被遮挡位置,和/或,根据所述再次未被遮挡图像帧确定所述再次未被遮挡位置。
PCT/CN2020/136834 2019-12-25 2020-12-16 一种用户的肢体节点的位置确定方法、装置、介质及系统 WO2021129487A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911358174.4 2019-12-25
CN201911358174.4A CN113111678B (zh) 2019-12-25 2019-12-25 一种用户的肢体节点的位置确定方法、装置、介质及系统

Publications (1)

Publication Number Publication Date
WO2021129487A1 true WO2021129487A1 (zh) 2021-07-01

Family

ID=76573673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136834 WO2021129487A1 (zh) 2019-12-25 2020-12-16 一种用户的肢体节点的位置确定方法、装置、介质及系统

Country Status (2)

Country Link
CN (1) CN113111678B (zh)
WO (1) WO2021129487A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681216A (zh) * 2023-07-31 2023-09-01 山东莱恩光电科技股份有限公司 基于安全光幕历史数据的冲压器械安全监测方法
CN118094475A (zh) * 2024-04-19 2024-05-28 华南理工大学 一种基于多传感器融合的手势识别系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201225008A (en) * 2010-12-06 2012-06-16 Ind Tech Res Inst System for estimating location of occluded skeleton, method for estimating location of occluded skeleton and method for reconstructing occluded skeleton
JP2017168029A (ja) * 2016-03-18 2017-09-21 Kddi株式会社 行動価値によって調査対象の位置を予測する装置、プログラム及び方法
CN107833271A (zh) * 2017-09-30 2018-03-23 中国科学院自动化研究所 一种基于Kinect的骨骼重定向方法及装置
CN107847187A (zh) * 2015-07-07 2018-03-27 皇家飞利浦有限公司 用于对肢体的至少部分进行运动跟踪的装置和方法
CN108537156A (zh) * 2018-03-30 2018-09-14 广州幻境科技有限公司 一种抗遮挡的手部关键节点追踪方法
CN108919943A (zh) * 2018-05-22 2018-11-30 南京邮电大学 一种基于深度传感器的实时手部追踪方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9165199B2 (en) * 2007-12-21 2015-10-20 Honda Motor Co., Ltd. Controlled human pose estimation from depth image streams

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201225008A (en) * 2010-12-06 2012-06-16 Ind Tech Res Inst System for estimating location of occluded skeleton, method for estimating location of occluded skeleton and method for reconstructing occluded skeleton
CN107847187A (zh) * 2015-07-07 2018-03-27 皇家飞利浦有限公司 用于对肢体的至少部分进行运动跟踪的装置和方法
JP2017168029A (ja) * 2016-03-18 2017-09-21 Kddi株式会社 行動価値によって調査対象の位置を予測する装置、プログラム及び方法
CN107833271A (zh) * 2017-09-30 2018-03-23 中国科学院自动化研究所 一种基于Kinect的骨骼重定向方法及装置
CN108537156A (zh) * 2018-03-30 2018-09-14 广州幻境科技有限公司 一种抗遮挡的手部关键节点追踪方法
CN108919943A (zh) * 2018-05-22 2018-11-30 南京邮电大学 一种基于深度传感器的实时手部追踪方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681216A (zh) * 2023-07-31 2023-09-01 山东莱恩光电科技股份有限公司 基于安全光幕历史数据的冲压器械安全监测方法
CN118094475A (zh) * 2024-04-19 2024-05-28 华南理工大学 一种基于多传感器融合的手势识别系统

Also Published As

Publication number Publication date
CN113111678B (zh) 2024-05-24
CN113111678A (zh) 2021-07-13

Similar Documents

Publication Publication Date Title
Herath et al. Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, & new methods
Yuan et al. 3d ego-pose estimation via imitation learning
Wu et al. Action recognition using context and appearance distribution features
US20160292497A1 (en) Fusion of inertial and depth sensors for movement measurements and recognition
CN107767419A (zh) 一种人体骨骼关键点检测方法及装置
WO2021129487A1 (zh) 一种用户的肢体节点的位置确定方法、装置、介质及系统
Liu et al. When video meets inertial sensors: Zero-shot domain adaptation for finger motion analytics with inertial sensors
BR102017026251A2 (pt) Método e sistema de reconhecimento de dados de sensor utilizando o enriquecimento de dados para o processo de aprendizagem
Ahmad et al. Human action recognition using convolutional neural network and depth sensor data
US20220362630A1 (en) Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing
Xiao et al. Machine learning for placement-insensitive inertial motion capture
KR20170036747A (ko) 장면 내 키포인트를 추적하기 위한 방법
KR102436906B1 (ko) 대상자의 보행 패턴을 식별하는 방법 및 이를 수행하는 전자 장치
Wang et al. A2dio: Attention-driven deep inertial odometry for pedestrian localization based on 6d imu
KR20220129905A (ko) 대상 객체를 추적하는 방법과 장치 및 전자 장치
US10551195B2 (en) Portable device with improved sensor position change detection
TWI812053B (zh) 定位方法、電子設備及電腦可讀儲存媒體
US20200320283A1 (en) Determining golf swing characteristics
US20230285802A1 (en) Method, device, and non-transitory computer-readable recording medium for estimating information on golf swing
CN115471863A (zh) 三维姿态的获取方法、模型训练方法和相关设备
CN116563450A (zh) 表情迁移方法、模型训练方法和装置
JP2022092528A (ja) 三次元人物姿勢推定装置、方法およびプログラム
Jia et al. Condor: Mobile Golf Swing Tracking via Sensor Fusion using Conditional Generative Adversarial Networks.
US20230381584A1 (en) Method, system, and non-transitory computer-readable recording medium for estimating information on golf swing posture
TWI797916B (zh) 人體偵測方法、人體偵測裝置及電腦可讀儲存媒體

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907579

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20907579

Country of ref document: EP

Kind code of ref document: A1