CN115471908A

CN115471908A - Attitude processing method, apparatus, system and storage medium

Info

Publication number: CN115471908A
Application number: CN202210983612.1A
Authority: CN
Inventors: 易正琨; 李晓宇; 吴新宇; 柳义文; 柳程亮; 苏园哲
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-12-13
Also published as: WO2024036825A1

Abstract

The application discloses a method, a device and a system for processing gestures and a storage medium, wherein the method for processing the gestures comprises the following steps: acquiring attitude data of different parts of a target object at the same moment; performing feature extraction on the attitude data by using a memory network to obtain a time sequence feature; based on the graph structure, performing feature extraction on the time sequence features by utilizing a graph convolution network to obtain space-time features; the graph structure is obtained based on the reference attitude data of the target object; and performing attitude reconstruction based on the space-time characteristics to obtain an attitude image of the target object. By the mode, the posture of the target object can be accurately acquired.

Description

Attitude processing method, apparatus, system and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a system, and a storage medium for processing an attitude.

Background

The high-precision motion tracking can meet the requirements of most fields, such as VR games, movie special effects, biology and kinematics analysis, and the like. Therefore, in recent years, the demand for high-precision motion tracking has been increasing, and particularly, the realization of motion tracking by means of sensors has been receiving more and more attention.

However, the related art method for realizing motion tracking by means of sensors usually only uses a simple time sequence network to process data of signals collected by the sensors, and the tracking accuracy is not high.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a method, a device, a system and a storage medium for processing the posture, which can accurately acquire the posture of a target object.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a gesture processing method, including: acquiring attitude data of different parts of a target object at the same moment;

extracting the features of the attitude data by using a memory network to obtain a time sequence feature; based on the graph structure, performing feature extraction on the time sequence features by utilizing a graph convolution network to obtain space-time features; the graph structure is obtained based on reference attitude data of the target object; and performing attitude reconstruction based on the space-time characteristics to obtain an attitude image of the target object.

Wherein, utilize memory network to carry out feature extraction to the attitude data, obtain the chronogenesis characteristic, include: in the memory network, the attitude data at the current moment is subjected to feature extraction by using the parameter matrix, the offset and the time sequence feature at the previous moment, so that the time sequence feature at the current moment is obtained.

Based on the graph structure, the time sequence characteristics are subjected to characteristic extraction by utilizing a graph convolution network, and before the time-space characteristics are obtained, the method comprises the following steps: fusing the attitude data and the time sequence characteristics to obtain enhanced time sequence characteristics; based on the graph structure, the time sequence characteristics are subjected to characteristic extraction by utilizing a graph convolution network to obtain space-time characteristics, and the method comprises the following steps: based on the graph structure, the enhanced time sequence characteristics are subjected to characteristic extraction by utilizing a graph convolution network, and the space-time characteristics are obtained.

Based on the graph structure, the graph convolution network is used for carrying out feature extraction on the enhanced time sequence feature to obtain space-time features, and the method comprises the following steps: performing feature extraction on the enhanced time sequence feature by using a graph convolution network to obtain an initial time sequence feature; and fusing the adjacent matrix corresponding to the graph structure with the initial time sequence characteristic to obtain the space-time characteristic.

The method comprises the following steps of performing attitude reconstruction based on the space-time characteristics to obtain an attitude diagram of a target object, wherein the method comprises the following steps: fusing the attitude data and the space-time characteristics to obtain enhanced space-time characteristics; carrying out attitude reconstruction based on the space-time characteristics to obtain an attitude map of the target object, wherein the attitude map comprises the following steps: and carrying out attitude reconstruction based on the enhanced space-time characteristics to obtain an attitude image of the target object.

Wherein, the posture reconstruction is carried out based on the space-time characteristics to obtain the posture graph of the target object, and the method comprises the following steps: predicting the time-space characteristics by using a full-connection layer to obtain a space coordinate corresponding to each attitude data; and obtaining a posture graph according to the space coordinates.

Wherein the graph structure is derived based on reference pose data of the target object, comprising:

acquiring reference attitude data corresponding to different parts of a target object; mapping the part corresponding to each datum attitude data into a node in a graph structure; determining the connection relation between the target node and the other nodes by using the Euclidean distances between the target node and the other nodes; and connecting the target node with the other nodes according to the connection relation to form a graph structure.

Wherein, utilize memory network to carry out feature extraction to the attitude data, before obtaining the time sequence feature, include: and carrying out normalization processing on the attitude data.

In order to solve the above technical problem, another technical solution adopted by the present application is: the attitude processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring attitude data of different parts of a target object at the same time; the first extraction module is used for extracting the characteristics of the attitude data by using a memory network to obtain time sequence characteristics; the second extraction module is used for extracting the characteristics of the time sequence characteristics by using a graph convolution network based on a graph structure to obtain space-time characteristics; the graph structure is obtained based on reference attitude data of the target object; and the reconstruction module is used for reconstructing the posture based on the space-time characteristics to obtain a posture graph of the target object.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a pose processing system, comprising: the data acquisition unit is arranged at different parts of the target object and acquires attitude data of the response part of the target object; and the posture processing device is in communication connection with the data acquisition unit.

In order to solve the technical problem, the other technical scheme adopted by the application is as follows: there is provided a computer-readable storage medium storing program data for implementing the pose processing method as described above when executed by a processor.

The beneficial effect of this application is: different from the prior art, the time sequence feature is subjected to feature extraction by using the graph structure auxiliary graph convolution network with the space position information, the space-time feature with the time information and the space information is obtained, and then a more accurate attitude graph is obtained according to the space-time feature reconstruction, so that the accuracy of target object attitude identification can be improved, and the problem of low accuracy in the attitude tracking process is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. Wherein:

fig. 1 is a schematic flowchart of a first embodiment of a posture processing method provided in the present application;

FIG. 2 is a schematic diagram of a structure of a keypoint image provided by the present application;

FIG. 3 is a schematic diagram of the training of a network model provided herein;

FIG. 4 is a diagram of a structure of a motion tracking model based on a long-and-short-term memory network and a graph convolution network;

FIG. 5 is a schematic diagram of an embodiment of a gesture processing system provided herein;

FIG. 6 is a schematic structural diagram of an embodiment of a gesture processing apparatus provided in the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

As shown in fig. 1, the gesture processing method described in the present application may include: step 100: and acquiring attitude data of different parts of the target object at the same time. Step 200: and performing feature extraction on the attitude data by using a memory network to obtain a time sequence feature. Step 300: based on the graph structure, performing feature extraction on the time sequence features by using a graph convolution network to obtain space-time features; the graph structure is derived based on reference pose data of the target object. Step 400: and carrying out attitude reconstruction based on the space-time characteristics to obtain an attitude diagram of the target object.

That is to say, the method and the device have the advantages that the graph structure is additionally constructed according to the reference attitude data of the target object, the time sequence feature is subjected to feature extraction by using the graph convolution network based on the graph structure, the space-time feature is obtained, the detection dimensionality of the sensor space position information is fully utilized, errors caused by the sensor space position information to the processed data are avoided, and the accuracy of target object attitude identification can be improved.

The following describes in detail a first embodiment of the present posture processing method.

Step 100: and acquiring attitude data of different parts of the target object at the same time.

The target object may be a human or other object, such as an animal, and the target objects mentioned in the embodiments of the present application are mainly described by taking a human as an example.

For example, the different parts of the target object may be various parts of the human body, including but not limited to elbow joints, knee joints, hip joints, spine, shoulders, spine, and hands.

Alternatively, a sensor may be mounted on the target object to acquire the attitude data. The sensor can be an accelerometer, a gyroscope, a magnetometer and other sensors capable of accurately acquiring human activity information.

However, it should be noted that the sensors used in the present application are mainly soft sensors because of their relatively low cost.

In addition, besides being read by a conventional strain sensor, the posture data can also be read by using data with geometric position relation such as human joint angles, finger angles and the like.

Generally, after attitude data is acquired, feature extraction processing is performed on the attitude data, but in the field of machine learning, different features are different evaluation indexes, have different dimensions and dimension units, and affect the result of subsequent data analysis, so in order to eliminate the dimension influence among the features, the acquired data needs to be standardized first, so that the acquired data conforms to the standard normal distribution, and the training speed is accelerated later.

Specifically, in some embodiments, after the pose data is obtained, the pose data may be normalized.

The present application is not limited to the specific embodiment of normalization.

Step 200: and performing feature extraction on the attitude data by using a memory network to obtain time sequence features. The memory network can comprise a long-time memory network and a short-time memory network, and the long-time memory network can well solve the problems of cross-time memory and gradient disappearance. Therefore, the application mainly uses a long-term memory network (LSTM).

For example, in the memory network, the attitude data at the current time is subjected to feature extraction by using the parameter matrix, the offset and the time sequence feature at the previous time, so as to obtain the time sequence feature at the current time.

Specifically, the formula may be employed: h _t ＝σ(W·[H _t-1 ,X _t ]) And + b, performing feature extraction on the attitude data at the current moment to obtain the time sequence feature of the current moment.

Where σ is the activation function, W is the weight matrix, b is the offset, H _t-1 For the time-series characteristic of the preceding instant, X _t As attitude data at the present time, H _t Is the time sequence characteristic of the current time.

In order to facilitate subsequent citation of a formula, the posture data is marked as X, feature extraction is performed on the posture data X by using a memory network, and the obtained time sequence feature is marked as H, so that the formula can be replaced by H = LSTM (X).

In addition, in order to retain more timing characteristics, some embodiments may automatically analyze and synthesize the obtained timing characteristics under certain criteria after obtaining the timing characteristics. For example, the attitude data and the timing characteristics are fused to obtain enhanced timing characteristics.

Specifically, the formula may be employed:

and performing fusion splicing operation on the attitude data and the time sequence characteristics.

Wherein H _in To enhance the timing characteristics, X is the pose data and LSTM (X) is the timing characteristics.

The attitude data and the time sequence feature are fused to obtain the enhanced time sequence feature, so that the enhanced time sequence feature can have more relevant information of the attitude data, and the relevant information of the attitude data cannot be lost due to feature extraction.

Since the memory network can only extract the time information from the acquired pose data generally, but the processing of the time dimension alone may cause the missing of the acquired data or errors in the subsequent analysis processing, after the time sequence feature is obtained through the memory network and the pose data and the time sequence feature are fused to obtain the enhanced time sequence feature, the method and the device for extracting the spatial information from the pose data by using the convolution network can be specifically as shown in step 300 in order to reduce the tracking error and obtain the more accurate pose of the target object.

Step 300: and based on the graph structure, performing feature extraction on the time sequence features by utilizing a graph convolution network to obtain space-time features.

Wherein the graph structure is derived based on reference pose data of the target object.

The reference posture data is posture data of the target object in a normal state, that is, the target object has not been subjected to angle changes such as translation or inclination.

Optionally, in some embodiments, the graph structure may be obtained by:

step 301: acquiring reference attitude data corresponding to different parts of a target object;

step 302: mapping the part corresponding to each datum attitude data into a node in a graph structure;

step 303: determining the connection relation between the target node and the other nodes by using the Euclidean distances between the target node and the other nodes;

step 304: and connecting the target node with the other nodes according to the connection relation to form a graph structure.

For the forming process of the graph structure described in step 301 to step 304, reference may be made to fig. 2, where fig. 2 is a schematic diagram of a graph structure generated based on the K-nearest neighbor method, and the details are as follows:

to illustrate the target person, sensors in fig. 2 are sensors, 01 to 20 represent nodes of the respective sensors, the nodes of the respective sensors correspond to respective parts of the human body, for example, 01 and 02 correspond to elbow joints, 03 and 04 correspond to trapezius muscles, 05 and 06 correspond to pectoralis major muscles, 07, 08, 09 and 10 correspond to the upper half and the lower half of the back, 11, 12, 13 and 14 correspond to the upper half and the lower half of the muscles along the spine, 15, 16 and 17 and 18 correspond to the back and the side connected to hip joints, and 19 and 20 correspond to knee joints.

Optionally, a graph structure may be constructed by using a k-nearest neighbor method, specifically, the actual three-dimensional coordinates of each node in the space may be first obtained, and the euclidean distance between each target node and the remaining nodes is calculated by using the following formula:

wherein d is _ij Representing the Euclidean distance between node i and node j, where x _i Is the abscissa, y, of the ith node _i Is the ordinate of the ith node, z _i Is the vertical axis coordinate of the ith node. For the same reason, x _j Is the abscissa, y, of the jth node _j Is the ordinate of the jth node, z _j Is the vertical axis coordinate of the jth node. Each target node is connected with its nearest k other nodes according to the euclidean distance to form k undirected edges, and generally, the value of k is 2, but it should be noted that when there are a plurality of other nodes closer to the target node, the value of k may float upward according to the set range, for example, the value of k may be 3 or 4 at this time.

Illustratively, with 07 as the target node and the remaining 19 nodes as the remaining nodes, since 07 is closer to 09 in fig. 2, when 07 is the target node, the value of k may be 3, that is, 07 has 3 nearest remaining nodes, 09, 03 and 05, and the connecting lines between the three nodes, 07 and 09, 03 and 05, are non-directional edges.

Optionally, after further enhancing the time sequence features to obtain enhanced time sequence features, some embodiments may perform feature extraction on the enhanced time sequence features by using a graph convolution network based on a graph structure to obtain spatio-temporal features.

For example, the feature extraction of the enhanced time sequence feature by the convolutional network to obtain the spatio-temporal feature may include the following sub-steps:

step 310: performing feature extraction on the enhanced time sequence feature by using a graph convolution network to obtain an initial time sequence feature;

step 320: and fusing the adjacent matrix corresponding to the graph structure with the initial time sequence characteristic to obtain the space-time characteristic.

Wherein, the graph convolution network is (GCN).

The adjacency matrix corresponding to the graph structure can be represented by A ∈ R ^N×N Expressed, the formula for a is:

wherein A is _ij =1 represents node i is adjacent to node j, A _ij =0 represents node i is not contiguous with node j, (v) _i ,v _j ) Representing undirected edges connecting node i and node j, E being the set of all undirected edges, a definition matrix D _ii ＝∑ _j A _ij Node v _i Degree D of _ii Indicating the number of undirected edges associated with the node.

Illustratively, assume a node has 3 undirected edges, the degree D of the node _ii Is 3.

Similarly, some embodiments may normalize adjacency matrix a in order to reduce noise information and increase training speed.

Exemplarily, the normalization processing of the adjacency matrix a becomes

Then, the formula can be used:

and fusing the adjacent matrix and the initial time sequence characteristic to obtain the space-time characteristic.

Wherein,

i is an identity matrix and is a matrix of the identity,

w is a weight matrix, H _cs For initial timing characteristics, H _sk Are spatiotemporal features.

It should be noted that the identity matrix I is for introducing self-connection.

In addition, in order to retain more features, some embodiments may automatically analyze and synthesize the obtained spatio-temporal features under certain criteria after obtaining the spatio-temporal features. For example, the attitude data and the spatiotemporal features are fused to obtain enhanced spatiotemporal features.

Specifically, the formula may be employed:

and fusing the attitude data and the space-time characteristics to obtain enhanced space-time characteristics.

Wherein S is an enhanced spatiotemporal feature, X is attitude data, H _sk Is a spatiotemporal feature.

Step 400: and carrying out attitude reconstruction based on the space-time characteristics to obtain an attitude diagram of the target object.

Optionally, in some embodiments, performing pose reconstruction based on the spatiotemporal features to obtain a pose graph of the target object may include the following sub-steps:

step 401: and predicting the time-space characteristics by using the full-connection layer to obtain a space coordinate corresponding to each attitude data.

The full link layer may predict the temporal-spatial features using a modified linear activation function (ReLU), and since each node on the full link layer is connected to all nodes on the previous layer to integrate the extracted features, the parameters of the full link layer are typically the largest, and the parameters are 128, 64, and M × 3, respectively, where the last parameter M × 3 corresponds to the spatial coordinates (x, y, z) of the M tracking points.

Step 402: and obtaining a posture graph according to the space coordinates.

Specifically, three-dimensional reconstruction is performed according to the space coordinates to obtain a posture graph.

Optionally, in some embodiments, after the spatio-temporal features are further enhanced to obtain the enhanced spatio-temporal features, the pose reconstruction may be performed based on the enhanced spatio-temporal features to obtain a pose graph of the target object.

With reference to the above embodiments, the following describes training of a network model related in the present application, specifically referring to fig. 3 and 4, where fig. 3 is a schematic flow chart of human motion tracking in the present application, and fig. 4 is a structure of a motion tracking model based on a long-time and short-time memory network and a graph convolution network, and specifically includes the following steps:

1) Preprocessing data;

specifically, a bending sensor is arranged on a human joint to acquire attitude data, and acquired data signals are preprocessed and collected to obtain a data set.

Among them, the public data sets adopted in the present embodiment are the depfull-Bodydataset and the Stretch-Sens ingglovedataet dataset, the depfull-Bodydataset provides the whole-body microfluidic sensor data of one adult under three actions, the Stretch-Sens ingglovedataet dataset provides the capacitive Stretch sensor data of one hand of ten adults, and the two datasets cover the whole-body action tracking and the hand action tracking.

2) Dividing a data set;

for example, when testing a first data set, the training set and the test set are partitioned according to a specification. When testing the second data set, the data was randomly shuffled and divided into training and test sets in the ratio of 9.

Calculating through a long-time and short-time memory network;

specifically, the sample data in the training set, namely the posture data X, is input into a long-short time memory network for feature extraction, and the time sequence feature H is output.

For example, the attitude data at the current time may be subjected to feature extraction by using the parameter matrix, the offset, and the time series feature at the previous time, so as to obtain the time series feature at the current time.

Where σ is the activation function, W is the weight matrix, b is the offset, H _t-1 For the time-series characteristic of the preceding moment, X _t Attitude data for time t, i.e. the current time, H _t Is the time sequence characteristic of the current time. X _t-1 Is attitude data at time (t-1), X _t-2 Is the attitude data at the time (t-2).

In order to reserve more time sequence characteristics, the attitude data and the time sequence characteristics can be subjected to fusion splicing operation.

For example, the formula:

Wherein H _in To enhance timing characteristics, X is the attitude data and H is the timing characteristics.

5) Calculating through a graph convolution network;

in particular, the timing characteristic H will be enhanced _in And inputting the graph convolution network for feature extraction, and outputting the initial time sequence feature.

6) Generating a graph structure from the preprocessed data;

specifically, reference attitude data in a training set is obtained; mapping the part corresponding to each datum attitude data into a node in a graph structure; determining the connection relation between the target node and the other nodes by using the Euclidean distances between the target node and the other nodes; and connecting the target node with the other nodes according to the connection relation to form a graph structure.

7) And acquiring an adjacency matrix of the graph structure, and normalizing the adjacency matrix.

8) Initial timing characteristics H to be output _cs And the adjacency matrix after normalization processing

Inputting the graph convolution network again for fusion, and outputting the space-time characteristics H _sk 。

9) And fusing the attitude data and the space-time characteristics to further obtain the enhanced space-time characteristics.

10 The obtained enhanced space-time characteristics are input into a full-connection layer to carry out three-dimensional reconstruction, and a posture diagram of human motion is output to finish training.

It should be noted that after each training, the network parameters of the graph convolution network are updated, and then the next training is performed until the precision of the motion tracking model meets the requirement, and the training is finished.

11 Verifying the effectiveness of the training model by using the attitude data of the test set, and obtaining an evaluation model by using an evaluation index as an average error RMSE of data tracking.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of the posture processing system provided in the present application, and the posture processing system 10 includes a data acquisition unit 01 and a posture processing device 02.

The data acquisition unit 01 is arranged at different parts of the target object and acquires attitude data of the response part of the target object; and the posture processing device 02 is in communication connection with the data acquisition unit 01.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the posture processing device provided in the present application, where the posture processing device 02 includes an obtaining module 001, a first extracting module 002, a second extracting module 003, and a rebuilding module 004, where functions of the modules are as follows:

the acquisition module 001 is used for acquiring attitude data of different parts of the target object at the same moment;

the first extraction module 002 is used for extracting the features of the attitude data by using a memory network to obtain the time sequence features;

the second extraction module 003 is used for extracting the characteristics of the time sequence characteristics by using a graph convolution network based on a graph structure to obtain space-time characteristics; the graph structure is obtained based on reference attitude data of the target object;

and the reconstruction module 004 is used for performing attitude reconstruction based on the space-time characteristics to obtain an attitude map of the target object.

It is understood that each module is also used for implementing the method of any of the above embodiments.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a computer-readable storage medium 140 provided in the present application, where the computer-readable storage medium 140 stores program data 141, and when the program data 141 is executed by a processor, the method is implemented as follows:

acquiring attitude data of different parts of a target object at the same moment; performing feature extraction on the attitude data by using a memory network to obtain a time sequence feature; based on the graph structure, performing feature extraction on the time sequence features by utilizing a graph convolution network to obtain space-time features; the graph structure is obtained based on reference attitude data of the target object; and performing attitude reconstruction based on the space-time characteristics to obtain an attitude image of the target object.

It will be appreciated that program data 141, when executed by a processor, is also used to implement the method of any of the embodiments described above.

Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A method of gesture processing, the method comprising:

acquiring attitude data of different parts of a target object at the same moment;

performing feature extraction on the attitude data by using a memory network to obtain a time sequence feature;

based on the graph structure, performing feature extraction on the time sequence features by utilizing a graph convolution network to obtain space-time features; the graph structure is obtained based on reference pose data of the target object;

and performing attitude reconstruction based on the space-time characteristics to obtain an attitude image of the target object.

2. The method of claim 1, wherein the performing feature extraction on the posture data by using a memory network to obtain a time series feature comprises:

in the memory network, feature extraction is carried out on the attitude data at the current moment by using a parameter matrix, an offset and the time sequence feature at the previous moment, so as to obtain the time sequence feature at the current moment.

3. The method according to claim 1, wherein the feature extraction of the time-series feature by using a graph convolution network based on a graph structure to obtain the spatio-temporal feature comprises:

fusing the attitude data and the time sequence characteristics to obtain enhanced time sequence characteristics;

based on the graph structure, the time sequence characteristics are subjected to characteristic extraction by utilizing a graph convolution network to obtain space-time characteristics, and the method comprises the following steps:

and based on a graph structure, performing feature extraction on the enhanced time sequence features by using a graph convolution network to obtain the space-time features.

4. The method of claim 3, wherein the feature extraction of the enhanced time-series features using a graph convolution network based on a graph structure to obtain the spatio-temporal features comprises:

performing feature extraction on the enhanced time sequence feature by using a graph convolution network to obtain an initial time sequence feature;

and fusing the adjacent matrix corresponding to the graph structure with the initial time sequence characteristic to obtain the space-time characteristic.

5. The method according to any one of claims 1-4, wherein before performing the pose reconstruction based on the spatiotemporal features to obtain the pose graph of the target object, the method comprises:

fusing the attitude data and the space-time characteristics to obtain enhanced space-time characteristics;

the posture reconstruction based on the space-time characteristics to obtain the posture graph of the target object comprises the following steps:

and performing attitude reconstruction based on the enhanced space-time characteristics to obtain an attitude diagram of the target object.

6. The method of claim 1, wherein performing pose reconstruction based on the spatiotemporal features to obtain a pose graph of the target object comprises:

predicting the space-time characteristics by using a full-connection layer to obtain a space coordinate corresponding to each attitude data;

and obtaining the attitude map according to the space coordinates.

7. The method of claim 1, wherein the graph structure is derived based on reference pose data of the target object, comprising:

acquiring reference attitude data corresponding to different parts of the target object;

mapping a part corresponding to each datum attitude data into a node in the graph structure;

determining the connection relation between a target node and other nodes by using the Euclidean distance between the target node and other nodes;

and connecting the target node with the other nodes according to the connection relation to form the graph structure.

8. The method of claim 1, wherein before the extracting the features of the pose data by using the memory network to obtain the time-series features, the method comprises:

and carrying out normalization processing on the attitude data.

9. An attitude processing apparatus characterized by comprising:

the acquisition module is used for acquiring attitude data of different parts of the target object at the same moment;

the first extraction module is used for extracting the characteristics of the attitude data by using a memory network to obtain time sequence characteristics;

the second extraction module is used for extracting the characteristics of the time sequence characteristics by using a graph convolution network based on a graph structure to obtain space-time characteristics; the graph structure is obtained based on reference pose data of the target object;

and the reconstruction module is used for carrying out attitude reconstruction based on the space-time characteristics to obtain an attitude image of the target object.

10. An attitude processing system, characterized in that the attitude processing system comprises:

the data acquisition unit is arranged at different parts of a target object and acquires attitude data of a response part of the target object;

a pose processing apparatus communicatively coupled to the data acquisition unit, the pose processing apparatus of claim 9.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program data for implementing the posture processing method according to any one of claims 1 to 8 when executed by a processor.