CN114972874A - Three-dimensional human body classification and generation method and system for complex action sequence - Google Patents

Three-dimensional human body classification and generation method and system for complex action sequence Download PDF

Info

Publication number
CN114972874A
CN114972874A CN202210635201.3A CN202210635201A CN114972874A CN 114972874 A CN114972874 A CN 114972874A CN 202210635201 A CN202210635201 A CN 202210635201A CN 114972874 A CN114972874 A CN 114972874A
Authority
CN
China
Prior art keywords
sequence
action
complex
training
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210635201.3A
Other languages
Chinese (zh)
Inventor
宋文凤
张欣宇
侯霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202210635201.3A priority Critical patent/CN114972874A/en
Publication of CN114972874A publication Critical patent/CN114972874A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional human body classification and generation method and a system of a complex action sequence, which are applied to the field of virtual reality and comprise the following steps: acquiring and preprocessing a complex motion video to construct a data set; performing key point identification on the data set to obtain human body key points and action sequence posture information as a training set; constructing a complex motion sequence classification coding model based on three-dimensional geometry, combining the input and the output of the complex motion sequence classification coding model into a sequence for coding and decoding training, and constructing a complex motion sequence generation model based on three-dimensional geometry; and inputting the test set into the model to obtain action sequences of various test set action categories. According to the invention, the standard three-dimensional geometric sequence is coded into the geometric parameters containing the time information, so that the learning of different action types in hidden space distribution by a network is enhanced, the action types can be accurately identified under the condition of complex actions, a reasonable action sequence is generated, and the identification accuracy and the action diversity are improved.

Description

Three-dimensional human body classification and generation method and system for complex action sequence
Technical Field
The invention relates to the field of virtual reality, in particular to a three-dimensional human body classification and generation method and system of a complex action sequence.
Background
In the field of human three-dimensional reconstruction, human motion prediction is a very challenging task. Based on CAVE, semantic labels of the actions are used as prior conditions and are put into network training together with action sequences, and an infinite number of three-dimensional human action sequences are generated according to the labels, so that the sequences look more real. Much of the previous work has been based on motion sequences, implicitly modeling the spatial structure of human bones through structural prediction, and applying it to individual joints. Most deep learning methods for motion recognition now use shallow convolutional networks. Using convolutional neural networks, one can use random gradient descent to learn features and end-to-end learning with reduced classification, and limit some methods of support vector machine and manual feature correlation. The extracted features are obtained by training data learning in a direct convolution filter, and the method has the main advantages that: the feature extractor and classifier parameters are optimized in a very simple and convenient end-to-end manner, and the extracted features are adaptively optimized for specific attributes. Multi-label convolutional neural networks are preferred over support vector machines because they allow a more comprehensive learning of the relationships between attributes.
The three-dimensional human body action sequence generation aims to generate a three-dimensional human body meeting the expected action type under a given condition. Deep learning based approaches have dominated the area of hard recognition. These methods can be broadly divided into two categories: the first type mainly extracts action features from three-dimensional information to obtain the distribution of different actions in a hidden space, and generates complex and various action sequences through different distributions; the second type is that bones are predicted from images by a style migration mode on the premise of giving action data, weights are bound, and human single actions with various styles are generated.
The existing three-dimensional human body motion small data sets (NTU RGBD, HumanAct12, UESTC, BABEL) are all obtained by obtaining human body motion sequences through a common camera or a depth camera, and then obtaining information of key points of a human body and information of body states, postures and the like through VIBE processing, and the largest data set is tens of thousands of pictures. In the face of shooting angles of different scenes, the process of obtaining the camera view conversion function model by learning the characteristic change is complex.
The existing three-dimensional motion generation technology is still in a development stage, and when the existing model has a plurality of motion types, the generated motion types are simpler; or an action sequence that can only generate a single action. In a complex action sequence, related actions may intersect, which further results in a decrease in the accuracy of the actions, and it is more difficult to generate a corresponding action sequence.
Therefore, it is an urgent need for the skilled person to provide a three-dimensional human body classification and generation method and system capable of accurately recognizing and classifying and generating a complex motion sequence of a human body sequence corresponding to a motion under the condition of multiple complex motion types.
Disclosure of Invention
In view of this, the invention provides a method and a system for classifying and generating a three-dimensional human body with a complex motion sequence. The VIBE is adopted to process the picture data to obtain the human key points and the action sequence posture information in each frame of picture, so that the influence of lens distortion on action generation can be weakened as much as possible; two linear full-connection layers are adopted for a standard three-dimensional geometric sequence, and a long and short memory neural network with time information is attached to obtain geometric parameters containing time information, and the geometric parameters are input into a complex motion sequence generation model based on three-dimensional geometry as prior, so that the learning of the network on hidden space distribution of different motion types can be enhanced; and the complex actions in the data set are parameterized, so that the action types can still be accurately identified under the condition of the complex actions, a reasonable action sequence is generated, and the identification accuracy and the action diversity are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a three-dimensional human body classification and generation method of a complex action sequence comprises the following steps:
step (1): and acquiring a complex motion video, preprocessing the complex motion video, and constructing a data set.
Step (2): and carrying out key point identification on the data set to obtain human body key points and action sequence posture information as a training set.
And (3): and constructing a complex motion sequence classification coding model based on three-dimensional geometry, and taking a training set as input to obtain the output of the geometric parameters containing time information corresponding to motion categories of the training set.
And (4): and combining the input and the output of the complex motion sequence classification coding model based on the three-dimensional geometry into a sequence for coding and decoding training, and constructing a complex motion sequence generation model based on the three-dimensional geometry.
And (5): inputting a test set of actions to be generated into the complex action sequence classification coding model based on the three-dimensional geometry, acquiring geometric parameters containing time information corresponding to action categories of the test set, and generating a model through the trained complex action sequence based on the three-dimensional geometry to obtain action sequences of various action categories of the test set.
Optionally, in step (1), the preprocessing the complex motion video includes: clipping, cutting off frames, converting video data into picture data, and selecting the picture data with rich motion characteristics to construct a data set.
Optionally, in the step (2), identifying key points of the data set specifically includes: and processing the picture data by adopting VIBE to obtain the human body key points and the action sequence posture information in each frame of picture.
Optionally, in step (3), the complex motion sequence classification coding model based on three-dimensional geometry is based on a coordinated coding, and a trained complex motion sequence classification model is constructed, which specifically includes the following steps:
processing the training set and the corresponding label by adopting a data set function to obtain the length of the training set;
obtaining the tensor of the corresponding three-dimensional geometric model by iterating the training set in the data iterator;
and migrating the categorical codes to a GPU, and obtaining the geometric parameters containing time information through two linear full-connection layers and a long and short memory neural network with time information.
Optionally, in step (4), constructing a three-dimensional geometry-based complex motion sequence generation model based on a transform model, including: two stages of coding training and decoding training.
Optionally, the coding training specifically includes:
combining a training set and geometric information parameters containing time information corresponding to the training set into a sequence, and inputting the sequence into a transform coding through a gate control cycle unit to obtain Gaussian distribution of different action types in a hidden space;
the decoding training specifically comprises:
the variance and the mean value of different action categories in the hidden space are learned, human body three-dimensional information and action postures are obtained in a transform decoder, and the parameters are rendered by using an SMPL (simple Markov chain) model to obtain a complete human body sequence.
Optionally, the training loss function of the complex motion sequence generation model for constructing the three-dimensional geometry based on the transform model is as follows:
Figure BDA0003681812040000041
wherein: vt represents three-dimensional geometric information of input data, and Pt represents predicted human body joint points and human body motion posture information.
Optionally, a DropPath module is introduced in the process of building a three-dimensional geometry-based complex motion sequence generation model based on a transformer model, so that two stages of encoding training and decoding training are performed alternately.
The invention also provides a three-dimensional human body classification and generation system of a complex action sequence, which comprises the following steps:
an acquisition module: the system is used for acquiring a complex motion video, preprocessing the complex motion video and constructing a data set;
a data identification module: the system comprises a data set, a video acquisition module, a motion sequence analysis module and a motion sequence analysis module, wherein the data set is used for acquiring human body key points and motion sequence posture information in each frame of picture;
a first building block: coding and training a three-dimensional geometric model corresponding to a training set based on category to obtain geometric information parameters containing time information, and constructing a complex action sequence classification coding model based on three-dimensional geometry;
a second building block: the method comprises the steps of combining a training set and geometric information parameters which are corresponding to the training set and contain time information into a sequence, coding and decoding training based on a transform model until a training loss function of a complex motion sequence generation model which constructs three-dimensional geometry based on the transform model converges, and constructing the complex motion sequence generation model based on the three-dimensional geometry;
an input generation module: the method is used for inputting a test set of actions to be generated into the complex action sequence classification coding model based on the three-dimensional geometry, acquiring geometric parameters which correspond to action types of the test set and contain time information, and generating a model through the trained complex action sequence based on the three-dimensional geometry to obtain action sequences of various action types of the test set.
Compared with the prior art, the three-dimensional human body classification and generation method and system of the complex motion sequence disclosed by the invention are known through the technical scheme. The VIBE is adopted to process the picture data to obtain the human key points and the action sequence posture information in each frame of picture, so that the influence of lens distortion on action generation can be weakened as much as possible; two linear full-connection layers are adopted for a standard three-dimensional geometric sequence, and a long and short memory neural network with time information is attached to obtain geometric parameters containing time information, and the geometric parameters are input into a complex motion sequence generation model based on three-dimensional geometry as prior, so that the learning of the network on hidden space distribution of different motion types can be enhanced; and the complex actions in the data set are parameterized, so that the action types can still be accurately identified under the condition of the complex actions, a reasonable action sequence is generated, and the identification accuracy and the action diversity are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a single action type training process according to the present invention.
Fig. 3 is a schematic diagram of the system structure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment 1 of the invention discloses a three-dimensional human body classification and generation method of a complex action sequence, which comprises the following steps:
step (1): acquiring a complex motion video, and preprocessing the complex motion video, wherein the preprocessing comprises the following steps: clipping, cutting off frames, converting video data into picture data, and selecting the picture data with rich motion characteristics to construct a data set, specifically:
motion videos of a plurality of motion types are collected through a camera.
And editing the action video to acquire the video with the human body in each frame.
And performing frame truncation operation on each frame of video, converting the video data into picture data, and constructing a data set by adopting pictures with abundant motion characteristics.
Step (2): and processing the picture data by adopting VIBE, and identifying key points of the data set to obtain the key points of the human body and the gesture information of the action sequence in each frame of picture as a training set.
Further, with respect to the acquisition and construction of the data set: and a rear camera of a common mobile phone is adopted to collect data. The acquired video needs to meet the requirement that the human body is within the visual angle without shielding and moving (in-situ action), and if data are acquired randomly, a lot of data are invalid. To solve this problem, high effectiveness of each video action is ensured by capturing video for a short time and at a high frequency. To obtain better experimental results, video with a resolution of 1080 × 1920, averaging 30 frames per second, was collected, while the duration of each video segment was controlled to be around 15-20 seconds. Meanwhile, based on 12 action types based on the HumanAct12 data set, five videos with complex action types are innovatively defined:
1. walk, sit down and drink water
2. Squatting, standing and walking
3. Telephone for receiving and connecting by sitting
4. Running and jumping
5. Eating east and west and throwing things
Further, to reduce the size of the network, the input is simplified to human key points and motion sequence poses in the video. Therefore, in order to improve the quality of the data set, the data set is processed by the method of VIBE in the present embodiment.
And (3): constructing a complex action sequence classification coding model based on three-dimensional geometry based on the catagorical coding, which specifically comprises the following steps:
and processing the training set and the corresponding label by adopting a data set function to obtain the length of the training set.
And iterating the training set in the data iterator to obtain the corresponding tensor of the three-dimensional geometric model.
And migrating the categorical codes to a GPU, and obtaining the geometric parameters containing time information through two linear full-connection layers and a long and short memory neural network with time information.
And taking the training set as input to obtain the output of the geometric parameters containing the time information corresponding to the action type of the training set.
And (4): and combining the input and the output of the complex motion sequence classification coding model based on the three-dimensional geometry into a sequence, carrying out coding and decoding training based on a transformer model, and constructing a complex motion sequence generation model based on the three-dimensional geometry.
The coding training specifically comprises:
combining a training set and geometric information parameters containing time information corresponding to the training set into a sequence, inputting the sequence into a transform coding through a gate control cycle unit, and obtaining Gaussian distribution of different action types in a hidden space.
The decoding training specifically comprises:
the variance and the mean value of different action categories in the hidden space are learned, human body three-dimensional information and action postures are obtained in a transform decoder, and the parameters are rendered by using an SMPL (simple Markov chain) model to obtain a complete human body sequence.
The training loss function of the complex motion sequence generation model for constructing the three-dimensional geometry based on the transformer model is as follows:
Figure BDA0003681812040000081
wherein: vt represents three-dimensional geometric information of input data, and Pt represents predicted human joint points and human motion posture information.
The method also comprises the step of introducing a DropPath module in the process of constructing a three-dimensional geometry-based complex motion sequence generation model based on a transformer model, so that two stages of coding training and decoding training are alternately carried out, specifically: 1. the Join layer will be discarded randomly with a certain probability but it must be guaranteed that at least one branch is on. 2. And globally discarding and randomly selecting one branch.
And (5): inputting a test set of actions to be generated into the complex action sequence classified coding model based on the three-dimensional geometry, acquiring geometric parameters (model hidden space coding corresponding to the action category of the test set) containing time information corresponding to the action category of the test set, and generating a model (transform variable self-encoder) through the trained complex action sequence based on the three-dimensional geometry to obtain action sequences of multiple action categories of the test set.
In addition, the model may be tested, with the test being trained separately for each individual action type. The data and results are shown in Table 1. Firstly, videos of the same action type in a data set are divided, training is carried out respectively to output training results, and then the recognition accuracy and the action generation accuracy of each action model are counted to judge that the recognition and the generation of corresponding actions of each sub-model are more accurate.
TABLE 1 Single action test data and results
Figure BDA0003681812040000091
Example 2
The embodiment 2 of the invention discloses a three-dimensional human body classification and generation system of a complex action sequence, which comprises the following steps:
an acquisition module: the system comprises a video acquisition module, a data processing module and a data processing module, wherein the video acquisition module is used for acquiring a complex motion video, preprocessing the complex motion video and constructing a data set;
a data identification module: the image processing device is used for processing image data by adopting VIBE and performing key point identification on the data set to obtain human key points and action sequence posture information in each frame of image as a training set;
a first building block: based on the category, carrying out coding training on a three-dimensional geometric model corresponding to the training set to obtain geometric information parameters containing time information, and constructing a complex action sequence classification coding model based on the three-dimensional geometry;
a second building block: the method comprises the steps of combining a training set and geometric information parameters which are corresponding to the training set and contain time information into a sequence, coding and decoding training based on a transform model until a training loss function of a complex motion sequence generation model which constructs three-dimensional geometry based on the transform model converges, and constructing the complex motion sequence generation model based on the three-dimensional geometry;
an input generation module: the method is used for inputting a test set of actions to be generated into the complex action sequence classification coding model based on three-dimensional geometry, acquiring geometric parameters (model hidden space coding corresponding to the action type of the test set) containing time information corresponding to the action type of the test set, and generating a model through the trained complex action sequence based on three-dimensional geometry to obtain action sequences of various action types of the test set.
In addition, the device also comprises a single-action detection module: for training the detection separately for each separate action type.
The invention discloses a three-dimensional human body classification and generation method and a three-dimensional human body classification and generation system for a complex action sequence. The VIBE is adopted to process the picture data to obtain the human key points and the action sequence posture information in each frame of picture, so that the influence of lens distortion on action generation can be weakened as much as possible; two linear full-connection layers and a long and short memory neural network with time information are adopted for a standard three-dimensional geometric sequence to obtain geometric parameters containing time information, and the geometric parameters are input into a complex action sequence generation model based on three-dimensional geometry as prior, so that the learning of the network on the hidden space distribution of different action types can be enhanced; and the complex actions in the data set are parameterized, so that the action types can still be accurately identified under the condition of the complex actions, a reasonable action sequence is generated, and the identification accuracy and the action diversity are improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A three-dimensional human body classification and generation method of a complex action sequence is characterized by comprising the following steps:
step (1): acquiring a complex motion video, preprocessing the complex motion video, and constructing a data set;
step (2): performing key point identification on the data set to obtain human body key points and action sequence posture information as a training set;
and (3): constructing a complex motion sequence classification coding model based on three-dimensional geometry, and taking the training set as input to obtain the output of geometric parameters containing time information corresponding to motion categories of the training set;
and (4): combining the input and the output of the complex motion sequence classification coding model based on the three-dimensional geometry into a sequence for coding and decoding training, and constructing a complex motion sequence generation model based on the three-dimensional geometry;
and (5): inputting a test set of actions to be generated into the complex action sequence classification coding model based on the three-dimensional geometry, acquiring geometric parameters containing time information corresponding to the action types of the test set, and generating a model through the trained complex action sequence based on the three-dimensional geometry to obtain action sequences of various action types of the test set.
2. The method for classifying and generating the three-dimensional human body with the complex motion sequence according to claim 1, wherein in the step (1), the preprocessing the complex motion video comprises: clipping, cutting off frames, converting video data into picture data, and selecting the picture data with rich motion characteristics to construct a data set.
3. The method for classifying and generating a three-dimensional human body with a complex motion sequence according to claim 2, wherein in the step (2), the identification of the key points of the data set specifically comprises: and processing the picture data by adopting VIBE to obtain the human body key points and the action sequence posture information in each frame of picture.
4. The method according to claim 1, wherein in the step (3), the complex motion sequence classification coding model based on three-dimensional geometry constructs a trained complex motion sequence classification model based on categorical coding, and specifically comprises the following steps:
processing the training set and the corresponding label by adopting a data set function to obtain the length of the training set;
obtaining the tensor of the corresponding three-dimensional geometric model by iterating the training set in the data iterator;
and migrating the categorical code to a GPU, and obtaining the geometric parameters containing time information through two linear full-connection layers and a long and short memory neural network with time information.
5. The method for classifying and generating a three-dimensional human body with a complex motion sequence according to claim 1, wherein in the step (4), the step of constructing the three-dimensional geometry-based complex motion sequence generation model based on a transformer model comprises: two stages of coding training and decoding training.
6. The method for classifying and generating a three-dimensional human body with a complex motion sequence according to claim 5, wherein the coding training specifically comprises:
combining the training set and geometric information parameters containing time information corresponding to the training set into a sequence, and inputting the sequence into a transform coding through a gate control cycle unit to obtain Gaussian distribution of different action types in a hidden space;
the decoding training specifically comprises:
the variance and the mean value of different action categories in the hidden space are learned, human body three-dimensional information and action postures are obtained in a transform decoder, and the parameters are rendered by using an SMPL (simple Markov chain) model to obtain a complete human body sequence.
7. The method of claim 6, wherein the training loss function of the complex motion sequence generation model for constructing three-dimensional geometry based on the transform model is as follows:
Figure FDA0003681812030000021
wherein: vt represents three-dimensional geometric information of input data, and Pt represents predicted human body joint points and human body motion posture information.
8. The method as claimed in claim 6, further comprising introducing a DropPath module in the process of constructing the three-dimensional geometry-based complex motion sequence generation model based on a transform model, so that the encoding training and the decoding training are performed alternately.
9. A three-dimensional human body classification and generation system of a complex motion sequence is characterized by comprising:
an acquisition module: the system comprises a video acquisition module, a data processing module and a data processing module, wherein the video acquisition module is used for acquiring a complex motion video, preprocessing the complex motion video and constructing a data set;
a data identification module: the system comprises a data set, a video acquisition module, a motion sequence analysis module and a motion sequence analysis module, wherein the data set is used for acquiring image data and motion sequence posture information of each frame of image;
a first building block: based on the category, carrying out coding training on the three-dimensional geometric model corresponding to the training set to obtain geometric information parameters containing time information, and constructing a complex action sequence classification coding model based on the three-dimensional geometry;
a second building block: the method comprises the steps of combining a training set and geometric information parameters which contain time information and correspond to the training set into a sequence, coding and decoding training based on a transform model until a training loss function of a complex motion sequence generation model which constructs three-dimensional geometry based on the transform model converges, and constructing the complex motion sequence generation model based on the three-dimensional geometry;
an input generation module: the method is used for inputting a test set of actions to be generated into the complex action sequence classification coding model based on the three-dimensional geometry, acquiring geometric parameters which correspond to action types of the test set and contain time information, and generating a model through the trained complex action sequence based on the three-dimensional geometry to obtain action sequences of various action types of the test set.
CN202210635201.3A 2022-06-07 2022-06-07 Three-dimensional human body classification and generation method and system for complex action sequence Pending CN114972874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210635201.3A CN114972874A (en) 2022-06-07 2022-06-07 Three-dimensional human body classification and generation method and system for complex action sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210635201.3A CN114972874A (en) 2022-06-07 2022-06-07 Three-dimensional human body classification and generation method and system for complex action sequence

Publications (1)

Publication Number Publication Date
CN114972874A true CN114972874A (en) 2022-08-30

Family

ID=82958738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210635201.3A Pending CN114972874A (en) 2022-06-07 2022-06-07 Three-dimensional human body classification and generation method and system for complex action sequence

Country Status (1)

Country Link
CN (1) CN114972874A (en)

Similar Documents

Publication Publication Date Title
Sabir et al. Recurrent convolutional strategies for face manipulation detection in videos
CN107577985B (en) The implementation method of the face head portrait cartooning of confrontation network is generated based on circulation
CN113365147B (en) Video editing method, device, equipment and storage medium based on music card point
CN109191369A (en) 2D pictures turn method, storage medium and the device of 3D model
CN112668492B (en) Behavior recognition method for self-supervision learning and skeleton information
CN108960059A (en) A kind of video actions recognition methods and device
CN102271241A (en) Image communication method and system based on facial expression/action recognition
CN115908659A (en) Method and device for synthesizing speaking face based on generation countermeasure network
CN106407369A (en) Photo management method and system based on deep learning face recognition
CN113221663A (en) Real-time sign language intelligent identification method, device and system
CN109948721A (en) A kind of video scene classification method based on video presentation
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN112819689B (en) Training method of human face attribute editing model, human face attribute editing method and human face attribute editing equipment
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN111160356A (en) Image segmentation and classification method and device
CN111597978B (en) Method for automatically generating pedestrian re-identification picture based on StarGAN network model
CN113191216A (en) Multi-person real-time action recognition method and system based on gesture recognition and C3D network
CN116825365A (en) Mental health analysis method based on multi-angle micro-expression
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
CN116580453A (en) Human body behavior recognition method based on space and time sequence double-channel fusion model
CN111275778A (en) Face sketch generating method and device
Yu et al. Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
CN112488165A (en) Infrared pedestrian identification method and system based on deep learning model
CN111325149A (en) Video action identification method based on voting time sequence correlation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination