CN111222459B - Visual angle independent video three-dimensional human body gesture recognition method - Google Patents

Visual angle independent video three-dimensional human body gesture recognition method Download PDF

Info

Publication number
CN111222459B
CN111222459B CN202010010324.9A CN202010010324A CN111222459B CN 111222459 B CN111222459 B CN 111222459B CN 202010010324 A CN202010010324 A CN 202010010324A CN 111222459 B CN111222459 B CN 111222459B
Authority
CN
China
Prior art keywords
dimensional
human body
video
module
visual angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010010324.9A
Other languages
Chinese (zh)
Other versions
CN111222459A (en
Inventor
邱丰
马利庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010010324.9A priority Critical patent/CN111222459B/en
Publication of CN111222459A publication Critical patent/CN111222459A/en
Application granted granted Critical
Publication of CN111222459B publication Critical patent/CN111222459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a visual angle independent video three-dimensional human body gesture recognition method, which comprises the following steps: step 1: virtual data generation phase: synthesizing virtual camera parameters based on a human body posture data set containing three-dimensional labels at will, and then generating a two-dimensional/three-dimensional data tuple; step 2: model training stage: training a first module of a modularized neural network for obtaining a model with camera view generalization capability and a second module of the modularized neural network for obtaining a model capable of protecting inter-frame motion continuity by using the generated two-dimensional/three-dimensional data elements respectively; step 3: unconstrained video reasoning phase: and (3) predicting the video acquired by any unconstrained acquisition by utilizing the multi-module depth neural network trained in the step (2) to acquire a three-dimensional human body posture recognition result. Compared with the prior art, the method is based on the modularized neural network combined training method, and the generalization capability of three-dimensional human body gesture recognition is effectively improved.

Description

Visual angle independent video three-dimensional human body gesture recognition method
Technical Field
The invention relates to a three-dimensional human body posture recognition technology in the technical field of computer vision, in particular to a video three-dimensional human body posture recognition method which aims at the unknown visual angle data synthesis, modularized neural network training and preprocessing method of video tasks, namely visual angle independence.
Background
In recent decades, with the development of technology related to artificial intelligence and deep learning, the problem of human body posture recognition has also advanced. Video human body gesture recognition, in particular to three-dimensional human body gesture recognition aiming at video, has long been important content in the fields of computer vision and intelligent human-computer interaction; the portable mobile electronic device integrates multiple subjects such as digital image processing, man-machine interaction, computer graphics, computer vision and the like, and further integrates the popularization of portable mobile electronic devices such as security monitoring networks, intelligent robots, intelligent mobile phones, tablet personal computers and the like into the life of people.
The existing three-dimensional human body posture recognition algorithm can be divided into single-stage human body posture recognition and multi-stage human body posture recognition according to a predicted target: the method has the advantages that more hidden information in the pictures is utilized, the accuracy is higher in a laboratory environment, but the method is limited by the fact that the RGB picture data with three-dimensional labels are missing and cannot be separated from the laboratory acquisition environment, so that generalization capability is poor, and the method is difficult to convert into products with strong usability to generate commercial value; the method is characterized in that training of a two-dimensional human body posture estimation part can be performed by collecting a large number of internet unconstrained pictures in a manual labeling mode, and the problem of two-dimensional to three-dimensional prediction is proved to be a task which is relatively easy to complete through the paper of Martinez et al. In order to facilitate transformation, the invention generally adopts a multi-stage human body gesture recognition architecture, however, on the basis of the existing stronger two-dimensional human body key point detection model, the existing method is still easy to be overfitted to the camera parameters of the dataset because of the visual angle deficiency limited by the three-dimensional acquisition data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a visual angle independent video three-dimensional human body gesture recognition method, and provides a virtual visual angle synthesis method, wherein a camera visual angle enhancement module is utilized to generate a random visual angle, and a camera projection relationship is matched to obtain a two-dimensional/three-dimensional data tuple which is used for multi-module training and generalization capability verification of a neural network; in addition, the input of three-dimensional prediction is normalized by utilizing a two-dimensional human body posture detection frame mode, so that the three-dimensional human body posture estimation method in an unconstrained environment is separated from the limit of internal parameters and external parameters of a camera, and the method has stronger generalization capability.
The aim of the invention can be achieved by the following technical scheme:
a visual angle independent video three-dimensional human body gesture recognition method, the recognition method comprising:
step 1: virtual data generation phase: synthesizing virtual camera parameters based on a human body posture data set containing three-dimensional labels at will, and then generating a two-dimensional/three-dimensional data tuple;
step 2: model training stage: training a first module of a modularized neural network for obtaining a model with camera view generalization capability and a second module of the modularized neural network for obtaining a model capable of protecting inter-frame motion continuity by using the generated two-dimensional/three-dimensional data elements respectively;
step 3: unconstrained video reasoning phase: and (3) predicting the video acquired by any unconstrained acquisition by utilizing the multi-module depth neural network trained in the step (2) to acquire a three-dimensional human body posture recognition result.
Further, the step 1 specifically includes: for any human body posture data set containing three-dimensional labels, a camera view angle enhancement module is adopted to synthesize virtual camera parameters, and a projection relation is utilized to generate a two-dimensional/three-dimensional data tuple.
Further, the camera parameters include external parameters for determining the position and orientation of the camera and internal parameters for determining the projected focal length of the camera.
Further, the first module in the step 2 performs training of view enhancement by using a single frame data tuple.
Further, the second module in the step 2 performs timing model training by using the continuous sequence of data tuples.
The first module and the second module only need to meet the condition that the first module is a single-frame two-dimensional to three-dimensional prediction module, the second module is a time sequence three-dimensional to three-dimensional correction module, and the first module and the second module are connected in series to complete two-dimensional to three-dimensional prediction.
Further, before inputting the neural network, the steps 2 and 3 further include a two-dimensional detection normalization preprocessing process for performing camera-independent two-dimensional detection on the two-dimensional detection result, where the corresponding description formula is as follows:
Figure BDA0002356917620000031
wherein K is x,y Representing two-dimensional detection returnsUnifying the two-dimensional point coordinates after pretreatment,
Figure BDA0002356917620000032
representing the original two-dimensional point coordinates,
Figure BDA0002356917620000033
representing the center coordinates, w, of a two-dimensional detection frame d ,h d The width and the height of the two-dimensional detection frame are respectively.
Further, the video obtained by unconstrained acquisition in the step 3 specifically includes natural condition acquisition or video sequences subjected to scaling, clipping, speed changing and other color adjustment transformations.
Compared with the prior art, the invention has the following advantages:
(1) The visual angle independent video three-dimensional human body gesture recognition method provided by the invention has the advantages that in the virtual data generation stage, a reasonable random visual angle is assumed to replace a camera visual angle used in the acquisition of a data set in the original fixed visual angle training, so that the dependence of the internal parameters and the external parameters of the camera of the data set is overcome; in the model training stage, the modularized design can train two independent modules respectively, can also train the video stream data tuples completely in series, has definite task purposes of the two independent modules, can be verified independently, and has strong generalization capability.
(2) Because the invention utilizes the time sequence model training, the prediction prompt for a long time can be obtained on the basis of controlling the receptive field; in the unconstrained video reasoning stage, due to effective normalization frame design and selection, projection relation dependence is decoupled, and a good prediction effect can be obtained for a large number of videos which are acquired through the Internet and lack of camera parameters, extreme character proportion (often represented by too small scale proportion of characters in the original video), cutting and other processing.
(3) The invention provides a visual angle independent video three-dimensional human body gesture recognition method, which trains a plurality of module neural networks by using a large number of two-dimensional/three-dimensional data tuples enhanced by camera visual angles, and simultaneously carries out preprocessing of two-dimensional input by using a camera independent two-dimensional detection normalization method; the first module in the invention can adapt to unconstrained three-dimensional human body posture estimation tasks to obtain stronger camera generalization capability, and the second module can effectively utilize continuous characteristics on time sequences to ensure that prediction key points obtain better spatial stability and the whole prediction achieves ideal precision.
Drawings
FIG. 1 is a flow chart of a method structure of the present invention;
FIG. 2 is a schematic diagram of rotation (attitude angle) control at the time of camera parameter generation in the method of the present invention;
FIG. 3 is a diagram showing an example of the structure of a first modular neural network and a second modular neural network in the method of the present invention;
FIG. 4 is a schematic view of the projection relationship in the method of the present invention;
FIG. 5 is a schematic diagram of a two-dimensional detection frame normalization method in the method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Fig. 1 is a flow chart of the overall structure of a visual angle independent video three-dimensional human body gesture recognition method of the present invention, which mainly includes the following three stages: the virtual data generation stage, the model training stage and the unconstrained video reasoning stage, and the method also comprises a camera-independent two-dimensional detection normalization method which can be used in both the training stage and the reasoning stage;
virtual data generation phase: for any disclosed three-dimensional human body posture academic data set or three-dimensional human body posture data set acquired by a motion capture system, corresponding two-dimensional projection and three-dimensional posture data tuples are generated through the camera view angle synthesis principle and projection transformation provided by the invention;
model training stage: training a neural network model, and training a first module provided by the invention by using a large-scale visual angle enhancement single-frame data tuple to obtain better visual angle resistance and decoupling with camera parameters; for the second module provided by the invention, the video stream data containing the three-dimensional labels is used for time sequence learning and prediction, so that the spatial continuity on the time sequence is obtained, and the gesture recognition accuracy is improved;
unconstrained video reasoning phase: the method comprises the steps of preprocessing a video data stream obtained under a general environment (in-the-wild) and a module based on any two-dimensional human body key point detection, and carrying out forward prediction on processed two-dimensional data sequentially through a first module and a second module by using a special normalization method of two-dimensional human body detection key point results, so as to obtain human body gestures based on three-dimensional key point representation.
The human body posture data representation method is mainly a key point-skeleton representation method; the first module provided by the invention is mainly used for improving the view angle generalization capability, and the second module is mainly used for obtaining the characteristics of larger receptive field and good stability of time sequence prediction. The described scenarios include, but are not limited to, research and applications involving video human gesture recognition. The invention is based on a modularized neural network combined training method, and effectively improves the generalization capability of three-dimensional human body gesture recognition.
The application range of the virtual data generation stage method includes, but is not limited to, a published academic data set, a motion capture system acquisition data set and the like, and the application range is applicable only if three-dimensional labels and camera parameters (namely, only if a two-dimensional/three-dimensional projection relationship exists).
The implementation of the first module and the second module defined in the respective method processes of the model training stage and the unconstrained video reasoning stage is not limited to the implementation in the specification of the invention, and all neural networks and the like which are suitable for predicting three-dimensional human body gestures based on single-frame two-dimensional human body detection and predicting continuous sequence three-dimensional human body gestures based on time sequence models can replace the first module and the second module referred to by the invention.
The method process of the unconstrained video reasoning stage is applicable to unconstrained video, i.e., natural condition acquisition, or video sequences that have undergone transformations including, but not limited to, scaling, cropping, shifting, and other color adjustments.
The normalization method serving as a special preprocessing stage in the model training stage and the unconstrained video reasoning stage is suitable for unconstrained videos: even if the relative projection position relationship between the person and the camera is destroyed, the method is applicable only if the two-dimensional human body key point result or the detection module can be detected.
Further, the specific flow details of each stage in the method of the invention are as follows:
in the visual angle independent video three-dimensional human body gesture recognition method provided by the invention, the virtual data generation stage further comprises the following steps: for the existing three-dimensional human body posture data set, a plurality of different camera parameters are generated through a random scheme with reasonable design, wherein the parameters comprise external parameters for determining the position and the orientation of a camera and internal parameters for determining the projection focal length frame of the camera. Generating corresponding two-dimensional human body data under different camera parameters for the three-dimensional human body data by continuously utilizing a projection relation on the basis of random parameters, so as to obtain a two-dimensional/three-dimensional human body posture data tuple; for the video data set, the motion range of the observed human body in the video sequence is considered, and reasonable internal parameters are obtained according to the motion trail, so that the projection view cone comprises a motion point set of three-dimensional human body key points as far as possible.
In the visual angle independent video three-dimensional human body gesture recognition method provided by the invention, the model training stage further comprises the following steps: and training the neural network of the sub-module. That is, for the first module, training for view enhancement can be performed using either single frames or continuous, but mainly single frame data tuples to obtain a model with camera view generalization capability; for the second module, sequential model training is performed mainly using a continuous sequence of data tuples to obtain a model that can protect inter-frame motion continuity. Care should be taken that: generally, for RGB video input, the first module can be understood as a two-dimensional to three-dimensional regression problem, and the second module is a sequential three-dimensional to sequential three-dimensional regression problem; for RGB-D video input, additional depth latitude may be added to both the first and second modules.
In the visual angle independent video three-dimensional human body gesture recognition method provided by the invention, the unconstrained video reasoning stage further comprises the following steps: the method comprises the steps of obtaining a two-dimensional detection result through any known two-dimensional human body key point detection method, preprocessing data by using the camera-independent two-dimensional detection normalization method, and sequentially carrying out forward reasoning through the first module and the second module to obtain a three-dimensional human body posture estimation result corresponding to a video sequence.
In the visual angle independent video three-dimensional human body gesture recognition method provided by the invention, the camera independent two-dimensional detection normalization method further comprises the following steps: the two-dimensional human body key point detection frame (or multiplied by a proper coefficient) is used as a normalization standard to normalize the two-dimensional human body key point input, and the normalization method has better camera resistance and better resistance to projection relation loss and damage caused by local zooming, cutting and the like of the picture.
The video three-dimensional human body gesture recognition method independent of the visual angle provided by the invention is specifically described below with reference to the specific embodiment.
In the first phase of the method of the invention, the virtual data generation phase: firstly, according to the existing three-dimensional human body posture data set, such as a published academic data set of Human3.6M, generating a random configuration of camera parameters for three-dimensional human body world coordinates of each video segment, wherein the configuration can set the camera position and the rotation angle according to the observed human body height and the activity range, such as: randomly determining a point as a passing point O in the direction of the optical axis of the camera by taking the mean value of the projection of the character moving range on the ground plane as a center point, taking 0.75 of the height of the character as an observation sphere center and taking 0.5 times of the height of the character as a Gaussian radius; randomly selecting the Euclidean distance of a camera from an O point in uniform distribution within 4.0 meters to 6.5 meters, wherein the camera attitude angle is shown in figure 2, the fixed roll (rotation in the camera shooting direction of a rotation on camera's direction vector) angle is fixed, the pitch (rotation in the transverse direction of a Rotate on the cross product of the other camera's up and direction vectors camera) angle is randomly generated between-15 degrees and +15 degrees, and the yaw (rotation in the vertical direction of a rotation on camera's up vector camera) angle is randomly generated between 0 degrees and +360 degrees; because the human3.6m dataset is built with camera references, reference generation may be temporarily omitted here. And randomly generating the values again during each sampling, and obtaining a two-dimensional/three-dimensional human body posture data tuple by utilizing a projection relation.
In the second stage of the invention, the model training stage: training a first module (camera-agnostic regressor) by adopting a random single-frame data tuple, wherein the implementation adopts a two-time iterative regression model with residual error connection based on a deep neural network, and obtains three-dimensional human body posture key points according to two-dimensional human body posture key point regression, and the model has the characteristics of camera independence and wide viewing angle; the second module (temporal regressor) is improved based on a hole convolution model on time sequence, and a three-dimensional posture correction network is designed for expanding the characteristics of the receptive field by using a hole convolution method and increasing the spatial continuity of the three-dimensional prediction result on time sequence, so that the function of compensating the prediction result of the first module is achieved. The loss function used for two-part supervision of the training process is shown below.
Figure BDA0002356917620000061
Figure BDA0002356917620000062
Figure BDA0002356917620000063
Wherein L is the total loss,
Figure BDA0002356917620000071
and->
Figure BDA0002356917620000072
The weights of the first module and the second module respectively,/>
Figure BDA0002356917620000073
and
Figure BDA0002356917620000074
the losses of the first module and the second module, respectively.
In the third stage of the invention, the unconstrained video reasoning stage: based on the existing two-dimensional detection result, the video acquired by any unconstrained condition is sequentially predicted by utilizing the multi-module deep neural network acquired by the training convergence of the second stage of the invention to acquire a three-dimensional human body posture result. The implementation of the first and second modules and the reasoning process are shown in fig. 3, where single-frame and multi-frame in fig. 3 represent a single and multiple two-dimensional frames, respectively.
The camera-independent two-dimensional detection normalization method used in the second and third stages of the invention: as shown in fig. 4, the principal point and the optical point in fig. 4 respectively represent the center points of the two-dimensional detection frame before and after projection, and the projection relationship and the conventional method generally use the pixel size of the original picture or the size of the square circumscribed by the original picture as the normalization standard, and the calculation function is as follows:
Figure BDA0002356917620000075
Figure BDA0002356917620000076
it can be seen that it depends on camera parameters (focal length) and its equivalent also on artwork size, lacking transformation resistance such as cropping. The normalization method provided by the invention is shown in fig. 5, and a specific calculation function is shown as follows, and the normalization method has the characteristics of keeping the two-dimensional detection size stable and being irrelevant to camera parameters:
Figure BDA0002356917620000077
wherein K is x,y Representing the two-dimensional point coordinates after the two-dimensional detection normalization pretreatment,
Figure BDA0002356917620000078
representing the original two-dimensional point coordinates +.>
Figure BDA0002356917620000079
Representing the center coordinate of the two-dimensional detection frame, m is greater than 1, and the embodiment takes 1.2, w d ,h d The width and the height of the two-dimensional detection frame are respectively.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (7)

1. The visual angle independent video three-dimensional human body gesture recognition method is characterized by comprising the following steps of:
step 1: virtual data generation phase: synthesizing virtual camera parameters based on a human body posture data set containing three-dimensional labels at will, and then generating a two-dimensional/three-dimensional data tuple;
step 2: model training stage: training a first module of a modularized neural network for obtaining a model with camera view generalization capability and a second module of the modularized neural network for obtaining a model capable of protecting inter-frame motion continuity by using the generated two-dimensional/three-dimensional data elements respectively;
step 3: unconstrained video reasoning phase: and (3) predicting the video acquired by any unconstrained acquisition by utilizing the multi-module depth neural network trained in the step (2) to acquire a three-dimensional human body posture recognition result.
2. The visual angle independent video three-dimensional human body gesture recognition method according to claim 1, wherein the step 1 specifically comprises: for any human body posture data set containing three-dimensional labels, a camera view angle enhancement module is adopted to synthesize virtual camera parameters, and a projection relation is utilized to generate a two-dimensional/three-dimensional data tuple.
3. The visual angle independent three-dimensional human body gesture recognition method of claim 2, wherein the camera parameters include an external parameter determining the position and orientation of the camera and an internal parameter determining the projected focal length of the camera.
4. The method of claim 1, wherein the first module in step 2 performs visual angle enhancement training using single frame data tuples.
5. The visual angle independent three-dimensional human body gesture recognition method of claim 1, wherein the second module in step 2 performs time sequence model training using a continuous sequence of data tuples.
6. The three-dimensional human body gesture recognition method of video independent of visual angle according to claim 1, wherein before inputting the neural network, the method further comprises a two-dimensional detection normalization preprocessing process of camera independent of the two-dimensional detection result, which corresponds to the description formula:
Figure FDA0002356917610000011
wherein K is x,y Representing the two-dimensional point coordinates after the two-dimensional detection normalization pretreatment,
Figure FDA0002356917610000012
representing the original two-dimensional point coordinates,
Figure FDA0002356917610000013
representing the center coordinates, w, of a two-dimensional detection frame d ,h d The width and the height of the two-dimensional detection frame are respectively.
7. The visual angle independent video three-dimensional human body gesture recognition method according to claim 1, wherein the video obtained by unconstrained acquisition in the step 3 specifically comprises a video sequence obtained by natural condition acquisition or scaling, clipping, speed changing and other color adjustment transformation.
CN202010010324.9A 2020-01-06 2020-01-06 Visual angle independent video three-dimensional human body gesture recognition method Active CN111222459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010010324.9A CN111222459B (en) 2020-01-06 2020-01-06 Visual angle independent video three-dimensional human body gesture recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010010324.9A CN111222459B (en) 2020-01-06 2020-01-06 Visual angle independent video three-dimensional human body gesture recognition method

Publications (2)

Publication Number Publication Date
CN111222459A CN111222459A (en) 2020-06-02
CN111222459B true CN111222459B (en) 2023-05-12

Family

ID=70825945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010010324.9A Active CN111222459B (en) 2020-01-06 2020-01-06 Visual angle independent video three-dimensional human body gesture recognition method

Country Status (1)

Country Link
CN (1) CN111222459B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833439B (en) * 2020-07-13 2024-06-21 郑州胜龙信息技术股份有限公司 Artificial intelligence based ammunition throwing analysis and mobile simulation training method
CN112183184B (en) * 2020-08-13 2022-05-13 浙江大学 Motion capture method based on asynchronous video
CN112990032B (en) * 2021-03-23 2022-08-16 中国人民解放军海军航空大学航空作战勤务学院 Face image processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631861A (en) * 2015-12-21 2016-06-01 浙江大学 Method of restoring three-dimensional human body posture from unmarked monocular image in combination with height map
CN106600667A (en) * 2016-12-12 2017-04-26 南京大学 Method for driving face animation with video based on convolution neural network
CN108062170A (en) * 2017-12-15 2018-05-22 南京师范大学 Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal
CN108345869A (en) * 2018-03-09 2018-07-31 南京理工大学 Driver's gesture recognition method based on depth image and virtual data
DE102019106123A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3D) pose estimation from the side of a monocular camera
CN110647991A (en) * 2019-09-19 2020-01-03 浙江大学 Three-dimensional human body posture estimation method based on unsupervised field self-adaption

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631861A (en) * 2015-12-21 2016-06-01 浙江大学 Method of restoring three-dimensional human body posture from unmarked monocular image in combination with height map
CN106600667A (en) * 2016-12-12 2017-04-26 南京大学 Method for driving face animation with video based on convolution neural network
CN108062170A (en) * 2017-12-15 2018-05-22 南京师范大学 Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal
CN108345869A (en) * 2018-03-09 2018-07-31 南京理工大学 Driver's gesture recognition method based on depth image and virtual data
DE102019106123A1 (en) * 2018-03-12 2019-09-12 Nvidia Corporation Three-dimensional (3D) pose estimation from the side of a monocular camera
CN110647991A (en) * 2019-09-19 2020-01-03 浙江大学 Three-dimensional human body posture estimation method based on unsupervised field self-adaption

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
改进PSO优化神经网络算法的人体姿态识别;何佳佳 等;《传感器与微系统》;第36卷(第01期);115-118 *

Also Published As

Publication number Publication date
CN111222459A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
Baldassarre et al. Deep koalarization: Image colorization using cnns and inception-resnet-v2
WO2018177379A1 (en) Gesture recognition, gesture control and neural network training methods and apparatuses, and electronic device
CN112530019B (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
CN107953329B (en) Object recognition and attitude estimation method and device and mechanical arm grabbing system
CN111222459B (en) Visual angle independent video three-dimensional human body gesture recognition method
US11380121B2 (en) Full skeletal 3D pose recovery from monocular camera
CN111046734B (en) Multi-modal fusion sight line estimation method based on expansion convolution
CN107563308B (en) SLAM closed loop detection method based on particle swarm optimization algorithm
CN111723707B (en) Gaze point estimation method and device based on visual saliency
CN110555408B (en) Single-camera real-time three-dimensional human body posture detection method based on self-adaptive mapping relation
WO2023070695A1 (en) Infrared image conversion training method and apparatus, device and storage medium
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN112560618B (en) Behavior classification method based on skeleton and video feature fusion
WO2024060978A1 (en) Key point detection model training method and apparatus and virtual character driving method and apparatus
CN112906520A (en) Gesture coding-based action recognition method and device
CN116310105B (en) Object three-dimensional reconstruction method, device, equipment and storage medium based on multiple views
Bayegizova et al. EFFECTIVENESS OF THE USE OF ALGORITHMS AND METHODS OF ARTIFICIAL TECHNOLOGIES FOR SIGN LANGUAGE RECOGNITION FOR PEOPLE WITH DISABILITIES.
CN113065506B (en) Human body posture recognition method and system
Zhang et al. EventMD: High-speed moving object detection based on event-based video frames
Gao et al. Study of improved Yolov5 algorithms for gesture recognition
CN115810219A (en) Three-dimensional gesture tracking method based on RGB camera
WO2023086398A1 (en) 3d rendering networks based on refractive neural radiance fields
CN115393963A (en) Motion action correcting method, system, storage medium, computer equipment and terminal
Zhan et al. Scale-equivariant Steerable Networks for Crowd Counting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant