WO2021051526A1 - 多视图3d人体姿态估计方法及相关装置 - Google Patents

多视图3d人体姿态估计方法及相关装置 Download PDF

Info

Publication number
WO2021051526A1
WO2021051526A1 PCT/CN2019/116990 CN2019116990W WO2021051526A1 WO 2021051526 A1 WO2021051526 A1 WO 2021051526A1 CN 2019116990 W CN2019116990 W CN 2019116990W WO 2021051526 A1 WO2021051526 A1 WO 2021051526A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
views
coordinates
key points
confidence
Prior art date
Application number
PCT/CN2019/116990
Other languages
English (en)
French (fr)
Inventor
王义文
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051526A1 publication Critical patent/WO2021051526A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • This application relates to the field of human body pose estimation, and in particular to a multi-view 3D human body pose estimation method and related devices.
  • 3D pose reconstruction is a very active area of research and development.
  • a precise system can extract 3D information from a given human body, extracting their joints and limbs.
  • these systems are usually very expensive, susceptible to interference and require important hardware and human resources, and the market demand is low.
  • 3D human pose estimation With the development of deep learning, a more popular approach for 3D human pose estimation is to use a single two-dimensional image for extraction, because extracting depth information from a single view has great flexibility.
  • 3D pose estimation itself is open and due to external factors, such as changes in human appearance, clothes or self-occlusion, the accuracy of pose estimation from a single two-dimensional image is low.
  • the embodiments of the present application provide a multi-view 3D human body pose estimation method and related devices, which can improve the accuracy of the estimated 3D human body pose.
  • an embodiment of the present application provides a method for estimating a multi-view 3D human body pose, and the method includes:
  • the continuous time period includes multiple moments, and the first moment is the continuous time period Any moment
  • the 3D human body pose at all moments in the continuous time period is input into the target LSTM network model to obtain the estimated 3D human body pose at the next moment in the continuous time period.
  • an embodiment of the present application provides a multi-view 3D human body pose estimation device, and the multi-view 3D human body pose estimation device includes:
  • the acquiring unit is configured to acquire the 3D human body pose at the first moment in the continuous time period until the 3D human body pose at all moments in the continuous time period is acquired, the continuous time period includes multiple moments, and the first moment is all Any time in the continuous time period;
  • the estimation unit is configured to input the 3D human body pose at all moments in the continuous time period into the target LSTM network model to obtain the estimated 3D human body pose at the next moment in the continuous time period.
  • an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured by the above Executed by the processor, and the foregoing program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
  • embodiments of the present application provide a computer non-volatile readable storage medium, wherein the aforementioned computer non-volatile readable storage medium stores a computer program for electronic data exchange, wherein the aforementioned computer program enables The computer executes part or all of the steps described in the first aspect of the embodiments of the present application.
  • the embodiments of the present application can improve the accuracy of the estimated 3D human body posture.
  • FIG. 1 is a schematic flowchart of a multi-view 3D human body pose estimation method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a process of acquiring a 3D human body pose at a first moment according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a process of acquiring a 3D human body pose at a first moment according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 5 is a schematic structural diagram of a multi-view 3D human body pose estimation apparatus provided by an embodiment of the application.
  • FIG. 1 is a schematic flowchart of a method for estimating a multi-view 3D human body pose according to an embodiment of the application, and the method includes:
  • the server obtains the 3D human body posture at the first moment in a continuous time period until obtaining the 3D human body posture at all moments in the continuous time period, the continuous time period includes multiple moments, and the first moment is the continuous time period. Any moment in the time period.
  • the multi-view 3D human body pose estimation method in the embodiment of the present application is applied to a server, and the continuous time period can be 1s, 2s, 5s, 10s, etc., because the time is short, the result is obtained in the continuous time period
  • the pose of the 3D human body changes smoothly and continuously.
  • the acquiring the 3D human body pose at the first moment in the continuous time period includes:
  • N 2D views of different viewing angles at the first moment in a continuous time period, where N is a positive integer.
  • N 2D views of the human body are acquired through N cameras placed at different angles at each time in the continuous time period.
  • A2 Obtain N confidence maps and N partial affinity fields of the N 2D views through the 2D pose detection network model, where the confidence map is a probability matrix where each pixel in the 2D view is a key point, and part of the affinity is The sum field is a 2D vector field group, which is used to encode the position and direction of the body segment.
  • the confidence map is a matrix with the same size as the original image, and each element stores the probability that each pixel in the view is a key point. According to the confidence map, the pixel point of the key point can be determined from the view.
  • the partial affinity field is a set of 2D vector fields used to encode the position and direction of the body segment on the image domain. For each pixel belonging to a specific body segment area, the direction of the 2D vector field is a part of the slave body segment Point to another part, after confirming the key points of the limbs, the limbs of the human body can be connected in sections according to the partial affinity field, and finally the overall skeleton of the human body is obtained.
  • the 3D optimized coordinates of the key points of the human body can be obtained according to the N confidence maps and the N partial affinity fields, and the 3D optimized coordinates can be further obtained according to the 3D optimized coordinates. Describe the 3D human posture.
  • the server inputs the 3D human body pose at all moments in the continuous time period into a target LSTM network model to obtain an estimated 3D human body pose at the next moment in the continuous time period.
  • the target LSTM network model is trained in advance to realize the estimation of the 3D human body posture at the next time after the continuous time period according to the input 3D human body posture in the continuous time period, and the training method is as follows:
  • the initial LSTM network model is trained through the training samples to obtain the target LSTM network model.
  • the Human3.6M data set has 3.6 million 3D human poses and corresponding images, a total of 11 experimenters, 17 action scenes, the data is captured by 4 digital cameras, 1 time sensor, and 10 motion cameras, so The 3D body posture images of the same person at different moments in a continuous period of time can be selected as training samples.
  • the number of frames for training in each iteration is 2048, and the number of periods (the number of frames per unit time) is 200.
  • Random operations are applied to the training set for training to avoid closing very similar sequences during each batch of training.
  • the mean square error is selected as the loss function
  • the Adam optimization algorithm is selected to optimize the training of the data set, thereby speeding up the convergence speed and reducing the range of hyperparameter variation.
  • the poses in successive time steps are closely related, and the pose changes are small.
  • the frame rate is 50 frames per second
  • the trajectory performed by the human limbs and joints should be regarded as smooth.
  • the 3D pose estimation can be improved by integrating spatial and temporal information.
  • system performance is less affected by missed detections.
  • the system can improve inaccurate 3D estimation based on the smoothness of the process history.
  • the pose at time t can be estimated based on the 3D pose at the time step t-D to t-1, where D is the time window used to estimate the number of previous frames.
  • Many windows in the LSTM neural network are sliding windows on the video sequence, with a stride of 1 frame.
  • the structural characteristics of LSTM neural network are as follows:
  • N3 is the number of key points. In the embodiment of the present application, N3 is 14.
  • 3D human pose estimation is less affected by missed key points.
  • 3D human pose estimation can eliminate the impact of missed key points based on the process history.
  • the system can also be based on the smoothness of the process history. Improve inaccurate 3D estimation.
  • the 3D human body pose at the first moment in the continuous time period is acquired until the 3D human body pose at all moments in the continuous time period is acquired.
  • the continuous time period includes multiple moments, and the first A moment is any moment in the continuous time period; the 3D human body poses at all moments in the continuous time period are input into the target LSTM network model to obtain the estimated 3D human body at the next moment in the continuous time period Posture;
  • the embodiments of the present application can improve the accuracy of the estimated 3D human posture.
  • FIG. 2 is a schematic diagram of a process for obtaining a 3D human body pose at a first moment according to an embodiment of the application, including:
  • N 2D views of different viewing angles at the first moment in a continuous time period, where N is a positive integer.
  • N confidence maps of N 2D views through a 2D pose detection network model, where the N confidence maps are in one-to-one correspondence with the N 2D views.
  • the 2D pose detection network is a dual-branch multi-level CNN neural network, which is composed of continuous convolutional layers and is divided into two branches. The two branches are used to determine the confidence map and the partial affinity field.
  • the 2D pose detection network is trained in advance through the data in the Microsoft COCO dataset and Human3.6M dataset.
  • the Microsoft COCO dataset is a dataset for image recognition segmentation and subtitles, which uses 18 key points.
  • the human body model represents the posture of the human body.
  • the Human3.6M data set is a 3D data set. The data set considers a full-body model with 32 key points, but it uses 17 key points to represent the human body posture. After comprehensive consideration, select two data The 14 key points in total are collected as the key points of the human body in this application.
  • the following table 1 shows the correspondence between the key points of the COCO data set and the key points of the Human3.6M data set. According to Table 1, the correspondence between the key points in the two models can be determined.
  • the serial number in Table 1 is each key The numbering sequence of the points in their respective data sets.
  • the partial affinity field of the target view can be acquired through the 2D posture detection network, with high accuracy and fast result output.
  • FIG. 3 is a schematic diagram of a process for obtaining a 3D human body pose at a first moment according to an embodiment of the application, including:
  • N 2D views of different viewing angles at the first moment in a continuous time period, where N is a positive integer.
  • N confidence maps and N partial affinity fields of the N 2D views through the 2D pose detection network model, where the confidence map is a probability matrix in which each pixel in the 2D view is a key point, and the partial affinity
  • the sum field is a 2D vector field group, which is used to encode the position and direction of the body segment.
  • step 304 includes:
  • the importance of different key points is different, and different weights are assigned according to their importance.
  • the neck is connected to the head and the body at the same time, and its nature is stable and difficult to change. Therefore, the neck is often used in the process of acquiring images and image processing. To locate and achieve accurate detection, correspondingly, the weight given to the neck is greater than other key points.
  • the confidence of important key points in a 2D view is higher, the calculated overall 2D view The confidence level is also higher.
  • the N 2D views are sorted according to the N overall confidence levels from high to low, and the 2D views with the overall confidence levels below the threshold and the 2D views with the overall confidence levels below the threshold can be filtered out,
  • the overall confidence of the first view is the highest among the N 2D views, and the overall confidence of the second view is the second highest among the N 2D views.
  • step 306 includes:
  • the other views are the views other than the first view, the second view, and the 2D view whose overall confidence is lower than the threshold among the N 2D views.
  • the i-th key point is any one of the M key points, and i is a positive integer not greater than M.
  • the reprojection error of the i-th key point is obtained by the following formula:
  • the original plane coordinates of the i-th key point in the r-th 2D view are The projection coordinates of the i-th key point in the r-th 2D view are
  • the reprojection error of the i-th key point in the r-th 2D view is RPE((x,y,z) i,t ,r).
  • the weighted sum is assigned different weights according to the overall confidence of each view in the other views, and then calculates that each key point in the M key points is in each of the other views.
  • the re-projection error of the view is multiplied by the re-projection error of a key point in each view and the weight of the corresponding view and then added to obtain the sum of the re-projection error of the key point, which is obtained by the above method.
  • r P, S (t, r, i) represents the detection confidence of the skeleton pose (P, S) of the i-th key point at time t (the first moment) and the r-th 2D view Degree, that is, weight, T is a set of 2D views whose detection confidence is higher than the threshold.
  • the minimizing the sum of each re-projection error in the M re-projection error sums to obtain M 3D optimized coordinates includes:
  • the target re-projection error sum is the sum of the re-projection error of each view of the first node in the other views, and S3 is executed;
  • the reference node is the actual point corresponding to the i-th key point in space, which is obtained by random selection in the first node field, and the range of the first node field can be defined according to actual conditions.
  • the method of obtaining the reprojection error sum is actually implemented by the Levenberg-Marquardt least square method.
  • the Levenberg-Marquardt least square method the key points whose confidence is lower than the threshold can be discarded, so as to ensure the reprojection error sum in each view.
  • Minimize take the point at the 3D initial coordinates of the i-th key point as the first node, that is, the initial point, and seek an optimal value in the finite iteration calculation process within its domain according to the initial point, if If the re-projection error and decrease of the i-th key point in one iteration, replace the previous re-projection error sum with the new re-projection error sum, and continue the iteration, otherwise discard the re-projection error obtained in this iteration Sum, re-select points to calculate the re-projection error sum.
  • the number of iterations is at most 15 times. This application does not limit the number of iterations.
  • the i-th The minimum reprojection error sum of the key points, and the minimum reprojection error and the corresponding 3D coordinates are also obtained.
  • the M 3D optimized coordinates can be obtained, and the first can be obtained according to the M 3D optimized coordinates and the directions of the M key points. 3D human pose at the moment.
  • the missed key points are regarded as key points that are very far from the key point "neck", that is, key points more than 2 meters away from the neck are regarded as missed key points and will not be processed.
  • the accurate 3D human body posture at the first moment can be obtained through the 3D posture reconstruction network model, and the time period is high in efficiency.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in the figure, it includes a processor, a memory, a communication interface, and one or more programs. In the memory, and configured to be executed by the processor.
  • the program includes instructions for performing the following steps:
  • the continuous time period includes multiple moments, and the first moment is the continuous time period Any moment; input the 3D human body pose at all moments in the continuous time period into the target LSTM network model to obtain the estimated 3D human body pose at the next moment in the continuous time period.
  • the program includes instructions for executing the following steps:
  • N Acquire N 2D views of different perspectives at the first moment in a continuous time period, where N is a positive integer; obtain N confidence maps and N partial affinity fields of the N 2D views through a 2D pose detection network model, where ,
  • the confidence map is a probability matrix in which each pixel in the 2D view is a key point, and part of the affinity field is a 2D vector field group, which is used to encode the position and direction of the body segment;
  • the N partial affinity fields are input into the 3D posture reconstruction network model to obtain the 3D human body posture at the first moment.
  • the program includes instructions for executing the following steps:
  • N confidence maps of the N 2D views through the 2D pose detection network model, and the N confidence maps correspond to the N 2D views in a one-to-one correspondence; according to the N confidence maps Determine the M key points of each 2D view in the N 2D views, where M is a positive integer; determine the positions and directions of the M key points of each 2D view in the N 2D views to obtain the The N partial affinity fields of N 2D views.
  • the program In a possible example, in the aspect of inputting the N confidence maps and the N partial affinity fields into the 3D pose reconstruction network model to obtain the 3D human body pose at the first moment, the program Include instructions to perform the following steps:
  • N confidence maps and the N partial affinity fields at the first moment into the 3D pose reconstruction network model obtain N of the N 2D views according to the N confidence maps
  • the N overall confidence levels correspond to the N 2D views one-to-one; according to the N overall confidence levels, the first view and the first view with the highest overall confidence are selected from the N 2D views.
  • the affinity field obtains the 3D initial coordinates of each key point in the M key points; project the 3D initial coordinates of each key point in the M key points to other views to obtain the M key points
  • the re-projection error of each key point in each of the other views calculate the weighted sum of the re-projection error of each key point in the other views in the M key points , Obtain the sum of M re-projection errors of the M key points, and the sum of the M
  • the N confidence maps include the rth confidence map of the rth 2D view, and r is a positive integer not greater than N, and the N confidence maps are obtained from the N confidence maps.
  • the program includes instructions for executing the following steps:
  • the M key points include the i-th key point, i is a positive integer not greater than M, and each of the reprojection error sums in the minimizing the M reprojection error sums,
  • the program includes instructions for executing the following steps:
  • the target re-projection error sum is the sum of the re-projection error of each view of the first node in the other views, and S3 is executed;
  • each of the M key points is acquired according to the partial affinity field in the first view and the partial affinity field in the second view
  • the program also includes instructions for executing the following steps:
  • the program also includes instructions for performing the following steps:
  • Establish an initial LSTM network model ; select training samples from the Human3.6M data set; train the initial LSTM network model through the training samples to obtain the target LSTM network model.
  • FIG. 5 is a schematic structural diagram of a multi-view 3D human body pose estimation device 500 according to an embodiment of the application.
  • the multi-view 3D human body pose estimation device is applied to an electronic device.
  • the 3D human body pose estimation device includes:
  • the acquiring unit 501 is configured to acquire the 3D human body pose at the first moment in a continuous time period until the 3D human body pose at all moments in the continuous time period is acquired, the continuous time period includes multiple moments, and the first moment is Any time in the continuous time period;
  • the estimation unit 502 is configured to input the 3D human body posture at all moments in the continuous time period into the target LSTM network model to obtain the estimated 3D human body posture at the next moment in the continuous time period.
  • the acquiring unit 501 is specifically configured to:
  • the confidence map is a probability matrix in which each pixel in the 2D view is a key point, and part of the affinity field is a 2D vector field group, which is used to encode the position and direction of the body segment, and to combine the N confidence maps
  • the 3D posture reconstruction network model Inputting the N partial affinity fields into the 3D posture reconstruction network model to obtain the 3D human body posture at the first moment.
  • the acquiring unit 501 is specifically configured to:
  • N confidence maps of the N 2D views through the 2D pose detection network model, and the N confidence maps correspond to the N 2D views in a one-to-one correspondence; according to the N confidence maps Determine the M key points of each 2D view in the N 2D views, where M is a positive integer; determine the positions and directions of the M key points of each 2D view in the N 2D views to obtain the The N partial affinity fields of N 2D views.
  • the obtaining The unit 501 is specifically used for:
  • the N confidence maps include the rth confidence map of the rth 2D view, and r is a positive integer not greater than N, and the N confidence maps are obtained from the N confidence maps.
  • the acquiring unit 501 is specifically configured to:
  • the M key points include the i-th key point, where i is a positive integer not greater than M, and each of the reprojection error sums in the sum of the M reprojection errors is minimized,
  • the obtaining unit 501 is specifically configured to:
  • S2 Taking the point located at the 3D initial coordinates of the i-th key point as the first node, and calculating the target reprojection error sum of the i-th key point according to the 3D initial coordinates of the first node, The target re-projection error sum is the sum of the re-projection error of each view of the first node in the other views, and S3 is executed;
  • S3 Select a reference node in the first node field, determine the 3D coordinates of the reference node, calculate the reference reprojection error sum according to the 3D coordinates of the reference node, and execute S4;
  • S4 Compare the size of the target re-projection error and the sum of the reference re-projection error, select the smaller of the target re-projection error and the reference re-projection error sum as the new target re-projection error sum, and use The new target re-projection error and the replacement of the target re-projection error sum, and S5 is executed;
  • S5 Repeat S3 and S4 until a preset condition is met, the 3D optimized coordinates of the i-th key point are obtained, and the 3D optimized coordinates of the i-th key point are added to the first set;
  • each of the M key points is acquired according to the partial affinity field in the first view and the partial affinity field in the second view
  • the acquiring unit 301 is configured to:
  • the multi-view 3D human body pose estimation device further includes a training unit 303, and the 3D human body pose at all moments in the continuous time period is input into the target LSTM network model to obtain the Before the estimated 3D human body pose at the next moment in the continuous time period, the training unit 303 is configured to:
  • Establish an initial LSTM network model ; select training samples from the Human3.6M data set; train the initial LSTM network model through the training samples to obtain the target LSTM network model.
  • the 3D human body pose at the first moment in the continuous time period is acquired until the 3D human body pose at all moments in the continuous time period is acquired.
  • the continuous time period includes multiple moments, and the first A moment is any moment in the continuous time period; the 3D human body poses at all moments in the continuous time period are input into the target LSTM network model to obtain the estimated 3D human body at the next moment in the continuous time period Posture;
  • the embodiments of the present application can improve the accuracy of the estimated 3D human posture.
  • the embodiment of the present application also provides a computer non-volatile readable storage medium that stores a computer program for electronic data exchange.
  • the computer program enables the computer to execute any of the multi-view 3D human postures recorded in the above method embodiments. Part or all of the steps of the estimation method.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program causes a computer to execute any of the multiple methods described in the above method embodiments. View some or all of the steps of the 3D human pose estimation method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种多视图3D人体姿态估计方法及相关装置,包括:服务器获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻(101);所述服务器将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态(102)。通过该方法可提高预估的3D人体姿态的精确性。

Description

多视图3D人体姿态估计方法及相关装置
本申请要求于2019年09月18日提交中国专利局、申请号为201910880173X、申请名称为“多视图3D人体姿态估计方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人体姿态估计领域,特别涉及一种多视图3D人体姿态估计方法及相关装置。
背景技术
3D姿态重建是一个非常活跃的研究和开发领域。精确的系统可以从给定的人体中提取3D信息,提取他们的关节和肢体。然而与视觉解决方案相比,这些系统通常非常昂贵,易受干扰并且需要重要的硬件和人力资源,市场需求不大。
随着深度学习的发展,3D人体姿态估计比较流行的做法是使用单个二维图像进行提取,因为从单一视图的信息提取深度信息具有很大的灵活性。但是3D姿态估计本身具有开放性并且由于外部因素影响,如人的外表,衣服或自我遮挡的变化之类的其他因素,从单张二维图像进行姿态估计的准确性较低。
发明内容
本申请实施例提供了一种多视图3D人体姿态估计方法及相关装置可提高预估的3D人体姿态的精确性。
第一方面,本申请实施例提供一种多视图3D人体姿态估计方法,所述方法包括:
获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;
将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
第二方面,本申请实施例提供一种多视图3D人体姿态估计装置,所述多视图3D人体姿态估计装置包括:
获取单元,用于获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;
预估单元,用于将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
第三方面,本申请实施例提供一种电子设备,包括处理器、存储器、通信接口,以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。
第四方面,本申请实施例提供了一种计算机非易失性可读存储介质,其中,上述计算机非易失性可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使 得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。
本申请实施例可提高预估的3D人体姿态的精确性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种多视图3D人体姿态估计方法的流程示意图;
图2是本申请实施例提供的获取第一时刻的3D人体姿态的流程示意图;
图3是本申请实施例提供的获取第一时刻的3D人体姿态的流程示意图;
图4为本申请实施例提供的一种电子设备的结构示意图;
图5为本申请实施例提供了一种多视图3D人体姿态估计装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
参阅图1,图1为本申请实施例提供的一种多视图3D人体姿态估计方法的流程示意图,所述方法包括:
101、服务器获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻。
其中,本申请实施例中的多视图3D人体姿态估计方法应用于服务器,所述连续时间段可以为1s,2s,5s,10s等,因为时间较短,所以在所述连续时间段中所得到的3D人体姿态是平滑且连续变化的。
其中,所述获取连续时间段中第一时刻的3D人体姿态包括:
A1、获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数。
其中,所述连续时间段中每个时刻都通过N个放置于不同角度的摄像机获取人体的N张2D视图。
A2、通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码。
其中,置信度图是与原始图像具有相同大小的矩阵,其中每个元素存储视图中各像素点为关键点的概率,根据置信度图可以从视图中确定为关键点的像素点。部分亲和字段是一组2D矢量字段,用于对图像域上的体段的位置和方向进行编码,对于属于特定体段区域中的每个像素,2D矢量字段的方向为从体段的一部分指向另一部分,在确认肢体各关键点后,可根据部分亲和字段将人体的肢体分段分部分的连接起来,最终得到人体的整体骨架。
A3、将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得 到所述第一时刻的3D人体姿态。
其中,在已知摄像机的内参数和外参数的前提下,可根据所述N个置信度图和所述N个部分亲和字段得到人体关键点的3D优化坐标,进一步根据3D优化坐标得到所述3D人体姿态。
102、所述服务器将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
其中,所述目标LSTM网络模型预先经过训练,可以实现根据输入的连续时间段内的所述3D人体姿态预估连续时间段之后下一时刻的3D人体姿态,其训练方法如下:
建立初始LSTM网络模型;
数据集中选取训练样本;
通过所述训练样本对所述初始LSTM网络模型进行训练,得到所述目标LSTM网络模型。
其中,Human3.6M数据集有360万个3D人体姿势和相应的图像,共有11个实验者,17个动作场景,该数据由4个数字摄像机,1个时间传感器,10个运动摄像机捕获,因此可从中选取同一人在连续时间段中不同时刻的3D人体姿势图像作为训练样本。每次迭代训练的帧数量为2048,时期数量(单位时间内帧数量)为200,对训练集应用随机操作进行训练,以避免在每批训练过程中关闭非常相似的序列。在训练过程中,选取均方误差作为损失函数,选择Adam优化算法对数据集进行优化训练,从而加快收敛速度,减少超参数变化范围。
其中,连续时间步骤中的姿态密切相关,且姿态变化很小,当帧速率为每秒50帧时,人体肢体和关节执行的轨迹应视为平滑,通过整合空间和时间信息可以改进3D姿态估计结果,一方面,系统性能受错过的检测的影响较小,另一方面,系统可以根据过程历史的平滑性来改善不精确的3D估计。据此可以根据在时间步长t-D到t-1的3D姿态来估计在时间t的姿态,其中D为时间窗口,用于估计的先前帧的数量。LSTM神经网络中的很多窗口是视频序列上的滑动窗口,具有1帧的步幅。LSTM神经网络的结构特点如下:
(i)大小为D×N3×3的输入数据;
(ii)具有256个隐藏单元的隐藏层的LSTM;
(iii)具有N3×3隐藏单元的完全连接网络ReLU作为激活函数;
(iv)大小为N3×3的输出层。
其中,N3是关键点的个数,在本申请实施例中,N3为14。
可见,通过这种方法估计3D人体姿态估计受漏检关键点的影响较小,3D人体姿态估计可以基于过程历史来消除漏检关键点的影响,此外,系统也可以根据过程历史的平滑性来改善不精确的3D估计。
可以看出,本申请实施例中获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态;本申请实施例可提高预估的3D人体姿态的精确性。
参阅图2,图2为本申请实施例提供的获取第一时刻的3D人体姿态的流程示意图,包括:
201、获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数。
202、通过2D姿态检测网络模型获取N张2D视图的N个置信度图,所述N个置信度图与所述N张2D视图一一对应。
203、根据所述N个置信度图确定所述N张2D视图中每张2D视图的M个关键点,M为正整数。
204、确定所述N张2D视图中每张2D视图的所述M个关键点的位置和方向,得到所述N张2D视图的N个部分亲和字段。
205、将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态。
其中,所述2D姿态检测网络为双分支多级CNN神经网络,它由连续的卷积层组成,分为两个分支,两个分支分别用于确定置信度图和部分亲和字段,所述2D姿态检测网络预先通过Microsoft COCO数据集和Human3.6M数据集中的数据对所述2D姿态检测网络进行训练,Microsoft COCO数据集是一个图像识别分割和字幕的数据集,它使用18个关键点组成的全身模型来表示人体姿态,Human3.6M数据集是3D数据集,该数据集考虑了32个关键点的全身模型,但它使用17个关键点来表示人体姿态,综合考虑,选择两个数据集中共有的14个关键点作为本申请中人体的关键点。下表1为COCO数据集的关键点和Human3.6M数据集的关键点之间的对应关系,根据表1可确定两种模型中关键点之间的对应关系,表1中的序号为各关键点在各自的数据集中的编号顺序。训练后的所述2D姿态检测网络对输入的所述N张2D视图中的一张视图进行分析,得到该图像的特征图F,然后根据该特征图F生成一组检测置信图S1=ρ1(F),通过贪狼算法对检测置信图S1进行迭代预测,直至损失函数最小为止,得到该视图的所述置信度图,因为所述置信度图中存有该视图各个像素点为关键点的概率,因此可根据所述置信度图得到M个关键点,然后根据M个关键点和人体各段的位置和方向得到所述部分亲和字段。
表1 COCO数据集的关键点和Human3.6M数据集的关键点之间的对应关系
描述对象 COCO关键点 Human3.6M关键点
鼻子 0 14
颈部 1 13
右肩 2 25
右肘 3 26
右手腕 4 27
左肩 5 17
左肘 6 18
左手腕 7 19
右髋 8 1
右膝 9 2
右脚踝 10 3
左髋 11 6
左膝 12 7
左脚踝 13 8
可见,通过所述2D姿态检测网络可以获取目标视图的所述部分亲和字段,且精确度高,结果输出快。
参阅图3,图3为本申请实施例提供的获取第一时刻的3D人体姿态的流程示意图,包括:
301、获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数。
302、通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码。
303、将所述第一时刻的所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中。
304、根据所述N个置信度图获取所述N张2D视图的N个整体置信度,所述N个整体置信度与所述N张2D视图一一对应。
其中,步骤304包括:
B1、根据所述第r个置信度图确定所述第r张2D视图的M个关键点以及所述M个关键点的M个置信度,所述M个关键点与所述M个置信度一一对应;
B2、赋予所述M个关键点M个权重,所述M个关键点与所述M个权重一一对应;
B3、根据所述M个置信度和所述M个权重计算所述第r张2D视图的整体置信度;
B4、重复执行B1-B3,直至得到所述N张2D视图的N个整体置信度。
其中,不同的关键点重要程度不同,根据其重要程度赋予不同的权重,例如颈部因为同时连接头部和身体,并且性质稳定不易改变,所以在获取图像以及图像处理的过程中,常使用颈部来定位和实现精确检测,相应的,赋予颈部的权重较其他关键点更大,显然,若一张2D视图中重要的关键点的置信度较高,则计算出来的该2D视图的整体置信度也较高。
可见,通过确定2D视图的整体置信度可以筛选出整体置信度更高的视图,也可以找出整体置信度较低的视图,从而可根据整体置信度更高的视图得到更精确的3D人体姿态。
305、根据所述N个整体置信度从所述N张2D视图中选取整体置信度最高的第一视图和第二视图,并从所述N张2D视图中去除所述整体置信度低于阈值的2D视图。
其中,将所述N张2D视图按照所述N个整体置信度从高到低排序,可以筛选出所述整体置信度低于阈值的2D视图以及所述整体置信度低于阈值的2D视图,所述第一视图的所述整体置信度在所述N张2D视图中最高,所述第二视图的所述整体置信度在所述N张2D视图中第二高。
306、根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标。
其中,步骤306包括:
C1、根据所述第一视图中的所述部分亲和字段获取所述第一视图中的所述M个关键点的M个第一平面坐标,根据所述第二视图中的所述部分亲和字段获取所述第二视图中的所述M个关键点的M个第二平面坐标;
C2、获取所述第一视图对应的摄像机的第一内参数矩阵,获取所述第二视图对应的摄像机的第二内参数矩阵;
C3、根据所述第一内参数矩阵、所述第二内参数矩阵、所述M个第一平面坐标和所述 M个第二平面坐标建立方程组;
C4、解所述方程组,得到所述M个关键点中每个关键点的所述3D初始坐标。
307、将所述M个关键点中每个关键点的所述3D初始坐标投影到其他视图中,得到所述M个关键点中每个关键点的所述3D初始坐标在所述其他视图中的投影坐标,所述其他视图为所述N张2D视图中除所述第一视图和所述第二视图以及所述整体置信度低于阈值的所述2D视图以外的视图。
308、根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差。
其中,第i个关键点为所述M个关键点中任意一个关键点,i为不大于M的正整数,通过以下公式得到第i个关键点的再投影误差:
Figure PCTCN2019116990-appb-000001
其中,所述第i个关键点在所述第r张2D视图中的所述原始平面坐标为
Figure PCTCN2019116990-appb-000002
所述第i个关键点在所述第r张2D视图中的所述投影坐标为
Figure PCTCN2019116990-appb-000003
所述第i个关键点在所述第r张2D视图中的再投影误差为RPE((x,y,z) i,t,r)。所述第r张2D视图可为所述N张2D视图中任意一张2D视图,在该实施例中,因为是所述第一时刻,所以,t=1,上述公式中所述再投影误差由所述原始平面坐标和所述投影坐标的平方差得到,其中,E表示为原始平面坐标,P表示为投影坐标。
309、计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差的加权和,得到所述M个关键点的M个再投影误差和,所述M个再投影误差和与所述M个关键点一一对应。
其中,所述加权和是根据所述其他视图中每个视图的所述整体置信度分别赋予不同的权重后,再计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差,将某个关键点在每个视图中的所述再投影误差与其对应视图的权重相乘后相加,即得到该关键点的再投影误差和,通过上述方法获得所述M个关键点的M个再投影误差和,所述第i个关键点的所述再投影误差和如下:
Figure PCTCN2019116990-appb-000004
其中,r P,S(t,r,i)表示在时间t(第一时刻)和所述第r张2D视图处对所述第i个关键点的骨架姿态(P,S)的检测置信度,即权重,T是检测置信度高于阈值的一组2D视图,可见,关键点的所述再投影误差和越大,则该关键点的3D坐标与实际偏差越大,越不精确,因此需要最小化每个关键点的所述再投影误差和。
310、最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,根据所述M个3D优化坐标得到所述第一时刻的3D人体姿态,所述M个3D优化坐标与所述M个关键点一一对应。
其中,所述最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标包括:
S1、赋值i=1;
S2、以位于所述第i个关键点的所述3D初始坐标处的点为第一节点,根据所述第一节点的3D初始坐标计算所述第i个关键点的目标再投影误差和,所述目标再投影误差和为所述第一节点在所述其他视图中每个视图的所述再投影误差的和,执行S3;
S3、在所述第一节点领域内选取参考节点,确定所述参考节点的3D坐标,根据所述参考节点的3D坐标计算参考再投影误差和,执行S4;
S4、比较所述目标再投影误差和与所述参考再投影误差和的大小,选取所述目标再投影误差和与所述参考再投影误差和中较小者作为新目标再投影误差和,用所述新目标再投影误差和替代所述目标再投影误差和,执行S5;
S5、重复执行S3和S4,直至满足预设条件,得到所述第i个关键点的所述3D优化坐标,将所述第i个关键点的所述3D优化坐标加入第一集合;
S6、i=i+1,判断i是否小于M,若i小于等于M,返回S2,若i大于M,输出i=M时的所述第一集合,根据i=M时的所述第一集合得到所述M个3D优化坐标。
其中,所述参考节点为所述第i个关键点在空间中实际对应的点,通过在所述第一节点领域内随机选取得到,可根据实际情况定义所述第一节点领域的范围,上述获取再投影误差和的方法实际上通过Levenberg-Marquardt最小二乘法实现,在Levenberg-Marquardt最小二乘法中,可舍弃置信度低于阈值的关键点,从而可保证每个视图中的再投影误差和最小化,以所述第i个关键点的所述3D初始坐标处的点为第一节点即初始点,根据该初始点在其领域范围内在有限次迭代计算过程中寻求一个最优值,如果在一次迭代中所述第i个关键点的再投影误差和下降,则用新的再投影误差和取代上一个再投影误差和,继续迭代,否则舍弃在这次迭代中所得到的再投影误差和,重新选点计算再投影误差和,在本实施例中,迭代次数最多15次,本申请对迭代次数不做限定,当达到预设条件即迭代了15次时,可得到所述第i个关键点的最小再投影误差和,同时也得到最小再投影误差和对应的3D坐标。重复执行获取所述第i个关键点的3D优化坐标的方法,可得到所述M个3D优化坐标,根据所述M个3D优化坐标以及所述M个关键点的方向可以得到所述第一时刻的3D人体姿态。
此外,本实施例中将漏检的关键点视为距离关键点“颈部”非常远的关键点,即距离颈部2米以上的关键点认为是漏检关键点,不作处理。
可见,通过所述3D姿态重建网络模型可以得到所述第一时刻的精确的3D人体姿态,且时间段,效率高。
请参阅图4,图4为本申请实施例提供的一种电子设备的结构示意图,如图所示,包括处理器、存储器、通信接口,以及一个或多个程序,所述程序被存储在所述存储器中,并且被配置由所述处理器执行。所述程序包括用于执行以下步骤的指令:
获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
在一个可能的示例中,在所述获取连续时间段中第一时刻的3D人体姿态方面,所述 程序包括用于执行以下步骤的指令:
获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数;通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码;将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态。
在一个可能的示例中,在所述通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段方面,所述程序包括用于执行以下步骤的指令:
通过所述2D姿态检测网络模型获取所述N张2D视图的所述N个置信度图,所述N个置信度图与所述N张2D视图一一对应;根据所述N个置信度图确定所述N张2D视图中每张2D视图的M个关键点,M为正整数;确定所述N张2D视图中每张2D视图的所述M个关键点的位置和方向,得到所述N张2D视图的所述N个部分亲和字段。
在一个可能的示例中,在所述将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态方面,所述程序包括用于执行以下步骤的指令:
将所述第一时刻的所述N个置信度图和所述N个部分亲和字段输入所述3D姿态重建网络模型中;根据所述N个置信度图获取所述N张2D视图的N个整体置信度,所述N个整体置信度与所述N张2D视图一一对应;根据所述N个整体置信度从所述N张2D视图中选取整体置信度最高的第一视图和第二视图,并从所述N张2D视图中去除所述整体置信度低于阈值的2D视图;根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标;将所述M个关键点中每个关键点的所述3D初始坐标投影到其他视图中,得到所述M个关键点中每个关键点的所述3D初始坐标在所述其他视图中的投影坐标,所述其他视图为所述N张2D视图中除所述第一视图和所述第二视图以及所述整体置信度低于阈值的所述2D视图以外的视图;根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差;计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差的加权和,得到所述M个关键点的M个再投影误差和,所述M个再投影误差和与所述M个关键点一一对应;最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,根据所述M个3D优化坐标得到所述第一时刻的3D人体姿态,所述M个3D优化坐标与所述M个关键点一一对应。
在一个可能的示例中,所述N个置信度图包括第r张2D视图的第r个置信度图,r为不大于N的正整数,在所述根据所述N个置信度图获取所述N张2D视图的N个整体置信度方面,所述程序包括用于执行以下步骤的指令:
B1、根据所述第r个置信度图确定所述第r张2D视图的M个关键点以及所述M个关键点的M个置信度,所述M个关键点与所述M个置信度一一对应;
B2、赋予所述M个关键点M个权重,所述M个关键点与所述M个权重一一对应;
B3、根据所述M个置信度和所述M个权重计算所述第r张2D视图的整体置信度;
B4、重复执行B1-B3,直至得到所述N张2D视图的N个整体置信度。
在一个可能的示例中,所述M个关键点包括第i个关键点,i为不大于M的正整数,在所述最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标方面, 所述程序包括用于执行以下步骤的指令:
S1、赋值i=1;
S2、以位于所述第i个关键点的所述3D初始坐标处的点为第一节点,根据所述第一节点的3D初始坐标计算所述第i个关键点的目标再投影误差和,所述目标再投影误差和为所述第一节点在所述其他视图中每个视图的所述再投影误差的和,执行S3;
S3、在所述第一节点领域内选取参考节点,确定所述参考节点的3D坐标,根据所述参考节点的3D坐标计算参考再投影误差和,执行S4;
S4、比较所述目标再投影误差和与所述参考再投影误差和的大小,选取所述目标再投影误差和与所述参考再投影误差和中较小者作为新目标再投影误差和,用所述新目标再投影误差和替代所述目标再投影误差和,执行S5;
S5、重复执行S3和S4,直至满足预设条件,得到所述第i个关键点的所述3D优化坐标,将所述第i个关键点的所述3D优化坐标加入第一集合;
S6、i=i+1,判断i是否小于M,若i小于等于M,返回S2,若i大于M,输出i=M时的所述第一集合,根据i=M时的所述第一集合得到所述M个3D优化坐标。
在一可能的示例中,在所述根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标方面,所述程序还包括用于执行以下步骤的指令:
根据所述第一视图中的所述部分亲和字段获取所述第一视图中的所述M个关键点的M个第一平面坐标,根据所述第二视图中的所述部分亲和字段获取所述第二视图中的所述M个关键点的M个第二平面坐标;获取所述第一视图对应的摄像机的第一内参数矩阵,获取所述第二视图对应的摄像机的第二内参数矩阵;根据所述第一内参数矩阵、所述第二内参数矩阵、所述M个第一平面坐标和所述M个第二平面坐标建立方程组;解所述方程组,得到所述M个关键点中每个关键点的所述3D初始坐标。
在一可能的示例中,在所述将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态之前,所述程序还包括用于执行以下步骤的指令:
建立初始LSTM网络模型;从Human3.6M数据集中选取训练样本;通过所述训练样本对所述初始LSTM网络模型进行训练,得到所述目标LSTM网络模型。
与上述一致的,请参阅图5,图5为本申请实施例提供了一种多视图3D人体姿态估计装置500的结构示意图,所述多视图3D人体姿态估计装置应用于电子设备,所述多视图3D人体姿态估计装置包括:
获取单元501,用于获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;
预估单元502,用于将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
在一可能的示例中,在所述获取连续时间段中第一时刻的3D人体姿态方面,所述获取单元501具体用于:
获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数,以及通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信 度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码,以及将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态。
在一可能的示例中,在所述通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段方面,所述获取单元501具体用于:
通过所述2D姿态检测网络模型获取所述N张2D视图的所述N个置信度图,所述N个置信度图与所述N张2D视图一一对应;根据所述N个置信度图确定所述N张2D视图中每张2D视图的M个关键点,M为正整数;确定所述N张2D视图中每张2D视图的所述M个关键点的位置和方向,得到所述N张2D视图的所述N个部分亲和字段。
在一可能的示例中,在所述将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态方面,所述获取单元501具体用于:
将所述第一时刻的所述N个置信度图和所述N个部分亲和字段输入所述3D姿态重建网络模型中;根据所述N个置信度图获取所述N张2D视图的N个整体置信度;根据所述N个整体置信度从所述N张2D视图中选取整体置信度最高的第一视图和第二视图,并从所述N张2D视图中去除所述整体置信度低于阈值的2D视图;根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标;将所述M个关键点中每个关键点的所述3D初始坐标投影到其他视图中,得到所述M个关键点中每个关键点的所述3D初始坐标在所述其他视图中的投影坐标;根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差;计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差的加权和,得到所述M个关键点的M个再投影误差和;最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,根据所述M个3D优化坐标得到所述第一时刻的3D人体姿态。
在一可能的示例中,所述N个置信度图包括第r张2D视图的第r个置信度图,r为不大于N的正整数,在所述根据所述N个置信度图获取所述N张2D视图的N个整体置信度方面,所述获取单元501具体用于:
B1、根据所述第r个置信度图确定所述第r张2D视图的M个关键点以及所述M个关键点的M个置信度,所述M个关键点与所述M个置信度一一对应;
B2、赋予所述M个关键点M个权重,所述M个关键点与所述M个权重一一对应;
B3、根据所述M个置信度和所述M个权重计算所述第r张2D视图的整体置信度;
B4、重复执行B1-B3,直至得到所述N张2D视图的N个整体置信度。
在一个可能的示例中,所述M个关键点包括第i个关键点,i为不大于M的正整数,在所述最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标方面,所述获取单元501具体用于:
S1:赋值i=1;
S2:以位于所述第i个关键点的所述3D初始坐标处的点为第一节点,根据所述第一节点的3D初始坐标计算所述第i个关键点的目标再投影误差和,所述目标再投影误差和为所述第一节点在所述其他视图中每个视图的所述再投影误差的和,执行S3;
S3:在所述第一节点领域内选取参考节点,确定所述参考节点的3D坐标,根据所述参考节点的3D坐标计算参考再投影误差和,执行S4;
S4:比较所述目标再投影误差和与所述参考再投影误差和的大小,选取所述目标再投 影误差和与所述参考再投影误差和中较小者作为新目标再投影误差和,用所述新目标再投影误差和替代所述目标再投影误差和,执行S5;
S5:重复执行S3和S4,直至满足预设条件,得到所述第i个关键点的所述3D优化坐标,将所述第i个关键点的所述3D优化坐标加入第一集合;
S6:i=i+1,判断i是否小于M,若i小于等于M,返回S2,若i大于M,输出i=M时的所述第一集合,根据i=M时的所述第一集合得到所述M个3D优化坐标。
在一可能的示例中,在所述根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标方面,所述获取单元301用于:
根据所述第一视图中的所述部分亲和字段获取所述第一视图中的所述M个关键点的M个第一平面坐标,根据所述第二视图中的所述部分亲和字段获取所述第二视图中的所述M个关键点的M个第二平面坐标;获取所述第一视图对应的摄像机的第一内参数矩阵,获取所述第二视图对应的摄像机的第二内参数矩阵;根据所述第一内参数矩阵、所述第二内参数矩阵、所述M个第一平面坐标和所述M个第二平面坐标建立方程组;解所述方程组,得到所述M个关键点中每个关键点的所述3D初始坐标。
在一可能的示例中,所述多视图3D人体姿态估计装置还包括训练单元303,在所述将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态之前,所述训练单元303用于:
建立初始LSTM网络模型;从Human3.6M数据集中选取训练样本;通过所述训练样本对所述初始LSTM网络模型进行训练,得到所述目标LSTM网络模型。
可以看出,本申请实施例中获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态;本申请实施例可提高预估的3D人体姿态的精确性。
本申请实施例还提供一种计算机非易失性可读存储介质,存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种多视图3D人体姿态估计方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种多视图3D人体姿态估计方法的部分或全部步骤。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制。尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例记载的技术方案进行修改,或者对其中部分技术特征进行等同替换。而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (20)

  1. 一种多视图3D人体姿态估计方法,其特征在于,所述方法包括:
    获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;
    将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
  2. 根据权利要求1所述的方法,其特征在于,所述获取连续时间段中第一时刻的3D人体姿态包括:
    获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数;
    通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码;
    将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态。
  3. 根据权利要求1所述的方法,其特征在于,所述通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,包括:
    通过所述2D姿态检测网络模型获取所述N张2D视图的所述N个置信度图,所述N个置信度图与所述N张2D视图一一对应;
    根据所述N个置信度图确定所述N张2D视图中每张2D视图的M个关键点,M为正整数;
    确定所述N张2D视图中每张2D视图的所述M个关键点的位置和方向,得到所述N张2D视图的所述N个部分亲和字段。
  4. 根据权利要求1所述的方法,其特征在于,所述将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态,包括:
    将所述第一时刻的所述N个置信度图和所述N个部分亲和字段输入所述3D姿态重建网络模型中;
    根据所述N个置信度图获取所述N张2D视图的N个整体置信度,所述N个整体置信度与所述N张2D视图一一对应;
    根据所述N个整体置信度从所述N张2D视图中选取整体置信度最高的第一视图和第二视图,并从所述N张2D视图中去除所述整体置信度低于阈值的2D视图;
    根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标;
    将所述M个关键点中每个关键点的所述3D初始坐标投影到其他视图中,得到所述M个关键点中每个关键点的所述3D初始坐标在所述其他视图中的投影坐标,所述其他视图为所述N张2D视图中除所述第一视图和所述第二视图以及所述整体置信度低于阈值的所述2D视图以外的视图;
    根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差;
    计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差的加权和,得到所述M个关键点的M个再投影误差和,所述M个再投影误差和与所述M个关 键点一一对应;
    最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,根据所述M个3D优化坐标得到所述第一时刻的3D人体姿态,所述M个3D优化坐标与所述M个关键点一一对应。
  5. 权利要求4所述的方法,其特征在于,所述N个置信度图包括第r张2D视图的第r个置信度图,r为不大于N的正整数,所述根据所述N个置信度图获取所述N张2D视图的N个整体置信度,包括:
    B1、根据所述第r个置信度图确定所述第r张2D视图的M个关键点以及所述M个关键点的M个置信度,所述M个关键点与所述M个置信度一一对应;
    B2、赋予所述M个关键点M个权重,所述M个关键点与所述M个权重一一对应;
    B3、根据所述M个置信度和所述M个权重计算所述第r张2D视图的整体置信度;
    B4、重复执行B1-B3,直至得到所述N张2D视图的N个整体置信度。
  6. 权利要求5所述的方法,其特征在于,所述M个关键点包括第i个关键点,i为不大于M的正整数,所述根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差,由以下公式得到:
    Figure PCTCN2019116990-appb-100001
    其中,所述第i个关键点在所述第r张2D视图中的所述原始平面坐标为
    Figure PCTCN2019116990-appb-100002
    所述第i个关键点在所述第r张2D视图中的所述投影坐标为
    Figure PCTCN2019116990-appb-100003
    所述第i个关键点在所述第r张2D视图中的再投影误差为RPE((x,y,z) i,t,r)。
  7. 根据权利要求6所述的方法,其特征在于,所述最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,包括:
    S1、赋值i=1;
    S2、以位于所述第i个关键点的所述3D初始坐标处的点为第一节点,根据所述第一节点的3D初始坐标计算所述第i个关键点的目标再投影误差和,所述目标再投影误差和为所述第一节点在所述其他视图中每个视图的所述再投影误差的和,执行S3;
    S3、在所述第一节点领域内选取参考节点,确定所述参考节点的3D坐标,根据所述参考节点的3D坐标计算参考再投影误差和,执行S4;
    S4、比较所述目标再投影误差和与所述参考再投影误差和的大小,选取所述目标再投影误差和与所述参考再投影误差和中较小者作为新目标再投影误差和,用所述新目标再投影误差和替代所述目标再投影误差和,执行S5;
    S5、重复执行S3和S4,直至满足预设条件,得到所述第i个关键点的所述3D优化坐标,将所述第i个关键点的所述3D优化坐标加入第一集合;
    S6、i=i+1,判断i是否小于M,若i小于等于M,返回S2,若i大于M,输出i=M时的所述第一集合,根据i=M时的所述第一集合得到所述M个3D优化坐标。
  8. 根据权利要求4所述的方法,其特征在于,所述根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初 始坐标,包括:
    根据所述第一视图中的所述部分亲和字段获取所述第一视图中的所述M个关键点的M个第一平面坐标,根据所述第二视图中的所述部分亲和字段获取所述第二视图中的所述M个关键点的M个第二平面坐标;
    获取所述第一视图对应的摄像机的第一内参数矩阵,获取所述第二视图对应的摄像机的第二内参数矩阵;
    根据所述第一内参数矩阵、所述第二内参数矩阵、所述M个第一平面坐标和所述M个第二平面坐标建立方程组;
    解所述方程组,得到所述M个关键点中每个关键点的所述3D初始坐标。
  9. 根据权利要求1中所述的方法,其特征在于,在所述将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态之前,所述方法还包括:
    建立初始LSTM网络模型;
    从Human3.6M数据集中选取训练样本;
    通过所述训练样本对所述初始LSTM网络模型进行训练,得到所述目标LSTM网络模型。
  10. 一种多视图3D人体姿态估计装置,其特征在于,所述多视图3D人体姿态估计装置包括:
    获取单元,用于获取连续时间段中第一时刻的3D人体姿态,直至获取所述连续时间段中所有时刻的3D人体姿态,所述连续时间段包括多个时刻,所述第一时刻为所述连续时间段中任意一个时刻;
    预估单元,用于将所述连续时间段中所有时刻的所述3D人体姿态输入目标LSTM网络模型中,得到所述连续时间段的下一时刻的预估3D人体姿态。
  11. 根据权利要求10所述的装置,其特征在于,所述获取单元用于:
    获取连续时间段中第一时刻的不同视角的N张2D视图,N为正整数;
    通过2D姿态检测网络模型获取所述N张2D视图的N个置信度图和N个部分亲和字段,其中,置信度图为2D视图中各个像素点为关键点的概率矩阵,部分亲和字段为2D矢量字段组,用于对体段的位置和方向进行编码;
    将所述N个置信度图和所述N个部分亲和字段输入3D姿态重建网络模型中,得到所述第一时刻的3D人体姿态。
  12. 根据权利要求10所述的装置,其特征在于,所述获取单元具体用于:
    通过所述2D姿态检测网络模型获取所述N张2D视图的所述N个置信度图,所述N个置信度图与所述N张2D视图一一对应;
    根据所述N个置信度图确定所述N张2D视图中每张2D视图的M个关键点,M为正整数;
    确定所述N张2D视图中每张2D视图的所述M个关键点的位置和方向,得到所述N张2D视图的所述N个部分亲和字段。
  13. 根据权利要求10所述的装置,其特征在于,所述获取单元还用于:
    将所述第一时刻的所述N个置信度图和所述N个部分亲和字段输入所述3D姿态重建网络模型中;
    根据所述N个置信度图获取所述N张2D视图的N个整体置信度;
    根据所述N个整体置信度从所述N张2D视图中选取整体置信度最高的第一视图和第二视图,并从所述N张2D视图中去除所述整体置信度低于阈值的2D视图;
    根据所述第一视图中的所述部分亲和字段和所述第二视图中的所述部分亲和字段获取所述M个关键点中每个关键点的3D初始坐标;
    将所述M个关键点中每个关键点的所述3D初始坐标投影到其他视图中,得到所述M个关键点中每个关键点的所述3D初始坐标在所述其他视图中的投影坐标;
    根据所述M个关键点中每个关键点在所述其他视图中每个视图的原始平面坐标和所述投影坐标计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差;
    计算所述M个关键点中每个关键点在所述其他视图中每个视图的所述再投影误差的加权和,得到所述M个关键点的M个再投影误差和;
    最小化所述M个再投影误差和中每个再投影误差和,得到M个3D优化坐标,根据所述M个3D优化坐标得到所述第一时刻的3D人体姿态。
  14. 根据权利要求13所述的装置,其特征在于,所述获取单元还用于:
    B1、根据所述第r个置信度图确定所述第r张2D视图的M个关键点以及所述M个关键点的M个置信度,所述M个关键点与所述M个置信度一一对应;
    B2、赋予所述M个关键点M个权重,所述M个关键点与所述M个权重一一对应;
    B3、根据所述M个置信度和所述M个权重计算所述第r张2D视图的整体置信度;
    B4、重复执行B1-B3,直至得到所述N张2D视图的N个整体置信度。
  15. 根据权利要求14所述的装置,其特征在于,所述获取单元还用于通过以下公式计算所述M个关键点中每个关键点在所述其他视图中每个视图的再投影误差:
    Figure PCTCN2019116990-appb-100004
    其中,所述第i个关键点在所述第r张2D视图中的所述原始平面坐标为
    Figure PCTCN2019116990-appb-100005
    所述第i个关键点在所述第r张2D视图中的所述投影坐标为
    Figure PCTCN2019116990-appb-100006
    所述第i个关键点在所述第r张2D视图中的再投影误差为RPE((x,y,z) i,t,r)。
  16. 根据权利要求15所述的装置,其特征在于,所述获取单元还用于:
    S1:赋值i=1;
    S2:以位于所述第i个关键点的所述3D初始坐标处的点为第一节点,根据所述第一节点的3D初始坐标计算所述第i个关键点的目标再投影误差和,所述目标再投影误差和为所述第一节点在所述其他视图中每个视图的所述再投影误差的和,执行S3;
    S3:在所述第一节点领域内选取参考节点,确定所述参考节点的3D坐标,根据所述参考节点的3D坐标计算参考再投影误差和,执行S4;
    S4:比较所述目标再投影误差和与所述参考再投影误差和的大小,选取所述目标再投影误差和与所述参考再投影误差和中较小者作为新目标再投影误差和,用所述新目标再投影误差和替代所述目标再投影误差和,执行S5;
    S5:重复执行S3和S4,直至满足预设条件,得到所述第i个关键点的所述3D优化坐标,将所述第i个关键点的所述3D优化坐标加入第一集合;
    S6:i=i+1,判断i是否小于M,若i小于等于M,返回S2,若i大于M,输出i=M时 的所述第一集合,根据i=M时的所述第一集合得到所述M个3D优化坐标。
  17. 根据权利要求13所述的装置,其特征在于,所述获取单元还用于:
    根据所述第一视图中的所述部分亲和字段获取所述第一视图中的所述M个关键点的M个第一平面坐标,根据所述第二视图中的所述部分亲和字段获取所述第二视图中的所述M个关键点的M个第二平面坐标;
    获取所述第一视图对应的摄像机的第一内参数矩阵,获取所述第二视图对应的摄像机的第二内参数矩阵;
    根据所述第一内参数矩阵、所述第二内参数矩阵、所述M个第一平面坐标和所述M个第二平面坐标建立方程组;
    解所述方程组,得到所述M个关键点中每个关键点的所述3D初始坐标。
  18. 根据权利要求10所述的装置,其特征在于,还包括训练单元,用于:
    建立初始LSTM网络模型;
    从Human3.6M数据集中选取训练样本;
    通过所述训练样本对所述初始LSTM网络模型进行训练,得到所述目标LSTM网络模型。
  19. 一种电子设备,其特征在于,包括处理器、存储器、通信接口,以及一个或多个程序,所述程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法中的步骤的指令。
  20. 一种计算机非易失性可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-9任一项所述的方法。
PCT/CN2019/116990 2019-09-18 2019-11-11 多视图3d人体姿态估计方法及相关装置 WO2021051526A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910880173.X 2019-09-18
CN201910880173.XA CN110751039B (zh) 2019-09-18 2019-09-18 多视图3d人体姿态估计方法及相关装置

Publications (1)

Publication Number Publication Date
WO2021051526A1 true WO2021051526A1 (zh) 2021-03-25

Family

ID=69276574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116990 WO2021051526A1 (zh) 2019-09-18 2019-11-11 多视图3d人体姿态估计方法及相关装置

Country Status (2)

Country Link
CN (1) CN110751039B (zh)
WO (1) WO2021051526A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469030A (zh) * 2021-06-30 2021-10-01 珠海市亿点科技有限公司 一种基于人工智能与身体阴影评估的人员定位方法及系统
CN113643366A (zh) * 2021-07-12 2021-11-12 中国科学院自动化研究所 一种多视角三维对象姿态估计方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401340B (zh) * 2020-06-02 2020-12-25 腾讯科技(深圳)有限公司 目标对象的运动检测方法和装置
US11380121B2 (en) 2020-08-25 2022-07-05 Sony Group Corporation Full skeletal 3D pose recovery from monocular camera
CN112257582A (zh) * 2020-10-21 2021-01-22 北京字跳网络技术有限公司 脚部姿态确定方法、装置、设备和计算机可读介质
CN112613490B (zh) * 2021-01-08 2022-02-01 云从科技集团股份有限公司 一种行为识别方法、装置、机器可读介质及设备
CN112907892A (zh) * 2021-01-28 2021-06-04 上海电机学院 基于多视图的人体跌倒报警方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780569A (zh) * 2016-11-18 2017-05-31 深圳市唯特视科技有限公司 一种人体姿态估计行为分析方法
CN108389227A (zh) * 2018-03-01 2018-08-10 深圳市唯特视科技有限公司 一种基于多视图深感知器框架的三维姿势估计方法
US10102629B1 (en) * 2015-09-10 2018-10-16 X Development Llc Defining and/or applying a planar model for object detection and/or pose estimation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3745117B2 (ja) * 1998-05-08 2006-02-15 キヤノン株式会社 画像処理装置及び画像処理方法
CN109271933B (zh) * 2018-09-17 2021-11-16 北京航空航天大学青岛研究院 基于视频流进行三维人体姿态估计的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10102629B1 (en) * 2015-09-10 2018-10-16 X Development Llc Defining and/or applying a planar model for object detection and/or pose estimation
CN106780569A (zh) * 2016-11-18 2017-05-31 深圳市唯特视科技有限公司 一种人体姿态估计行为分析方法
CN108389227A (zh) * 2018-03-01 2018-08-10 深圳市唯特视科技有限公司 一种基于多视图深感知器框架的三维姿势估计方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZE PENG: "Elderly Fall Detection Based on 3D Human Pose Estimation", CHINESE MASTER'S THESES FULL-TEXT DATABASE, no. 8, 1 May 2019 (2019-05-01), pages 1 - 67, XP055793161, ISSN: 1674-0246 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469030A (zh) * 2021-06-30 2021-10-01 珠海市亿点科技有限公司 一种基于人工智能与身体阴影评估的人员定位方法及系统
CN113469030B (zh) * 2021-06-30 2023-09-01 上海天齐智能建筑股份有限公司 一种基于人工智能与身体阴影评估的人员定位方法及系统
CN113643366A (zh) * 2021-07-12 2021-11-12 中国科学院自动化研究所 一种多视角三维对象姿态估计方法及装置
CN113643366B (zh) * 2021-07-12 2024-03-05 中国科学院自动化研究所 一种多视角三维对象姿态估计方法及装置

Also Published As

Publication number Publication date
CN110751039A (zh) 2020-02-04
CN110751039B (zh) 2023-07-25

Similar Documents

Publication Publication Date Title
WO2021051526A1 (zh) 多视图3d人体姿态估计方法及相关装置
WO2022002150A1 (zh) 一种视觉点云地图的构建方法、装置
CN104317391B (zh) 一种基于立体视觉的三维手掌姿态识别交互方法和系统
WO2017133009A1 (zh) 一种基于卷积神经网络的深度图像人体关节定位方法
WO2014117446A1 (zh) 基于单个视频摄像机的实时人脸动画方法
CN107705322A (zh) 运动目标识别跟踪方法和系统
US11417095B2 (en) Image recognition method and apparatus, electronic device, and readable storage medium using an update on body extraction parameter and alignment parameter
CN110555408B (zh) 一种基于自适应映射关系的单摄像头实时三维人体姿态检测方法
CN104517289B (zh) 一种基于混合摄像机的室内场景定位方法
CN109934065A (zh) 一种用于手势识别的方法和装置
CN112200057B (zh) 人脸活体检测方法、装置、电子设备及存储介质
CN105930790A (zh) 基于核稀疏编码的人体行为识别方法
CN112037310A (zh) 基于神经网络的游戏人物动作识别生成方法
CN113205595A (zh) 一种3d人体姿态估计模型的构建方法及其应用
CN112232134A (zh) 一种基于沙漏网络结合注意力机制的人体姿态估计方法
CN109345504A (zh) 一种使用边界框约束的自底向上的多人姿态估计方法
US10791321B2 (en) Constructing a user's face model using particle filters
CN116030498A (zh) 面向虚拟服装走秀的三维人体姿态估计方法
CN114036969A (zh) 一种多视角情况下的3d人体动作识别算法
CN103839280B (zh) 一种基于视觉信息的人体姿态跟踪方法
CN104978583A (zh) 人物动作的识别方法及装置
KR20220149717A (ko) 단안 카메라로부터 전체 골격 3d 포즈 복구
WO2015078007A1 (zh) 一种快速人脸对齐方法
CN116862984A (zh) 一种相机的空间位姿估计方法
CN111695484A (zh) 一种用于手势姿态分类的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946161

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946161

Country of ref document: EP

Kind code of ref document: A1