CN112926475A - Human body three-dimensional key point extraction method - Google Patents
Human body three-dimensional key point extraction method Download PDFInfo
- Publication number
- CN112926475A CN112926475A CN202110251506.XA CN202110251506A CN112926475A CN 112926475 A CN112926475 A CN 112926475A CN 202110251506 A CN202110251506 A CN 202110251506A CN 112926475 A CN112926475 A CN 112926475A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- human body
- key point
- dimensional key
- key points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000000007 visual effect Effects 0.000 claims abstract description 26
- 230000009471 action Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 11
- 230000006399 behavior Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 29
- 238000013527 convolutional neural network Methods 0.000 description 12
- 239000011159 matrix material Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a human body three-dimensional key point extraction method, which is applied to the field of human body three-dimensional key point detection and aims at solving the problem of poor estimation precision in the prior art, firstly, double view angles are adopted to collect human body action data; then, a double-branch multi-stage structure is adopted to respectively detect two-dimensional key point confidence maps of the human body on the data of the two visual angles; further establishing a three-dimensional key point generation model; finally, inputting a human body two-dimensional key point confidence map corresponding to the detected human body action behavior data to be detected into a three-dimensional key point generation model to obtain three-dimensional key point coordinates; the method can effectively improve the estimation precision of the three-dimensional key points of the human body.
Description
Technical Field
The invention belongs to the field of image processing, and particularly relates to a three-dimensional key point detection technology.
Background
At present, human body motion capture technology is widely adopted in aspects of health monitoring, movie and television production and the like, and the motion of a virtual character is rendered to be more real according to the real motion of a reproduced human body, wherein human body key point detection is the basis for realizing human body motion reproduction. And two-dimensional key point detection and three-dimensional key point detection can be divided according to whether the detection result contains three-dimensional depth information. The two-dimensional key point detection is researched more, but false detection and missed detection are easily caused due to reasons such as shielding or light and shadow change, and the detection precision is influenced.
At present, three-dimensional key point detection is mainly divided into two types: the method is characterized in that three-dimensional key point detection is directly carried out from an image, and Chinese invention patent 'a combined target classification and three-dimensional attitude estimation method (CN108280481A) based on a residual error network' carries out key point feature extraction and classification based on the residual error network ResNet-50 to realize three-dimensional key point detection; the other method is to firstly acquire two-dimensional coordinates of key points from images and then generate three-dimensional coordinates based on the two-dimensional coordinates of the key points, and the Chinese patent 'a method for estimating the three-dimensional posture of a human body based on structural information (CN 110427877A)' firstly inputs monocular RGB images into a two-dimensional posture detector to acquire the two-dimensional coordinates of the key points, then constructs a graph convolution network based on the structural information of the two-dimensional key points and outputs the three-dimensional coordinates of the key points.
The existing three-dimensional key point detection method has the following defects: 1) the method for directly checking the coordinates of the three-dimensional key points through the images usually depends on other parameters, such as a camera projection matrix, and the parameters are not usually marked in the video data; 2) the direct three-dimensional key point labeling is difficult, the existing training data basically come from a motion capture system, the scene and the object are single, and the generalization capability of the trained model is limited; 3) the detection of three-dimensional key points of a video is generally carried out according to frames, a static image of each frame is processed, and time sequence information between continuous frames and action change of the previous frame and the next frame are ignored.
Disclosure of Invention
In order to solve the technical problems, the invention provides a human body three-dimensional key point extraction method which comprises the steps of firstly, respectively extracting features of two visual angle original images, preliminarily generating two-dimensional key point coordinates through two-dimensional key point confidence detection, simultaneously establishing a three-dimensional key point generation model, cooperatively predicting the three-dimensional key point coordinates through double visual angles, and improving the detection precision of the human body three-dimensional key points.
The technical scheme adopted by the invention is as follows: a human body three-dimensional key point extraction method comprises the following steps:
s1, collecting human body action data by adopting double visual angles;
s2, detecting two-dimensional key point confidence maps of the human body on the data of the two visual angles by adopting a double-branch multi-stage structure;
s3, establishing a three-dimensional key point generation model;
s4, processing the human body action behavior data to be detected acquired in the step S1 in the step S2 to obtain a corresponding two-dimensional key point confidence map, and inputting the two-dimensional key point confidence map into the three-dimensional key point generation model established in the step S3 to obtain three-dimensional key point coordinates.
Step S1 specifically includes: two cameras are adopted and marked as a camera A and a camera B, human action behavior data acquisition is carried out simultaneously, and synchronous frame sampling is carried out on acquired video data.
The two-branch multi-stage structure in step S2 specifically includes: the upper branch is used for learning the positions of key points in the camera A, the lower branch is used for learning the positions of key points in the camera B, and the upper branch and the lower branch comprise a plurality of stages, wherein 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution are adopted in the stage 1, and 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution are adopted in the rest stages.
The method also comprises the steps of extracting original image features by adopting a layer of three-dimensional CNN, wherein the input of the first stage is the original image features extracted by the layer of three-dimensional CNN; the input of the subsequent stage is the original image characteristics extracted by the three-dimensional CNN of the layer and the confidence map prediction result of the previous stage.
The layer of three-dimensional CNN is used for extracting image features of the current frame and frames before and after the current frame.
The three-dimensional CNN convolution kernel size of this layer is 3 × 3 × 3.
The three-dimensional key point generation model of step S3 specifically includes: and the three convolutional layers and the one full-connection layer adopt a sigmoid function as an output unit and use a ReLU function as an activation function of the convolutional layers.
The method further comprises the following step of carrying out weak supervision training on the three-dimensional key point generation model, specifically: and the optimization target is a minimum loss function, a gradient descent method is adopted for carrying out back propagation weight training, and the parameters of the three-dimensional key point generation model are updated in an iterative manner.
The loss function expression is:
Loss=LD+LTD+f
wherein L isDRepresenting the distance error loss function, LTDAnd f represents a two-dimensional confidence loss function of the whole double-branch multi-stage structure.
The invention has the beneficial effects that: according to the method, video data of two visual angles are acquired through a common camera, characteristics are extracted through CNN, and two-branch multi-stage structures are utilized to respectively detect confidence maps of two-dimensional key points of a human body on the data of the two visual angles; designing a three-dimensional CNN model, generating three-dimensional key point coordinates based on a two-dimensional key point confidence map of the detected human body action behavior data to be detected, and performing weak supervision training on the model through double-view combination; the method of the invention has the following advantages:
1) the three-dimensional convolution neural network simultaneously extracts the spatial characteristics and the track information of the key points and the time characteristics of the whole activity process, and reduces the confidence map error of the two-dimensional key points by utilizing the interframe correlation information.
2) The method generates the three-dimensional coordinates based on the detected two-dimensional confidence map, reduces the influence brought by the process of converting the confidence map into the two-dimensional key point coordinates, establishes the loss function training model by combining the consistency of the time point and the human body posture through the double visual angles, solves the problem of lack of three-dimensional key point marking, simultaneously repairs the influence brought by missing detection of the key points of a single visual angle, and obviously improves the estimation precision of the human body three-dimensional key points.
Drawings
FIG. 1 is a flow chart of three-dimensional keypoint coordinate estimation;
FIG. 2 is a two-branch multi-stage structure provided by the present invention;
fig. 3 is a schematic diagram of three-dimensional coordinate generation.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, the method of the present invention mainly includes the steps of video data acquisition, two-dimensional key point detection, three-dimensional key point coordinate generation, etc., and the specific steps are as follows:
1. data acquisition
The invention adopts two common cameras, the direction of the camera A and the camera B forms an angle of 90 degrees, the human body action data acquisition is carried out simultaneously, the synchronous frame sampling is carried out on the acquired video data, the sampling frequency is set to be 30Hz, namely 30 frames of images are sampled per second, and the size of each frame of image is represented as w multiplied by h pixels.
The azimuth angles of the camera a and the camera B are not limited to 90 degrees, and may be other angles.
2. Video-based three-dimensional keypoint detection
According to the invention, three-dimensional key point detection is carried out by combining a three-dimensional CNN (Convolutional Neural Networks) with a two-layer CPM (Convolutional attitude Machine) network, a network frame is designed into a multi-stage double-view branch structure, as shown in FIG. 2, each branch corrects a confidence map through multiple stages, and each stage has supervision training, so that the problem that an over-deep network is difficult to optimize is solved. In which stage 1 (denoted by block1 in fig. 2) employs 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution, and the remaining stages (denoted by block2 in fig. 2) each employ 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution. The upper branch is used for learning the position of a key point in the view A and is represented as a confidence map, namely the probability of the key point at a certain point on the image; the lower branch is used to learn the location of the keypoints in view B. And finally, realizing three-dimensional key point prediction by the cooperation of the upper branch and the lower branch. C within the box in fig. 2 is used to represent convolution.
The specific process is as follows:
1) two-dimensional confidence map detection
Inputting the images with the size of w multiplied by h multiplied by T collected in the view angle A and the view angle B into a double-branch multi-stage structure network,t represents the number of image frames, firstly, the image characteristics F of the corresponding frame and the frames before and after the corresponding frame are extracted through a layer of three-dimensional CNNAAnd FBThe sizes of convolution kernels are all set to be 3 multiplied by 3 so as to extract time sequence characteristics between three frames; the data size after convolution was (w-2) × (h-2) × T. FAAnd FBCorresponding to the two branches respectively, and taking the branches as the input of the first stage. The first stage of the network being based on image features FAAnd FBGenerating two sets of detection confidence mapsAndwherein the content of the first and second substances, r represents a real number, i.e.The method is characterized in that the method is a w x h matrix and represents a confidence map of J key points in the first stage in a visual angle A, J belongs to {1 … J }, and J represents the number of the key points;and (3) representing a confidence map of J key points at the first stage in the view B, wherein J is equal to {1 … J }, and J represents the number of key points.
Then, each stage is similar to the first stage, and the confidence map prediction result and the original image characteristic F from the previous stage are inputAAnd FBTo produce more accurate prediction results.Andrepresenting the network structure of the nth stage (it should be noted by those skilled in the art that the network here isThe structure is equivalent to a process function),andrepresenting the set S obtained in the n-th stageAAnd set SB:
Wherein the content of the first and second substances,representing a confidence map of J key points at the nth stage in the visual angle A, wherein J belongs to {1 … J }, J represents the number of the key points, N belongs to {1 … N }, and N represents the number of model stages; a confidence map representing the jth keypoint of the nth stage in view B.
The confidence map loss function for stage n is calculated as follows:
where P represents each pixel location in the image,is a prediction confidence map for the jth keypoint in the nth stage in view a,is the true confidence map of the jth keypoint in view A;is a prediction confidence map for the jth keypoint in the nth stage in view B,the confidence coefficient map is a real confidence coefficient map of the jth key point in the view angle B, W is a binary mask matrix and is used for reducing errors caused by label value missing, when a label at the position P is missing, the value of the binary mask matrix is 0, otherwise, the value of the binary mask matrix is 1.
The overall two-dimensional confidence loss function is expressed as:
and N represents the total stage number of the network structure, the confidence coefficient graph of the key points is analyzed through a greedy algorithm, and the coordinates of the two-dimensional key points of the human body are output.
Due to the problems of occlusion, light rays and the like in the image, missing detection may occur through the two-dimensional key point coordinates detected in the steps, the threshold value th is set to be 0.4, and if the maximum probability value of a certain point in the confidence map generated in the last stage does not exceed th, the point is determined to be a missing detection point. Processing respective missing key points in the two visual angles, if the same key point is judged to be missing in the visual angle A and is successfully detected in the visual angle B, setting the coordinates of the key point in the visual angle A to be consistent with those in the visual angle B, and similarly, if the same key point is judged to be missing in the visual angle B and is successfully detected in the visual angle A, setting the coordinates of the key point in the visual angle B to be consistent with those in the visual angle A.
The threshold th is 0.4 based on a compromise principle, the missing detection points are filtered based on the threshold th, if the threshold is too large, the complexity of the algorithm for filtering the missing detection points is increased, and the algorithm is not favorable for repairing, and if the threshold is too small, part of the missing detection points can be omitted, so that the error is increased, and therefore, the threshold is set in a compromise mode in terms of detection precision and algorithm efficiency.
2) And (3) generating three-dimensional coordinates:
the two-dimensional confidence map set obtained in the last stage is collectedAnd collectionsInputting a three-dimensional CNN network, extracting information of front and rear key points and outputting three-dimensional coordinates of corresponding key points, wherein the network structure is shown in figure 3. The input size is w × h × J, J representing the number of key points. The CNN includes three convolutional layers, each of which is set to a size of 3 × 3 × 3. And (3) introducing nonlinear mapping for the model by using a ReLU function as an activation function of the convolutional layer, wherein the ReLU activation function is as follows:
ReLU(x)=max(0,x)
where x represents a function argument.
The convolution layer extracts the space characteristics among the key points through convolution operation, and inputs the convolution result into a full-connected Dense (J) layer, wherein J is the regression target number (namely the number of the key points of the human body), and J is equal to 17 in the invention. And finally, taking a sigmoid function as an output unit:
and obtaining the corresponding three-dimensional coordinate point after the output result is subjected to inverse normalization.
3) Coordinate point processing
The world coordinate system is determined to be the point A of the camera, the output result of the visual angle B branch line is converted into the world coordinate system coordinate, and the initial result isAndrepresenting the camera coordinate system coordinates at each of view a and view B. The transfer of the camera coordinate system to the world coordinate system transfer matrix is:
wherein R is an orthogonal rotation matrix, (R)x,ry,rz) Is the angular deviation of camera B relative to camera A, and B is the position of camera B in the world coordinate system (B)x,by,bz),Indicating that view B outputs coordinates in the world coordinate system.
The final two-branch result after the output conversion of the B branch at the visual angle isAndand the coordinates of the human body key points output by the visual angle A and the visual angle B in a world coordinate system are shown, wherein T epsilon {1 … T }, the corresponding image frame number is shown, and the model outputs at the same moment are consistent.
4) Weakly supervised model training
The three-dimensional key point model loss function comprises a distance error loss and an inter-frame loss, and the distance error loss function is defined as follows:
t denotes the number of frames simultaneously input to the network,indicating that the euclidean distance is calculated.
The interframe error loss function is as follows:
the final joint loss function is defined as the sum of two losses of the two-dimensional key point branch and the three-dimensional key point branch in the last stage:
Loss=LD+LTD+f
the goal of the model training is to minimize the loss function, perform back propagation weight training by using a gradient descent method, and iteratively update model parameters, which are known to those skilled in the art and specifically refer to weight parameters and bias parameters in the neural network calculation.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (9)
1. A human body three-dimensional key point extraction method is characterized by comprising the following steps:
s1, collecting human body action data by adopting double visual angles;
s2, detecting two-dimensional key point confidence maps of the human body on the data of the two visual angles by adopting a double-branch multi-stage structure;
s3, establishing a three-dimensional key point generation model;
s4, processing the human body action behavior data to be detected acquired in the step S1 in the step S2 to obtain a corresponding two-dimensional key point confidence map, and inputting the two-dimensional key point confidence map into the three-dimensional key point generation model established in the step S3 to obtain three-dimensional key point coordinates.
2. The method for extracting three-dimensional key points of a human body according to claim 1, wherein the step S1 specifically comprises: two cameras are adopted and marked as a camera A and a camera B, human action behavior data acquisition is carried out simultaneously, and synchronous frame sampling is carried out on acquired video data.
3. The method for extracting three-dimensional key points of a human body according to claim 2, wherein the two-branch multi-stage structure of step S2 is specifically: the upper branch is used for learning the positions of key points in the camera A, the lower branch is used for learning the positions of key points in the camera B, and the upper branch and the lower branch comprise a plurality of stages, wherein 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution are adopted in the stage 1, and 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution are adopted in the rest stages.
4. The method for extracting human body three-dimensional key points according to claim 3, further comprising extracting original image features by using a layer of three-dimensional CNN, wherein the input of the first stage is the original image features extracted by the layer of three-dimensional CNN; the input of the subsequent stage is the original image characteristics extracted by the three-dimensional CNN of the layer and the confidence map prediction result of the previous stage.
5. The method as claimed in claim 4, wherein the layer of three-dimensional CNN is used for extracting image features of the current frame and frames before and after the current frame.
6. The method as claimed in claim 5, wherein the size of the three-dimensional CNN convolution kernel is 3 x 3.
7. The method for extracting three-dimensional key points of a human body according to any one of claims 1 to 6, wherein the step S3 of generating a model of the three-dimensional key points specifically includes: and the three convolutional layers and the one full-connection layer adopt a sigmoid function as an output unit and use a ReLU function as an activation function of the convolutional layers.
8. The method for extracting three-dimensional key points of a human body according to claim 7, further comprising performing weak supervision training on a three-dimensional key point generation model, specifically: and the optimization target is a minimum loss function, a gradient descent method is adopted for carrying out back propagation weight training, and the parameters of the three-dimensional key point generation model are updated in an iterative manner.
9. The method for extracting three-dimensional key points of a human body according to claim 8, wherein the loss function expression is as follows:
Loss=LD+LTD+f
wherein L isDRepresenting the distance error loss function, LTDAnd f represents a two-dimensional confidence loss function of the whole double-branch multi-stage structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110251506.XA CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110251506.XA CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112926475A true CN112926475A (en) | 2021-06-08 |
CN112926475B CN112926475B (en) | 2022-10-21 |
Family
ID=76171889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110251506.XA Expired - Fee Related CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112926475B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780120A (en) * | 2021-08-27 | 2021-12-10 | 深圳云天励飞技术股份有限公司 | Method, device, server and storage medium for generating human body three-dimensional model |
CN113989283A (en) * | 2021-12-28 | 2022-01-28 | 中科视语(北京)科技有限公司 | 3D human body posture estimation method and device, electronic equipment and storage medium |
CN114757822A (en) * | 2022-06-14 | 2022-07-15 | 之江实验室 | Binocular-based human body three-dimensional key point detection method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299669A (en) * | 2018-08-30 | 2019-02-01 | 清华大学 | Video human face critical point detection method and device based on double intelligent bodies |
CN109635843A (en) * | 2018-11-14 | 2019-04-16 | 浙江工业大学 | A kind of three-dimensional object model classification method based on multi-view image |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110544301A (en) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | Three-dimensional human body action reconstruction system, method and action training system |
CN110874865A (en) * | 2019-11-14 | 2020-03-10 | 腾讯科技(深圳)有限公司 | Three-dimensional skeleton generation method and computer equipment |
CN111108507A (en) * | 2017-09-22 | 2020-05-05 | 祖克斯有限公司 | Generating a three-dimensional bounding box from two-dimensional images and point cloud data |
CN111738220A (en) * | 2020-07-27 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Three-dimensional human body posture estimation method, device, equipment and medium |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
-
2021
- 2021-03-08 CN CN202110251506.XA patent/CN112926475B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111108507A (en) * | 2017-09-22 | 2020-05-05 | 祖克斯有限公司 | Generating a three-dimensional bounding box from two-dimensional images and point cloud data |
CN109299669A (en) * | 2018-08-30 | 2019-02-01 | 清华大学 | Video human face critical point detection method and device based on double intelligent bodies |
CN109635843A (en) * | 2018-11-14 | 2019-04-16 | 浙江工业大学 | A kind of three-dimensional object model classification method based on multi-view image |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110544301A (en) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | Three-dimensional human body action reconstruction system, method and action training system |
CN110874865A (en) * | 2019-11-14 | 2020-03-10 | 腾讯科技(深圳)有限公司 | Three-dimensional skeleton generation method and computer equipment |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
CN111738220A (en) * | 2020-07-27 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Three-dimensional human body posture estimation method, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
张广翩: ""基于二维点云图的三维人体建模方法"", 《计算机工程与应用》 * |
肖澳文: ""基于CNN的三维人体姿态估计方法"", 《武汉工程大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780120A (en) * | 2021-08-27 | 2021-12-10 | 深圳云天励飞技术股份有限公司 | Method, device, server and storage medium for generating human body three-dimensional model |
CN113989283A (en) * | 2021-12-28 | 2022-01-28 | 中科视语(北京)科技有限公司 | 3D human body posture estimation method and device, electronic equipment and storage medium |
CN114757822A (en) * | 2022-06-14 | 2022-07-15 | 之江实验室 | Binocular-based human body three-dimensional key point detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112926475B (en) | 2022-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112926475B (en) | Human body three-dimensional key point extraction method | |
CN110135319B (en) | Abnormal behavior detection method and system | |
CN109684924B (en) | Face living body detection method and device | |
CN107492121B (en) | Two-dimensional human body bone point positioning method of monocular depth video | |
CN109684925B (en) | Depth image-based human face living body detection method and device | |
CN110555412B (en) | End-to-end human body gesture recognition method based on combination of RGB and point cloud | |
CN107767419A (en) | A kind of skeleton critical point detection method and device | |
CN111814661A (en) | Human behavior identification method based on residual error-recurrent neural network | |
CN111199207B (en) | Two-dimensional multi-human body posture estimation method based on depth residual error neural network | |
CN111695457A (en) | Human body posture estimation method based on weak supervision mechanism | |
CN110852182A (en) | Depth video human body behavior recognition method based on three-dimensional space time sequence modeling | |
CN112257513B (en) | Training method, translation method and system for sign language video translation model | |
WO2022052782A1 (en) | Image processing method and related device | |
CN111898566B (en) | Attitude estimation method, attitude estimation device, electronic equipment and storage medium | |
CN115376024A (en) | Semantic segmentation method for power accessory of power transmission line | |
CN114611600A (en) | Self-supervision technology-based three-dimensional attitude estimation method for skiers | |
CN114724185A (en) | Light-weight multi-person posture tracking method | |
CN115482523A (en) | Small object target detection method and system of lightweight multi-scale attention mechanism | |
CN116092178A (en) | Gesture recognition and tracking method and system for mobile terminal | |
Li et al. | Mask-FPAN: semi-supervised face parsing in the wild with de-occlusion and UV GAN | |
CN108537156B (en) | Anti-shielding hand key node tracking method | |
CN113762009B (en) | Crowd counting method based on multi-scale feature fusion and double-attention mechanism | |
CN111274901B (en) | Gesture depth image continuous detection method based on depth gating recursion unit | |
CN111950476A (en) | Deep learning-based automatic river channel ship identification method in complex environment | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20221021 |