CN112926475B - Human body three-dimensional key point extraction method - Google Patents
Human body three-dimensional key point extraction method Download PDFInfo
- Publication number
- CN112926475B CN112926475B CN202110251506.XA CN202110251506A CN112926475B CN 112926475 B CN112926475 B CN 112926475B CN 202110251506 A CN202110251506 A CN 202110251506A CN 112926475 B CN112926475 B CN 112926475B
- Authority
- CN
- China
- Prior art keywords
- dimensional
- key point
- dimensional key
- human body
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 7
- 230000000007 visual effect Effects 0.000 claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 20
- 230000009471 action Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 11
- 230000006399 behavior Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 29
- 238000013527 convolutional neural network Methods 0.000 description 12
- 239000011159 matrix material Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human body three-dimensional key point extraction method, which is applied to the field of human body three-dimensional key point detection and aims at solving the problem of poor estimation precision in the prior art, firstly, double view angles are adopted to collect human body action data; then, a double-branch multi-stage structure is adopted to respectively detect two-dimensional key point confidence maps of the human body on the data of the two visual angles; further establishing a three-dimensional key point generation model; finally, inputting a human body two-dimensional key point confidence map corresponding to the detected human body action behavior data to be detected into a three-dimensional key point generation model to obtain three-dimensional key point coordinates; the method can effectively improve the estimation precision of the three-dimensional key points of the human body.
Description
Technical Field
The invention belongs to the field of image processing, and particularly relates to a three-dimensional key point detection technology.
Background
At present, human body motion capture technology is widely adopted in aspects of health monitoring, movie and television production and the like, and the motion of a virtual character is rendered to be more real according to the real motion of a reproduced human body, wherein human body key point detection is the basis for realizing human body motion reproduction. And two-dimensional key point detection and three-dimensional key point detection can be divided according to whether the detection result contains three-dimensional depth information. The two-dimensional key point detection is researched more, but false detection and missed detection are easily caused due to reasons such as shielding or light and shadow change, and the detection precision is influenced.
At present, three-dimensional key point detection is mainly divided into two types: the method is characterized in that three-dimensional key point detection is directly carried out from an image, and Chinese invention patent 'a combined target classification and three-dimensional attitude estimation method (CN 108280481A) based on a residual error network' carries out key point feature extraction and classification based on a residual error network ResNet-50 to realize three-dimensional key point detection; the other method is to acquire two-dimensional coordinates of key points from an image and generate three-dimensional coordinates based on the two-dimensional coordinates of the key points, and the Chinese patent 'a method for estimating human body three-dimensional posture based on structural information (CN 110427877A)' firstly inputs a monocular RGB image into a two-dimensional posture detector to acquire the two-dimensional coordinates of the key points, and then constructs a graph convolution network based on the structural information of the two-dimensional key points and outputs the three-dimensional coordinates of the key points.
The existing three-dimensional key point detection method has the following defects: 1) The method for directly checking the three-dimensional key point coordinates through the images usually depends on other parameters, such as a camera projection matrix, and the parameters are usually not marked in the video data; 2) The direct three-dimensional key point labeling is difficult, the existing training data basically come from a motion capture system, the scene and the object are single, and the generalization capability of the trained model is limited; 3) The detection of three-dimensional key points of a video is generally carried out according to frames, a static image of each frame is processed, and time sequence information between continuous frames and action change of the previous frame and the next frame are ignored.
Disclosure of Invention
In order to solve the technical problems, the invention provides a human body three-dimensional key point extraction method which comprises the steps of firstly, respectively extracting features of two visual angle original images, preliminarily generating two-dimensional key point coordinates through two-dimensional key point confidence detection, simultaneously establishing a three-dimensional key point generation model, cooperatively predicting the three-dimensional key point coordinates through double visual angles, and improving the detection precision of the human body three-dimensional key points.
The technical scheme adopted by the invention is as follows: a human body three-dimensional key point extraction method comprises the following steps:
s1, collecting human body action behavior data by adopting double visual angles;
s2, detecting the confidence maps of the two-dimensional key points of the human body by adopting a double-branch multi-stage structure;
s3, establishing a three-dimensional key point generation model;
and S4, processing the human body action behavior data to be detected acquired in the step S1 in the step S2 to obtain a corresponding two-dimensional key point confidence map, and inputting the two-dimensional key point confidence map into the three-dimensional key point generation model established in the step S3 to obtain three-dimensional key point coordinates.
The step S1 specifically comprises the following steps: two cameras are adopted and marked as a camera A and a camera B, human action behavior data acquisition is carried out simultaneously, and synchronous frame sampling is carried out on acquired video data.
The double-branch multi-stage structure in step S2 is specifically: the upper branch is used for learning the positions of key points in the camera A, the lower branch is used for learning the positions of key points in the camera B, and the upper branch and the lower branch comprise a plurality of stages, wherein 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution are adopted in the stage 1, and 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution are adopted in the rest stages.
The method also comprises the steps of extracting original image features by adopting a layer of three-dimensional CNN, wherein the input of the first stage is the original image features extracted by the layer of three-dimensional CNN; the input of the subsequent stage is the original image characteristics extracted by the three-dimensional CNN of the layer and the confidence map prediction result of the previous stage.
The layer of three-dimensional CNN is used for extracting image features of the current frame and frames before and after the current frame.
The three-dimensional CNN convolution kernel size of this layer is 3 × 3 × 3.
S3, the three-dimensional key point generation model specifically comprises the following steps: and the three convolutional layers and the one full connection layer adopt a sigmoid function as an output unit and use a ReLU function as an activation function of the convolutional layers.
The method further comprises the following step of carrying out weak supervision training on the three-dimensional key point generation model, specifically: and the optimization target is a minimum loss function, a gradient descent method is adopted for carrying out back propagation weight training, and the parameters of the three-dimensional key point generation model are updated in an iterative manner.
The loss function expression is:
Loss=L D +L TD +f
wherein L is D Representing the distance error loss function, L TD Indicating inter-frame errorsAnd f represents a two-dimensional confidence loss function of the whole double-branch multi-stage structure.
The invention has the beneficial effects that: according to the method, video data of two visual angles are obtained through a common camera, characteristics are extracted through CNN, and two-branch multi-stage structures are utilized to respectively detect confidence maps of two-dimensional key points of a human body on the data of the two visual angles; designing a three-dimensional CNN model, generating three-dimensional key point coordinates based on a two-dimensional key point confidence map of the detected human body action behavior data to be detected, and performing weak supervision training on the model through double-view combination; the method of the invention has the following advantages:
1) The three-dimensional convolution neural network simultaneously extracts the spatial characteristics and the track information of the key points and the time characteristics of the whole activity process, and reduces the confidence map error of the two-dimensional key points by utilizing the interframe correlation information.
2) The method generates the three-dimensional coordinates based on the detected two-dimensional confidence map, reduces the influence brought by the process of converting the confidence map into the two-dimensional key point coordinates, establishes the loss function training model by combining the consistency of the time point and the human body posture through the double visual angles, solves the problem of lack of three-dimensional key point marking, simultaneously repairs the influence brought by missing detection of the key points of a single visual angle, and obviously improves the estimation precision of the human body three-dimensional key points.
Drawings
FIG. 1 is a flow chart of three-dimensional keypoint coordinate estimation;
FIG. 2 is a two-branch multi-stage structure provided by the present invention;
fig. 3 is a schematic diagram of three-dimensional coordinate generation.
Detailed Description
In order to facilitate understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, the method of the present invention mainly includes the steps of video data acquisition, two-dimensional key point detection, three-dimensional key point coordinate generation, etc., and the specific steps are as follows:
1. data acquisition
The invention adopts two common cameras, the direction of the camera A and the camera B forms an angle of 90 degrees, the human body action data acquisition is carried out simultaneously, the synchronous frame sampling is carried out on the acquired video data, the sampling frequency is set to be 30Hz, namely 30 frames of images are sampled per second, and the size of each frame of image is represented as w multiplied by h pixels.
The azimuth angles of the camera a and the camera B are not limited to 90 degrees, and may be other angles.
2. Video-based three-dimensional keypoint detection
According to the invention, three-dimensional key point detection is carried out by combining a three-dimensional CNN (Convolutional Neural Networks) with a two-layer CPM (Convolutional attitude Machine) network, a network frame is designed into a multi-stage double-view branch structure, as shown in FIG. 2, each branch corrects a confidence map through multiple stages, and each stage has supervision training, so that the problem that an over-deep network is difficult to optimize is solved. In which stage 1 (indicated by block1 in fig. 2) employs 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution, and the remaining stages (indicated by block2 in fig. 2) employ 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution. The upper branch is used for learning the position of a key point in the visual angle A, and is represented as a confidence map, namely the probability of the key point at a certain point on the image; the lower branch is used to learn the location of the keypoints in view B. And finally, realizing three-dimensional key point prediction by the cooperation of the upper branch and the lower branch. C within the box in fig. 2 is used to represent convolution.
The specific process is as follows:
1) Two-dimensional confidence map detection
Inputting images with the size of w multiplied by h multiplied by T collected in a visual angle A and a visual angle B into a double-branch multi-stage structure network, wherein T represents the number of image frames, firstly extracting image characteristics F of a corresponding frame and frames before and after the corresponding frame through a layer of three-dimensional CNN A And F B The sizes of convolution kernels are all set to be 3 multiplied by 3 so as to extract time sequence characteristics between three frames; the data size after convolution is (w-2) × (h-2) × T. F A And F B Corresponding to the two branches respectively, and taking the branches as the input of the first stage. The first stage of the network being based on image features F A And F B Generating two sets of detection confidence mapsAndwherein the content of the first and second substances, r represents a real number, i.e.Representing a confidence map of the jth key point in the first stage in a visual angle A by a w multiplied by h matrix, wherein J belongs to {1 … J }, and J represents the number of the key points;and J belongs to {1 … J }, and J represents the number of key points.
Then, each stage is similar to the first stage, and the confidence map prediction result and the original image characteristic F from the previous stage are input A And F B To produce more accurate prediction results.Andrepresents the network structure of the nth stage (it should be noted by those skilled in the art that the network structure here is equivalent to a process function),andrepresenting the set S obtained in the n-th stage A And set S B :
Wherein the content of the first and second substances,representing a confidence coefficient graph of the jth key point at the nth stage in the visual angle A, wherein J belongs to {1 … J }, J represents the number of the key points, N belongs to {1 … N }, and N represents the number of the model stages; a confidence map representing the jth keypoint of the nth stage in view B.
The confidence map loss function for stage n is calculated as follows:
where P represents each pixel location in the image,is a prediction confidence map for the jth keypoint in the nth stage in view a,is the true confidence map of the jth keypoint in view A;is a prediction confidence map for the jth keypoint in the nth stage in view B,the confidence coefficient map is a real confidence coefficient map of the jth key point in the view angle B, W is a binary mask matrix and is used for reducing errors caused by label value missing, when a label at the position P is missing, the value of the binary mask matrix is 0, otherwise, the value of the binary mask matrix is 1.
The overall two-dimensional confidence loss function is expressed as:
and N represents the total number of stages of the network structure, the confidence coefficient graph of the key points is analyzed through a greedy algorithm, and the coordinates of the two-dimensional key points of the human body are output.
Due to the fact that the image has problems of blocking or light rays and the like, missing detection may occur through the two-dimensional key point coordinates detected in the steps, a threshold th =0.4 is set, and if the maximum probability value of a certain point in the confidence map generated in the last stage does not exceed th, the point is judged to be a missing detection point. Processing respective missing key points in the two visual angles, if the same key point is judged to be missing in the visual angle A and successfully detected in the visual angle B, setting the coordinates of the key point in the visual angle A to be consistent with those in the visual angle B, and if the same key point is judged to be missing in the visual angle B and successfully detected in the visual angle A, setting the coordinates of the key point in the visual angle B to be consistent with those in the visual angle A.
The threshold th =0.4 is based on a compromise principle, and filters the missing detection points based on the threshold th, and if the threshold is too large, the complexity of the algorithm for filtering the missing detection points is increased and the algorithm is not favorable for repairing, and if the threshold is too small, part of the missing detection points may be omitted, and the error is increased, so that the threshold is set in terms of detection accuracy and algorithm efficiency in a compromise manner.
2) Three-dimensional coordinate generation:
the two-dimensional confidence map set obtained in the last stage is collectedAnd collectionsInputting a three-dimensional CNN network, extracting information of front and rear key points and outputting three-dimensional coordinates of corresponding key points, wherein the network structure is shown in figure 3. The input size is w × h × J, J representing the number of key points. The CNN includes three convolutional layers, each of which is set to a size of 3 × 3 × 3. And (3) introducing nonlinear mapping for the model by using a ReLU function as an activation function of the convolutional layer, wherein the ReLU activation function is as follows:
ReLU(x)=max(0,x)
where x represents a function argument.
The convolution layer extracts the space characteristics among the key points through convolution operation, and inputs the convolution result into a full-connection Dense (J) layer, wherein J is the regression target number (namely the number of the key points of the human body), and J is equal to 17 in the invention. And finally, taking a sigmoid function as an output unit:
and (5) performing inverse normalization on the output result to obtain a corresponding three-dimensional coordinate point.
3) Coordinate point processing
The world coordinate system is determined to be the point A of the camera, the output result of the visual angle B branch line is converted into the world coordinate system coordinate, and the initial result isAndrepresenting the camera coordinate system coordinates at each of view a and view B. The transfer of the camera coordinate system to the world coordinate system transfer matrix is:
wherein R is an orthogonal rotation matrix, (R) x ,r y ,r z ) Is the angular deviation of camera B relative to camera A, and B is the position of camera B in the world coordinate system (B) x ,b y ,b z ),Representing the coordinates of view B output in the world coordinate system.
The final two branch results after the output conversion of the B branch at the visual angle areAndand the coordinates of the key points of the human body output from the visual angle A and the visual angle B in a world coordinate system are shown, wherein T belongs to {1 … T }, the corresponding image frame numbers are shown, and the model outputs at the same moment are consistent.
4) Weakly supervised model training
The three-dimensional key point model loss function comprises a distance error loss and an inter-frame loss, and the distance error loss function is defined as follows:
t denotes the number of frames simultaneously input to the network,indicating a calculated euclidean distance.
The interframe error loss function is as follows:
the final joint loss function is defined as the sum of two losses of the two-dimensional key point branch and the three-dimensional key point branch in the last stage:
Loss=L D +L TD +f
the goal of the model training is to minimize the loss function, perform back propagation weight training by using a gradient descent method, and iteratively update model parameters, which are known to those skilled in the art and specifically refer to weight parameters and bias parameters in the neural network calculation.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (4)
1. A human body three-dimensional key point extraction method is characterized by comprising the following steps:
s1, collecting human body action behavior data by adopting double visual angles;
s2, detecting two-dimensional key point confidence maps of the human body on the data of the two visual angles by adopting a double-branch multi-stage structure; the double-branch multi-stage structure in step S2 is specifically: the upper branch is used for learning the positions of key points in the camera A, the lower branch is used for learning the positions of key points in the camera B, and the upper branch and the lower branch comprise a plurality of stages, wherein 3 layers of 3 × 3 convolution and two layers of 1 × 1 convolution are adopted in the stage 1, and 5 layers of 7 × 7 convolution and two layers of 1 × 1 convolution are adopted in the rest stages;
the method also comprises the steps of extracting original image features by adopting a layer of three-dimensional CNN, wherein the input of the first stage is the original image features extracted by the layer of three-dimensional CNN; the input of the subsequent stage is the original image characteristics extracted by the three-dimensional CNN of the layer and the confidence map prediction result of the previous stage; the three-dimensional CNN is used for extracting image characteristics of the current frame and frames before and after the current frame; the size of the three-dimensional CNN convolution kernel of the layer is 3 multiplied by 3;
s3, establishing a three-dimensional key point generation model; step S3, the three-dimensional key point generation model specifically comprises the following steps: the three convolutional layers and the one full-connection layer adopt a sigmoid function as an output unit and use a ReLU function as an activation function of the convolutional layers;
and S4, processing the human body action behavior data to be detected acquired in the step S1 in the step S2 to obtain a corresponding two-dimensional key point confidence map, and inputting the two-dimensional key point confidence map into the three-dimensional key point generation model established in the step S3 to obtain a three-dimensional key point coordinate.
2. The method for extracting three-dimensional key points of a human body according to claim 1, wherein the step S1 specifically comprises: two cameras are adopted and marked as a camera A and a camera B, human action behavior data acquisition is carried out simultaneously, and synchronous frame sampling is carried out on acquired video data.
3. The method for extracting three-dimensional key points of a human body according to claim 2, further comprising performing weak supervision training on the three-dimensional key point generation model, specifically: and the optimization target is a minimum loss function, a gradient descent method is adopted for carrying out back propagation weight training, and the parameters of the three-dimensional key point generation model are updated in an iterative manner.
4. The method for extracting three-dimensional key points of a human body according to claim 3, wherein the loss function expression is as follows:
Loss=L D +L TD +f
wherein L is D Represents the distance error loss function, L TD And f represents a two-dimensional confidence loss function of the whole double-branch multi-stage structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110251506.XA CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110251506.XA CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112926475A CN112926475A (en) | 2021-06-08 |
CN112926475B true CN112926475B (en) | 2022-10-21 |
Family
ID=76171889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110251506.XA Expired - Fee Related CN112926475B (en) | 2021-03-08 | 2021-03-08 | Human body three-dimensional key point extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112926475B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780120A (en) * | 2021-08-27 | 2021-12-10 | 深圳云天励飞技术股份有限公司 | Method, device, server and storage medium for generating human body three-dimensional model |
CN113989283B (en) * | 2021-12-28 | 2022-04-05 | 中科视语(北京)科技有限公司 | 3D human body posture estimation method and device, electronic equipment and storage medium |
CN114299128A (en) * | 2021-12-30 | 2022-04-08 | 咪咕视讯科技有限公司 | Multi-view positioning detection method and device |
CN114757822B (en) * | 2022-06-14 | 2022-11-04 | 之江实验室 | Binocular-based human body three-dimensional key point detection method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299669A (en) * | 2018-08-30 | 2019-02-01 | 清华大学 | Video human face critical point detection method and device based on double intelligent bodies |
CN109635843A (en) * | 2018-11-14 | 2019-04-16 | 浙江工业大学 | A kind of three-dimensional object model classification method based on multi-view image |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110544301A (en) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | Three-dimensional human body action reconstruction system, method and action training system |
CN110874865A (en) * | 2019-11-14 | 2020-03-10 | 腾讯科技(深圳)有限公司 | Three-dimensional skeleton generation method and computer equipment |
CN111108507A (en) * | 2017-09-22 | 2020-05-05 | 祖克斯有限公司 | Generating a three-dimensional bounding box from two-dimensional images and point cloud data |
CN111738220A (en) * | 2020-07-27 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Three-dimensional human body posture estimation method, device, equipment and medium |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
-
2021
- 2021-03-08 CN CN202110251506.XA patent/CN112926475B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111108507A (en) * | 2017-09-22 | 2020-05-05 | 祖克斯有限公司 | Generating a three-dimensional bounding box from two-dimensional images and point cloud data |
CN109299669A (en) * | 2018-08-30 | 2019-02-01 | 清华大学 | Video human face critical point detection method and device based on double intelligent bodies |
CN109635843A (en) * | 2018-11-14 | 2019-04-16 | 浙江工业大学 | A kind of three-dimensional object model classification method based on multi-view image |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110544301A (en) * | 2019-09-06 | 2019-12-06 | 广东工业大学 | Three-dimensional human body action reconstruction system, method and action training system |
CN110874865A (en) * | 2019-11-14 | 2020-03-10 | 腾讯科技(深圳)有限公司 | Three-dimensional skeleton generation method and computer equipment |
CN111753747A (en) * | 2020-06-28 | 2020-10-09 | 高新兴科技集团股份有限公司 | Violent motion detection method based on monocular camera and three-dimensional attitude estimation |
CN111738220A (en) * | 2020-07-27 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Three-dimensional human body posture estimation method, device, equipment and medium |
Non-Patent Citations (2)
Title |
---|
"基于CNN的三维人体姿态估计方法";肖澳文;《武汉工程大学学报》;20190415;第41卷(第02期);第168-172页 * |
"基于二维点云图的三维人体建模方法";张广翩;《计算机工程与应用》;20201001;第56卷(第19期);第205-215页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112926475A (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112926475B (en) | Human body three-dimensional key point extraction method | |
CN110135319B (en) | Abnormal behavior detection method and system | |
CN109684924B (en) | Face living body detection method and device | |
WO2022002150A1 (en) | Method and device for constructing visual point cloud map | |
CN109684925B (en) | Depth image-based human face living body detection method and device | |
CN111814661A (en) | Human behavior identification method based on residual error-recurrent neural network | |
CN111199207B (en) | Two-dimensional multi-human body posture estimation method based on depth residual error neural network | |
CN111695457A (en) | Human body posture estimation method based on weak supervision mechanism | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
CN113361542A (en) | Local feature extraction method based on deep learning | |
CN111898566B (en) | Attitude estimation method, attitude estimation device, electronic equipment and storage medium | |
CN113139489A (en) | Crowd counting method and system based on background extraction and multi-scale fusion network | |
CN115376024A (en) | Semantic segmentation method for power accessory of power transmission line | |
CN114724185A (en) | Light-weight multi-person posture tracking method | |
CN114611600A (en) | Self-supervision technology-based three-dimensional attitude estimation method for skiers | |
CN115482523A (en) | Small object target detection method and system of lightweight multi-scale attention mechanism | |
CN116092178A (en) | Gesture recognition and tracking method and system for mobile terminal | |
CN113516232B (en) | Self-attention mechanism-based wall-penetrating radar human body posture reconstruction method | |
CN115393928A (en) | Face recognition method and device based on depth separable convolution and additive angle interval loss | |
CN111523586A (en) | Noise-aware-based full-network supervision target detection method | |
CN112257513B (en) | Training method, translation method and system for sign language video translation model | |
CN113762009B (en) | Crowd counting method based on multi-scale feature fusion and double-attention mechanism | |
CN111274901B (en) | Gesture depth image continuous detection method based on depth gating recursion unit | |
CN111950476A (en) | Deep learning-based automatic river channel ship identification method in complex environment | |
CN116092189A (en) | Bimodal human behavior recognition method based on RGB data and bone data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20221021 |