CN112001217A - Multi-person human body posture estimation algorithm based on deep learning - Google Patents
Multi-person human body posture estimation algorithm based on deep learning Download PDFInfo
- Publication number
- CN112001217A CN112001217A CN202010560950.5A CN202010560950A CN112001217A CN 112001217 A CN112001217 A CN 112001217A CN 202010560950 A CN202010560950 A CN 202010560950A CN 112001217 A CN112001217 A CN 112001217A
- Authority
- CN
- China
- Prior art keywords
- human body
- limb
- image
- joint
- postures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 28
- 238000013135 deep learning Methods 0.000 title claims abstract description 10
- 230000036544 posture Effects 0.000 claims abstract description 48
- 238000000605 extraction Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000000034 method Methods 0.000 abstract description 17
- 230000008569 process Effects 0.000 abstract description 11
- 230000006870 function Effects 0.000 abstract description 4
- 239000000428 dust Substances 0.000 abstract description 3
- 238000005290 field theory Methods 0.000 abstract description 2
- 210000003414 extremity Anatomy 0.000 description 79
- 238000001514 detection method Methods 0.000 description 29
- 239000013598 vector Substances 0.000 description 16
- 210000001503 joint Anatomy 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 230000003190 augmentative effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000002414 leg Anatomy 0.000 description 2
- 210000003739 neck Anatomy 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000000779 smoke Substances 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 210000000689 upper leg Anatomy 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 210000002310 elbow joint Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 210000003857 wrist joint Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a human body posture estimation algorithm based on deep learning, which comprises the following steps: the method comprises the steps of inputting an image or video file containing human body postures of multiple persons into a built model, extracting image characteristics of multiple person limbs and joint points from the input image and video by using a ResNet network with 50 layers, detecting by using a convolution posture machine, selecting heat map of the optimal joint point from the detected joint points by using a Gaussian function, matching the joint point pairs by using a component affinity field theory to obtain all limb types and joint point sets required by the human body postures, matching the limb types and the joint point sets by using a Hungary algorithm and a human body limb frame to generate postures, and finishing estimation of the postures of the multiple persons in the image through the whole process. The invention can be applied to the rescue robot platform to accurately and efficiently estimate the human body postures of a plurality of people to be rescued in complex environments such as land dust, wet lands, narrow spaces and the like.
Description
Technical Field
The invention belongs to the technical field of multi-person human body image processing in a complex environment, and particularly relates to a multi-person human body posture estimation algorithm based on deep learning.
Background
Rescue under outdoor land environment is one of main contents of human rescue, and the existing traditional rescue method can not ensure timely and accurate arrival at the scene to carry out rescue when facing to the complex land environments such as sand, dust, wetland, narrow space and the like, thereby increasing a plurality of instability factors for rescue tasks, and simultaneously greatly threatening the safety of related personnel due to secondary disasters possibly occurring in the search and rescue process. In order to make up for the defect that the existing search and rescue system equipment cannot cover complex terrains on land, it is necessary to develop a portable and high-adaptability ground robot system capable of meeting the requirement of multi-search and rescue terrains. The rescue robot has the main task of quickly finding the posture information of the injured person to prepare for further taking rescue measures. The image information of the injured person has the characteristics of rich content, easy and quick acquisition speed, so that a machine Vision (CV) technology is very common in the land rescue robot. In the process of visual search, the processing and processing contents of a plurality of information such as image classification, target detection, target pose judgment and estimation of machine vision are involved. In the actual rescue process, the visual information of the injured person is easily influenced by the outdoor severe environment, and particularly, an effective image is difficult to obtain due to the interference of an image background and the posture of the injured person (the posture of a single person or multiple persons is blocked), so that the posture estimation solution is not unique, and the accurate and stable posture estimation on the injured person cannot be realized. In order to solve the problem of how to accurately find out injured people in a complex environment by machine vision, the multi-person human posture estimation model which can effectively resist the interference of an outdoor environment and improve the posture estimation efficiency of the injured people and has certain robustness is developed, and the multi-person human posture estimation model has important significance for the development of the current rescue robot.
Disclosure of Invention
In view of the above, the present invention is directed to a method for estimating and calculating human postures of multiple persons based on deep learning, so as to solve the above-mentioned problems in the background art.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a multi-person human body posture estimation algorithm based on deep learning transfers an image or a video file into a model, then uses a ResNet network with 50 layers to extract image characteristics of multi-person limbs and joints of the input image and video, uses a Convolution Posture Machine (CPM) to detect the joints, uses a Gaussian function to select a heat map of an optimal joint of the detected joints, uses a component affinity field theory (PAFs) to match the obtained joints with the limbs to obtain a set of all the limb types and joints required by human body posture, then uses a Hungary algorithm and a human body limb frame to match the set of the limb types and joints to obtain complete human body posture estimation, and finally completes the multi-person posture estimation in the image.
Further, the model input specifically includes an RGB color image within 1000 × 1000 pixel size or a video file containing a multi-person image, and the file format is MP4 format.
Further, the feature extraction network consists of a 50-layer ResNet network. One of the cores of the ResNet network is to introduce a deep Bottleneck structure (deep Bottleneck architecture), and the principle is to add several Identity mapping layers (i.e. y ═ x, where the output is equal to the input) behind a shallower network (shalow Net) to increase the depth of the network and improve the non-linear capability of the network, and meanwhile, Identity mapping cannot cause an error increase, i.e. a Deeper network should not bring an error increase on the training set. The advent of residual networks successfully addressed the frequent appearance of conventional neural networks: if the distance from the input layer is too far, the derivative value transmitted back by the residual error is too small, so that the adjustment value is zero and is close to distortion; each layer network needs to learn a new output function f (x), and when the network depth is greatly increased, the number of the output functions causes problems of high calculation pressure and the like. The original input of the network is directly input into a deeper layer of the network from a bypass to increase the residual error of the network, the phenomenon that the residual error disappears is remedied, and each time when each layer of the residual error network is trained, only one residual error is learned relative to the original data, instead of directly mapping f (x).
The conventional convolutional neural network extracts all information at one time, the risk of gradient disappearance is increased, the residual error network only learns the residual error, the calculation time is divided into two paths, and the first network directly transfers downwards: attempting to learn the residual f (x) directly from x; the second shortcut network: and inputting x. The image is input x and the result to be fitted (output) is h (x). According to the residual module structure, the output result is differentiated into x + y, that is, h (x) x + y, and y is further made to be f (x), that is, y is also fit by x, and then the obtained residual and x are added together to obtain the output result of the layer, that is, the mapping value h (x) f (x) + x is different from the input x to obtain the required residual, so that the residual structure actually only needs to fit f (x), and the calculation formula is shown in (1).
Further, in the feature detection process, confidence maps (confidence maps) are represented by S for 2D detection positions of specific key points in the image, for example, if there is only one person in the image and the joint points are visible, there should be a single peak value for each confidence map; if there are k people in the image where j, say j, necks are visible for the joint point, then there should be j peaks. Inputting the feature points obtained in the first step of the model into a posture convolution machine network for joint point detection to obtain a batch of potential joint point confidence graphs, and then inputting the potential joint points X into the posture convolution machine networkj,kAnd the real joint point p is calculated by using the formula (2) to obtain the optimal joint point.
Where σ represents the peak creep degree, and p is the image coordinate value at that point. Will be provided withThe resulting set obtains the final output predicted confidence map by equation (3.3).
Furthermore, joint points in the image can be obtained through a confidence map of joint point detection, and the network model connects key points by using Part Affinity Fields. Part Affinity Fields (PAFs) are the core content of the openpos model, which refers to the location and orientation information stored in the limb area. The PAFs are further classified into single-person PAFs and multi-person PAFs.
Further, each limb joint point points to the other limb in a single person's examination, each limb having a corresponding Affinity Field connectionThe body part to which it relates. Let Xj1,kAnd Xj2,kRespectively represent a joint point j1And j2Coordinate of, vector ofRepresenting a limb C of the Kth person consisting of these two joint points, only from point j when point P is on this limb as shown1Point j of2Time, vectorIs a unit vector; the other points are zero vectors, and the judgment conditions are shown in the formulas (4) and (5).
The point P on the limb C satisfies both equations (6) and (7).
Where L represents the length of the limb, V is a vector perpendicular to the unit vector, σlRefers to the width of the limb. The vector is required when a plurality of limbs C are overlapped in a figureThe average value is obtained as shown in equation (8).
Wherein n isc(P) represents the number of non-0 vectors at the P point. And detecting the associated point pairs formed by the joint points, and screening real associated point pairs and limbs suitable for reality by calculating the line integral of PAF on the line segment formed by the associated point pairs. The integral formula is shown in equations (9) and (10).
p(u)=(1-u)dj1+udj2 (10)
Where p (u) is the interpolated position between the two joint points.
Furthermore, in multi-person detection, after non-maximum suppression is carried out on the detected confidence maps, position discrete point candidate sets of the joint points are obtained, in images of multiple persons, the candidate points need to be matched with different persons, multiple solutions exist, and the multi-person posture solution is obtained through the combined action of the Hungary algorithm and the body limb frame.
Further, the hungarian algorithm means that a body limb part and a joint point are assumed to be G, and G ═ V, E is an undirected graph. The vertex set V of the graph can be divided into two mutually disjoint subsets X and Y (no edge inside the subset), and two endpoints of any one edge in the graph belong to different subsets, so the graph G is called a bipartite graph. In the matching process, it is necessary to ensure that the endpoints in the subsets X and Y are matched with each other as many as possible by one-to-one without repetition if | V occurs1|≤|V2I (i.e., the number of endpoints in subset 1 that need to be matched is less than subset 2), and | M | ═ V1If V is equal to V, this is called matching process as perfect matching1|=|V2If is called perfect match.
Further, in order to help the Hungarian algorithm to quickly match a part of limb pairs which are not easy to match in the graph, a human limb frame model is introduced, wherein points represent important joint points of a human body, lines represent limbs, and the points and the lines do not represent volumes, so that the model is modeled by a non-volume method, and for all joint points and limbs of the human body, only when the joint points and the limbs are connected, the connection exists between the adjacent joint points and the limbs. When the multiple persons do not seriously shield or the characteristics of the joints of the human bodies are not obvious, the model can be applied to a part of detected limbs, then a preferential candidate area of a missing limb and a joint point is provided for the network according to the space rotation range of the rest of the limbs in the model, and the preferential candidate area has high detection and matching weights so as to improve the identification precision of the network in the multiple person overlapped image.
Compared with the prior art, the multi-person human body posture estimation algorithm based on deep learning has the following advantages:
the invention aims at the problems that the rescue robot has inaccurate recognition when recognizing the posture of a person in a complex land environment and the accuracy of an Open position human posture estimation model is to be further improved, and the invention carries out two improvements:
(1) the multi-person human body posture estimation algorithm based on deep learning is used as a multi-person human body posture estimation core algorithm in the complex environment recognized by the robot, and the robot is effectively helped to recognize the human body posture in the complex environment on the land.
(2) The matching problem of the human body limbs and the joint points of a plurality of people is solved under the combined action of the Hungarian algorithm and the human body limb framework, and the matching precision of the human body limbs and the joint points is further improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating algorithm detection according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a single limb PAFs in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a candidate pair of nodes according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of two-part graph matching according to an embodiment of the present invention
FIG. 5 is a schematic diagram of a human body structure based on component synthesis according to an embodiment of the present invention;
FIG. 6 is a graph of the test results according to the embodiment of the present invention;
Detailed Description
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
According to the invention, on the basis of an Open pos algorithm, a 50-layer ResNet network is used as a human body feature extraction network, the robustness of the algorithm to the estimation of the human body posture of multiple persons in a strange environment is improved, and then a Hungary algorithm and a human body limb frame are used for matching the multiple person limbs and joint points to obtain the estimation result of the human body posture of the multiple persons.
1. First an image or video file is input into the model.
2. And then inputting the input image and video file into a feature extraction network to obtain feature points required by the human body gestures of multiple people.
3. During the detection process, confidence maps (confidence maps) are represented by S for 2D detection positions of specific key points in the image, for example, if only one person is in the image and the joint points are visible, a single peak value should exist in each confidence map; if there are k people in the image where j, say j, necks are visible for the joint, then there should be j peaks. Inputting the feature points obtained in the first step of the model into a posture convolution machine network for joint point detection to obtain a batch of potential joint point information graphs, and then inputting the potential joint points X into the posture convolution machine networkj,kAnd the real joint point p is calculated using equation (11) to obtain the optimal joint point.
Where σ represents the peak creep degree, and p is the image coordinate value at that point. Will be provided withThe resulting set obtains the final output predicted confidence map by equation (12).
Joint points in the image can be obtained through a confidence map of joint point detection, and the network model connects key points to obtain a limb by using Part Affinity Fields shown in figure 1.
Let Xj1,kAnd Xj2,kRespectively represent a joint point j1And j2Coordinate of, vector ofRepresenting a limb C of the Kth person consisting of these two joint points, only from point j when point P is on this limb as shown1Point j of2Time, vectorIs a unit vector; the other points are all zero vectors, and the judgment conditions are shown in equations (13) and (14).
The point P on the limb C satisfies both equations (15) and (16).
Wherein L representsLength of limb, V is the vector perpendicular to the unit vector, σlRefers to the width of the limb.
The vector is required when a plurality of limbs C are overlapped in a figureThe average value is obtained as shown in equation (17).
Wherein n isc(P) represents the number of non-0 vectors at the P point. And detecting the associated point pairs formed by the joint points, and screening real associated point pairs and limbs suitable for reality by calculating the line integral of PAF on the line segment formed by the associated point pairs. The integral formula is shown in formulas (18) and (19).
p(u)=(1-u)dj1+udj2 (19)
Where p (u) is the interpolated position between the two joint points.
After the detected confidence map is subjected to non-maximum suppression, a position discrete point candidate set of the joint points is obtained, and in an image of multiple persons, there are multiple solutions, for example, as shown in fig. 2, where the candidate points need to be matched to different persons. The boxes represented by the same color in the graph represent the same joint point, the possible results that the three joint points can be connected into limbs are shown as b, and the network model uses the Hungarian algorithm, the global context connection implicitly coded by the paired association results contained in the PAFs and the human body limb framework to obtain the connection of high-quality multi-person key point pairs.
4. The idea of the algorithm is as follows: let G ═ (V, E) be an undirected graph. The vertex set V of the graph can be divided into two mutually disjoint subsets X and Y (no edge inside the subset), and two endpoints of any one edge in the graph belong to different subsets, so the graph is calledG is a bipartite graph. In the matching process, it is necessary to ensure that as many endpoints in the subsets X and Y as possible are matched with each other one-to-one without repetition if | V occurs1|≤|V2I (i.e., the number of endpoints in subset 1 that need to be matched is less than subset 2), and | M | ═ V1If V is equal to V, this is called matching process as perfect matching1|=|V2If is called perfect match.
The augmented path may be defined as: setting M as the successfully matched set in the bipartite graph G, as shown in FIG. 3, if P is a path in the graph G that can connect two paths without matching points (the initial point of P can be both X and Y), and the edge belonging to M and the edge not belonging to M appear alternately on P, then P is an augmented path of M. The calculation process of the Hungarian algorithm is that M is set to be null, an augmentation path P on M is found out, and then more matching M' is obtained to replace M through negation operation. The operation of continuing the previous step is repeated until no more augmented paths are found, so the core of the hungarian algorithm is to find as many augmented paths as possible.
The algorithmic pseudo code of the augmented path is as follows:
in the algorithm, the network model takes two types of limb sets which can be correctly connected as subsets X and Y, and obtains correct limb combinations through the Hungarian algorithm to form a complete human body posture structure.
In order to help Hungarian algorithm to quickly match a part of limb pairs which are not easy to match in a graph, a human limb frame model is introduced, wherein points represent important joint points of a human body, lines represent the limbs, and the points and the lines do not represent volumes, so that the model is modeled by a non-volume method, and for all joint points and limbs of the human body, only when the joint points and the limbs are connected, the connection exists between the adjacent joint points and the limbs. When the multiple persons do not seriously shield or the human joint features are not obvious, the model can be sleeved on a part of detected limbs, then a preferential candidate area of the missing limbs and joint points is provided for the network according to the space rotation range of the rest of the limbs in the model, and the area has high detection and matching weight so as to improve the identification precision of the network in the multiple person overlapped image. The model is divided into two layers, wherein the first layer is a human body posture integral layer, the second layer comprises a head, a trunk, a left arm, a left leg, a right arm and a right leg, the third layer comprises a left big arm, a left small arm, a right big arm, a right small arm, a left thigh, a left shank, a right thigh and a right shank, the fourth layer comprises a joint connected between the two parts in the third layer, if the fourth layer below the right lower arm of the third layer comprises a wrist joint and an elbow joint, the whole structure schematic diagram is shown in figure 4, and the layers are directed to the lower layer from the high layer through arrows.
When in detection, firstly, the matched and determined limb is taken as a stable point, then the limb can be simplified into a rigid body according to the principle that the limb of a human body is connected with each other through a hinge at two ends, and the length of the limb of the human body has a certain proportional relation, so that the constraint of the limb of the human body is divided into two parts: the first part is the length constraint on the same limb, and the calculation is shown as formula (20); the second part is the length constraint of the symmetrically positioned limb, and the calculation formula is shown as (21).
Wherein R isiRepresenting a group of limbs having a certain similarity, SiThe (i) th limb is represented,mean values representing the ratio between the length of all limbs in a group and their mean values.
And after the length estimation value of the limb is obtained, taking the joint point in the limb as the center, taking the estimated length of the limb as the joint point and the limb related to the limb in the radius detection range, and then calculating all limb matching by using the Hungarian algorithm again.
Experiments and analyses
In order to test the generalization ability of the model in various environments, a plurality of photos of multiple persons are randomly selected from campus, battlefield, earthquake, fire and dust environments and tested in the trained model, and the test results are shown in fig. 5 to 6.
Analysis of results
In order to quantitatively describe the detection accuracy of the model in a complex environment, 100 images are randomly extracted from three environments with low visibility, such as war, earthquake, smoke and the like, and then the images are detected, and are compared with the correct results to obtain the detection accuracy of each human body and the detection accuracy of each limb, wherein the results are shown in tables 2 and 3.
TABLE 2 human body detection accuracy in complex environments
TABLE 3 accuracy of estimation of human body limbs in complex environment
As can be seen from Table 2, the mean value of the human body detection accuracy in the 3 environments is 0.83, but only the performance of the number of human body postures detected by the model is evaluated. Table 3 shows the detection accuracy of each limb of the human body in 3 environments, and it can be seen that the detection accuracy of the limb of the human body in the war environment is the lowest, because the posture change of the human body in the war is the largest and the environment is quite harsh, the detection accuracy of each limb is greatly reduced compared with the original accuracy of the model. In an earthquake environment, the position and posture of personnel can be shielded by the surrounding environment, so that the accuracy of the trunk part and the tail ends of the limbs of the human body is greatly reduced, and the accuracy of other parts is reduced in a small range. Compared with other two environments, the low visibility environment such as smoke has the advantages that although visibility is reduced and the detection accuracy of the limb end is affected, the pose change is small, and the detection accuracy of each limb is relatively high.
In a word, when the figure in the picture is in a clear background environment, the person shielding is not serious, and the image is a close scene, the model detection effect is good, and the detection accuracy is high. When the background of the image is complex, but the characters in the image are sparse and the occlusion is not serious, the detection rate and the accuracy of the model are high. When people in the figures are stacked densely and the background and the human body are fused, the detection effect is reduced, because when people are stacked heavily and the background and the human body are fused, the features extracted by the feature extraction network are poor, even the features which are not considered to be effective are discarded, so that the joint point features of the people disappear, and finally the body features which are stacked together and are completely fused with the background cannot be detected by the model.
The three-aspect research is carried out on the problem that the rescue robot estimates the postures of multiple persons inaccurately in the complex land environment and the problem that the precision of the existing human posture estimation model needs to be improved: the method comprises the following steps of (1) providing a multi-person human body posture estimation model based on deep learning; (2) using a 50-layer ResNet network as a feature extraction network; (3) the Hungarian algorithm and the human body limb frame are used together to obtain the postures of the human bodies of the multiple persons.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (3)
1. A multi-person human posture estimation model based on deep learning is characterized in that: inputting the image into a network, obtaining the estimation characteristics of the postures of the human bodies of the multiple persons through the processing of a characteristic extraction network, and then inputting the characteristics into a matching network of the limbs and the joint points of the human bodies of the multiple persons to realize the estimation of the postures of the human bodies of the multiple persons.
2. The model of claim 1, wherein the model comprises: the model input specifically comprises an RGB color image within 1000 × 1000 pixels in size or a video file containing a multi-person image, and the file format is MP4 format.
3. The model of claim 1, wherein the model comprises: the human body limbs and the joint points are obtained by matching and using the Hungarian algorithm and the human body limb frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010560950.5A CN112001217A (en) | 2020-06-18 | 2020-06-18 | Multi-person human body posture estimation algorithm based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010560950.5A CN112001217A (en) | 2020-06-18 | 2020-06-18 | Multi-person human body posture estimation algorithm based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112001217A true CN112001217A (en) | 2020-11-27 |
Family
ID=73466633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010560950.5A Pending CN112001217A (en) | 2020-06-18 | 2020-06-18 | Multi-person human body posture estimation algorithm based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112001217A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221924A (en) * | 2021-06-02 | 2021-08-06 | 福州大学 | Portrait shooting system and method based on OpenPose |
CN113269166A (en) * | 2021-07-19 | 2021-08-17 | 环球数科集团有限公司 | Fire detection algorithm for cross-media analysis and inference |
CN113368487A (en) * | 2021-06-10 | 2021-09-10 | 福州大学 | OpenPose-based 3D private fitness system and working method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886069A (en) * | 2017-11-10 | 2018-04-06 | 东北大学 | A kind of multiple target human body 2D gesture real-time detection systems and detection method |
CN110084138A (en) * | 2019-04-04 | 2019-08-02 | 高新兴科技集团股份有限公司 | A kind of more people's Attitude estimation methods of 2D |
CN111199207A (en) * | 2019-12-31 | 2020-05-26 | 华南农业大学 | Two-dimensional multi-human body posture estimation method based on depth residual error neural network |
-
2020
- 2020-06-18 CN CN202010560950.5A patent/CN112001217A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886069A (en) * | 2017-11-10 | 2018-04-06 | 东北大学 | A kind of multiple target human body 2D gesture real-time detection systems and detection method |
CN110084138A (en) * | 2019-04-04 | 2019-08-02 | 高新兴科技集团股份有限公司 | A kind of more people's Attitude estimation methods of 2D |
CN111199207A (en) * | 2019-12-31 | 2020-05-26 | 华南农业大学 | Two-dimensional multi-human body posture estimation method based on depth residual error neural network |
Non-Patent Citations (1)
Title |
---|
ZHE CAO 等: "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", 《ARXIV》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221924A (en) * | 2021-06-02 | 2021-08-06 | 福州大学 | Portrait shooting system and method based on OpenPose |
CN113368487A (en) * | 2021-06-10 | 2021-09-10 | 福州大学 | OpenPose-based 3D private fitness system and working method thereof |
CN113269166A (en) * | 2021-07-19 | 2021-08-17 | 环球数科集团有限公司 | Fire detection algorithm for cross-media analysis and inference |
CN113269166B (en) * | 2021-07-19 | 2021-09-24 | 环球数科集团有限公司 | Fire detection algorithm for cross-media analysis and inference |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135375B (en) | Multi-person attitude estimation method based on global information integration | |
Gao et al. | Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation | |
CN108052896B (en) | Human body behavior identification method based on convolutional neural network and support vector machine | |
CN109522850B (en) | Action similarity evaluation method based on small sample learning | |
CN112001217A (en) | Multi-person human body posture estimation algorithm based on deep learning | |
CN104794737B (en) | A kind of depth information Auxiliary Particle Filter tracking | |
CN109559320A (en) | Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network | |
CN102075686B (en) | Robust real-time on-line camera tracking method | |
CN110176016B (en) | Virtual fitting method based on human body contour segmentation and skeleton recognition | |
CN110008913A (en) | The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism | |
CN108875586B (en) | Functional limb rehabilitation training detection method based on depth image and skeleton data multi-feature fusion | |
US20210216759A1 (en) | Recognition method, computer-readable recording medium recording recognition program, and learning method | |
CN111199207B (en) | Two-dimensional multi-human body posture estimation method based on depth residual error neural network | |
CN105869166A (en) | Human body action identification method and system based on binocular vision | |
CN108154066B (en) | Three-dimensional target identification method based on curvature characteristic recurrent neural network | |
CN111046734A (en) | Multi-modal fusion sight line estimation method based on expansion convolution | |
CN111881716A (en) | Pedestrian re-identification method based on multi-view-angle generation countermeasure network | |
CN105279522A (en) | Scene object real-time registering method based on SIFT | |
CN112257741B (en) | Method for detecting generative anti-false picture based on complex neural network | |
CN113111857A (en) | Human body posture estimation method based on multi-mode information fusion | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
CN115035546B (en) | Three-dimensional human body posture detection method and device and electronic equipment | |
Hachaj et al. | Real-time recognition of selected karate techniques using GDL approach | |
CN113076891B (en) | Human body posture prediction method and system based on improved high-resolution network | |
CN114463619A (en) | Infrared dim target detection method based on integrated fusion features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201127 |
|
RJ01 | Rejection of invention patent application after publication |