CN112001217A - Multi-person human body posture estimation algorithm based on deep learning - Google Patents

Multi-person human body posture estimation algorithm based on deep learning Download PDF

Info

Publication number
CN112001217A
CN112001217A CN202010560950.5A CN202010560950A CN112001217A CN 112001217 A CN112001217 A CN 112001217A CN 202010560950 A CN202010560950 A CN 202010560950A CN 112001217 A CN112001217 A CN 112001217A
Authority
CN
China
Prior art keywords
human body
limb
image
joint
postures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010560950.5A
Other languages
Chinese (zh)
Inventor
周旺发
邓三鹏
祁宇明
马瑞军
权利红
王帅
王文
邓茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Bo Wan Robot Co ltd
Hubei Bono Robot Co ltd
Tianjin Bonuo Intelligent Creative Robotics Technology Co ltd
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Original Assignee
Anhui Bo Wan Robot Co ltd
Hubei Bono Robot Co ltd
Tianjin Bonuo Intelligent Creative Robotics Technology Co ltd
Tianjin University of Technology and Education China Vocational Training Instructor Training Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Bo Wan Robot Co ltd, Hubei Bono Robot Co ltd, Tianjin Bonuo Intelligent Creative Robotics Technology Co ltd, Tianjin University of Technology and Education China Vocational Training Instructor Training Center filed Critical Anhui Bo Wan Robot Co ltd
Priority to CN202010560950.5A priority Critical patent/CN112001217A/en
Publication of CN112001217A publication Critical patent/CN112001217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a human body posture estimation algorithm based on deep learning, which comprises the following steps: the method comprises the steps of inputting an image or video file containing human body postures of multiple persons into a built model, extracting image characteristics of multiple person limbs and joint points from the input image and video by using a ResNet network with 50 layers, detecting by using a convolution posture machine, selecting heat map of the optimal joint point from the detected joint points by using a Gaussian function, matching the joint point pairs by using a component affinity field theory to obtain all limb types and joint point sets required by the human body postures, matching the limb types and the joint point sets by using a Hungary algorithm and a human body limb frame to generate postures, and finishing estimation of the postures of the multiple persons in the image through the whole process. The invention can be applied to the rescue robot platform to accurately and efficiently estimate the human body postures of a plurality of people to be rescued in complex environments such as land dust, wet lands, narrow spaces and the like.

Description

Multi-person human body posture estimation algorithm based on deep learning
Technical Field
The invention belongs to the technical field of multi-person human body image processing in a complex environment, and particularly relates to a multi-person human body posture estimation algorithm based on deep learning.
Background
Rescue under outdoor land environment is one of main contents of human rescue, and the existing traditional rescue method can not ensure timely and accurate arrival at the scene to carry out rescue when facing to the complex land environments such as sand, dust, wetland, narrow space and the like, thereby increasing a plurality of instability factors for rescue tasks, and simultaneously greatly threatening the safety of related personnel due to secondary disasters possibly occurring in the search and rescue process. In order to make up for the defect that the existing search and rescue system equipment cannot cover complex terrains on land, it is necessary to develop a portable and high-adaptability ground robot system capable of meeting the requirement of multi-search and rescue terrains. The rescue robot has the main task of quickly finding the posture information of the injured person to prepare for further taking rescue measures. The image information of the injured person has the characteristics of rich content, easy and quick acquisition speed, so that a machine Vision (CV) technology is very common in the land rescue robot. In the process of visual search, the processing and processing contents of a plurality of information such as image classification, target detection, target pose judgment and estimation of machine vision are involved. In the actual rescue process, the visual information of the injured person is easily influenced by the outdoor severe environment, and particularly, an effective image is difficult to obtain due to the interference of an image background and the posture of the injured person (the posture of a single person or multiple persons is blocked), so that the posture estimation solution is not unique, and the accurate and stable posture estimation on the injured person cannot be realized. In order to solve the problem of how to accurately find out injured people in a complex environment by machine vision, the multi-person human posture estimation model which can effectively resist the interference of an outdoor environment and improve the posture estimation efficiency of the injured people and has certain robustness is developed, and the multi-person human posture estimation model has important significance for the development of the current rescue robot.
Disclosure of Invention
In view of the above, the present invention is directed to a method for estimating and calculating human postures of multiple persons based on deep learning, so as to solve the above-mentioned problems in the background art.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a multi-person human body posture estimation algorithm based on deep learning transfers an image or a video file into a model, then uses a ResNet network with 50 layers to extract image characteristics of multi-person limbs and joints of the input image and video, uses a Convolution Posture Machine (CPM) to detect the joints, uses a Gaussian function to select a heat map of an optimal joint of the detected joints, uses a component affinity field theory (PAFs) to match the obtained joints with the limbs to obtain a set of all the limb types and joints required by human body posture, then uses a Hungary algorithm and a human body limb frame to match the set of the limb types and joints to obtain complete human body posture estimation, and finally completes the multi-person posture estimation in the image.
Further, the model input specifically includes an RGB color image within 1000 × 1000 pixel size or a video file containing a multi-person image, and the file format is MP4 format.
Further, the feature extraction network consists of a 50-layer ResNet network. One of the cores of the ResNet network is to introduce a deep Bottleneck structure (deep Bottleneck architecture), and the principle is to add several Identity mapping layers (i.e. y ═ x, where the output is equal to the input) behind a shallower network (shalow Net) to increase the depth of the network and improve the non-linear capability of the network, and meanwhile, Identity mapping cannot cause an error increase, i.e. a Deeper network should not bring an error increase on the training set. The advent of residual networks successfully addressed the frequent appearance of conventional neural networks: if the distance from the input layer is too far, the derivative value transmitted back by the residual error is too small, so that the adjustment value is zero and is close to distortion; each layer network needs to learn a new output function f (x), and when the network depth is greatly increased, the number of the output functions causes problems of high calculation pressure and the like. The original input of the network is directly input into a deeper layer of the network from a bypass to increase the residual error of the network, the phenomenon that the residual error disappears is remedied, and each time when each layer of the residual error network is trained, only one residual error is learned relative to the original data, instead of directly mapping f (x).
The conventional convolutional neural network extracts all information at one time, the risk of gradient disappearance is increased, the residual error network only learns the residual error, the calculation time is divided into two paths, and the first network directly transfers downwards: attempting to learn the residual f (x) directly from x; the second shortcut network: and inputting x. The image is input x and the result to be fitted (output) is h (x). According to the residual module structure, the output result is differentiated into x + y, that is, h (x) x + y, and y is further made to be f (x), that is, y is also fit by x, and then the obtained residual and x are added together to obtain the output result of the layer, that is, the mapping value h (x) f (x) + x is different from the input x to obtain the required residual, so that the residual structure actually only needs to fit f (x), and the calculation formula is shown in (1).
Figure BDA0002546047730000031
Further, in the feature detection process, confidence maps (confidence maps) are represented by S for 2D detection positions of specific key points in the image, for example, if there is only one person in the image and the joint points are visible, there should be a single peak value for each confidence map; if there are k people in the image where j, say j, necks are visible for the joint point, then there should be j peaks. Inputting the feature points obtained in the first step of the model into a posture convolution machine network for joint point detection to obtain a batch of potential joint point confidence graphs, and then inputting the potential joint points X into the posture convolution machine networkj,kAnd the real joint point p is calculated by using the formula (2) to obtain the optimal joint point.
Figure BDA0002546047730000032
Where σ represents the peak creep degree, and p is the image coordinate value at that point. Will be provided with
Figure BDA0002546047730000033
The resulting set obtains the final output predicted confidence map by equation (3.3).
Figure BDA0002546047730000034
Furthermore, joint points in the image can be obtained through a confidence map of joint point detection, and the network model connects key points by using Part Affinity Fields. Part Affinity Fields (PAFs) are the core content of the openpos model, which refers to the location and orientation information stored in the limb area. The PAFs are further classified into single-person PAFs and multi-person PAFs.
Further, each limb joint point points to the other limb in a single person's examination, each limb having a corresponding Affinity Field connectionThe body part to which it relates. Let Xj1,kAnd Xj2,kRespectively represent a joint point j1And j2Coordinate of, vector of
Figure BDA0002546047730000041
Representing a limb C of the Kth person consisting of these two joint points, only from point j when point P is on this limb as shown1Point j of2Time, vector
Figure BDA0002546047730000042
Is a unit vector; the other points are zero vectors, and the judgment conditions are shown in the formulas (4) and (5).
Figure BDA0002546047730000043
Figure BDA0002546047730000044
The point P on the limb C satisfies both equations (6) and (7).
Figure BDA0002546047730000045
Figure BDA0002546047730000046
Where L represents the length of the limb, V is a vector perpendicular to the unit vector, σlRefers to the width of the limb. The vector is required when a plurality of limbs C are overlapped in a figure
Figure BDA0002546047730000047
The average value is obtained as shown in equation (8).
Figure BDA0002546047730000048
Wherein n isc(P) represents the number of non-0 vectors at the P point. And detecting the associated point pairs formed by the joint points, and screening real associated point pairs and limbs suitable for reality by calculating the line integral of PAF on the line segment formed by the associated point pairs. The integral formula is shown in equations (9) and (10).
Figure BDA0002546047730000051
p(u)=(1-u)dj1+udj2 (10)
Where p (u) is the interpolated position between the two joint points.
Furthermore, in multi-person detection, after non-maximum suppression is carried out on the detected confidence maps, position discrete point candidate sets of the joint points are obtained, in images of multiple persons, the candidate points need to be matched with different persons, multiple solutions exist, and the multi-person posture solution is obtained through the combined action of the Hungary algorithm and the body limb frame.
Further, the hungarian algorithm means that a body limb part and a joint point are assumed to be G, and G ═ V, E is an undirected graph. The vertex set V of the graph can be divided into two mutually disjoint subsets X and Y (no edge inside the subset), and two endpoints of any one edge in the graph belong to different subsets, so the graph G is called a bipartite graph. In the matching process, it is necessary to ensure that the endpoints in the subsets X and Y are matched with each other as many as possible by one-to-one without repetition if | V occurs1|≤|V2I (i.e., the number of endpoints in subset 1 that need to be matched is less than subset 2), and | M | ═ V1If V is equal to V, this is called matching process as perfect matching1|=|V2If is called perfect match.
Further, in order to help the Hungarian algorithm to quickly match a part of limb pairs which are not easy to match in the graph, a human limb frame model is introduced, wherein points represent important joint points of a human body, lines represent limbs, and the points and the lines do not represent volumes, so that the model is modeled by a non-volume method, and for all joint points and limbs of the human body, only when the joint points and the limbs are connected, the connection exists between the adjacent joint points and the limbs. When the multiple persons do not seriously shield or the characteristics of the joints of the human bodies are not obvious, the model can be applied to a part of detected limbs, then a preferential candidate area of a missing limb and a joint point is provided for the network according to the space rotation range of the rest of the limbs in the model, and the preferential candidate area has high detection and matching weights so as to improve the identification precision of the network in the multiple person overlapped image.
Compared with the prior art, the multi-person human body posture estimation algorithm based on deep learning has the following advantages:
the invention aims at the problems that the rescue robot has inaccurate recognition when recognizing the posture of a person in a complex land environment and the accuracy of an Open position human posture estimation model is to be further improved, and the invention carries out two improvements:
(1) the multi-person human body posture estimation algorithm based on deep learning is used as a multi-person human body posture estimation core algorithm in the complex environment recognized by the robot, and the robot is effectively helped to recognize the human body posture in the complex environment on the land.
(2) The matching problem of the human body limbs and the joint points of a plurality of people is solved under the combined action of the Hungarian algorithm and the human body limb framework, and the matching precision of the human body limbs and the joint points is further improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating algorithm detection according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a single limb PAFs in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a candidate pair of nodes according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of two-part graph matching according to an embodiment of the present invention
FIG. 5 is a schematic diagram of a human body structure based on component synthesis according to an embodiment of the present invention;
FIG. 6 is a graph of the test results according to the embodiment of the present invention;
Detailed Description
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
According to the invention, on the basis of an Open pos algorithm, a 50-layer ResNet network is used as a human body feature extraction network, the robustness of the algorithm to the estimation of the human body posture of multiple persons in a strange environment is improved, and then a Hungary algorithm and a human body limb frame are used for matching the multiple person limbs and joint points to obtain the estimation result of the human body posture of the multiple persons.
1. First an image or video file is input into the model.
2. And then inputting the input image and video file into a feature extraction network to obtain feature points required by the human body gestures of multiple people.
3. During the detection process, confidence maps (confidence maps) are represented by S for 2D detection positions of specific key points in the image, for example, if only one person is in the image and the joint points are visible, a single peak value should exist in each confidence map; if there are k people in the image where j, say j, necks are visible for the joint, then there should be j peaks. Inputting the feature points obtained in the first step of the model into a posture convolution machine network for joint point detection to obtain a batch of potential joint point information graphs, and then inputting the potential joint points X into the posture convolution machine networkj,kAnd the real joint point p is calculated using equation (11) to obtain the optimal joint point.
Figure BDA0002546047730000071
Where σ represents the peak creep degree, and p is the image coordinate value at that point. Will be provided with
Figure BDA0002546047730000072
The resulting set obtains the final output predicted confidence map by equation (12).
Figure BDA0002546047730000073
Joint points in the image can be obtained through a confidence map of joint point detection, and the network model connects key points to obtain a limb by using Part Affinity Fields shown in figure 1.
Let Xj1,kAnd Xj2,kRespectively represent a joint point j1And j2Coordinate of, vector of
Figure BDA0002546047730000074
Representing a limb C of the Kth person consisting of these two joint points, only from point j when point P is on this limb as shown1Point j of2Time, vector
Figure BDA0002546047730000081
Is a unit vector; the other points are all zero vectors, and the judgment conditions are shown in equations (13) and (14).
Figure BDA0002546047730000082
Figure BDA0002546047730000083
The point P on the limb C satisfies both equations (15) and (16).
Figure BDA0002546047730000084
Figure BDA0002546047730000085
Wherein L representsLength of limb, V is the vector perpendicular to the unit vector, σlRefers to the width of the limb.
The vector is required when a plurality of limbs C are overlapped in a figure
Figure BDA0002546047730000086
The average value is obtained as shown in equation (17).
Figure BDA0002546047730000087
Wherein n isc(P) represents the number of non-0 vectors at the P point. And detecting the associated point pairs formed by the joint points, and screening real associated point pairs and limbs suitable for reality by calculating the line integral of PAF on the line segment formed by the associated point pairs. The integral formula is shown in formulas (18) and (19).
Figure BDA0002546047730000088
p(u)=(1-u)dj1+udj2 (19)
Where p (u) is the interpolated position between the two joint points.
After the detected confidence map is subjected to non-maximum suppression, a position discrete point candidate set of the joint points is obtained, and in an image of multiple persons, there are multiple solutions, for example, as shown in fig. 2, where the candidate points need to be matched to different persons. The boxes represented by the same color in the graph represent the same joint point, the possible results that the three joint points can be connected into limbs are shown as b, and the network model uses the Hungarian algorithm, the global context connection implicitly coded by the paired association results contained in the PAFs and the human body limb framework to obtain the connection of high-quality multi-person key point pairs.
4. The idea of the algorithm is as follows: let G ═ (V, E) be an undirected graph. The vertex set V of the graph can be divided into two mutually disjoint subsets X and Y (no edge inside the subset), and two endpoints of any one edge in the graph belong to different subsets, so the graph is calledG is a bipartite graph. In the matching process, it is necessary to ensure that as many endpoints in the subsets X and Y as possible are matched with each other one-to-one without repetition if | V occurs1|≤|V2I (i.e., the number of endpoints in subset 1 that need to be matched is less than subset 2), and | M | ═ V1If V is equal to V, this is called matching process as perfect matching1|=|V2If is called perfect match.
The augmented path may be defined as: setting M as the successfully matched set in the bipartite graph G, as shown in FIG. 3, if P is a path in the graph G that can connect two paths without matching points (the initial point of P can be both X and Y), and the edge belonging to M and the edge not belonging to M appear alternately on P, then P is an augmented path of M. The calculation process of the Hungarian algorithm is that M is set to be null, an augmentation path P on M is found out, and then more matching M' is obtained to replace M through negation operation. The operation of continuing the previous step is repeated until no more augmented paths are found, so the core of the hungarian algorithm is to find as many augmented paths as possible.
The algorithmic pseudo code of the augmented path is as follows:
Figure BDA0002546047730000091
Figure BDA0002546047730000101
in the algorithm, the network model takes two types of limb sets which can be correctly connected as subsets X and Y, and obtains correct limb combinations through the Hungarian algorithm to form a complete human body posture structure.
In order to help Hungarian algorithm to quickly match a part of limb pairs which are not easy to match in a graph, a human limb frame model is introduced, wherein points represent important joint points of a human body, lines represent the limbs, and the points and the lines do not represent volumes, so that the model is modeled by a non-volume method, and for all joint points and limbs of the human body, only when the joint points and the limbs are connected, the connection exists between the adjacent joint points and the limbs. When the multiple persons do not seriously shield or the human joint features are not obvious, the model can be sleeved on a part of detected limbs, then a preferential candidate area of the missing limbs and joint points is provided for the network according to the space rotation range of the rest of the limbs in the model, and the area has high detection and matching weight so as to improve the identification precision of the network in the multiple person overlapped image. The model is divided into two layers, wherein the first layer is a human body posture integral layer, the second layer comprises a head, a trunk, a left arm, a left leg, a right arm and a right leg, the third layer comprises a left big arm, a left small arm, a right big arm, a right small arm, a left thigh, a left shank, a right thigh and a right shank, the fourth layer comprises a joint connected between the two parts in the third layer, if the fourth layer below the right lower arm of the third layer comprises a wrist joint and an elbow joint, the whole structure schematic diagram is shown in figure 4, and the layers are directed to the lower layer from the high layer through arrows.
When in detection, firstly, the matched and determined limb is taken as a stable point, then the limb can be simplified into a rigid body according to the principle that the limb of a human body is connected with each other through a hinge at two ends, and the length of the limb of the human body has a certain proportional relation, so that the constraint of the limb of the human body is divided into two parts: the first part is the length constraint on the same limb, and the calculation is shown as formula (20); the second part is the length constraint of the symmetrically positioned limb, and the calculation formula is shown as (21).
Figure BDA0002546047730000111
Figure BDA0002546047730000112
Figure BDA0002546047730000113
Wherein R isiRepresenting a group of limbs having a certain similarity, SiThe (i) th limb is represented,
Figure BDA0002546047730000114
mean values representing the ratio between the length of all limbs in a group and their mean values.
And after the length estimation value of the limb is obtained, taking the joint point in the limb as the center, taking the estimated length of the limb as the joint point and the limb related to the limb in the radius detection range, and then calculating all limb matching by using the Hungarian algorithm again.
Experiments and analyses
In order to test the generalization ability of the model in various environments, a plurality of photos of multiple persons are randomly selected from campus, battlefield, earthquake, fire and dust environments and tested in the trained model, and the test results are shown in fig. 5 to 6.
Analysis of results
In order to quantitatively describe the detection accuracy of the model in a complex environment, 100 images are randomly extracted from three environments with low visibility, such as war, earthquake, smoke and the like, and then the images are detected, and are compared with the correct results to obtain the detection accuracy of each human body and the detection accuracy of each limb, wherein the results are shown in tables 2 and 3.
TABLE 2 human body detection accuracy in complex environments
Figure BDA0002546047730000121
TABLE 3 accuracy of estimation of human body limbs in complex environment
Figure BDA0002546047730000122
As can be seen from Table 2, the mean value of the human body detection accuracy in the 3 environments is 0.83, but only the performance of the number of human body postures detected by the model is evaluated. Table 3 shows the detection accuracy of each limb of the human body in 3 environments, and it can be seen that the detection accuracy of the limb of the human body in the war environment is the lowest, because the posture change of the human body in the war is the largest and the environment is quite harsh, the detection accuracy of each limb is greatly reduced compared with the original accuracy of the model. In an earthquake environment, the position and posture of personnel can be shielded by the surrounding environment, so that the accuracy of the trunk part and the tail ends of the limbs of the human body is greatly reduced, and the accuracy of other parts is reduced in a small range. Compared with other two environments, the low visibility environment such as smoke has the advantages that although visibility is reduced and the detection accuracy of the limb end is affected, the pose change is small, and the detection accuracy of each limb is relatively high.
In a word, when the figure in the picture is in a clear background environment, the person shielding is not serious, and the image is a close scene, the model detection effect is good, and the detection accuracy is high. When the background of the image is complex, but the characters in the image are sparse and the occlusion is not serious, the detection rate and the accuracy of the model are high. When people in the figures are stacked densely and the background and the human body are fused, the detection effect is reduced, because when people are stacked heavily and the background and the human body are fused, the features extracted by the feature extraction network are poor, even the features which are not considered to be effective are discarded, so that the joint point features of the people disappear, and finally the body features which are stacked together and are completely fused with the background cannot be detected by the model.
The three-aspect research is carried out on the problem that the rescue robot estimates the postures of multiple persons inaccurately in the complex land environment and the problem that the precision of the existing human posture estimation model needs to be improved: the method comprises the following steps of (1) providing a multi-person human body posture estimation model based on deep learning; (2) using a 50-layer ResNet network as a feature extraction network; (3) the Hungarian algorithm and the human body limb frame are used together to obtain the postures of the human bodies of the multiple persons.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (3)

1. A multi-person human posture estimation model based on deep learning is characterized in that: inputting the image into a network, obtaining the estimation characteristics of the postures of the human bodies of the multiple persons through the processing of a characteristic extraction network, and then inputting the characteristics into a matching network of the limbs and the joint points of the human bodies of the multiple persons to realize the estimation of the postures of the human bodies of the multiple persons.
2. The model of claim 1, wherein the model comprises: the model input specifically comprises an RGB color image within 1000 × 1000 pixels in size or a video file containing a multi-person image, and the file format is MP4 format.
3. The model of claim 1, wherein the model comprises: the human body limbs and the joint points are obtained by matching and using the Hungarian algorithm and the human body limb frame.
CN202010560950.5A 2020-06-18 2020-06-18 Multi-person human body posture estimation algorithm based on deep learning Pending CN112001217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010560950.5A CN112001217A (en) 2020-06-18 2020-06-18 Multi-person human body posture estimation algorithm based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010560950.5A CN112001217A (en) 2020-06-18 2020-06-18 Multi-person human body posture estimation algorithm based on deep learning

Publications (1)

Publication Number Publication Date
CN112001217A true CN112001217A (en) 2020-11-27

Family

ID=73466633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010560950.5A Pending CN112001217A (en) 2020-06-18 2020-06-18 Multi-person human body posture estimation algorithm based on deep learning

Country Status (1)

Country Link
CN (1) CN112001217A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221924A (en) * 2021-06-02 2021-08-06 福州大学 Portrait shooting system and method based on OpenPose
CN113269166A (en) * 2021-07-19 2021-08-17 环球数科集团有限公司 Fire detection algorithm for cross-media analysis and inference
CN113368487A (en) * 2021-06-10 2021-09-10 福州大学 OpenPose-based 3D private fitness system and working method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886069A (en) * 2017-11-10 2018-04-06 东北大学 A kind of multiple target human body 2D gesture real-time detection systems and detection method
CN110084138A (en) * 2019-04-04 2019-08-02 高新兴科技集团股份有限公司 A kind of more people's Attitude estimation methods of 2D
CN111199207A (en) * 2019-12-31 2020-05-26 华南农业大学 Two-dimensional multi-human body posture estimation method based on depth residual error neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886069A (en) * 2017-11-10 2018-04-06 东北大学 A kind of multiple target human body 2D gesture real-time detection systems and detection method
CN110084138A (en) * 2019-04-04 2019-08-02 高新兴科技集团股份有限公司 A kind of more people's Attitude estimation methods of 2D
CN111199207A (en) * 2019-12-31 2020-05-26 华南农业大学 Two-dimensional multi-human body posture estimation method based on depth residual error neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHE CAO 等: "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", 《ARXIV》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221924A (en) * 2021-06-02 2021-08-06 福州大学 Portrait shooting system and method based on OpenPose
CN113368487A (en) * 2021-06-10 2021-09-10 福州大学 OpenPose-based 3D private fitness system and working method thereof
CN113269166A (en) * 2021-07-19 2021-08-17 环球数科集团有限公司 Fire detection algorithm for cross-media analysis and inference
CN113269166B (en) * 2021-07-19 2021-09-24 环球数科集团有限公司 Fire detection algorithm for cross-media analysis and inference

Similar Documents

Publication Publication Date Title
CN110135375B (en) Multi-person attitude estimation method based on global information integration
Gao et al. Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
CN109522850B (en) Action similarity evaluation method based on small sample learning
CN112001217A (en) Multi-person human body posture estimation algorithm based on deep learning
CN104794737B (en) A kind of depth information Auxiliary Particle Filter tracking
CN109559320A (en) Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network
CN102075686B (en) Robust real-time on-line camera tracking method
CN110176016B (en) Virtual fitting method based on human body contour segmentation and skeleton recognition
CN110008913A (en) The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism
CN108875586B (en) Functional limb rehabilitation training detection method based on depth image and skeleton data multi-feature fusion
US20210216759A1 (en) Recognition method, computer-readable recording medium recording recognition program, and learning method
CN111199207B (en) Two-dimensional multi-human body posture estimation method based on depth residual error neural network
CN105869166A (en) Human body action identification method and system based on binocular vision
CN108154066B (en) Three-dimensional target identification method based on curvature characteristic recurrent neural network
CN111046734A (en) Multi-modal fusion sight line estimation method based on expansion convolution
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN105279522A (en) Scene object real-time registering method based on SIFT
CN112257741B (en) Method for detecting generative anti-false picture based on complex neural network
CN113111857A (en) Human body posture estimation method based on multi-mode information fusion
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN115035546B (en) Three-dimensional human body posture detection method and device and electronic equipment
Hachaj et al. Real-time recognition of selected karate techniques using GDL approach
CN113076891B (en) Human body posture prediction method and system based on improved high-resolution network
CN114463619A (en) Infrared dim target detection method based on integrated fusion features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201127

RJ01 Rejection of invention patent application after publication