CN112733767B - Human body key point detection method and device, storage medium and terminal equipment - Google Patents

Human body key point detection method and device, storage medium and terminal equipment Download PDF

Info

Publication number
CN112733767B
CN112733767B CN202110059465.4A CN202110059465A CN112733767B CN 112733767 B CN112733767 B CN 112733767B CN 202110059465 A CN202110059465 A CN 202110059465A CN 112733767 B CN112733767 B CN 112733767B
Authority
CN
China
Prior art keywords
human body
key point
heat map
detection network
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110059465.4A
Other languages
Chinese (zh)
Other versions
CN112733767A (en
Inventor
谢雪梅
陈奕蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110059465.4A priority Critical patent/CN112733767B/en
Publication of CN112733767A publication Critical patent/CN112733767A/en
Application granted granted Critical
Publication of CN112733767B publication Critical patent/CN112733767B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body key point detection method, which comprises the following steps: acquiring a human body image to be detected; acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points; and detecting the positions of the key points of the human body according to the human body key point prediction heat map. Correspondingly, the invention also discloses a human body key point detection device, a computer readable storage medium and terminal equipment. By adopting the technical scheme of the invention, the human body key point detection can be carried out by combining the constraint between the human body structure information, thereby reducing the quantization error and improving the accuracy of the detection result.

Description

Human body key point detection method and device, storage medium and terminal equipment
Technical Field
The invention relates to the technical field of image processing, in particular to a human body key point detection method and device, a computer readable storage medium and terminal equipment.
Background
The human body key point detection aims at detecting specific positions of joints, five sense organs and the like of a human body on an image so as to describe the posture of the human body, and in a natural image, the complexity of a shooting scene and the diversity of human body postures bring great challenges to a human body posture estimation task. As one of the most challenging problems in the field of computer vision technology, the occurrence of diversified human body posture estimation data sets and the development of deep learning technology gradually improve the accuracy of human body key point detection, and the commonly used representative human body key point detection networks include Hourglass, CPN, MSPN, HRNet and the like, and these human body key point detection networks based on confidence heat map regression mostly adopt a multi-level feature fusion module to capture rich multi-scale information, but sacrifice the complexity of the network, ignore the spatial constraints of human body structure information, and cause a certain quantization error when acquiring specific positions of key points, so that the human body key point detection technology cannot meet the requirement of high accuracy in practical application.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method and an apparatus for detecting key points of a human body, a computer-readable storage medium, and a terminal device, which can perform key point detection of a human body by combining constraints between pieces of human body structure information, thereby reducing quantization errors and improving accuracy of detection results.
In order to solve the above technical problem, an embodiment of the present invention provides a method for detecting a human body key point, including:
acquiring a human body image to be detected;
acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points; the Gaussian response heat map comprises a key point response heat map and a trunk response heat map, the key point response heat map is constructed by taking key points as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers;
and detecting the positions of the key points of the human body according to the human body key point prediction heat map.
Further, the method pre-constructs the gaussian response heatmap by:
acquiring a human body data file, and performing data enhancement processing on the human body data file to correspondingly acquire a human body data set; the human body data file at least comprises a human body image file, a marking file of a human body detection frame and a marking file of a position coordinate of a human body key point;
constructing the Gaussian response heat map according to the connection relation between the human body data set and the human body key points; wherein, the connection relation between the key points of the human body at least comprises the connection relation between the head, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the chest, the pelvis, the left hip, the right hip, the left knee, the left ankle, the right knee and the right ankle of the human body.
Further, the key point detection network comprises a first-stage detection network, a second-stage detection network, a third-stage detection network and a fourth-stage detection network;
then, the acquiring a human body key point prediction heat map according to the human body image to be detected based on the preset key point detection network specifically includes:
extracting the characteristics of the human body image to be detected according to the first-level detection network, and correspondingly obtaining first-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the second-level detection network, and correspondingly obtaining second-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the third-level detection network, and correspondingly obtaining third-level characteristic information of the human body image to be detected;
performing feature fusion on the first-level feature information, the second-level feature information and the third-level feature information according to the fourth-level detection network, and correspondingly obtaining the human body key point prediction heat map;
wherein the first level detection network and the second level detection network have the torso response heatmap as supervisory information and the third level detection network and the fourth level detection network have the keypoint response heatmap as supervisory information.
Furthermore, the first-level detection network, the second-level detection network and the third-level detection network are all formed by a Hourglass network; the fourth stage detection network comprises a 3 x 3 convolutional neural network.
Further, the detecting the positions of the key points of the human body according to the human body key point prediction heat map specifically includes:
acquiring initial coordinates of target key points from the human body key point prediction heat map according to maximum likelihood estimation, and acquiring coordinates of N adjacent points of the target key points; wherein N is more than or equal to 2;
and determining the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points by adopting an interpolation method.
Further, N ═ 2; then, the determining, by using an interpolation method, the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points specifically includes:
acquiring initial coordinates (x) of the target key points in the human body key point prediction heat map0,y0) Corresponding pixel value h0
Acquiring the coordinate (x) of the first adjacent point in the human body key point prediction heat map1,y1) Corresponding pixel value h1And the coordinates (x) of the second neighboring point2,y2) Corresponding pixel value h2
Using Newton interpolation, according to the formula
Figure GDA0003258418390000031
Determining the position coordinates (x, y) of the target keypoints.
In order to solve the above technical problem, an embodiment of the present invention further provides a human body key point detection device, including:
the human body image acquisition module is used for acquiring a human body image to be detected;
the human body key point prediction heat map acquisition module is used for acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points; the Gaussian response heat map comprises a key point response heat map and a trunk response heat map, the key point response heat map is constructed by taking key points as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers;
and the human body key point position detection module is used for detecting the positions of the human body key points according to the human body key point prediction heat map.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein the computer program, when running, controls the device on which the computer readable storage medium is located to execute any one of the above human key point detection methods.
An embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements any one of the above human body key point detection methods when executing the computer program.
Compared with the prior art, the embodiment of the invention provides a human body key point detection method, a human body key point detection device, a computer readable storage medium and a terminal device, wherein a Gaussian response heat map constructed by combining the connection relation between human body key points is used as the supervision information of a key point detection network, a human body image to be detected is used as the input of the key point detection network, a human body key point prediction heat map is correspondingly obtained, the positions of the human body key points are detected according to the obtained human body key point prediction heat map, the human body key points can be detected by combining the constraint between human body structure information, namely the spatial constraint of the human body structure information is considered, so that the quantization error is reduced, and the accuracy of the detection result is improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a method for detecting key points in a human body according to the present invention;
FIG. 2 is a schematic structural diagram of a preferred embodiment of a key point detection network according to the present invention;
FIG. 3 is a block diagram of a preferred embodiment of a human body keypoint detection apparatus provided by the present invention;
fig. 4 is a block diagram of a preferred embodiment of a terminal device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
An embodiment of the present invention provides a method for detecting a human key point, which is a flowchart of a preferred embodiment of the method for detecting a human key point provided by the present invention, as shown in fig. 1, and the method includes steps S11 to S13:
step S11, acquiring a human body image to be detected;
step S12, acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points;
and step S13, detecting the positions of the key points of the human body according to the human body key point prediction heat map.
Specifically, according to a large number of human body data sets and connection relations between preset human body key points, a gaussian response heat map of a human body part is constructed in advance, a trained key point detection network is set in advance, and when human body key point detection is actually carried out, firstly, an image needing human body key point detection, namely a human body image to be detected, is obtained, and the human body image to be detected can be understood to include the image of the human body part; and then, the Gaussian response heat map is used as the monitoring information of the key point detection network, the obtained human body image to be detected is used as the input of the key point detection network, the human body image to be detected is correspondingly processed through the key point detection network, and a human body key point prediction heat map is output, namely the human body key point prediction heat map is correspondingly obtained according to the key point detection network, so that the position of the human body key point is detected according to the obtained human body key point prediction heat map.
It should be noted that the human body key points at least include a head, a neck, a right shoulder, a right elbow, a right wrist, a left shoulder, a left elbow, a left wrist, a chest, a pelvis, a left hip, a right hip, a left knee, a left ankle, a right knee, and a right ankle of the human body, and when the human body key point prediction heat map is determined, any one of the human body key points may be determined to have its position coordinates based on the human body key point prediction heat map.
According to the method for detecting the key points of the human body, the Gaussian response heat map constructed by combining the connection relation among the key points of the human body is used as the supervision information of the key point detection network, the image of the human body to be detected is used as the input of the key point detection network, the prediction heat map of the key points of the human body is correspondingly obtained, the positions of the key points of the human body are detected according to the obtained prediction heat map of the key points of the human body, the key points of the human body can be detected by combining the constraint among the structural information of the human body (equivalent to the connection relation among the key points of the human body), namely the spatial constraint of the structural information of the human body is considered, so that the quantization error is reduced, and the accuracy of the detection result is improved.
In another preferred embodiment, the method pre-constructs the gaussian response heatmap by:
acquiring a human body data file, and performing data enhancement processing on the human body data file to correspondingly acquire a human body data set; the human body data file at least comprises a human body image file, a labeling file of a human body detection frame and a labeling file of a position coordinate of a human body key point;
constructing the Gaussian response heat map according to the connection relation between the human body data set and the human body key points; wherein, the connection relation between the key points of the human body at least comprises the connection relation between the head, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the chest, the pelvis, the left hip, the right hip, the left knee, the left ankle, the right knee and the right ankle of the human body.
Specifically, with reference to the above embodiments, the embodiments of the present invention predefine the connection relationship among the key points of the human body at least including the connection relationship among the head, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the chest, the pelvis, the left hip, the right hip, the left knee, the left ankle, the right knee, and the right ankle of the human body, when a Gaussian response heat map of a human body part is constructed in advance, a large number of human body data files are acquired, the human body data file at least comprises a human body image file, a marking file of a human body detection frame and a marking file of the position coordinates of the human body key points, and carrying out data enhancement processing on the obtained human body data file, expanding a data set, correspondingly obtaining a human body data set, and constructing a Gaussian response heat map of the human body part according to the obtained human body data set and the connection relation between predefined human body key points.
When the human body data file is obtained, images at least containing 10000 human body postures can be selected as human body image files, the human body detection frames in each image are labeled to obtain corresponding labeled files of the human body detection frames, and the positions of the human body key points in each image are labeled to obtain corresponding labeled files of the positions of the human body key points; when data enhancement processing is carried out on the obtained human body data file, data amplification can be carried out by adopting modes of horizontal turning, random rotation, scale scaling, multi-data set mixed training and the like; the specific data acquisition mode and the data enhancement mode are not specifically limited in the embodiments of the present invention; in addition, the specific gaussian response heatmap is a common construction method in the prior art, and the embodiment of the present invention is not limited in particular.
It can be understood that the key points of the human body at least include the head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, chest, pelvis, left hip, right hip, left knee, left ankle, right knee and right ankle of the human body, and accordingly, the connection relationship between the key points of the human body at least includes the following 15: the head-neck, neck-right shoulder, right shoulder-right elbow, right elbow-right wrist, neck-left shoulder, left shoulder-left elbow, left elbow-left wrist, neck-chest, chest-pelvis, pelvis-left hip, pelvis-right hip, left hip-left knee, left knee-left ankle, right hip-right knee, and right knee-right ankle, and the connection between key points of a human body constitutes the spatial constraint of the structural information of the human body.
As an improvement to the above, the gaussian response heatmap comprises a keypoint response heatmap and a torso response heatmap; the key point response heat map is constructed by taking key point positions as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers.
Specifically, with reference to the above embodiments, when a gaussian response heatmap of a human body part is actually constructed, different gaussian response heatmaps may be respectively constructed according to different information, for example, a corresponding key point response heatmap is constructed with key points of the human body as a center, and a corresponding trunk response heatmap is constructed with trunk connections between key points of the human body as a center, where a size of a gaussian kernel depends on a size of a set standard deviation.
It should be noted that, the gaussian response heat map (including the key point response heat map and the trunk response heat map) is used as the supervision information of the key point detection network, so that on one hand, the problem that the gradient of the detection network disappears as the depth deepens can be solved, and on the other hand, the detection network can firstly learn the feature information covering the human body key point connection area.
In yet another preferred embodiment, the keypoint detection network comprises a first level detection network, a second level detection network, a third level detection network, and a fourth level detection network;
then, the acquiring a human body key point prediction heat map according to the human body image to be detected based on the preset key point detection network specifically includes:
extracting the characteristics of the human body image to be detected according to the first-level detection network, and correspondingly obtaining first-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the second-level detection network, and correspondingly obtaining second-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the third-level detection network, and correspondingly obtaining third-level characteristic information of the human body image to be detected;
performing feature fusion on the first-level feature information, the second-level feature information and the third-level feature information according to the fourth-level detection network, and correspondingly obtaining the human body key point prediction heat map;
wherein the first level detection network and the second level detection network have the torso response heatmap as supervisory information and the third level detection network and the fourth level detection network have the keypoint response heatmap as supervisory information.
Specifically, with reference to the foregoing embodiments, the key point detection network in the embodiments of the present invention adopts a four-stage prediction network structure, which includes a first-stage detection network, a second-stage detection network, a third-stage detection network, and a fourth-stage detection network, respectively; when a human body key point prediction heat map is actually acquired, the trunk response heat map is used as supervision information of a first-level detection network, a human body image to be detected is used as input of the first-level detection network, feature extraction processing is carried out on the human body image to be detected through the first-level detection network, and first-level feature information of the human body image to be detected is correspondingly acquired; taking the trunk response heat map as supervision information of a second-level detection network, taking a human body image to be detected as input of the second-level detection network, and carrying out feature extraction processing on the human body image to be detected through the second-level detection network to correspondingly obtain second-level feature information of the human body image to be detected; taking the key point response heat map as monitoring information of a third-level detection network, taking a human body image to be detected as input of the third-level detection network, and performing feature extraction processing on the human body image to be detected through the third-level detection network to correspondingly obtain third-level feature information of the human body image to be detected; and performing feature fusion processing on the first-level feature information, the second-level feature information and the third-level feature information of the human body image to be detected through the fourth-level detection network to correspondingly obtain a human body key point prediction heat map.
It should be noted that the key point detection network may adopt a four-stage prediction network structure, or may also adopt a five-stage or other-stage network structure, and a specific stage may be selected according to an actual need, which is not specifically limited in the embodiment of the present invention.
As an improvement of the above scheme, the first-stage detection network, the second-stage detection network and the third-stage detection network are all formed by a Hourglass network; the fourth stage detection network comprises a 3 x 3 convolutional neural network.
It should be noted that, the first three levels of the detection networks of the key point detection network are all formed by the Hourglass network, wherein the Hourglass network modules with multi-level refinement can fully capture feature information with different resolutions, and correspondingly obtain high semantic features, however, the more the number of the Hourglass network modules is, the greater the network complexity is, so that the embodiment of the invention adopts 3 Hourglass network modules as the basic network structure for human body key point detection; the last stage of the detection network of the key points comprises a 3 x 3 convolutional neural network, wherein the semanteme of the features extracted by the Hourglass network modules with different depths is different, the features of a shallower layer contain more position detail information, and the semanteme of the features of a deeper layer is stronger, so that in order to obtain the more discriminative feature information, after the features of the human body image to be detected are extracted through a basic network structure (namely, the former three stages of detection networks), the features output by the Hourglass network modules with different depths are required to be cascaded, and a 3 x 3 convolutional neural network is used for feature learning, so that feature fusion is realized, and the prediction of the key points of the human body is obtained based on the feature prediction after the fusion.
According to the human body key point detection method provided by the embodiment of the invention, on the basis of the existing deep learning method, multi-level features are better utilized, the advantages of multi-level feature fusion in deep feature extraction are fully utilized, the balance between the complexity of a network and the key point detection accuracy is better considered, the higher key point detection accuracy is obtained with less network complexity, and the method has more practical application value.
In another preferred embodiment, the detecting the positions of the key points of the human body according to the human body key point prediction heat map specifically includes:
acquiring initial coordinates of target key points from the human body key point prediction heat map according to maximum likelihood estimation, and acquiring coordinates of N adjacent points of the target key points; wherein N is more than or equal to 2;
and determining the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points by adopting an interpolation method.
Specifically, with reference to the above embodiment, when the position of any one target key point is actually detected according to the obtained human key point prediction heat map, the initial coordinates of the target key point may be obtained from the obtained human key point prediction heat map through maximum likelihood estimation, the coordinates of N (N is greater than or equal to 2) adjacent points of the target key point may be correspondingly obtained, and then the accurate position coordinates of the target key point may be determined according to the obtained initial coordinates of the target key point and the coordinates of the N adjacent points of the target key point by using an interpolation method.
It should be noted that, the neighboring points of the target key point may be selected according to actual needs, for example, N is 2, that is, the target key point has two neighboring points, and the coordinates of the target key point are assumed to be (x)0,y0) Then the coordinate can be chosen to be (x)0-1,y0) Is taken as one of the neighboring points, the coordinate is selected as (x)0+1,y0) The point of (a) is used as another adjacent point, the obtained human body key point prediction heat map is subjected to Gaussian smoothing processing, and a target key point (x)0,y0) For example, on one dimension abscissa, the likelihood that each pixel point on the heat map satisfies along the x-axisThe function is then:
Figure GDA0003258418390000101
σ is the standard deviation of the Gaussian distribution, xμIs the maximum point coordinate of Gaussian distribution, theta is the parameter to be estimated of the likelihood function, and the logarithm of the likelihood function is taken, namely
Figure GDA0003258418390000102
And by solving differential equations
Figure GDA0003258418390000103
The coordinate of the extreme point obtained correspondingly is the abscissa x of the target key point0Similarly, the ordinate y of the target key point can be obtained0
It can be understood that the coordinates at which the target keypoints are obtained are (x)0,y0) Then, the coordinates (x) of the target key points can be obtained0,y0) Correspondingly obtaining the coordinates of two adjacent points of the target key point, wherein the coordinates are respectively (x)0-1,y0) And (x)0+1,y0)。
As a modification of the above, N ═ 2; then, the determining, by using an interpolation method, the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points specifically includes:
acquiring initial coordinates (x) of the target key points in the human body key point prediction heat map0,y0) Corresponding pixel value h0
Acquiring the coordinate (x) of the first adjacent point in the human body key point prediction heat map1,y1) Corresponding pixel value h1And the coordinates (x) of the second neighboring point2,y2) Corresponding pixel value h2
Using Newton interpolation, according to the formula
Figure GDA0003258418390000111
Determining the position coordinates (x, y) of the target keypoints.
Specifically, in combination with the above embodiments, the initial coordinates (x) of the target keypoints are determined by maximum likelihood estimation0,y0) And the coordinates (x) of two adjacent points of the target keypoint0-1,y0) And (x)0+1,y0) Then, the initial coordinates (x) of the target key points can be obtained in the human body key point prediction heat map correspondingly0,y0) Corresponding pixel value h0Coordinates (x) of adjacent points0-1,y0) (i.e. the coordinates (x) of the first neighboring point1,y1) ) corresponding pixel value h1And adjacent point coordinates (x)0+1,y0) (i.e., the coordinates (x) of the second neighboring point2,y2) ) corresponding pixel value h2Further, Newton's interpolation can be used according to the formula
Figure GDA0003258418390000112
And determining the position coordinates (x, y) of the target key points, namely obtaining more accurate position coordinates of the target key points.
According to the human body key point detection method provided by the embodiment of the invention, when the position of any target key point is detected according to the human body key point prediction heat map, the initial coordinates of the key points are optimized by adopting a post-processing mode (namely, maximum likelihood estimation and an interpolation method), so that the accurate key point position is obtained, and the accuracy of the detection result is further improved.
Referring to fig. 2, which is a schematic structural diagram of a preferred embodiment of a key point detection network provided by the present invention, the following specifically describes the working principle of the embodiment of the present invention with reference to all the embodiments described above and fig. 2:
the front three detection networks of the key point detection network are respectively Hourglass networks with different depths, a human body image to be detected is subjected to feature extraction processing by a first-level Hourglass network to correspondingly obtain lower-level features of the human body image to be detected (namely the first-level feature information), the human body image to be detected is subjected to feature extraction processing by a second-level Hourglass network to correspondingly obtain middle-level features of the human body image to be detected (namely the second-level feature information), the human body image to be detected is subjected to feature extraction processing by a third-level Hourglass network to correspondingly obtain deep-level features of the human body image to be detected (namely the third-level feature information), the fourth-level detection network is subjected to feature cascade connection with feature information output by the three-level Hourglass networks with different depths, and a 3-x 3 convolutional neural network is used for feature learning, so that feature fusion is realized and a human body key point prediction heat map is correspondingly obtained, when the positions of the key points are detected, based on the obtained human body key point prediction heat map, the position coordinates of the key points are optimized by adopting a post-processing mode (namely, maximum likelihood estimation and an interpolation method), so that a final detection result is obtained, namely, more accurate position coordinates of the key points are obtained.
An embodiment of the present invention further provides a human body key point detection device, which is shown in fig. 3 and is a block diagram of a preferred embodiment of the human body key point detection device provided by the present invention, and the device includes:
the human body image acquisition module 11 is used for acquiring a human body image to be detected;
a human body key point prediction heat map acquisition module 12, configured to acquire a human body key point prediction heat map according to the to-be-detected human body image based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points;
and the human body key point position detection module 13 is used for detecting the positions of the human body key points according to the human body key point prediction heat map.
Preferably, the apparatus further comprises a gaussian response heatmap construction module; the gaussian response heatmap construction module specifically comprises:
the human body data processing unit is used for acquiring a human body data file, performing data enhancement processing on the human body data file and correspondingly acquiring the human body data set; the human body data file at least comprises a human body image file, a marking file of a human body detection frame and a marking file of a position coordinate of a human body key point;
the Gaussian response heat map construction unit is used for constructing the Gaussian response heat map according to the connection relation between the human body data set and the human body key points; wherein, the connection relation between the key points of the human body at least comprises the connection relation between the head, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the chest, the pelvis, the left hip, the right hip, the left knee, the left ankle, the right knee and the right ankle of the human body.
Preferably, the gaussian response heatmap comprises a keypoint response heatmap and a torso response heatmap; the key point response heat map is constructed by taking key point positions as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers.
Preferably, the key point detection network comprises a first-stage detection network, a second-stage detection network, a third-stage detection network and a fourth-stage detection network;
then, the human body key point prediction heat map obtaining module 12 specifically includes:
the first feature extraction unit is used for extracting features of the human body image to be detected according to the first-level detection network and correspondingly obtaining first-level feature information of the human body image to be detected;
the second feature extraction unit is used for extracting features of the human body image to be detected according to the second-level detection network and correspondingly obtaining second-level feature information of the human body image to be detected;
the third feature extraction unit is used for extracting features of the human body image to be detected according to the third-level detection network, and correspondingly obtaining third-level feature information of the human body image to be detected;
a human body key point prediction heat map acquisition unit, configured to perform feature fusion on the first-level feature information, the second-level feature information, and the third-level feature information according to the fourth-level detection network, so as to correspondingly acquire the human body key point prediction heat map;
wherein the first level detection network and the second level detection network have the torso response heatmap as supervisory information and the third level detection network and the fourth level detection network have the keypoint response heatmap as supervisory information.
Preferably, the first-stage detection network, the second-stage detection network and the third-stage detection network are all formed by a Hourglass network; the fourth stage detection network comprises a 3 x 3 convolutional neural network.
Preferably, the human body key point position detecting module 13 specifically includes:
the initial coordinate acquisition unit is used for acquiring initial coordinates of target key points from the human body key point prediction heat map according to maximum likelihood estimation and acquiring coordinates of N adjacent points of the target key points; wherein N is more than or equal to 2;
and the human body key point position detection unit is used for determining the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points by adopting an interpolation method.
Preferably, N ═ 2; then, the human body key point position detection unit is specifically configured to:
acquiring initial coordinates (x) of the target key points in the human body key point prediction heat map0,y0) Corresponding pixel value h0
Acquiring the coordinate (x) of the first adjacent point in the human body key point prediction heat map1,y1) Corresponding pixel value h1And the coordinates (x) of the second neighboring point2,y2) Corresponding pixel value h2
Using Newton interpolation, according to the formula
Figure GDA0003258418390000141
Determining the position coordinates (x, y) of the target keypoints.
It should be noted that the human body key point detection device provided in the embodiment of the present invention can implement all the processes of the human body key point detection method described in any one of the above embodiments, and the functions and implemented technical effects of each module and unit in the device are respectively the same as those of the human body key point detection method described in the above embodiment and implemented technical effects, and are not described herein again.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; wherein, when running, the computer program controls the device where the computer readable storage medium is located to execute the human body key point detection method described in any of the above embodiments.
An embodiment of the present invention further provides a terminal device, as shown in fig. 4, which is a block diagram of a preferred embodiment of the terminal device provided in the present invention, the terminal device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and the processor 10 implements the human body keypoint detection method described in any of the above embodiments when executing the computer program.
Preferably, the computer program can be divided into one or more modules/units (e.g. computer program 1, computer program 2,) which are stored in the memory 20 and executed by the processor 10 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the terminal device, and various interfaces and lines are used to connect various parts of the terminal device.
The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.
It should be noted that the terminal device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural block diagram of fig. 4 is only an example of the terminal device and does not constitute a limitation to the terminal device, and may include more or less components than those shown, or combine some components, or different components.
To sum up, the embodiments of the present invention provide a method, an apparatus, a computer-readable storage medium, and a terminal device for detecting key points of a human body, wherein a gaussian response heatmap constructed by combining connection relationships between key points of the human body is used as supervision information of a key point detection network, and a human body image to be detected is used as input of the key point detection network, so as to obtain a predicted heatmap of key points of the human body, and detect positions of key points of the human body according to the obtained predicted heatmap of key points of the human body, so that key points of the human body can be detected by combining constraints between structural information of the human body (equivalent to connection relationships between key points of the human body), that is, spatial constraints of structural information of the human body are taken into consideration, thereby reducing quantization errors and improving accuracy of detection results, and better utilizing multi-level features on the basis of the existing deep learning method, the advantages of multi-level feature fusion in deep feature extraction are fully utilized, balance between network complexity and key point detection accuracy is better considered, higher key point detection accuracy is obtained with less network complexity, and the method has more practical application value.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A human body key point detection method is characterized by comprising the following steps:
acquiring a human body image to be detected;
acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points; the Gaussian response heat map comprises a key point response heat map and a trunk response heat map, the key point response heat map is constructed by taking key points as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers;
and detecting the positions of the key points of the human body according to the human body key point prediction heat map.
2. The method of human keypoint detection according to claim 1, characterized in that it pre-constructs said gaussian response heatmap by:
acquiring a human body data file, and performing data enhancement processing on the human body data file to correspondingly acquire the human body data set; the human body data file at least comprises a human body image file, a marking file of a human body detection frame and a marking file of a position coordinate of a human body key point;
constructing the Gaussian response heat map according to the connection relation between the human body data set and the human body key points; wherein, the connection relation between the key points of the human body at least comprises the connection relation between the head, the neck, the right shoulder, the right elbow, the right wrist, the left shoulder, the left elbow, the left wrist, the chest, the pelvis, the left hip, the right hip, the left knee, the left ankle, the right knee and the right ankle of the human body.
3. The human keypoint detection method of claim 1 or 2, wherein said keypoint detection network comprises a first level detection network, a second level detection network, a third level detection network and a fourth level detection network;
then, the acquiring a human body key point prediction heat map according to the human body image to be detected based on the preset key point detection network specifically includes:
extracting the characteristics of the human body image to be detected according to the first-level detection network, and correspondingly obtaining first-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the second-level detection network, and correspondingly obtaining second-level characteristic information of the human body image to be detected;
extracting the characteristics of the human body image to be detected according to the third-level detection network, and correspondingly obtaining third-level characteristic information of the human body image to be detected;
performing feature fusion on the first-level feature information, the second-level feature information and the third-level feature information according to the fourth-level detection network, and correspondingly obtaining the human body key point prediction heat map;
wherein the first level detection network and the second level detection network have the torso response heatmap as supervisory information and the third level detection network and the fourth level detection network have the keypoint response heatmap as supervisory information.
4. The human keypoint detection method of claim 3, wherein said first level detection network, said second level detection network and said third level detection network are all constituted by Hourglass networks; the fourth stage detection network comprises a 3 x 3 convolutional neural network.
5. The method according to claim 1, wherein the detecting the positions of the human key points according to the human key point prediction heat map specifically comprises:
acquiring initial coordinates of target key points from the human body key point prediction heat map according to maximum likelihood estimation, and acquiring coordinates of N adjacent points of the target key points; wherein N is more than or equal to 2;
and determining the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points by adopting an interpolation method.
6. The human keypoint detection method of claim 5, wherein N-2; then, the determining, by using an interpolation method, the position coordinates of the target key point according to the initial coordinates of the target key point and the coordinates of the N adjacent points specifically includes:
acquiring initial coordinates (x) of the target key points in the human body key point prediction heat map0,y0) Corresponding pixel value h0
Obtaining coordinates (x) of a first neighbor of the target keypoints in the human keypoint prediction heatmap1,y1) Corresponding pixel value h1And the coordinates (x) of the second neighboring point2,y2) Corresponding pixel value h2
Using Newton interpolation, according to the formula
Figure FDA0003341989760000031
Determining the position coordinates (x, y) of the target keypoints.
7. A human key point detection device, comprising:
the human body image acquisition module is used for acquiring a human body image to be detected;
the human body key point prediction heat map acquisition module is used for acquiring a human body key point prediction heat map according to the human body image to be detected based on a preset key point detection network; the key point detection network takes a Gaussian response heat map of a preset human body part as supervision information, and the Gaussian response heat map is constructed in advance according to a human body data set and the connection relation between preset human body key points; the Gaussian response heat map comprises a key point response heat map and a trunk response heat map, the key point response heat map is constructed by taking key points as centers, and the trunk response heat map is constructed by taking trunk connections among the key points as centers;
and the human body key point position detection module is used for detecting the positions of the human body key points according to the human body key point prediction heat map.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; wherein the computer program, when running, controls an apparatus on which the computer-readable storage medium is located to perform the human key point detection method according to any one of claims 1 to 6.
9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the human keypoint detection method according to any of claims 1 to 6 when executing the computer program.
CN202110059465.4A 2021-01-15 2021-01-15 Human body key point detection method and device, storage medium and terminal equipment Active CN112733767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110059465.4A CN112733767B (en) 2021-01-15 2021-01-15 Human body key point detection method and device, storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110059465.4A CN112733767B (en) 2021-01-15 2021-01-15 Human body key point detection method and device, storage medium and terminal equipment

Publications (2)

Publication Number Publication Date
CN112733767A CN112733767A (en) 2021-04-30
CN112733767B true CN112733767B (en) 2022-05-31

Family

ID=75591892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110059465.4A Active CN112733767B (en) 2021-01-15 2021-01-15 Human body key point detection method and device, storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN112733767B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092963B (en) * 2021-10-14 2023-09-22 北京百度网讯科技有限公司 Method, device, equipment and storage medium for key point detection and model training
CN114863473B (en) * 2022-03-29 2023-06-16 北京百度网讯科技有限公司 Human body key point detection method, device, equipment and storage medium
CN116645699B (en) * 2023-07-27 2023-09-29 杭州华橙软件技术有限公司 Key point detection method, device, terminal and computer readable storage medium
CN117422721B (en) * 2023-12-19 2024-03-08 天河超级计算淮海分中心 Intelligent labeling method based on lower limb CT image

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400316A (en) * 2019-04-19 2019-11-01 杭州健培科技有限公司 A kind of orthopaedics image measuring method and device based on deep learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359568A (en) * 2018-09-30 2019-02-19 南京理工大学 A kind of human body critical point detection method based on figure convolutional network
US10713948B1 (en) * 2019-01-31 2020-07-14 StradVision, Inc. Method and device for alerting abnormal driver situation detected by using humans' status recognition via V2V connection
CN109919122A (en) * 2019-03-18 2019-06-21 中国石油大学(华东) A kind of timing behavioral value method based on 3D human body key point
CN110210526A (en) * 2019-05-14 2019-09-06 广州虎牙信息科技有限公司 Predict method, apparatus, equipment and the storage medium of the key point of measurand
CN110705365A (en) * 2019-09-06 2020-01-17 北京达佳互联信息技术有限公司 Human body key point detection method and device, electronic equipment and storage medium
CN110674785A (en) * 2019-10-08 2020-01-10 中兴飞流信息科技有限公司 Multi-person posture analysis method based on human body key point tracking
CN111967406A (en) * 2020-08-20 2020-11-20 高新兴科技集团股份有限公司 Method, system, equipment and storage medium for generating human body key point detection model

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400316A (en) * 2019-04-19 2019-11-01 杭州健培科技有限公司 A kind of orthopaedics image measuring method and device based on deep learning

Also Published As

Publication number Publication date
CN112733767A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN112733767B (en) Human body key point detection method and device, storage medium and terminal equipment
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN109389030B (en) Face characteristic point detection method and device, computer equipment and storage medium
CN109448007B (en) Image processing method, image processing apparatus, and storage medium
CN111738321B (en) Data processing method, device, terminal equipment and storage medium
CN110688929B (en) Human skeleton joint point positioning method and device
US10489636B2 (en) Lip movement capturing method and device, and storage medium
CN114663593B (en) Three-dimensional human body posture estimation method, device, equipment and storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
WO2019119396A1 (en) Facial expression recognition method and device
CN110619316A (en) Human body key point detection method and device and electronic equipment
CN107862680B (en) Target tracking optimization method based on correlation filter
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN110210480B (en) Character recognition method and device, electronic equipment and computer readable storage medium
WO2020107847A1 (en) Bone point-based fall detection method and fall detection device therefor
US20230334893A1 (en) Method for optimizing human body posture recognition model, device and computer-readable storage medium
CN111104830A (en) Deep learning model for image recognition, training device and method of deep learning model
CN113255557B (en) Deep learning-based video crowd emotion analysis method and system
CN113449610A (en) Gesture recognition method and system based on knowledge distillation and attention mechanism
CN111353325A (en) Key point detection model training method and device
CN111985414A (en) Method and device for determining position of joint point
CN114022748B (en) Target identification method, device, equipment and storage medium
JP5704909B2 (en) Attention area detection method, attention area detection apparatus, and program
CN113228105A (en) Image processing method and device and electronic equipment
WO2021042544A1 (en) Facial verification method and apparatus based on mesh removal model, and computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant