CN113095254A - Method and system for positioning key points of human body part - Google Patents

Method and system for positioning key points of human body part Download PDF

Info

Publication number
CN113095254A
CN113095254A CN202110422052.8A CN202110422052A CN113095254A CN 113095254 A CN113095254 A CN 113095254A CN 202110422052 A CN202110422052 A CN 202110422052A CN 113095254 A CN113095254 A CN 113095254A
Authority
CN
China
Prior art keywords
key point
neural network
convolutional neural
coordinate
stages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110422052.8A
Other languages
Chinese (zh)
Other versions
CN113095254B (en
Inventor
王好谦
蔡元昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202110422052.8A priority Critical patent/CN113095254B/en
Publication of CN113095254A publication Critical patent/CN113095254A/en
Application granted granted Critical
Publication of CN113095254B publication Critical patent/CN113095254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/117Biometrics derived from hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for positioning key points of human body parts, wherein the method comprises the following steps: s1, preprocessing an image containing a human body part; s2, inputting the image preprocessed in the step S1 into a convolutional neural network branch to obtain a key point thermodynamic diagram, and decoding the key point thermodynamic diagram to obtain an initial coordinate of a key point; convolving the feature maps of all stages in the convolutional neural network branches through a connecting layer to obtain corresponding relay thermodynamic maps of all stages, and encoding and decoding the relay thermodynamic maps of all stages to generate corresponding node features and relay key point coordinates of all stages; respectively inputting the node characteristics of each stage and the coordinates of the relay key points into each corresponding stage in the graph convolution neural network branch to obtain the coordinate compensation of the key points; and S3, calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point. The method and the system can improve the detection precision of key points of human body parts.

Description

Method and system for positioning key points of human body part
Technical Field
The invention relates to the technical field of computer data processing, in particular to a method and a system for positioning key points of human body parts.
Background
The main goal of human pose estimation is to locate and connect all human skeletal keypoints in a single RGB image into individual human instances. Human pose estimation is a very important and fundamental task in computer vision. In the traditional algorithm, the human body posture estimation task is regarded as a tree-shaped or net-shaped graph theory model, and the solution is carried out based on the characteristics of manual design. The method has limited characterization capability and cannot achieve good effect. With the continuous breakthrough of deep learning, the field of human posture estimation has also made rapid progress.
The current mainstream algorithms for estimating the human body posture are mainly divided into two types: top-down (Top-down) and Bottom-up (Bottom-up). The top-down algorithm first uses a human body detector to output a rectangular bounding box (bounding box) to map the pedestrian location. Generally, a rectangular bounding box is a quadruple parameter (x, y, w, h), where x denotes the abscissa of the upper left corner of the rectangular bounding box, y denotes the ordinate of the upper left corner of the rectangular bounding box, w denotes the width of the rectangular bounding box, and h denotes the height of the rectangular bounding box, and the position and size information of the rectangular bounding box is shown by such a quadruple. And then, deducting out the rectangular frame area containing the pedestrians, and carrying out single posture estimation on each human body example. The single-person posture estimation process is to input a picture including a single person into a designed convolutional neural network, assuming that a person has K skeletal key points, the neural network outputs thermodynamic diagrams of K channels, each channel represents the probability that any position in the picture is the skeletal key point of the kind, and then the thermodynamic diagrams of each channel are decoded (generally, peak value to peak value shift is taken) to obtain two-dimensional coordinates of each skeletal key point. The bottom-up algorithm firstly detects all human skeleton key points without example labels in the whole picture, specifically, the whole picture containing a plurality of persons is input into a convolutional neural network, then thermodynamic diagrams of all skeleton key points are output, the thermodynamic diagrams are also K channels, then the thermodynamic diagrams of each channel are decoded to obtain two-dimensional coordinate information of each type of skeleton key point, and then key points belonging to the same person are connected to obtain individual human body examples.
However, the various methods/algorithms in the prior art are not accurate for human pose estimation. The present inventors have discovered through careful study that various prior art methods/algorithms focus solely on learning better and more sophisticated image representations to generate higher quality keypoint thermodynamic diagrams. However, in the thermodynamic diagram, information at a pixel point position is compressed into a probability value of corresponding human key point compression, which causes other information carried by the pixel point itself to be erased. For example, a location on a thermodynamic diagram has a large area of response, and we can only assume that the location belongs to the corresponding key point. However, we cannot tell the direction of rotation of the key point into and out of the plane, and the direction of extension of the limb hinged to the key point. In addition, the variety of apparel and severe occlusion can cause difficulty in learning appearance characterizations. In response to this problem, the applicant perceived that the performance of learning and estimating the human pose from the appearance features alone can be improved by implicitly modeling the spatial characterization between skeletal key points of the mutual hinges.
Graph Convolutional neural Network (GCN) is a novel neural Network proposed by Thomas N.Kipf and Maxwelling in Semi-assisted classification with Graph conditional networks. Such neural networks exclusively handle graph-shaped data structures. Generally, graph convolution networks can be divided into two categories, spectrum-based and space-based. The former uses fourier transform to implement the convolution process, and the latter expands the spatial definition of ordinary convolution to implement traditional convolution on nodes and their neighboring nodes in the graph. Generally, spectral-based graph convolution is suitable for handling topology-invariant graph data, while spatial-based graph convolution is good at handling topology-variant graph data.
A simple graph volume layer can be defined as follows:
Figure BDA0003028206780000021
wherein X represents the inputThe characteristics of the nodes of (a) are,
Figure BDA0003028206780000022
is a normalized version of adjacency matrix a. W is a learnable parameter matrix. σ (-) denotes the activation function, commonly the ReLU function. However, the applicant has found that simple graph-convolution networks are not suitable for simulating spatial relationships within skeletal keypoints. The reason is as follows: (1) the learnable matrix W is shared with edges in the graph structure, and therefore, the internal structure of the graph data is not well utilized; (2) adjacency matrix
Figure BDA0003028206780000023
The simple graph convolution layer is limited to capture features from the first-order field of each node; (3) simple graph-convolution networks only exploit spatial information such as two-dimensional skeletal point coordinates but ignore limb-based semantic features.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method and a system for positioning key points of a human body part, which can improve the detection accuracy of the key points.
The invention provides a method for positioning key points of human body parts, which comprises the following steps: s1, preprocessing an image containing a human body part; s2, inputting the image preprocessed in the step S1 into a convolutional neural network branch in a spatial structure representation network module to obtain a key point thermodynamic diagram, and decoding the key point thermodynamic diagram to obtain an initial coordinate of a key point; convolving the feature maps of all stages in the convolutional neural network branches through a connecting layer in the spatial structure representation network module to obtain corresponding relay thermodynamic diagrams of all stages, and encoding and decoding the relay thermodynamic diagrams of all stages to generate corresponding node features and relay key point coordinates of all stages; respectively inputting the node characteristics and the relay key point coordinates of each stage into the corresponding stages in the graph convolution neural network branches in the spatial structure representation network module so as to obtain the coordinate compensation of the key points; and S3, calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for locating key points of a human body part as described above.
The present application further provides a positioning system for key points of human body parts, comprising: a preprocessing module: the image preprocessing device is used for preprocessing an image containing a human body part; spatial structure characterization network module, comprising: branching of the convolutional neural network: the system comprises a preprocessing unit, a key point thermodynamic diagram acquiring unit, a coordinate calculating unit and a coordinate calculating unit, wherein the preprocessing unit is used for inputting a preprocessed image to acquire a key point thermodynamic diagram and decoding the key point thermodynamic diagram to acquire initial coordinates of key points; connecting layers: the relay thermodynamic diagrams are used for connecting the convolutional neural network branches and the stages of the graph convolutional neural network branches, performing convolution on the characteristic diagrams of the stages in the convolutional neural network branches to obtain corresponding relay thermodynamic diagrams of the stages, and encoding and decoding the relay thermodynamic diagrams of the stages to generate corresponding node characteristics and relay key point coordinates of the stages; graph convolution neural network branching: the system comprises a key point, a node characteristic and a relay key point coordinate, wherein the key point is used for inputting the node characteristic and the relay key point coordinate of each stage to obtain the coordinate compensation of the key point; a final coordinate calculation module: and the method is used for calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point.
The invention has the beneficial effects that:
1) in the traditional algorithm, manual design is adopted for spatial structure representation, and the characteristic generalization capability of the manual design is weak. The invention takes the convolutional neural network and the graph convolutional neural network as two parallel and mutually intersected branches to jointly iterate, on one hand, the convolutional neural network branches are arranged, the most advanced and best-effect human body part key point estimation algorithm at present can be directly utilized, and on the other hand, the graph convolutional neural network branches are adopted to implicitly simulate the spatial structure relation among the human body part key points, thereby overcoming the limitations of the traditional method and the current mainstream method for estimating the human body part key points and greatly improving the detection precision of the key points.
2) The method has good flexibility and expansibility. By adaptively replacing the convolutional neural network branches and the graph convolutional neural network branches, the key point estimation is performed according to different human body parts, such as: human body posture estimation, human body key point detection, gesture estimation and the like. The spatial structure representation network module is suitable for a single-stage convolution network and a multi-stage network, and is high in expandability.
3) The traditional convolution calculation is based on a two-dimensional image, a filter is used for traversing, large space storage is occupied, and the calculation complexity is high. The graph convolution in the invention is based on the node characteristics, matrix multiplication is used, and the space storage complexity and the calculation complexity are both low. Therefore, the calculation efficiency of the design of the invention is higher, the extra calculation time is hardly increased, and the requirements of small model capacity, high calculation speed, low delay and the like of mobile terminal equipment deployment can be met.
4) The general simple graph convolution has weaker representation capability when processing human skeleton data, and the graph convolution layer in the graph convolution neural network provided by the invention is elaborately designed to capture the stable dependency relationship between key points of human body parts, and can automatically generate key point coordinate compensation offset by using context information with large information amount to guide the optimization of the posture of the human body parts. Meanwhile, on the other hand, the graph convolution neural network branches can utilize local key point representation and global background representation to inhibit meaningless information propagation and reduce noise interference. It is noted that the graph convolution layer can also carry non-local blocks to further improve performance.
5) The design of the invention adopts an end-to-end optimization technology, and does not need a complicated multi-step training process. The optimal effect can be achieved by direct end-to-end training. Meanwhile, the space storage and the calculation complexity of the branches of the graph convolution network are not large, so that the occupied video memory resources are not large, the training overhead is controlled, and the training speed is high.
Drawings
Fig. 1 is a frame diagram of a positioning system for key points of human bones in embodiment 1 of the present invention.
Fig. 2 is a frame diagram of a spatial structure characterization network module in embodiment 1 of the present invention.
FIG. 3 is a block diagram of convolutional neural network branches in embodiment 1 of the present invention.
FIG. 4 is a block diagram of a convolutional layer in a convolutional neural network branch in embodiment 1 of the present invention.
Fig. 5 is a block diagram of a convolutional layer carrying non-local blocks in convolutional neural network branches in embodiment 1 of the present invention.
Fig. 6 is a flowchart of a method for locating key points of a human body part in embodiment 1 of the present invention.
Fig. 7 is a diagram illustrating the effect of the positioning experiment on key points of human bones in example 1 of the present invention.
Detailed Description
The present invention is described in further detail below with reference to specific embodiments and with reference to the attached drawings, it should be emphasized that the following description is only exemplary and is not intended to limit the scope and application of the present invention.
The invention relates to a method and a system for positioning key points of human body parts. The positioning of the key points of the human body parts comprises positioning of key points of human skeleton, detection of key points of human face, gesture estimation and the like. In the following examples, the key points of human bones are taken as examples for explanation.
Example 1
The embodiment provides a high-precision human body posture estimation method, in particular to a method and a system for positioning key points of human bones, wherein the system comprises three modules, as shown in fig. 1, which are respectively: the system comprises a preprocessing module 10, a spatial structure representation network module 20 and a final coordinate calculation module 30; the spatial structure characterization network module 20 further includes a convolutional neural network branch (CNN)21, a connection layer 22, and a convolutional neural network branch (GCN) 23.
Based on the system, the positioning method comprises the following steps: s1, preprocessing an image containing a human body part. S2, inputting the image preprocessed in the step S1 into a convolutional neural network branch in a spatial structure representation network module to obtain a key point thermodynamic diagram, and decoding the key point thermodynamic diagram to obtain an initial coordinate of the key point. And performing convolution on the feature graphs of each stage in the convolutional neural network branch through a connecting layer in the spatial structure representation network module to obtain corresponding relay thermodynamic diagrams of each stage, and encoding and decoding the relay thermodynamic diagrams of each stage to generate corresponding node features and relay key point coordinates of each stage. And respectively inputting the node characteristics and the relay key point coordinates of each stage into the corresponding stages in the graph convolution neural network branches in the spatial structure representation network module so as to obtain the coordinate compensation of the key points. And S3, calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point.
The modules and the corresponding method steps are further explained below.
1. Pre-processing module
The preprocessing module 10 is used for preprocessing an image including a human body part, and includes two steps: step 101 and step 102.
101: and (4) completing pre-training of the convolutional neural network branches on the data set ImageNet, and storing the weights for later use.
102: firstly, a pedestrian detector outputs a series of rectangular boundary frames to detect people in the picture, and the rectangular frames are deducted to be used as training data. The training data is then resized to a uniform size (256 × 256 or 256 × 192 or 384 × 288) and data enhanced with random cropping, rotation, symmetry, occlusion, truncation, etc.
2. Spatial structure representation network module
As shown in FIG. 2, the spatial structure characterization network module includes two branches and a connection layer, CNN1~CNNMRepresents a convolutional neural network branch 21 and comprises M stages; GCN1~GCNMRepresenting the convolutional neural network branch 23, and comprising M stages; the arrows connecting the stages in CNN and GCN represent the connection layer 22. The convolutional neural network branch 21 is used for inputting the preprocessed image to obtain a key point thermodynamic diagram, and decoding the key point thermodynamic diagram to obtain an initial coordinate of a key point; the connection layer 22 is used for connecting the convolutional neural network branch and each stage of the graph convolutional neural network branch, and divides the convolutional neural network intoCarrying out convolution on the feature graphs of each stage in the branch to obtain a corresponding relay thermodynamic diagram of each stage, and coding and decoding the relay thermodynamic diagrams of each stage to generate node features and relay key point coordinates of each stage; the graph convolution neural network branch 23 is used for inputting node characteristics of each stage and relaying the key point coordinates to obtain the coordinate compensation of the key point.
Let the input image be Iin。IinConvolutional layer passing through several heads (defined as Conv)down(. o)) to obtain an initial characteristic F0. Let the backbone portion of the convolutional neural network branch have M stages, the kth stage is defined as CNNk(. cndot.). This flow can be derived as follows:
F0=Convdown(Iin),Fk=CNNk(Fk-1) Formula (1)
Wherein k is more than or equal to 1 and less than or equal to M. Characteristic F of the k-th stagekGenerating a k-th stage relay thermodynamic diagram through a connection layer
Figure BDA0003028206780000061
R represents a real number domain. K denotes the number of bone keypoint classes, H and W define the height and width of the feature map, respectively. Then, the process of the present invention is carried out,
Figure BDA0003028206780000062
a node characteristic N used to generate a kth relay thermodynamic diagram by an encoding function (En (-) and a decoding function (De (-) respectively)k∈RK+1And two-dimensional coordinates C of relay key pointk∈RK×2. This process can be derived as follows:
Figure BDA0003028206780000063
wherein the encoding function En (-) is a 1 × 1 convolutional layer, and the decoding function De (-) is a weighted sum of the two-dimensional keypoint coordinates of the Top-k thermodynamic diagram response maximum. Because of the difference from the real label, it is used
Figure BDA0003028206780000064
And
Figure BDA0003028206780000065
to represent the network prediction that is input into the graph convolution neural network branches. There are also M stages of the graph convolution neural network branch. The graph convolution network of the kth stage is defined as GCNk(. o), the output is defined as Gk。GkComposed of two parts, node characteristics Gk(0) And key point coordinate compensation Gk(1)。GCNkThe main task of (DEG) is to extract spatial structure contact information among the skeletal key points to generate and input CkAnd correspondingly compensating the two-dimensional coordinates. GkCan be derived from the following formula:
Figure BDA0003028206780000071
where λ is a hyper-parameter for balancing GCNkThe input of (c) is (c),
Figure BDA0003028206780000072
and
Figure BDA0003028206780000073
respectively representing the node features and the key point coordinates output in the first stage. From this, the keypoint prediction of the k-th stage can be further derived as:
Figure BDA0003028206780000074
as previously described, Gk(1) Is that
Figure BDA0003028206780000075
Relay keypoint coordinate compensation. Therefore, the temperature of the molten metal is controlled,
Figure BDA0003028206780000076
convenient substitute
Figure BDA0003028206780000077
The coordinates after optimization and improvement. The final coordinates of the keypoints finally output by the final coordinate calculation module can be expressed as:
Figure BDA0003028206780000078
2.1 convolutional neural network Branch
The convolutional neural network branches, which are currently leading high-resolution networks HRNet in this embodiment, and the details of the network structure are shown in fig. 3. The whole network structure is divided into 4 stages, and can be divided into four layers according to the spatial resolution of the characteristic diagram. Note the convolution network of the ith stage and the jth layer as NijThe spatial division ratios of the four layers from top to bottom are 1/4, 1/8, 1/16 and 1/32 of the original size in sequence. After the preprocessed training data is fed into the convolutional neural network branch, a relay thermodynamic diagram is generated at each stage of the HRNet, and the relay thermodynamic diagram is used for relay supervision on one hand, and is used for encoding to generate node characteristics and decoding to generate relay key point coordinates to serve as multi-element input of the convolutional neural network branch on the other hand. The final output key point thermodynamic diagram of the convolutional neural network branch is used for generating initial coordinates of human skeleton points.
2.2 graph convolution neural network Branch
In order to overcome the deficiency of the simple graph convolution layer in processing the human skeleton data, the present embodiment specially designs the graph convolution layer in the graph convolution neural network branch, and the block diagram of the graph convolution layer is shown in fig. 4.
Wherein,
Figure BDA0003028206780000079
spatial coordinate information representing the key points,
Figure BDA00030282067800000710
node features representing local skeletal keypoints,
Figure BDA00030282067800000711
and representing the node characteristics of the relay thermodynamic diagram in a global context.
2.2.1 base map convolutional layers. In order to facilitate the training process, the jump level connection is introduced through the fc full-link layer in the embodiment, and in a basic graph convolution layer, a simple graph convolution is calculated in the adjacent matrix A, so that the parameter matrix W and the input key point space coordinate information J can be learnedinAnd is spread out. This process can be derived as follows:
Jout=σ(fc1(σ(AJinW))+fc2(Jin) Equation (6)
Where the batch normalization is omitted, fc1And fc2Are all-connected layers not shared with each other, JoutAnd represents the spatial coordinate information of the key points after the graph convolution layer processing.
2.2.2 limb-based and background-based features. Encoding the features of local keypoints into node features HinThen the node feature comprises a limb-based characterization. The local features contain very rich spatial texture context information and semantic description information. The former facilitates the localization of keypoints and the latter facilitates the classification of keypoints. On the other hand, some background regions are often characterized by little meaning or even noise. This information reduces computational efficiency and detection performance. The present invention is therefore designed to suppress the propagation of useless information and to enable the network to concentrate more on areas of body key points. Thus, introduction of HinAnd GinAs input for the skeleton structure map convolutional layer. Then the output limb-based and background-based features HoutAnd GoutCan be derived as follows.
Figure BDA0003028206780000081
Wherein abs (. cndot.) represents absolute value, [. cndot. ], cascade operation, and split represents the channel-by-channel partition characteristics.
2.2.3 keypoint perceptual steering matrices. By means of HinAnd GinTo generate a key-point-aware steering matrix AkwThe derivation process can be derived as follows:
Akw=fc(repeat(Hout) Equation (8)
Wherein the repeat (. cndot.) function represents a negative profile and concatenates them to produce a
Figure BDA0003028206780000082
Of the matrix of (a). This matrix produces A by a full connection layer and a batch normalizationkw。AkwRich limb-based features can be maintained while suppressing interference of meaningless information. In addition, due to the copy operation using the full connection layer, AkwSpatial relationships between skeletal points can also be modeled. Then, in AkwAn, W and JinA simple graph convolution operation is performed. An indicator indicates a matrix bit-wise multiplication operation. Then J is finallyoutCan be expressed as follows:
Jout=σ(fc1(σ(Akw⊙A)JinW))+fc2(Jin) Equation (9)
2.2.4 non-local blocks. The graph convolutional layer designed as above can be seamlessly interfaced with non-local block technology. As shown in fig. 5, adding Non-Local blocks (Non-Local) may reinforce long-distance dependencies between key points of human bones.
2.3 training the loss function. The final training loss function is the mean square error of the thermodynamic diagram on the convolutional neural network branch plus the two-dimensional coordinates of the key points on the convolutional neural network branch and the regression loss of the limb length, as shown in the following formula:
Figure BDA0003028206780000091
wherein L represents a total loss function, γ is a hyperparameter, M is the number of stages of a convolutional neural network branch or a graph convolutional neural network branch,
Figure BDA0003028206780000092
representing the loss function of the convolutional neural network at stage k,
Figure BDA0003028206780000093
represents the loss function of the k-th stage graph convolutional neural network.
For the whole process, the specific steps are shown in fig. 6, the preprocessed image (a) is input into the branch of the convolutional neural network CNN in the spatial structure characterization network module, the whole CNN network structure is divided into 4 stages, the convolutional network of the i-th stage and the j-th layer is NijThe spatial division ratios of the four layers from top to bottom are 1/4, 1/8, 1/16 and 1/32 of the original size in sequence. The convolutional neural network CNN branch will eventually output a key point thermodynamic diagram (thermodynamic diagram in the upper right corner of the diagram). The keypoint thermodynamic diagram is decoded to obtain the initial coordinates of the keypoints.
Feature map (N) with maximum spatial resolution at each stage in convolutional neural network CNN11、N21、N31、N41) And respectively obtaining the relay thermodynamic diagrams (b) of each stage through a 1 × 1 convolutional layer, and encoding and decoding the relay thermodynamic diagrams to generate the corresponding node characteristics (c) and the relay key point coordinates (d) of each stage.
Then respectively inputting the node characteristics (c) and the relay key point coordinates (d) of each stage into corresponding stages (GCN) in the graph convolution neural network branch in the spatial structure representation network module1、GCN2、GCN3、GCN4) And obtaining the coordinate compensation of the key point after output.
And finally, calculating and obtaining the final coordinate (e) of the key point according to the initial coordinate of the key point output by the convolutional neural network branch and the coordinate compensation of the key point output by the graph convolutional neural network branch.
The method is used for positioning key points of human bones, and the finally obtained experimental effect graph is shown in fig. 6, so that the result is good, and the positioning is accurate.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present invention.
Example 2
The difference from embodiment 1 is that the convolutional neural network branch in this embodiment is replaced with SimpleBaseline-152 in the paper "Simple bases for human position estimation and tracking", and the graph convolutional neural network branch is replaced with SemGCN in the paper "Semantic graph volumetric network for 3d human position regression".
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention.

Claims (10)

1. A method for locating key points of human body parts is characterized by comprising
S1, preprocessing an image containing a human body part;
s2, inputting the image preprocessed in the step S1 into a convolutional neural network branch in a spatial structure representation network module to obtain a key point thermodynamic diagram, and decoding the key point thermodynamic diagram to obtain an initial coordinate of a key point;
convolving the feature maps of all stages in the convolutional neural network branches through a connecting layer in the spatial structure representation network module to obtain corresponding relay thermodynamic diagrams of all stages, and encoding and decoding the relay thermodynamic diagrams of all stages to generate corresponding node features and relay key point coordinates of all stages;
respectively inputting the node characteristics and the relay key point coordinates of each stage into the corresponding stages in the graph convolution neural network branches in the spatial structure representation network module so as to obtain the coordinate compensation of the key points;
and S3, calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point.
2. The method for locating key points of human body parts according to claim 1, wherein the preprocessing of the step S1 includes:
pre-training the convolutional neural network branches on a data set ImageNet;
and respectively detecting human body parts in the image one by using a detector, and enhancing data.
3. The method according to claim 1, wherein the convolutional neural network branch comprises HRNet or simplebaeline-152.
4. The method as claimed in claim 3, wherein the HRNet comprises 4 stages and is divided into four layers according to the spatial resolution, and the spatial resolution of the four layers from top to bottom is 1/4, 1/8, 1/16 and 1/32.
5. The method of claim 1, wherein the graph convolutional neural network branch comprises a SemGCN.
6. The method of claim 1, wherein the outputs of the graph convolution layer in the graph convolution neural network branches are:
Jout=σ(fc1(σ((Akw⊙A)JinW))+fc2(Jin))
wherein fc1And fc2Are fully connected layers that are not shared with each other; a is an adjacency matrix; w is a learning parameter matrix; j. the design is a squareinInputting key point space coordinate information; a. thekwA steering matrix perceivable as a key point, andkw=fc(repeat(Hout) Repeat () function represents a negative characteristic graph and concatenates them to produce one
Figure FDA0003028206770000021
The matrix is normalized by a full connection layer and a batch to generate Akw
HoutThe following are satisfied:
Figure FDA0003028206770000022
wherein HinAnd GinFor the input of the convolutional layer of the skeleton map, abs (. cndot.) represents the absolute value, [, ]]Representing cascade operation, split representing per-channel partitioning feature; houtAnd GoutAre limb-based and background-based features of the output.
7. The method according to claim 1, wherein the training loss function of the spatial structure characterization network module is:
Figure FDA0003028206770000023
wherein gamma is a hyper-parameter, M is the number of stages of the convolutional neural network branch or the graph convolutional neural network branch,
Figure FDA0003028206770000024
representing the loss function of the convolutional neural network at stage k,
Figure FDA0003028206770000025
represents the loss function of the k-th stage graph convolutional neural network.
8. The method of claim 1, wherein the human body part comprises: human skeleton, human face, human palm.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for locating key points of a body part according to any one of claims 1 to 8.
10. A system for locating key points on a human body part, comprising:
a preprocessing module: the image preprocessing device is used for preprocessing an image containing a human body part;
spatial structure characterization network module, comprising:
branching of the convolutional neural network: the system comprises a preprocessing unit, a key point thermodynamic diagram acquiring unit, a coordinate calculating unit and a coordinate calculating unit, wherein the preprocessing unit is used for inputting a preprocessed image to acquire a key point thermodynamic diagram and decoding the key point thermodynamic diagram to acquire initial coordinates of key points;
connecting layers: the relay thermodynamic diagrams are used for connecting the convolutional neural network branches and the stages of the graph convolutional neural network branches, performing convolution on the characteristic diagrams of the stages in the convolutional neural network branches to obtain corresponding relay thermodynamic diagrams of the stages, and encoding and decoding the relay thermodynamic diagrams of the stages to generate corresponding node characteristics and relay key point coordinates of the stages;
graph convolution neural network branching: the system comprises a key point, a node characteristic and a relay key point coordinate, wherein the key point is used for inputting the node characteristic and the relay key point coordinate of each stage to obtain the coordinate compensation of the key point;
a final coordinate calculation module: and the method is used for calculating and obtaining the final coordinate of the key point according to the initial coordinate of the key point and the coordinate compensation of the key point.
CN202110422052.8A 2021-04-20 2021-04-20 Method and system for positioning key points of human body part Active CN113095254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110422052.8A CN113095254B (en) 2021-04-20 2021-04-20 Method and system for positioning key points of human body part

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110422052.8A CN113095254B (en) 2021-04-20 2021-04-20 Method and system for positioning key points of human body part

Publications (2)

Publication Number Publication Date
CN113095254A true CN113095254A (en) 2021-07-09
CN113095254B CN113095254B (en) 2022-05-24

Family

ID=76678732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110422052.8A Active CN113095254B (en) 2021-04-20 2021-04-20 Method and system for positioning key points of human body part

Country Status (1)

Country Link
CN (1) CN113095254B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674207A (en) * 2021-07-21 2021-11-19 电子科技大学 Automatic PCB component positioning method based on graph convolution neural network
CN113706463A (en) * 2021-07-22 2021-11-26 杭州键嘉机器人有限公司 Method, device and equipment for automatically detecting key points of joint image based on deep learning and storage medium
CN113837130A (en) * 2021-09-29 2021-12-24 福州大学 Human hand skeleton detection method and system
CN114757822A (en) * 2022-06-14 2022-07-15 之江实验室 Binocular-based human body three-dimensional key point detection method and system
EP4167194A1 (en) * 2021-10-14 2023-04-19 Beijing Baidu Netcom Science And Technology Co. Ltd. Key point detection method and apparatus, model training method and apparatus, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359568A (en) * 2018-09-30 2019-02-19 南京理工大学 A kind of human body critical point detection method based on figure convolutional network
CN110969105A (en) * 2019-11-22 2020-04-07 清华大学深圳国际研究生院 Human body posture estimation method
CN111753669A (en) * 2020-05-29 2020-10-09 广州幻境科技有限公司 Hand data identification method, system and storage medium based on graph convolution network
CN112001403A (en) * 2020-08-11 2020-11-27 北京化工大学 Image contour detection method and system
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
US20210082144A1 (en) * 2019-09-12 2021-03-18 Nec Laboratories America, Inc Keypoint based pose-tracking using entailment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359568A (en) * 2018-09-30 2019-02-19 南京理工大学 A kind of human body critical point detection method based on figure convolutional network
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
US20210082144A1 (en) * 2019-09-12 2021-03-18 Nec Laboratories America, Inc Keypoint based pose-tracking using entailment
CN110969105A (en) * 2019-11-22 2020-04-07 清华大学深圳国际研究生院 Human body posture estimation method
CN111753669A (en) * 2020-05-29 2020-10-09 广州幻境科技有限公司 Hand data identification method, system and storage medium based on graph convolution network
CN112001403A (en) * 2020-08-11 2020-11-27 北京化工大学 Image contour detection method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BIN XIAO等: "Simple Baselines for Human Pose Estimation and Tracking", 《COMPUTER VISION - ECCV2018.15TH EUROPEAN CONFERENCE. PROCEEDING: LECTURE NOTES IN COMPUTER SCIENCE(LNCS11210)》, 8 September 2018 (2018-09-08), pages 1 - 6 *
HAOQIAN WANG等: "MAGNIFY-NET FOR MULTI-PERSON 2D POSE ESTIMATION", 《2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO(ICME)》, 23 July 2018 (2018-07-23), pages 1 - 16 *
LONG ZHAO等: "Semantic Graph Convolutional Networks for 3D Human Pose Regression", 《32ND IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECONGNITION(CVPR)》, 16 June 2019 (2019-06-16), pages 1 - 13 *
王好谦等: "基于NIR和SVM算法的巴里黄檀和交趾黄檀鉴别研究", 《光谱学与光谱分析》, vol. 36, no. 10, 31 October 2016 (2016-10-31), pages 69 - 72 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674207A (en) * 2021-07-21 2021-11-19 电子科技大学 Automatic PCB component positioning method based on graph convolution neural network
CN113674207B (en) * 2021-07-21 2023-04-07 电子科技大学 Automatic PCB component positioning method based on graph convolution neural network
CN113706463A (en) * 2021-07-22 2021-11-26 杭州键嘉机器人有限公司 Method, device and equipment for automatically detecting key points of joint image based on deep learning and storage medium
CN113706463B (en) * 2021-07-22 2024-04-26 杭州键嘉医疗科技股份有限公司 Joint image key point automatic detection method and device based on deep learning
CN113837130A (en) * 2021-09-29 2021-12-24 福州大学 Human hand skeleton detection method and system
CN113837130B (en) * 2021-09-29 2023-08-08 福州大学 Human hand skeleton detection method and system
EP4167194A1 (en) * 2021-10-14 2023-04-19 Beijing Baidu Netcom Science And Technology Co. Ltd. Key point detection method and apparatus, model training method and apparatus, device and storage medium
JP2023059231A (en) * 2021-10-14 2023-04-26 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Key point detection and model training method, apparatus, device, and storage medium
JP7443647B2 (en) 2021-10-14 2024-03-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Keypoint detection and model training method, apparatus, device, storage medium, and computer program
CN114757822A (en) * 2022-06-14 2022-07-15 之江实验室 Binocular-based human body three-dimensional key point detection method and system
CN114757822B (en) * 2022-06-14 2022-11-04 之江实验室 Binocular-based human body three-dimensional key point detection method and system

Also Published As

Publication number Publication date
CN113095254B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
CN113095254B (en) Method and system for positioning key points of human body part
CN111882002B (en) MSF-AM-based low-illumination target detection method
CN112288011B (en) Image matching method based on self-attention deep neural network
CN107492121B (en) Two-dimensional human body bone point positioning method of monocular depth video
CN112052886A (en) Human body action attitude intelligent estimation method and device based on convolutional neural network
US20140185924A1 (en) Face Alignment by Explicit Shape Regression
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
Li et al. Face sketch synthesis using regularized broad learning system
CN110390294B (en) Target tracking method based on bidirectional long-short term memory neural network
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN113095251B (en) Human body posture estimation method and system
CN115222998B (en) Image classification method
CN111008631A (en) Image association method and device, storage medium and electronic device
CN111126249A (en) Pedestrian re-identification method and device combining big data and Bayes
CN116563355A (en) Target tracking method based on space-time interaction attention mechanism
CN113554653A (en) Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
CN113205103A (en) Lightweight tattoo detection method
CN116402679A (en) Lightweight infrared super-resolution self-adaptive reconstruction method
CN115565039A (en) Monocular input dynamic scene new view synthesis method based on self-attention mechanism
Ghanem et al. Phase based modelling of dynamic textures
KR102057395B1 (en) Video generation method using video extrapolation based on machine learning
WO2020109497A2 (en) Method and system for generating data
CN114820755B (en) Depth map estimation method and system
CN109615640A (en) Correlation filtering method for tracking target and device
CN110826726B (en) Target processing method, target processing device, target processing apparatus, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant