CN111626159B - Human body key point detection method based on attention residual error module and branch fusion - Google Patents

Human body key point detection method based on attention residual error module and branch fusion Download PDF

Info

Publication number
CN111626159B
CN111626159B CN202010410104.5A CN202010410104A CN111626159B CN 111626159 B CN111626159 B CN 111626159B CN 202010410104 A CN202010410104 A CN 202010410104A CN 111626159 B CN111626159 B CN 111626159B
Authority
CN
China
Prior art keywords
branch
convolution
layer
attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010410104.5A
Other languages
Chinese (zh)
Other versions
CN111626159A (en
Inventor
刘峰
龙芳芳
干宗良
崔子冠
赵峥来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010410104.5A priority Critical patent/CN111626159B/en
Publication of CN111626159A publication Critical patent/CN111626159A/en
Application granted granted Critical
Publication of CN111626159B publication Critical patent/CN111626159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body key point detection method based on attention residual error module and branch fusion. Belongs to the technical field of computer vision, and comprises the following steps: carrying out feature processing on the input picture by using a feature extraction network to obtain a feature map; inputting the characteristic diagram into an area to generate a network to obtain a target suggestion box; performing region pooling operation to obtain a region-of-interest characteristic diagram; inputting the obtained data into the convolutional layer to perform feature extraction operation to obtain a feature graph I; performing feature extraction and fusion by using the first branch and the second branch; superposing the results of the two branches, firstly using deconvolution to recover the resolution, and then performing twice linear interpolation up-sampling; the locations of the keypoints are modeled as one-hot binary masks for training. The invention improves the diversity of the information output by the network, captures different visual fields better, not only effectively solves the problem of disordered key points in a simple scene, but also improves the accuracy and efficiency and can be well adapted to a complex scene.

Description

Human body key point detection method based on attention residual error module and branch fusion
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a human body key point detection method based on attention residual error module and branch fusion.
Background
In the prior art, detection of human body posture is widely concerned by scholars at home and abroad, and is also an important subject in the field of computer vision, and the core content of the detection is to detect human body targets from pictures by technologies such as image processing and analysis, machine learning, pattern recognition and the like, distinguish human body parts and further detect human joint points; in recent years, related researches at home and abroad divide the research mode of detecting the human body posture into a method based on a wearable sensor and a method based on the computer vision field according to different modes of acquiring original data of the human body describing posture; the former is mostly a contact type attitude analysis system, which has higher human body analysis capability, but a sensor for collecting human body motion parameters needs to be attached to a human body or accessories, so that the problem of inconvenient wearing exists, and unnatural feeling is brought to people; the cost is high, the operation is difficult, the remote control is not suitable, and the popularization is difficult; although the best human body detection algorithm has a good effect at present, errors still exist, and the detection task accuracy is not high due to the errors; the human body posture is represented by the optical flow, the human body silhouette, the outline, the skeleton, the joint points and the like in the image, so that the parameters of a human body model do not need to be solved, and the solution of the human body posture is simplified; the deep learning algorithm provides a new idea for detecting human body postures, and generally performs matching analysis through global features of images, so that the problem of feature matching ambiguity of a local feature method under the conditions of complex postures and shielding relations can be effectively avoided, and the algorithm can be ensured to have better robustness.
Disclosure of Invention
In view of the above problems, the present invention provides a human body key point detection method based on attention residual error module and branch fusion, which solves the problems of poor detection effect and low accuracy in the prior art.
The technical scheme of the invention is as follows: a human body key point detection method based on attention residual error module and branch fusion specifically comprises the following steps:
step (1.1), carrying out feature processing on an input picture by using a feature extraction network to obtain a feature map; inputting the feature map into a region generation network to obtain a target suggestion box, and then performing region pooling operation by combining the feature map to obtain a region-of-interest feature map;
step (1.2), inputting the obtained characteristic diagram of the region of interest into the convolutional layer for characteristic extraction operation, and marking the obtained result as a characteristic diagram I;
step (1.3), inputting the first characteristic diagram into the first branch and the second branch respectively for characteristic processing;
the specific steps of the branch for carrying out feature processing on the first feature graph are as follows:
(1.3.1) designing two identical attention residual error modules at the input position of the branch I, connecting the front layer and the rear layer in the network by matching with a data bypass, connecting the two attention residual error modules in pairs and superposing the two attention residual error modules at a pixel level, and enabling each layer of module in the network to receive feature mapping from the front layers of modules in a cascading mode;
(1.3.2), inputting the obtained product into a full connection layer after dimension reduction of the convolution layer; finally, reshaping the shape of the winding layer to obtain a winding layer with the same size as the first branch;
the operation of feature processing on the first feature map by the branch two-pair is as follows:
taking the first cavity convolution layer, the second cavity convolution layer and the third cavity convolution layer which are arranged in the second branch and have different space convolution rates as a combination, and obtaining different receptive fields through the combination so as to obtain multi-scale information;
and (1.4) overlapping results processed in the first branch and the second branch of the characteristic diagram, marking the result as a second characteristic diagram, deconvoluting the second characteristic diagram, then performing up-sampling, and finally obtaining joint point information through an unique-hot binary mask.
Further, in step (1.3.1), the attention residual module is composed of a residual small module of hole convolution in cooperation with an attention mechanism:
wherein, the residual error small module of the cavity convolution is as follows: the method comprises three convolution layers, namely a dimension reduction convolution layer, a cavity convolution layer and a dimension lifting convolution layer, wherein convolution weights are obtained through convolution operations of the three convolution layers and are set as V;
the attention mechanism comprises the following specific steps: after performing convolution operation on the V through a convolution layer, sequentially performing global weighting pooling, dot product convolution and S-shaped growth curves, and obtaining space attention weight through a network; finally, the spatial attention weight and the V weight are used for realizing the output of the channel attention, and the spatial attention weighted characteristic is obtained.
Further, superposing the output parameters of the two branches to obtain a second characteristic diagram; and performing resolution restoration on the feature map II by using an inverse convolution layer, performing up-sampling by using double linear interpolation to generate high-resolution output, and finally modeling the position of the joint point of the human body into an unique-heat binary mask to obtain the information of the joint point.
The invention has the beneficial effects that: the invention belongs to a human body posture detection method in the technical field of computer vision, which is a top-down detection method, and particularly relates to a method for performing feature fusion on an attention residual error module and a data bypass, wherein the method has stronger durability and usability and higher accuracy; (1) the attention residual error module at the first branch circuit distributes the weight to each channel characteristic, and the information of the characteristic diagram is highlighted in a space and channel self-adaptive manner; meanwhile, two layers of cross-layer connection is established between the attention residual error modules to connect the front layer and the rear layer in the network, so that signals can flow at a high speed between the input layer and the output layer, the design mode improves the information flow between the layers, enriches the information and lays a foundation for high accuracy and high efficiency of subsequent detection; (2) the Full Convolution Network (FCN) branch matched with the cavity convolution is adopted at the second branch, so that each group of results before and after convolution can be mutually staggered and depended, the receptive field is expanded, and the problem of local information loss (grid problem) of the cavity convolution is solved; and multi-scale context information can be captured, and local information dependence is obtained. Effectively avoiding single receptive field, insufficient acquired context information and insufficient ' seen ' to the full '; the problem of the confusion of the detection joint point caused by the problem; (3) the two branches are subjected to addition fusion operation, more diversified information is obtained, and different views of each target area are captured better; by combining the prediction results of the two visual fields, the diversity of the information output by the network is improved, the problem of disordered joint detection is effectively solved in a simple scene, the accuracy and the efficiency are improved, and the method can be well adapted to a complex scene.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a schematic diagram of an attention residual module according to the present invention;
FIG. 3 is a diagram illustrating an exemplary structure of a grid problem in the present invention;
FIG. 4 is a schematic representation of the human joint of the present invention.
Detailed Description
In order to more clearly illustrate the technical solution of the present invention, the present invention will be further described below; obviously, the following description is only a part of the embodiments, and it is obvious for a person skilled in the art to apply the technical solutions of the present invention to other similar situations without creative efforts; in order to more clearly illustrate the technical solution of the present invention, the technical solution of the present invention is further described in detail below with reference to the accompanying drawings:
a human body key point detection method based on attention residual error module and branch fusion comprises the steps of using a feature extraction network to perform feature processing on an input picture to obtain a feature map; inputting the feature map into the area to generate a network to obtain a target suggestion box; performing region pooling operation by combining the characteristic map to obtain a characteristic map of the region of interest; inputting the obtained characteristic diagram of the region of interest into the convolution layer for characteristic extraction operation to obtain a characteristic diagram I; carrying out deeper feature extraction and fusion by using a brand new neural network; after the results of the two branches are superposed, resolution restoration is carried out by deconvolution, and then twice linear interpolation upsampling is carried out; the positions of the joint points are modeled as one-hot binary masks for training.
As shown in fig. 1, the detection method specifically includes the following steps:
step (1.1), carrying out feature processing on an input picture by using a feature extraction network to obtain a feature map; inputting the feature map into a region generation network to obtain a target suggestion box, and then performing region pooling operation by combining the feature map to obtain a region-of-interest feature map;
step (1.2), inputting the obtained characteristic diagram of the region of interest into the convolutional layer for characteristic extraction operation, and recording the obtained result as a characteristic diagram I;
step (1.3), inputting the first characteristic diagram into the first branch and the second branch respectively for characteristic processing;
and (1.4) overlapping results of the first characteristic diagram processed in the first branch and the second branch, recording the result as a second characteristic diagram, performing deconvolution on the second characteristic diagram, performing up-sampling, and finally obtaining joint point information through an unique-heat binary mask.
Further, in the step (1.2), the convolutional layers are three identical convolutional layers;
for convenience of description, the parameters related to the convolutional layer are defined herein, and the length, width and dimension of the input feature map are W, H, C respectively, and the dimension form is R W×H×C (ii) a The size of the convolution kernel (kernel) is k, and the size form is recorded as k multiplied by k; the step length (stride) is s; padding (padding) is p; the width of the output feature map after the convolution operation is:
Figure GDA0003690955120000041
the lengths are the same;
therefore, the dimension of the convolution layer is 3 multiplied by 3, the step length and the filling are both 1, and the first characteristic diagram obtained after the convolution layer is obtained by a formula is consistent with the characteristic diagram of the interested region in the dimension R W×H×C
Further, in the step (1.3), the specific step of performing feature processing on the first feature map by the branch is as follows:
(1.3.1) designing two identical attention residual error modules at the input position of the branch I, connecting the front layer and the rear layer in the network by matching with a data bypass, connecting the two attention residual error modules in pairs and superposing the two attention residual error modules at a pixel level, and enabling each layer of module in the network to receive feature mapping from the front layers of modules in a cascading mode; wherein:
(1) and a residual error small module of the cavity convolution: the method comprises three convolution layers, namely a dimension reduction convolution layer, a cavity convolution layer and a dimension lifting convolution layer, wherein convolution weights are obtained through convolution operations of the three convolution layers and are set as V;
(2) the attention mechanism comprises the following specific steps: after performing convolution operation on the V through a convolution layer, sequentially performing global weighting pooling, dot multiplication convolution and S-shaped growth curves, and obtaining space attention weight through a network; finally, the spatial attention weight and the V weight are used for realizing the output of the channel attention to obtain a spatial attention weighted feature;
specifically, 1), residual small module of hole convolution: the hole convolution has a parameter of hole convolution Rate (d) which can be set, and the specific meaning is that (d-1) 0 s are filled in the convolution kernel or the number of pixel is skipped; therefore, when different partition rates are set, the receptive fields are different, that is, multi-scale information is obtained; continuing with the parameter definition above, the convolution kernel size of the hole convolution is:
n=k+(k-1)*(d-1) (2)
the width of the output signature is therefore:
Figure GDA0003690955120000051
the lengths are the same;
the hole convolution can freely expand the receptive field without introducing extra parameters, but if the resolution is increased, the overall calculation amount of the algorithm is also increased, so that blind increase cannot be realized; the hole convolution has a grid problem, namely information is lost, and information acquired remotely has no correlation (small targets are obvious);
to increase the reception field and reduce the calculation amount, the three convolution layer parameters of the dimension reduction convolution layer, the cavity convolution layer and the dimension increase convolution layer are respectively set as follows: the input dimension is C, the output dimension is C/4, k is 1, and s is 1; the input dimension is C/4, the output dimension is C/4, k is 3, s is 1, p is 2, and d is 2; the input dimension is C, the output dimension is C/4, k is 1, and s is 1;
2) the attention mechanism comprises the following specific steps: let the input of the attention residual error module be V ∈ R H×W×C The learned residual is mapped as V' e R H×W×C The multiple of dimensionality reduction is r; the output of the attention residual module is
Figure GDA0003690955120000052
Then there are:
Figure GDA0003690955120000053
wherein denotes element-wise multiplication in a spatial context; spatial attention weight α ∈ R H×W Produced by the following means; firstly, convolution weight W is obtained through convolution operation 1 ∈R H×W×C/r (ii) a Then, performing Global weighted pooling (GDC) on the obtained feature map, and if the number of packets in the Convolution is G and the number of output feature maps is N, when equations (5) and (6) are satisfied, the effect of GDC is achieved:
Figure GDA0003690955120000054
k=H=W (6)
the number of groups and the number of output feature maps are equal to the number of input feature maps, and the size of a convolution kernel is the same as that of the input feature maps; the learned convolution weight is W 2 ∈R 1×1×C/r GDC gives a learnable weight to each position, and meanwhile regularizes the whole network structure in a space range to prevent overfitting; performing dot multiplication on the output, wherein the size of convolution kernel is 1 × 1 × C/r, and the operation in the step is to calculate the W 2 Weighted combination is performed in the depth direction to generate W 3 ∈R 1×1×C (ii) a Finally, obtaining the spatial attention weight beta (Sigmoid) through an S-shaped growth curve 3 V) in which W 3 Represents convolution weight, Sigmoid represents S-type growth curve; finally, theβ is re-weighted on the input V of the attention residual module to achieve the output of the channel attention, resulting in a spatial attention weighted feature at the i, j-th element in the spatial domain:
Figure GDA0003690955120000055
wherein beta is i.j 、V i.j Denotes the value of β and V in space at the i, j-th element, which denotes the element-by-element multiplication between the i, j-th elements;
by way of example, W-14, C-512, and r-4 may be provided.
(1.3.2), inputting the result into a full connection layer after dimensionality reduction of the convolution layer; in order to be superposed with the final result of the first branch, finally, the shape of the first branch is reshaped to obtain a convolution layer with the same size as the first branch; specifically, the method comprises the following steps:
1) reducing the dimension of the characteristics obtained in the step (1.3.1) through a dimension reduction volume; by way of example, here the parameters are set to: the input dimension is C, the output dimension is C/2, k is 3, and s is 1;
2) sending the characteristics obtained in the step 1) into a Full Convolution (FC), wherein an FC layer has different attributes from an FCN, and the FCN predicts each pixel according to a local receptive field and shares parameters at different spatial positions; the FC layer is location sensitive, and the prediction of different spatial locations is realized by changing the parameter set; therefore, the joint prediction method has the capability of adapting to different spatial positions, and is helpful to predict at each spatial position by utilizing the global information of the whole scheme and distinguish and identify the independent joint parts belonging to the same object; the method is not only efficient, but also allows more sample training parameters to be used in the FC layer, avoids overfitting and the like, and improves the universality; as a specific embodiment, the feature size used is 14 × 14, so the FC layer would produce a 196 × 1 × 1 vector; in order to fuse the result with the output of the second branch, the dimension of the second branch needs to be kept consistent, so that the obtained vector is reshaped, and the reshaped dimension is consistent with the dimension of the output of the first branch.
Further, the specific steps of inputting the feature diagram one to the branch circuit two in the step (1.3) are as follows: taking the first cavity convolution layer, the second cavity convolution layer and the third cavity convolution layer which are arranged in the second branch and have different spatial convolution rates as a combination, and obtaining different receptive fields through the combination so as to obtain multi-scale information; the specific parameter calculation conditions are as follows:
let the magnitude of the receptive field of the j-th layer be rf j Then, the calculation formula is:
rf j =(n-1)*j+1 (8)
wherein rf 0 1 is ═ 1; as shown in fig. 3, from left to right, belong to the top-bottom relationship (convolution from left to right); the three convolution kernels are all k-3 and d-2, n is 5 according to formula (2), and the field of view of the central pixel of the third layer (rightmost) is 13 according to formula (8); however, only 75% of the actual calculation is involved; to prevent this problem, the design forms 3 convolutional layers into one group, then each group uses a continuously increasing void rate, and the other groups repeat; the goal is to have the final receptive field fully cover the entire area (without any holes or missing edges); at this time, the following requirements are met:
M i =Max[M i+1 -2r i ,2r i -M i+1 ] (9)
wherein, Max [ a, b ]]Means to find the maximum value of a and b, M i Refers to the maximum hole convolution rate, M, at the i-layer i+1 Refers to the maximum hole convolution rate, r, at the (i +1) layer i Is the hole convolution rate of the ith layer, the design goal being to let M be 2 ≤k;
Assuming that the value of k is 3, there is a second layer, k is 3, which can be derived from (9) if r is [1,2,5]
M 2 =Max[M 3 -2r 2 ,-M 3 +2r 2 ),
r 2 =Max[1,-1,2]=2<3
The condition is satisfied; up to this point, as described above, r ═ 1,2,5 may be selected as a group.
Further, superposing the output parameters of the two branches to obtain a second characteristic diagram; performing resolution restoration on the feature map II by using an inverse convolution layer, performing up-sampling by using double linear interpolation to generate high-resolution output, and modeling the position of the joint point of the human body into an independent-heating binary mask to obtain joint point information;
specifically, 1), adding two branches, further fusing features, improving the diversity of information output by a network and improving the quality of an output mask by combining the prediction results of the two visual fields so as to obtain better joint point prediction;
2) the final fused features described in 1) above are first subjected to resolution restoration by deconvolution to obtain dimensions marked by length, width and dimension, denoted as W × H × K (for example, 28 × 28 × 17 is desirable), and then subjected to upsampling by twice linear interpolation to generate 2W × 2H × K (for example, 56 × 56 × 17 is desirable) high-resolution output;
3) modeling the positions of the joint points as unique heat binary masks, outputting the unique heat binary masks by using the high resolution of 2 Wx2HxK described in the above 5, making unique heat M xM (for example, 56 x 56 can be taken) binary masks for each of the K joint points of the example, marking only one pixel in the binary masks as a foreground, and completing training to obtain the K joint points;
in addition, during training, for each real joint point with label, the cross-entropy loss on the softmax output of M ^2 is minimized (which helps detect a single point); the K joints are still treated independently, corresponding to one joint type (e.g. eye, left shoulder).
Through the steps, K (for example, 17) joint points needing to be detected can be definitely calibrated finally, so that the problem of disordered joint point detection is effectively solved in a simple scene, the accuracy and the efficiency are improved, and the method can be well adapted to a complex scene.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of embodiments of the present invention; other variations are possible within the scope of the invention; thus, by way of example, and not limitation, alternative configurations of embodiments of the invention may be considered consistent with the teachings of the present invention; accordingly, the embodiments of the invention are not limited to the embodiments explicitly described and depicted.

Claims (3)

1. A human body key point detection method based on attention residual error module and branch fusion is characterized by comprising the following steps:
step (1.1), carrying out feature processing on an input picture by using a feature extraction network to obtain a feature map; inputting the feature map into a region generation network to obtain a target suggestion box, and then performing region pooling operation by combining the feature map to obtain a region-of-interest feature map;
step (1.2), inputting the obtained characteristic diagram of the region of interest into the convolutional layer for characteristic extraction operation, and recording the obtained result as a characteristic diagram I;
step (1.3), inputting the first feature graph into the first branch and the second branch respectively for feature processing;
the specific steps of the branch for carrying out feature processing on the first feature map are as follows:
(1.3.1) designing two identical attention residual modules at the input part of the branch I, connecting the front layer and the rear layer in the network by matching with a data bypass, connecting the two attention residual modules in pairs and superposing the two attention residual modules at a pixel level, and enabling each layer of module in the network to receive feature mapping from the front layers of modules in a cascading mode;
(1.3.2), inputting the product into a full connection layer after dimensionality reduction of the convolution layer; finally, reshaping the shape of the winding layer to obtain a winding layer with the same size as the first branch;
the operation of the branch two-to-feature graph I for feature processing is as follows:
taking the first cavity convolution layer, the second cavity convolution layer and the third cavity convolution layer which are arranged in the second branch and have different space convolution rates as a combination, and obtaining different receptive fields through the combination so as to obtain multi-scale information;
and (1.4) overlapping results of the first characteristic diagram processed in the first branch and the second branch, recording the result as a second characteristic diagram, performing deconvolution on the second characteristic diagram, performing up-sampling, and finally obtaining joint point information through an unique-heat binary mask.
2. The human body key point detection method based on attention residual module and branch fusion as claimed in claim 1, wherein in step (1.3.1), the attention residual module is composed of a residual small module of hole convolution in cooperation with an attention mechanism:
wherein, the residual error small module of the cavity convolution is as follows: the method comprises three convolution layers, namely a dimension reduction convolution layer, a cavity convolution layer and a dimension lifting convolution layer, wherein convolution weights are obtained through convolution operations of the three convolution layers and are set as V;
the attention mechanism comprises the following specific steps: after performing convolution operation on the V through a convolution layer, sequentially performing global weighting pooling, dot product convolution and S-shaped growth curves, and obtaining space attention weight through a network; finally, the spatial attention weight and the V weight are used for realizing the output of the channel attention, and the spatial attention weighted characteristic is obtained.
3. The human body key point detection method based on the attention residual error module and the branch circuit fusion according to any one of claims 1-2, characterized in that output parameters of two branch circuits are superposed to obtain a second feature diagram; and finally, modeling the position of the joint point of the human body as a unique-hot binary mask to obtain joint point information.
CN202010410104.5A 2020-05-15 2020-05-15 Human body key point detection method based on attention residual error module and branch fusion Active CN111626159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010410104.5A CN111626159B (en) 2020-05-15 2020-05-15 Human body key point detection method based on attention residual error module and branch fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010410104.5A CN111626159B (en) 2020-05-15 2020-05-15 Human body key point detection method based on attention residual error module and branch fusion

Publications (2)

Publication Number Publication Date
CN111626159A CN111626159A (en) 2020-09-04
CN111626159B true CN111626159B (en) 2022-07-26

Family

ID=72271858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010410104.5A Active CN111626159B (en) 2020-05-15 2020-05-15 Human body key point detection method based on attention residual error module and branch fusion

Country Status (1)

Country Link
CN (1) CN111626159B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653899B (en) * 2020-12-18 2022-07-12 北京工业大学 Network live broadcast video feature extraction method based on joint attention ResNeSt under complex scene
CN112733672A (en) * 2020-12-31 2021-04-30 深圳一清创新科技有限公司 Monocular camera-based three-dimensional target detection method and device and computer equipment
CN112784856A (en) * 2021-01-29 2021-05-11 长沙理工大学 Channel attention feature extraction method and identification method of chest X-ray image
CN113012229A (en) * 2021-03-26 2021-06-22 北京华捷艾米科技有限公司 Method and device for positioning human body joint points
CN113269077B (en) * 2021-05-19 2023-04-07 青岛科技大学 Underwater acoustic communication signal modulation mode identification method based on improved gating network and residual error network
CN115019338B (en) * 2022-04-27 2023-09-22 淮阴工学院 Multi-person gesture estimation method and system based on GAMHR-Net
CN114783065B (en) * 2022-05-12 2024-03-29 大连大学 Parkinsonism early warning method based on human body posture estimation
CN115546779B (en) * 2022-11-26 2023-02-07 成都运荔枝科技有限公司 Logistics truck license plate recognition method and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
CN111047515A (en) * 2019-12-29 2020-04-21 兰州理工大学 Cavity convolution neural network image super-resolution reconstruction method based on attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516670A (en) * 2019-08-26 2019-11-29 广西师范大学 Suggested based on scene grade and region from the object detection method for paying attention to module
CN111047515A (en) * 2019-12-29 2020-04-21 兰州理工大学 Cavity convolution neural network image super-resolution reconstruction method based on attention mechanism

Also Published As

Publication number Publication date
CN111626159A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111626159B (en) Human body key point detection method based on attention residual error module and branch fusion
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN111582483B (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN111145131A (en) Infrared and visible light image fusion method based on multi-scale generation type countermeasure network
CN113283525B (en) Image matching method based on deep learning
CN107767419A (en) A kind of skeleton critical point detection method and device
CN113160375B (en) Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm
CN106886986B (en) Image interfusion method based on adaptive group structure sparse dictionary study
CN109993103A (en) A kind of Human bodys&#39; response method based on point cloud data
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN112767467B (en) Double-image depth estimation method based on self-supervision deep learning
CN113792641B (en) High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism
CN115359372A (en) Unmanned aerial vehicle video moving object detection method based on optical flow network
CN110660020A (en) Image super-resolution method of countermeasure generation network based on fusion mutual information
CN111833400B (en) Camera pose positioning method
CN112084934A (en) Behavior identification method based on two-channel depth separable convolution of skeletal data
CN114663509A (en) Self-supervision monocular vision odometer method guided by key point thermodynamic diagram
CN116934592A (en) Image stitching method, system, equipment and medium based on deep learning
CN114170290A (en) Image processing method and related equipment
Zhou et al. PADENet: An efficient and robust panoramic monocular depth estimation network for outdoor scenes
CN111539288B (en) Real-time detection method for gestures of both hands
CN112419387B (en) Unsupervised depth estimation method for solar greenhouse tomato plant image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant