CN112949498A - Target key point detection method based on heterogeneous convolutional neural network - Google Patents
Target key point detection method based on heterogeneous convolutional neural network Download PDFInfo
- Publication number
- CN112949498A CN112949498A CN202110242260.XA CN202110242260A CN112949498A CN 112949498 A CN112949498 A CN 112949498A CN 202110242260 A CN202110242260 A CN 202110242260A CN 112949498 A CN112949498 A CN 112949498A
- Authority
- CN
- China
- Prior art keywords
- convolution
- target
- heterogeneous
- key point
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 14
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 10
- 238000011176 pooling Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 abstract description 5
- 239000011800 void material Substances 0.000 abstract description 2
- 239000011796 hollow space material Substances 0.000 abstract 1
- 101000742346 Crotalus durissus collilineatus Zinc metalloproteinase/disintegrin Proteins 0.000 description 4
- 101000872559 Hediste diversicolor Hemerythrin Proteins 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target key point detection method based on a heterogeneous convolutional neural network. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with convolution kernel size of 3 multiplied by 3 in bootleneck by heterogeneous convolution; after the last layer of the backbone network, a void space pyramid pooling layer is added; finally, feature pyramid fusion is carried out on the feature maps with the resolutions of 8 × 8, 16 × 16, 32 × 32 and 64 × 64, feature maps with the resolution of 64 × 64 and the number of channels of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and a posture estimation result is output. The model uses a hollow space feature pyramid pooling layer and a feature pyramid fusion module. A novel lightweight target key point detection algorithm is constructed, and multi-target posture estimation can be performed on pictures of any size.
Description
Technical Field
The invention belongs to the technical field of computer vision and digital image processing, and particularly relates to a target key point detection method based on a later convolution neural network.
Background
Human body posture estimation is a basic problem of human body behavior recognition, and the obtained skeleton structure can provide high-level semantics for human body action recognition. The estimation of the human body posture itself has many applications in reality, for example: sports action specification, body correction, virtual reality games, video monitoring, robot motion control and the like.
The existing human body posture estimation methods are divided into two types: a bottom-up body pose estimation method and a top-down body pose estimation method. The method adopts a top-down method, and the existing top-down method has relatively high recall rate and accuracy. However, the accuracy of the model is maximized, but the parameters and floating point operation amount of the model are increased. Human body posture estimation falls to the ground in many practical applications, some of which are deployed on a mobile phone end and a microcomputer, and because the storage capacity and the calculation amount of equipment are limited, the parameters and the floating point calculation amount of an optimization model are one of important requirements of improvement on human body posture estimation.
Aiming at the problem of large model parameter and floating point operand, the method combines the traditional convolution and the grouping convolution to provide a heterogeneous convolution, and reduces the parameter and the floating point operand of the model on the premise of keeping the precision and the receptive field.
Disclosure of Invention
The invention relates to a target key point detection algorithm based on a heterogeneous convolutional neural network, which is characterized in that a heterogeneous convolution is provided based on standard convolution and grouping convolution. The network model of the invention is divided into three parts, namely a main network part, a cavity space pyramid pooling part and a characteristic pyramid module part. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with convolution kernel size of 3 multiplied by 3 in bootleneck by heterogeneous convolution; after the last layer of the backbone network, a void space pyramid pooling layer is added; finally, feature pyramid fusion is carried out on the feature maps with the resolutions of 8 × 8, 16 × 16, 32 × 32 and 64 × 64, feature maps with the resolution of 64 × 64 and the number of channels of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and a posture estimation result is output.
The invention mainly provides a heterogeneous convolution based on combination of standard convolution and grouping convolution, and a cavity space feature pyramid pooling layer and a feature pyramid fusion module are used in a model. A new lightweight target key point detection algorithm is constructed, and multi-target posture estimation (human body key point detection) can be carried out on pictures with any sizes. The method mainly comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained fastercnnn target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure.
Step 2: and then, widening the target frame by 20% on the basis of the detected target frame according to the coordinates acquired by the target detector in the step 1, and separately intercepting the target frame.
And step 3: and (4) inputting the single target frame intercepted in the step (2) into a heterogeneous convolutional neural network for target key point detection.
Step 3-1: the single target image is then resized to a 256x256 resolution image.
Step 3-2: feature extraction is performed using a ResNet50 network with an expansion coefficient of 2 as a skeleton network so that the resolution of the feature map is 8 × 8, in which a packet convolution with a packet of 4 replaces a 3 × 3 standard convolution with a step size of 1, and a packet convolution with a packet of 16 replaces a1 × 1 standard convolution with a step size of 1, to ensure that the parameter amount and floating point operation amount of the model are reduced while a large receptive field is retained.
The parameters of the conventional convolution are:
Scp=N×C×K×K
the parameters of the packet convolution are:
the parameters of the heterogeneous convolution are:
where N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of groups of the group convolution, K, K1、K2Are the convolution kernel size.
Gcp≤SHp≤Scp
Compared with grouping convolution, heterogeneous convolution effectively integrates channels of the characteristic diagram, and compared with standard convolution, heterogeneous convolution improves the receptive field of the model.
Step 3-3: pyramid pooling the 8 x 8 convolved added features obtained in step 3-2.
Step 3-4: and (4) performing up-sampling on the feature map obtained in the step 3-3 for three times to obtain a feature map with the resolution of 64x 64.
Step 3-4: and splicing the 16 × 16, 32 × 32 and 64 × 64 feature maps obtained by the backbone network in the step 3-2 with the feature maps with the corresponding resolutions obtained by up-sampling in the step 3-3 by using a hop connection layer.
Step 3-5: and (3) adjusting the channels of the feature map with the resolution of 64x64 obtained in the step 3-4 to the number of key points in the data set by convolution of 1x1, thereby outputting the coordinates of the corresponding key points.
And optimizing the network in a continuous iteration mode of random gradient descent in the training process. The loss function used is the mean square error loss function:
wherein m is the number of key points, yiTo be the coordinates of the labeled group _ truth keypoints,the coordinates of the key points predicted by the model are obtained, n is the number of training samples in each batch, and i is the index of the current key point.
And 4, step 4: and (4) corresponding the detected key points of the single-target human body to the picture in the step (1), thereby obtaining the result of the multi-human body posture estimation.
The invention provides a heterogeneous convolution based on standard convolution and packet convolution. The convolution significantly reduces the amount of floating point operations compared to a standard convolution and possesses a field of the same size as a standard convolution. Compared with the grouping convolution, the heterogeneous convolution effectively integrates the convolution channel, and the accuracy of the model is improved. The method proposed by the invention is verified on the MPII data set. The experimental result shows that the accuracy of the method is improved by 1.2% compared with that of the original ResNet-50 method and the floating point operation amount is reduced by 72.18% by using a backbone network of which the standard convolution is replaced by heterogeneous convolution and adding a cavity space pyramid pooling layer and a characteristic pyramid fusion module.
Drawings
FIG. 1 is a diagram of a heterogeneous convolutional neural network model based on RseNet50
FIG. 2 is a block diagram of a block convolution, standard convolution, and heterogeneous convolution
FIG. 3 human posture estimation detection effect diagram
Detailed Description
The invention is examined below with reference to examples for its superiority over other algorithms.
We trained the model using the training set of MPII data sets, and tested the validity of the algorithm with the validation set of MPII data sets. The experimental environment was Ubuntu 18.04.3LTS, Intel (R) Xeon (R) Silver 4110CPU@2.10Hzx 32, memory 64g, graphics card RTX2080Ti, and software platforms of cuda10.0.130, cudnn7.5, pytorech 1.4, and python 3.6.
During training, the batch size is set to 64, and the resolution size of the image is set to 256 × 256. The initial learning rate is 0.001, the learning rate is changed at 170 th and 200 th epochs, and the learning rate is reduced by 10% at 170 th and 200 th epochs, so that 210 epochs are trained.
To verify the accuracy and efficiency of the improved algorithm, we performed model comparisons for the estimated networks using ResNet18 and ResNet 50. Experimental results show that the method can reduce model parameters and floating point operation amount under the condition of ensuring accuracy. The results are shown in Table 1.
TABLE 1 comparison of results in MPII data set
Claims (3)
1. A target key point detection method based on a heterogeneous convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained fastercnnn target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure;
step 2: secondly, widening the target frame by 20% on the basis of the detected target frame by using the coordinates obtained by the target detector in the step 1, and independently intercepting the widened 20% of the target frame;
and step 3: inputting the single target frame obtained by the step 2 into a heterogeneous convolutional neural network for target key point detection;
and 4, step 4: and (4) corresponding the detected key points of the single-target human body to the picture in the step (1), thereby obtaining the result of the multi-human body posture estimation.
2. The method for detecting the target key point based on the heterogeneous convolutional neural network as claimed in claim 1, wherein: in the step 3, the method specifically comprises the following steps of step 3-1: then the size of the single target image is adjusted to be an image with 256x256 resolution;
step 3-2: performing feature extraction using a ResNet50 network with an expansion coefficient of 2 as a skeleton network so that the resolution of the feature map is 8 × 8, replacing 3 × 3 standard convolution with a step size of 1 with grouped convolution with a group of 4, and replacing 1 × 1 standard convolution with a step size of 1 with grouped convolution with a group of 16;
the parameters of the conventional convolution are:
Scp=N×C×K×K
the parameters of the packet convolution are:
the parameters of the heterogeneous convolution are:
where N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of groups of the group convolution, K, K1、K2Are the convolution kernel size;
Gcp≤SHp≤Scp
compared with the grouping convolution, the heterogeneous convolution effectively integrates the channel of the characteristic diagram, and compared with the standard convolution, the heterogeneous convolution improves the receptive field of the model;
step 3-3: pooling the 8 × 8 convolved added feature pyramid obtained in step 3-2;
step 3-4: carrying out up-sampling on the feature map obtained in the step 3-3 for three times to obtain a feature map with the resolution of 64x 64;
step 3-4: splicing the 16 × 16, 32 × 32 and 64 × 64 feature maps obtained by the backbone network in the step 3-2 with the feature map with the corresponding resolution obtained by up-sampling in the step 3-3 by using a hop connection layer;
step 3-5: and (3) adjusting the channels of the feature map with the resolution of 64x64 obtained in the step 3-4 to the number of key points in the data set by convolution of 1x1, thereby outputting the coordinates of the corresponding key points.
3. The method for detecting the target key point based on the heterogeneous convolutional neural network as claimed in claim 2, wherein: optimizing the network in a random gradient descent continuous iteration mode in the training process; the loss function used is the mean square error loss function:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242260.XA CN112949498B (en) | 2021-03-04 | 2021-03-04 | Target key point detection method based on heterogeneous convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242260.XA CN112949498B (en) | 2021-03-04 | 2021-03-04 | Target key point detection method based on heterogeneous convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112949498A true CN112949498A (en) | 2021-06-11 |
CN112949498B CN112949498B (en) | 2023-11-14 |
Family
ID=76247778
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110242260.XA Active CN112949498B (en) | 2021-03-04 | 2021-03-04 | Target key point detection method based on heterogeneous convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112949498B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762137A (en) * | 2021-09-02 | 2021-12-07 | 甘肃同兴智能科技发展有限责任公司 | Remote sensing image forest region carbon sink calculation method based on deep learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190303762A1 (en) * | 2018-03-30 | 2019-10-03 | Xilinx, Inc. | Methods of optimization of computational graphs of neural networks |
US20190311249A1 (en) * | 2018-04-04 | 2019-10-10 | Megvii (Beijing) Technology Co., Ltd. | Image processing method, image processing apparatus, and computer-readable storage medium |
CN111144209A (en) * | 2019-11-25 | 2020-05-12 | 浙江工商大学 | Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111402226A (en) * | 2020-03-13 | 2020-07-10 | 浙江工业大学 | Surface defect detection method based on cascade convolution neural network |
CN111738111A (en) * | 2020-06-10 | 2020-10-02 | 杭州电子科技大学 | Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid |
CN111738237A (en) * | 2020-04-29 | 2020-10-02 | 上海海事大学 | Target detection method of multi-core iteration RPN based on heterogeneous convolution |
CN112184752A (en) * | 2020-09-08 | 2021-01-05 | 北京工业大学 | Video target tracking method based on pyramid convolution |
-
2021
- 2021-03-04 CN CN202110242260.XA patent/CN112949498B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190303762A1 (en) * | 2018-03-30 | 2019-10-03 | Xilinx, Inc. | Methods of optimization of computational graphs of neural networks |
US20190311249A1 (en) * | 2018-04-04 | 2019-10-10 | Megvii (Beijing) Technology Co., Ltd. | Image processing method, image processing apparatus, and computer-readable storage medium |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111144209A (en) * | 2019-11-25 | 2020-05-12 | 浙江工商大学 | Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network |
CN111402226A (en) * | 2020-03-13 | 2020-07-10 | 浙江工业大学 | Surface defect detection method based on cascade convolution neural network |
CN111738237A (en) * | 2020-04-29 | 2020-10-02 | 上海海事大学 | Target detection method of multi-core iteration RPN based on heterogeneous convolution |
CN111738111A (en) * | 2020-06-10 | 2020-10-02 | 杭州电子科技大学 | Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid |
CN112184752A (en) * | 2020-09-08 | 2021-01-05 | 北京工业大学 | Video target tracking method based on pyramid convolution |
Non-Patent Citations (6)
Title |
---|
SHUO CHANG等: "Siamese Feature Pyramid Network for Visual Tracking", 2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS IN CHINA (ICCC WORKSHOPS), pages 164 - 168 * |
卢鑫;田莹;: "基于深度学习的人脸活体检测", 辽宁科技大学学报, vol. 42, no. 05, pages 389 - 397 * |
尹晓杰等: "人体动作识别中特征提取算法的研究综述", 计算机科学, vol. 46, no. 10, pages 157 - 160 * |
徐晗智;艾中良;张志超;: "一种基于通道重排的轻量级目标检测网络", 计算机与现代化, no. 02, pages 92 - 96 * |
王桢: "联合检测和重识别的行人搜索研究及嵌入式实现", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 138 - 1302 * |
胡鑫欣: "基于深度神经网络的图像分割技术及其在盲人视觉辅助中的应用", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 138 - 2233 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762137A (en) * | 2021-09-02 | 2021-12-07 | 甘肃同兴智能科技发展有限责任公司 | Remote sensing image forest region carbon sink calculation method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN112949498B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111126472B (en) | SSD (solid State disk) -based improved target detection method | |
CN111354017B (en) | Target tracking method based on twin neural network and parallel attention module | |
CN108596258B (en) | Image classification method based on convolutional neural network random pooling | |
CN110598788B (en) | Target detection method, target detection device, electronic equipment and storage medium | |
CN112967341B (en) | Indoor visual positioning method, system, equipment and storage medium based on live-action image | |
CN112381061B (en) | Facial expression recognition method and system | |
CN113869282B (en) | Face recognition method, hyper-resolution model training method and related equipment | |
CN109948457B (en) | Real-time target recognition method based on convolutional neural network and CUDA acceleration | |
CN111967471A (en) | Scene text recognition method based on multi-scale features | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN112131959A (en) | 2D human body posture estimation method based on multi-scale feature reinforcement | |
CN114627502A (en) | Improved YOLOv 5-based target recognition detection method | |
CN113837275B (en) | Improved YOLOv3 target detection method based on expanded coordinate attention | |
CN113052185A (en) | Small sample target detection method based on fast R-CNN | |
CN112036475A (en) | Fusion module, multi-scale feature fusion convolutional neural network and image identification method | |
CN112966574A (en) | Human body three-dimensional key point prediction method and device and electronic equipment | |
CN111274999A (en) | Data processing method, image processing method, device and electronic equipment | |
CN113298032A (en) | Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
CN116052025A (en) | Unmanned aerial vehicle video image small target tracking method based on twin network | |
CN112949498A (en) | Target key point detection method based on heterogeneous convolutional neural network | |
CN108876776B (en) | Classification model generation method, fundus image classification method and device | |
CN111222534A (en) | Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss | |
CN112149518A (en) | Pine cone detection method based on BEGAN and YOLOV3 models | |
CN113327227B (en) | MobileneetV 3-based wheat head rapid detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |