CN112949498A - Target key point detection method based on heterogeneous convolutional neural network - Google Patents

Target key point detection method based on heterogeneous convolutional neural network Download PDF

Info

Publication number
CN112949498A
CN112949498A CN202110242260.XA CN202110242260A CN112949498A CN 112949498 A CN112949498 A CN 112949498A CN 202110242260 A CN202110242260 A CN 202110242260A CN 112949498 A CN112949498 A CN 112949498A
Authority
CN
China
Prior art keywords
convolution
target
heterogeneous
key point
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110242260.XA
Other languages
Chinese (zh)
Other versions
CN112949498B (en
Inventor
何宁
尹晓杰
于海港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Union University
Original Assignee
Beijing Union University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Union University filed Critical Beijing Union University
Priority to CN202110242260.XA priority Critical patent/CN112949498B/en
Publication of CN112949498A publication Critical patent/CN112949498A/en
Application granted granted Critical
Publication of CN112949498B publication Critical patent/CN112949498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target key point detection method based on a heterogeneous convolutional neural network. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with convolution kernel size of 3 multiplied by 3 in bootleneck by heterogeneous convolution; after the last layer of the backbone network, a void space pyramid pooling layer is added; finally, feature pyramid fusion is carried out on the feature maps with the resolutions of 8 × 8, 16 × 16, 32 × 32 and 64 × 64, feature maps with the resolution of 64 × 64 and the number of channels of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and a posture estimation result is output. The model uses a hollow space feature pyramid pooling layer and a feature pyramid fusion module. A novel lightweight target key point detection algorithm is constructed, and multi-target posture estimation can be performed on pictures of any size.

Description

Target key point detection method based on heterogeneous convolutional neural network
Technical Field
The invention belongs to the technical field of computer vision and digital image processing, and particularly relates to a target key point detection method based on a later convolution neural network.
Background
Human body posture estimation is a basic problem of human body behavior recognition, and the obtained skeleton structure can provide high-level semantics for human body action recognition. The estimation of the human body posture itself has many applications in reality, for example: sports action specification, body correction, virtual reality games, video monitoring, robot motion control and the like.
The existing human body posture estimation methods are divided into two types: a bottom-up body pose estimation method and a top-down body pose estimation method. The method adopts a top-down method, and the existing top-down method has relatively high recall rate and accuracy. However, the accuracy of the model is maximized, but the parameters and floating point operation amount of the model are increased. Human body posture estimation falls to the ground in many practical applications, some of which are deployed on a mobile phone end and a microcomputer, and because the storage capacity and the calculation amount of equipment are limited, the parameters and the floating point calculation amount of an optimization model are one of important requirements of improvement on human body posture estimation.
Aiming at the problem of large model parameter and floating point operand, the method combines the traditional convolution and the grouping convolution to provide a heterogeneous convolution, and reduces the parameter and the floating point operand of the model on the premise of keeping the precision and the receptive field.
Disclosure of Invention
The invention relates to a target key point detection algorithm based on a heterogeneous convolutional neural network, which is characterized in that a heterogeneous convolution is provided based on standard convolution and grouping convolution. The network model of the invention is divided into three parts, namely a main network part, a cavity space pyramid pooling part and a characteristic pyramid module part. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with convolution kernel size of 3 multiplied by 3 in bootleneck by heterogeneous convolution; after the last layer of the backbone network, a void space pyramid pooling layer is added; finally, feature pyramid fusion is carried out on the feature maps with the resolutions of 8 × 8, 16 × 16, 32 × 32 and 64 × 64, feature maps with the resolution of 64 × 64 and the number of channels of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and a posture estimation result is output.
The invention mainly provides a heterogeneous convolution based on combination of standard convolution and grouping convolution, and a cavity space feature pyramid pooling layer and a feature pyramid fusion module are used in a model. A new lightweight target key point detection algorithm is constructed, and multi-target posture estimation (human body key point detection) can be carried out on pictures with any sizes. The method mainly comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained fastercnnn target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure.
Step 2: and then, widening the target frame by 20% on the basis of the detected target frame according to the coordinates acquired by the target detector in the step 1, and separately intercepting the target frame.
And step 3: and (4) inputting the single target frame intercepted in the step (2) into a heterogeneous convolutional neural network for target key point detection.
Step 3-1: the single target image is then resized to a 256x256 resolution image.
Step 3-2: feature extraction is performed using a ResNet50 network with an expansion coefficient of 2 as a skeleton network so that the resolution of the feature map is 8 × 8, in which a packet convolution with a packet of 4 replaces a 3 × 3 standard convolution with a step size of 1, and a packet convolution with a packet of 16 replaces a1 × 1 standard convolution with a step size of 1, to ensure that the parameter amount and floating point operation amount of the model are reduced while a large receptive field is retained.
The parameters of the conventional convolution are:
Scp=N×C×K×K
the parameters of the packet convolution are:
Figure BDA0002962638560000021
the parameters of the heterogeneous convolution are:
Figure BDA0002962638560000022
where N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of groups of the group convolution, K, K1、K2Are the convolution kernel size.
Gcp≤SHp≤Scp
Compared with grouping convolution, heterogeneous convolution effectively integrates channels of the characteristic diagram, and compared with standard convolution, heterogeneous convolution improves the receptive field of the model.
Step 3-3: pyramid pooling the 8 x 8 convolved added features obtained in step 3-2.
Step 3-4: and (4) performing up-sampling on the feature map obtained in the step 3-3 for three times to obtain a feature map with the resolution of 64x 64.
Step 3-4: and splicing the 16 × 16, 32 × 32 and 64 × 64 feature maps obtained by the backbone network in the step 3-2 with the feature maps with the corresponding resolutions obtained by up-sampling in the step 3-3 by using a hop connection layer.
Step 3-5: and (3) adjusting the channels of the feature map with the resolution of 64x64 obtained in the step 3-4 to the number of key points in the data set by convolution of 1x1, thereby outputting the coordinates of the corresponding key points.
And optimizing the network in a continuous iteration mode of random gradient descent in the training process. The loss function used is the mean square error loss function:
Figure BDA0002962638560000031
wherein m is the number of key points, yiTo be the coordinates of the labeled group _ truth keypoints,
Figure BDA0002962638560000032
the coordinates of the key points predicted by the model are obtained, n is the number of training samples in each batch, and i is the index of the current key point.
And 4, step 4: and (4) corresponding the detected key points of the single-target human body to the picture in the step (1), thereby obtaining the result of the multi-human body posture estimation.
The invention provides a heterogeneous convolution based on standard convolution and packet convolution. The convolution significantly reduces the amount of floating point operations compared to a standard convolution and possesses a field of the same size as a standard convolution. Compared with the grouping convolution, the heterogeneous convolution effectively integrates the convolution channel, and the accuracy of the model is improved. The method proposed by the invention is verified on the MPII data set. The experimental result shows that the accuracy of the method is improved by 1.2% compared with that of the original ResNet-50 method and the floating point operation amount is reduced by 72.18% by using a backbone network of which the standard convolution is replaced by heterogeneous convolution and adding a cavity space pyramid pooling layer and a characteristic pyramid fusion module.
Drawings
FIG. 1 is a diagram of a heterogeneous convolutional neural network model based on RseNet50
FIG. 2 is a block diagram of a block convolution, standard convolution, and heterogeneous convolution
FIG. 3 human posture estimation detection effect diagram
Detailed Description
The invention is examined below with reference to examples for its superiority over other algorithms.
We trained the model using the training set of MPII data sets, and tested the validity of the algorithm with the validation set of MPII data sets. The experimental environment was Ubuntu 18.04.3LTS, Intel (R) Xeon (R) Silver 4110CPU@2.10Hzx 32, memory 64g, graphics card RTX2080Ti, and software platforms of cuda10.0.130, cudnn7.5, pytorech 1.4, and python 3.6.
During training, the batch size is set to 64, and the resolution size of the image is set to 256 × 256. The initial learning rate is 0.001, the learning rate is changed at 170 th and 200 th epochs, and the learning rate is reduced by 10% at 170 th and 200 th epochs, so that 210 epochs are trained.
To verify the accuracy and efficiency of the improved algorithm, we performed model comparisons for the estimated networks using ResNet18 and ResNet 50. Experimental results show that the method can reduce model parameters and floating point operation amount under the condition of ensuring accuracy. The results are shown in Table 1.
TABLE 1 comparison of results in MPII data set
Figure BDA0002962638560000041
Figure BDA0002962638560000042
Wherein
Figure BDA0002962638560000043
Is a constant, l is 60% PCKh @0.5 of the head diagonal in the group _ channel is the finger limit
Figure BDA0002962638560000044

Claims (3)

1. A target key point detection method based on a heterogeneous convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained fastercnnn target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure;
step 2: secondly, widening the target frame by 20% on the basis of the detected target frame by using the coordinates obtained by the target detector in the step 1, and independently intercepting the widened 20% of the target frame;
and step 3: inputting the single target frame obtained by the step 2 into a heterogeneous convolutional neural network for target key point detection;
and 4, step 4: and (4) corresponding the detected key points of the single-target human body to the picture in the step (1), thereby obtaining the result of the multi-human body posture estimation.
2. The method for detecting the target key point based on the heterogeneous convolutional neural network as claimed in claim 1, wherein: in the step 3, the method specifically comprises the following steps of step 3-1: then the size of the single target image is adjusted to be an image with 256x256 resolution;
step 3-2: performing feature extraction using a ResNet50 network with an expansion coefficient of 2 as a skeleton network so that the resolution of the feature map is 8 × 8, replacing 3 × 3 standard convolution with a step size of 1 with grouped convolution with a group of 4, and replacing 1 × 1 standard convolution with a step size of 1 with grouped convolution with a group of 16;
the parameters of the conventional convolution are:
Scp=N×C×K×K
the parameters of the packet convolution are:
Figure FDA0002962638550000011
the parameters of the heterogeneous convolution are:
Figure FDA0002962638550000012
where N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of groups of the group convolution, K, K1、K2Are the convolution kernel size;
Gcp≤SHp≤Scp
compared with the grouping convolution, the heterogeneous convolution effectively integrates the channel of the characteristic diagram, and compared with the standard convolution, the heterogeneous convolution improves the receptive field of the model;
step 3-3: pooling the 8 × 8 convolved added feature pyramid obtained in step 3-2;
step 3-4: carrying out up-sampling on the feature map obtained in the step 3-3 for three times to obtain a feature map with the resolution of 64x 64;
step 3-4: splicing the 16 × 16, 32 × 32 and 64 × 64 feature maps obtained by the backbone network in the step 3-2 with the feature map with the corresponding resolution obtained by up-sampling in the step 3-3 by using a hop connection layer;
step 3-5: and (3) adjusting the channels of the feature map with the resolution of 64x64 obtained in the step 3-4 to the number of key points in the data set by convolution of 1x1, thereby outputting the coordinates of the corresponding key points.
3. The method for detecting the target key point based on the heterogeneous convolutional neural network as claimed in claim 2, wherein: optimizing the network in a random gradient descent continuous iteration mode in the training process; the loss function used is the mean square error loss function:
Figure FDA0002962638550000021
wherein m is the number of key points, yiTo be the coordinates of the labeled group _ truth keypoints,
Figure FDA0002962638550000022
the coordinates of the key points predicted by the model are obtained, n is the number of training samples in each batch, and i is the index of the current key point.
CN202110242260.XA 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network Active CN112949498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242260.XA CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242260.XA CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Publications (2)

Publication Number Publication Date
CN112949498A true CN112949498A (en) 2021-06-11
CN112949498B CN112949498B (en) 2023-11-14

Family

ID=76247778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242260.XA Active CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Country Status (1)

Country Link
CN (1) CN112949498B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762137A (en) * 2021-09-02 2021-12-07 甘肃同兴智能科技发展有限责任公司 Remote sensing image forest region carbon sink calculation method based on deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190303762A1 (en) * 2018-03-30 2019-10-03 Xilinx, Inc. Methods of optimization of computational graphs of neural networks
US20190311249A1 (en) * 2018-04-04 2019-10-10 Megvii (Beijing) Technology Co., Ltd. Image processing method, image processing apparatus, and computer-readable storage medium
CN111144209A (en) * 2019-11-25 2020-05-12 浙江工商大学 Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111402226A (en) * 2020-03-13 2020-07-10 浙江工业大学 Surface defect detection method based on cascade convolution neural network
CN111738111A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN111738237A (en) * 2020-04-29 2020-10-02 上海海事大学 Target detection method of multi-core iteration RPN based on heterogeneous convolution
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190303762A1 (en) * 2018-03-30 2019-10-03 Xilinx, Inc. Methods of optimization of computational graphs of neural networks
US20190311249A1 (en) * 2018-04-04 2019-10-10 Megvii (Beijing) Technology Co., Ltd. Image processing method, image processing apparatus, and computer-readable storage medium
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111144209A (en) * 2019-11-25 2020-05-12 浙江工商大学 Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network
CN111402226A (en) * 2020-03-13 2020-07-10 浙江工业大学 Surface defect detection method based on cascade convolution neural network
CN111738237A (en) * 2020-04-29 2020-10-02 上海海事大学 Target detection method of multi-core iteration RPN based on heterogeneous convolution
CN111738111A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
SHUO CHANG等: "Siamese Feature Pyramid Network for Visual Tracking", 2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS IN CHINA (ICCC WORKSHOPS), pages 164 - 168 *
卢鑫;田莹;: "基于深度学习的人脸活体检测", 辽宁科技大学学报, vol. 42, no. 05, pages 389 - 397 *
尹晓杰等: "人体动作识别中特征提取算法的研究综述", 计算机科学, vol. 46, no. 10, pages 157 - 160 *
徐晗智;艾中良;张志超;: "一种基于通道重排的轻量级目标检测网络", 计算机与现代化, no. 02, pages 92 - 96 *
王桢: "联合检测和重识别的行人搜索研究及嵌入式实现", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 138 - 1302 *
胡鑫欣: "基于深度神经网络的图像分割技术及其在盲人视觉辅助中的应用", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, pages 138 - 2233 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762137A (en) * 2021-09-02 2021-12-07 甘肃同兴智能科技发展有限责任公司 Remote sensing image forest region carbon sink calculation method based on deep learning

Also Published As

Publication number Publication date
CN112949498B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN111354017B (en) Target tracking method based on twin neural network and parallel attention module
CN108596258B (en) Image classification method based on convolutional neural network random pooling
CN110598788B (en) Target detection method, target detection device, electronic equipment and storage medium
CN112967341B (en) Indoor visual positioning method, system, equipment and storage medium based on live-action image
CN112381061B (en) Facial expression recognition method and system
CN113869282B (en) Face recognition method, hyper-resolution model training method and related equipment
CN109948457B (en) Real-time target recognition method based on convolutional neural network and CUDA acceleration
CN111967471A (en) Scene text recognition method based on multi-scale features
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112131959A (en) 2D human body posture estimation method based on multi-scale feature reinforcement
CN114627502A (en) Improved YOLOv 5-based target recognition detection method
CN113837275B (en) Improved YOLOv3 target detection method based on expanded coordinate attention
CN113052185A (en) Small sample target detection method based on fast R-CNN
CN112036475A (en) Fusion module, multi-scale feature fusion convolutional neural network and image identification method
CN112966574A (en) Human body three-dimensional key point prediction method and device and electronic equipment
CN111274999A (en) Data processing method, image processing method, device and electronic equipment
CN113298032A (en) Unmanned aerial vehicle visual angle image vehicle target detection method based on deep learning
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN116052025A (en) Unmanned aerial vehicle video image small target tracking method based on twin network
CN112949498A (en) Target key point detection method based on heterogeneous convolutional neural network
CN108876776B (en) Classification model generation method, fundus image classification method and device
CN111222534A (en) Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss
CN112149518A (en) Pine cone detection method based on BEGAN and YOLOV3 models
CN113327227B (en) MobileneetV 3-based wheat head rapid detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant