CN112949498B - Target key point detection method based on heterogeneous convolutional neural network - Google Patents

Target key point detection method based on heterogeneous convolutional neural network Download PDF

Info

Publication number
CN112949498B
CN112949498B CN202110242260.XA CN202110242260A CN112949498B CN 112949498 B CN112949498 B CN 112949498B CN 202110242260 A CN202110242260 A CN 202110242260A CN 112949498 B CN112949498 B CN 112949498B
Authority
CN
China
Prior art keywords
convolution
target
heterogeneous
multiplied
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110242260.XA
Other languages
Chinese (zh)
Other versions
CN112949498A (en
Inventor
何宁
尹晓杰
于海港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Union University
Original Assignee
Beijing Union University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Union University filed Critical Beijing Union University
Priority to CN202110242260.XA priority Critical patent/CN112949498B/en
Publication of CN112949498A publication Critical patent/CN112949498A/en
Application granted granted Critical
Publication of CN112949498B publication Critical patent/CN112949498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a target key point detection method based on a heterogeneous convolutional neural network. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with a convolution kernel size of 3 multiplied by 3 in bootleneck with heterogeneous convolution; after the last layer of the backbone network, we add a hole space pyramid pooling layer; finally, feature pyramid fusion is carried out on feature graphs with 8×8, 16×16, 32×32 and 64×64 resolutions, feature graphs with the resolution of 64×64 and the channel number of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and an attitude estimation result is output. And a cavity space feature pyramid pooling layer and a feature pyramid fusion module are used in the model. A novel light target key point detection algorithm is constructed, and multi-target attitude estimation can be carried out on pictures with any size.

Description

Target key point detection method based on heterogeneous convolutional neural network
Technical Field
The invention belongs to the technical field of computer vision and digital image processing, and particularly relates to a target key point detection method based on a convolutional neural network.
Background
Human body posture estimation is a fundamental problem of human body behavior recognition, and the obtained skeleton structure can provide high-level semantics for human body motion recognition. Human body pose estimation itself has many applications in reality, for example: sports action specification, body correction, virtual reality games, video monitoring, robot motion control, and the like.
Existing human body posture estimation methods are divided into two categories: a bottom-up human body posture estimation and top-down human body posture estimation method. The method adopts a top-down method, and the existing top-down method has relatively high recall rate and accuracy. However, the model precision is pursued to be maximized, but the parameter quantity and floating point operation quantity of the model are increased. Human body posture estimation is landed in many practical applications, some of the human body posture estimation can be deployed on mobile phone terminals and microcomputers, and the parameter amount and floating point operand of an optimized model are one of important requirements for improvement of human body posture estimation due to limited storage amount and calculation amount of equipment.
Aiming at the problem of large parameter quantity and floating point operation quantity of a model, the method combines the traditional convolution and the grouping convolution to provide a heterogeneous convolution, and reduces the parameter quantity and the floating point operation quantity of the model on the premise of keeping accuracy and receptive field.
Disclosure of Invention
The invention discloses a target key point detection algorithm based on a heterogeneous convolution neural network, which is characterized by providing a heterogeneous convolution based on standard convolution and group convolution. The network model of the invention is divided into three parts, namely a main network part, a cavity space pyramid pooling part and a characteristic pyramid module part. Using ResNet-50 as a skeleton network in a backbone network, and replacing standard convolution with a convolution kernel size of 3 multiplied by 3 in bootleneck with heterogeneous convolution; after the last layer of the backbone network, we add a hole space pyramid pooling layer; finally, feature pyramid fusion is carried out on feature graphs with 8×8, 16×16, 32×32 and 64×64 resolutions, feature graphs with the resolution of 64×64 and the channel number of 16 are output, a detection heat map is generated by using a Gaussian kernel function, and an attitude estimation result is output.
The invention mainly provides a heterogeneous convolution based on combination of standard convolution and grouping convolution, wherein a cavity space feature pyramid pooling layer and a feature pyramid fusion module are used in a model. A novel light target key point detection algorithm is constructed, and multi-target attitude estimation (human body key point detection) can be carried out on pictures with any size. Mainly comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained good fastermann target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure.
Step 2: and then widening the coordinate acquired by the target detector in the step 1 by 20% on the basis of the detected target frame, and independently intercepting the target frame.
Step 3: and (3) inputting the single target frame cut in the step (2) into a heterogeneous convolutional neural network, and detecting target key points.
Step 3-1: the single target image is then resized to a 256x256 resolution image.
Step 3-2: feature extraction is performed by using a ResNet50 network with an expansion coefficient of 2 as a skeleton network, so that the resolution of a feature map is 8×8, wherein 3×3 standard convolution with a step length of 1 is replaced by group convolution with a group of 4, and 1×1 standard convolution with a step length of 1 is replaced by group convolution with a group of 16, so that the parameter amount and floating point operation amount of a model are reduced on the basis of keeping a large receptive field.
The parameters of the traditional convolution are as follows:
S cp =N×C×K×K
the parameters of the packet convolution are:
the parameters of the deconvolution are:
wherein N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of packets of the packet convolution, K, K 1 、K 2 Are all convolution kernel sizes.
G cp ≤SH p ≤S cp
Compared with the grouping convolution, the heterogeneous convolution effectively integrates the channels of the feature map, and compared with the standard convolution, the heterogeneous convolution improves the receptive field of the model.
Step 3-3: the 8 x 8 convolutions obtained in step 3-2 are pooled into a layer of incremental feature pyramids.
Step 3-4: and (3) up-sampling the feature map obtained in the step (3-3) for three times to obtain a 64x64 feature map with the resolution.
Step 3-4: and (3) splicing the characteristic diagrams of 16 multiplied by 16, 32 multiplied by 32 and 64 multiplied by 64 obtained in the main network in the step 3-2 with the characteristic diagrams of corresponding resolutions obtained by up-sampling in the step 3-3 by using a jump connection layer.
Step 3-5: the channels of the feature map with the resolution of 64×64 obtained in step 3-4 are adjusted to the number of key points in the data set by convolution of 1×1, so that coordinates of the corresponding key points are output.
The network is optimized by using a random gradient descent continuous iteration mode in the training process. The loss function used is the mean square error loss function:
wherein m is the number of key points, y i For the coordinates of the marked group _ trunk key point,and (3) predicting coordinates of the key points for the model, wherein n is the number of training samples in each batch, and i is the index of the current key points.
Step 4: and (3) corresponding the detected single-target human body key points to the picture in the step (1), thereby obtaining a multi-human body posture estimation result.
The invention provides a standard convolution and packet convolution-based heterogeneous convolution. The convolution reduces the floating point operand significantly compared to standard convolution and has a receptive field of the same size as standard convolution. Compared with the grouping convolution, the heterogeneous convolution effectively integrates the convolved channels, and improves the accuracy of the model. The method proposed by the invention is verified on the MPII dataset. Experimental results show that the model with the heterogeneous convolution instead of the backbone network of the standard convolution and the cavity space pyramid pooling layer and the feature pyramid fusion module improves the precision by 1.2% and reduces the floating point operand by 72.18% compared with the original ResNet-50 method.
Drawings
FIG. 1 is a diagram of a heterogeneous convolutional neural network model based on RseNet50
FIG. 2 block convolution, standard convolution, and heterogeneous convolution block diagram
FIG. 3 human gesture estimation detection effect diagram
Detailed Description
The invention is now demonstrated with respect to other algorithms by the following examples.
We train the model using a training set of MPII data sets, with a validation set of MPII data sets to test the validity of the algorithm. The experimental environment is Ubuntu 18.04.3LTS, intel (R) Xeon (R) Silver 4110CPU@2.10Hzx 32, memory 64g, graphics card RTX2080Ti and software platforms of cuda10.0.130, cudnn7.5, pytorch1.4 and python 3.6.
During training, the batch size is set to 64, and the resolution size of the image is set to 256×256. The initial learning rate was 0.001, the learning rate was changed at 170 th and 200 th epochs, and the learning rate was decreased by 10% at 170 and 200 epochs, for a total of 210 epochs trained.
To verify the accuracy and efficiency of the improved algorithm, we used ResNet18 and ResNet50 for model comparison for the estimated network. Experimental results show that the model parameter and floating point operand can be reduced by the method under the condition of ensuring accuracy. The experimental results are shown in table 1.
Table 1 results comparison table in MPII dataset
Wherein the method comprises the steps ofIs a constant, and 60%PCKh@0.5, i being the head diagonal in group_trunk, is defined

Claims (2)

1. A target key point detection method based on a heterogeneous convolutional neural network is characterized by comprising the following steps of: the method comprises the following steps:
step 1: inputting a picture, detecting a target in the picture by using a trained fasterrcnn target detector, acquiring coordinates of a target frame, and storing the coordinates in a data structure;
step 2: then widening the target frame by 20% based on the coordinates acquired by the target detector in the step 1 and independently intercepting the widening of the target frame by 20%;
step 3: inputting the single target frame cut in the step 2 into a heterogeneous convolutional neural network to detect target key points;
step 4: corresponding the detected single-target human body key points to the picture in the step 1, thereby obtaining a multi-human body posture estimation result;
the step 3 specifically comprises the following steps of step 3-1: then the size of the single target image is adjusted to 256x256 resolution image;
step 3-2: feature extraction is carried out by using a ResNet50 network with an expansion coefficient of 2 as a skeleton network, so that the resolution of a feature map is 8 multiplied by 8, 3 multiplied by 3 standard convolution with a step length of 1 is replaced by group convolution with a group of 4, and 1 multiplied by 1 standard convolution with a step length of 1 is replaced by group convolution with a group of 16;
the parameters of the traditional convolution are:
S cp =N×C×K×K
the parameters of the packet convolution are:
the parameters of the deconvolution are:
wherein N is the number of channels of the input feature map, C is the number of channels of the output feature map, G is the number of packets of the packet convolution, K, K 1 、K 2 All are convolution kernel sizes;
G cp ≤SH p ≤S cp
compared with the grouping convolution, the heterogeneous convolution effectively integrates the channels of the feature map, and compared with the standard convolution, the heterogeneous convolution improves the receptive field of the model;
step 3-3: pooling the 8 x 8 convolved incremental feature pyramids obtained in step 3-2;
step 3-4: performing up-sampling on the feature map obtained in the step 3-3 for three times to obtain a feature map with the resolution of 64x 64;
step 3-4: splicing the characteristic diagrams of 16 multiplied by 16, 32 multiplied by 32 and 64 multiplied by 64 obtained in the step 3-2 of the main network with the characteristic diagrams of corresponding resolution obtained in the step 3-3 by using a jump connection layer;
step 3-5: the channels of the feature map with the resolution of 64×64 obtained in step 3-4 are adjusted to the number of key points in the data set by convolution with 1×1, so that coordinates of the corresponding key points are output.
2. The target key point detection method based on the heterogeneous convolutional neural network according to claim 1, wherein the method comprises the following steps of: optimizing the network in a random gradient descent continuous iteration mode in the training process; the loss function used is the mean square error loss function:
wherein m is the number of key points, y i For the coordinates of the marked group _ trunk key point,and (3) predicting coordinates of the key points for the model, wherein n is the number of training samples in each batch, and i is the index of the current key points.
CN202110242260.XA 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network Active CN112949498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242260.XA CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242260.XA CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Publications (2)

Publication Number Publication Date
CN112949498A CN112949498A (en) 2021-06-11
CN112949498B true CN112949498B (en) 2023-11-14

Family

ID=76247778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242260.XA Active CN112949498B (en) 2021-03-04 2021-03-04 Target key point detection method based on heterogeneous convolutional neural network

Country Status (1)

Country Link
CN (1) CN112949498B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762137A (en) * 2021-09-02 2021-12-07 甘肃同兴智能科技发展有限责任公司 Remote sensing image forest region carbon sink calculation method based on deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144209A (en) * 2019-11-25 2020-05-12 浙江工商大学 Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111402226A (en) * 2020-03-13 2020-07-10 浙江工业大学 Surface defect detection method based on cascade convolution neural network
CN111738237A (en) * 2020-04-29 2020-10-02 上海海事大学 Target detection method of multi-core iteration RPN based on heterogeneous convolution
CN111738111A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321999B (en) * 2018-03-30 2021-10-01 赛灵思电子科技(北京)有限公司 Neural network computational graph optimization method
CN108875904A (en) * 2018-04-04 2018-11-23 北京迈格威科技有限公司 Image processing method, image processing apparatus and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method
CN111144209A (en) * 2019-11-25 2020-05-12 浙江工商大学 Monitoring video head detection method based on heterogeneous multi-branch deep convolutional neural network
CN111402226A (en) * 2020-03-13 2020-07-10 浙江工业大学 Surface defect detection method based on cascade convolution neural network
CN111738237A (en) * 2020-04-29 2020-10-02 上海海事大学 Target detection method of multi-core iteration RPN based on heterogeneous convolution
CN111738111A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Siamese Feature Pyramid Network for Visual Tracking;Shuo Chang等;2019 IEEE/CIC International Conference on Communications Workshops in China (ICCC Workshops);164-168 *
一种基于通道重排的轻量级目标检测网络;徐晗智;艾中良;张志超;;计算机与现代化(02);92-96 *
人体动作识别中特征提取算法的研究综述;尹晓杰等;计算机科学;第46卷(第10A期);157-160 *
卢鑫等.基于深度学习的人脸活体检测.辽宁科技大学学报.2019,第42卷(第5期),389-397. *
基于深度学习的人脸活体检测;卢鑫;田莹;;辽宁科技大学学报(05);73-80 *
基于深度神经网络的图像分割技术及其在盲人视觉辅助中的应用;胡鑫欣;中国优秀硕士学位论文全文数据库信息科技辑(第2期);I138-2233 *
联合检测和重识别的行人搜索研究及嵌入式实现;王桢;中国优秀硕士学位论文全文数据库信息科技辑(第2期);I138-1302 *

Also Published As

Publication number Publication date
CN112949498A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
US11908244B2 (en) Human posture detection utilizing posture reference maps
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN112381061B (en) Facial expression recognition method and system
CN113869282B (en) Face recognition method, hyper-resolution model training method and related equipment
CN113837275B (en) Improved YOLOv3 target detection method based on expanded coordinate attention
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112966574A (en) Human body three-dimensional key point prediction method and device and electronic equipment
CN113239825B (en) High-precision tobacco beetle detection method in complex scene
CN113052185A (en) Small sample target detection method based on fast R-CNN
CN112287802A (en) Face image detection method, system, storage medium and equipment
CN111626379B (en) X-ray image detection method for pneumonia
CN111325190A (en) Expression recognition method and device, computer equipment and readable storage medium
CN111652054A (en) Joint point detection method, posture recognition method and device
CN111274999A (en) Data processing method, image processing method, device and electronic equipment
CN112949498B (en) Target key point detection method based on heterogeneous convolutional neural network
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN116052025A (en) Unmanned aerial vehicle video image small target tracking method based on twin network
CN112364974A (en) Improved YOLOv3 algorithm based on activation function
CN115410030A (en) Target detection method, target detection device, computer equipment and storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111582057B (en) Face verification method based on local receptive field
CN113221855A (en) Small target detection method and system based on scale sensitive loss and feature fusion
CN111274985A (en) Video text recognition network model, video text recognition device and electronic equipment
CN115761220A (en) Target detection method for enhancing detection of occluded target based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant