CN109033945B - Human body contour extraction method based on deep learning - Google Patents

Human body contour extraction method based on deep learning Download PDF

Info

Publication number
CN109033945B
CN109033945B CN201810582283.3A CN201810582283A CN109033945B CN 109033945 B CN109033945 B CN 109033945B CN 201810582283 A CN201810582283 A CN 201810582283A CN 109033945 B CN109033945 B CN 109033945B
Authority
CN
China
Prior art keywords
layer
human body
point
image
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810582283.3A
Other languages
Chinese (zh)
Other versions
CN109033945A (en
Inventor
王林
董楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201810582283.3A priority Critical patent/CN109033945B/en
Publication of CN109033945A publication Critical patent/CN109033945A/en
Application granted granted Critical
Publication of CN109033945B publication Critical patent/CN109033945B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body contour extraction method based on deep learning, which is implemented according to the following steps: step 1, extracting Gabor texture features of an original image; step 2, extracting Canny edge characteristics of the original image; step 3, building a convolutional neural network framework suitable for human body contour extraction; step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model; step 5, testing the structure of the trained CNN character model to obtain a human body contour image; and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image. The method of the invention achieves higher accuracy, improves the detection rate and shortens the test time.

Description

Human body contour extraction method based on deep learning
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a human body contour extraction method based on deep learning.
Background
Human body contour extraction plays an important role in the field of computer vision, and is a core technology of human body detection and human body behavior identification. The human body contour extraction technology is widely applied to the fields of intelligent monitoring, medical treatment and the like at present. The virtual reconstruction of the human body model is a key technology in a modern medical visualization system, and the accurate human body contour information acquisition can ensure that the reasonable medical analysis can be carried out on the diseases of the patient. On the other hand, along with the enhancement of the modern society on personal and public property safety requirements, the utilization rate of the intelligent monitoring system is gradually increased. The primary objective of the intelligent video monitoring technology is to acquire monitoring data by using various monitoring devices, so as to automatically understand and describe events occurring in a detected scene and predict events which may occur in the future. The human body contour extraction is used as a key supporting technology of the intelligent monitoring system, can provide the position and contour information of a human body in an image, is convenient for automatically tracking the human body and identifying behaviors, and therefore the purpose of intelligent monitoring is achieved.
At home and abroad, scholars propose various methods for realizing accurate human body recognition by extracting different features of images and combining classifier training aiming at the difficulty in human body detection of static images. However, although these conventional feature extraction methods can determine the position of the human body, they cannot accurately extract the contour of the human body. Aiming at the problem of target contour extraction, a plurality of effective schemes such as an active contour model, visual saliency and the like are provided. Although these methods can extract the target contour, they have some limitations in terms of computational complexity, real-time performance, and the like.
In recent years, a deep learning method gradually replaces a traditional feature extraction method, and breakthrough progress is made in the fields of target detection, image segmentation and the like. The purpose of deep learning is to automatically perform characteristic learning by simulating the operation of the human brain neural structure during data processing, and further complete the data processing result. Convolutional Neural Networks (CNN) are a type of model in deep learning methods, and their unique weight sharing structure and sparse connection mode make the Network itself dominant in image analysis.
Disclosure of Invention
The invention aims to provide a human body contour extraction method based on deep learning, and solves the problems that in the prior art, the human body contour extraction effect in a static image is poor and the model training speed is slow.
The invention adopts the technical scheme that a human body contour extraction method based on deep learning is implemented according to the following steps:
step 1, extracting Gabor texture features of an original image;
step 2, extracting Canny edge characteristics of the original image;
step 3, building a convolutional neural network framework suitable for human body contour extraction;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image.
The present invention is also characterized in that,
the step 1 is implemented according to the following steps:
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
Figure BDA0001688855980000031
in the formula (1), Ψu,vFor a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor kernel, respectively, where u is 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
Figure BDA0001688855980000032
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,
Figure BDA0001688855980000033
is a spacing factor, is used to limit the frequency neutralization function,
Figure BDA0001688855980000034
is the directional selectivity of the filter;
step 1.2, performing convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x, y), obtaining the Gabor texture characteristics of 8 directions:
Gu,v(x,y)=I(x,y)*Ψu,v (3)。
the step 2 is implemented according to the following steps:
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
Figure BDA0001688855980000041
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
Figure BDA0001688855980000042
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating the first derivative Z of the horizontal direction x and the vertical direction y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
Figure BDA0001688855980000043
Figure BDA0001688855980000044
wherein, Sobel operator in abscissa x and ordinate y direction is:
Figure BDA0001688855980000045
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting the high threshold to 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting the low threshold to 1/2% of the high threshold, if the gray level of the point processed in step 2.4 is greater than the high threshold, setting the pixel value to 255, if the gray level of the point processed in step 2.4 is less than the low threshold, setting the pixel value to 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, examining the adjacent 8 pixel values, if no point with a value of 255 exists in the adjacent 8 pixel values, setting the pixel value of the point to 0, if a point with a value of 255 exists in the adjacent gradient area, setting the pixel value of the point to 255, and completing the edge feature extraction until all the points are processed.
Step 3 is specifically implemented according to the following steps:
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, connecting a pooling layer behind each convolutional layer, selecting the maximum pooling, selecting the size of a pooling window to be 2 x 2, and moving the step length to be 2;
let P be the pixel of an unknown point, Q11,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
Figure BDA0001688855980000051
Figure BDA0001688855980000061
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
Figure BDA0001688855980000062
the resulting pixel result f (x, y) for the unknown point P is:
Figure BDA0001688855980000063
further generating a deconvolution structure;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure;
and 3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent over-fitting of the network, and forming a symmetrical convolutional neural network.
Step 3.2 is specifically implemented according to the following steps:
after an original convolutional layer, a plurality of convolutional layers with convolution kernel of 1 × 1 are connected, and the feature map of each convolutional layer to the last convolutional layer is calculated by the following formula (13):
Figure BDA0001688855980000064
in the formula (13), (x, y) is a pixel index of the feature map, and axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of MPL layers, and f isActivating a function, wherein w is a weight coefficient and b is offset;
and outputting the feature map of the current layer through the ReLU activation.
The convolutional neural network is:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolutional layer has three convolutional sublayers, with a 1 × 1 convolutional kernel in between each two 3 × 3 convolutional kernels.
Step 4 is specifically implemented according to the following steps:
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
and 4.2, adjusting the image size of the 12 characteristic channels obtained in the step 4.1 to 224 x 224, transmitting the 12 characteristic channels into the convolutional neural network constructed in the step 3, and training by taking the grountruth label graph of the original image as a teacher signal of the CNN to generate the CNN character model.
Step 5 is specifically implemented according to the following steps:
step 5.1, respectively settling the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
and 5.4, performing an AND operation on the original image and the human body mask obtained in the step 5.3, namely performing an AND operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask, setting the corresponding position in the original image as 0 if the pixel at the corresponding position of the mask is 0, and taking the original pixel at the corresponding position in the original image to obtain a human body contour image if the pixel in the mask is 1.
The beneficial effect of the invention is that,
(1) according to the human body contour extraction method based on deep learning, the operation of region selection on an original image is not needed, and a series of complex operations such as region combination on an output object are not needed;
(2) according to the human body contour extraction method based on deep learning, Canny edge characteristics are added on the basis of Gabor-CNN, high accuracy is achieved, the detection rate is improved, and the testing time is shortened.
Drawings
FIG. 1 is a schematic diagram of bilinear interpolation;
FIG. 2 is a graph of a convolutional neural network upsampling DAG used by the extraction method of the present invention;
FIG. 3 is a general structure diagram of a CNN used in the extraction method of the present invention;
FIG. 4 is a structural diagram of the extraction method of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a human body contour extraction method based on deep learning, which is implemented according to the following steps:
step 1, extracting Gabor texture features of an original image;
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
Figure BDA0001688855980000091
in the formula (1), Ψu,vFor a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor kernel, respectively, where u is 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
Figure BDA0001688855980000092
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,
Figure BDA0001688855980000093
is a spacing factor, is used to limit the frequency neutralization function,
Figure BDA0001688855980000094
is the directional selectivity of the filter;
step 1.2, performing convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x,y):
Gu,v(x,y)=I(x,y)*Ψu,v (3)
In order to obtain features of the original image, in particular local saliency features in multiple directions, a set of two-dimensional Gabor filters (Gabor kernel functions) with 8 directions u of 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, 8 directions is used for representation, wherein sigma is 2 pi, k ismax=π/2,
Figure BDA0001688855980000101
Obtaining Gabor texture characteristics in 8 directions;
step 2, extracting Canny edge characteristics of the original image;
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
Figure BDA0001688855980000102
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
Figure BDA0001688855980000103
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating a first derivative Z of the horizontal direction (abscissa) x and the vertical direction (ordinate) y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
Figure BDA0001688855980000111
Figure BDA0001688855980000112
wherein, Sobel operator in abscissa x and ordinate y direction is:
Figure BDA0001688855980000113
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting a high threshold as 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting a low threshold as 1/2% of the high threshold, wherein if the gray level of the point processed in step 2.4 is greater than the high threshold, the pixel value is 255, if the gray level of the point processed in step 2.4 is less than the low threshold, the pixel value is 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, the adjacent 8 pixel value is examined, if no point with the value of 255 exists in the adjacent 8 pixel value, the pixel value of the point is 0, if a point with the value of 255 exists in the adjacent gradient area, the pixel value of the point is 255, and until all the points are processed, the edge feature extraction is completed;
step 3, building a convolutional neural network framework suitable for human body contour extraction;
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, and selecting the largest pooling mode by connecting a pooling layer behind each convolutional layer, wherein the size of a pooling window is selected to be 2 x 2, and the moving step is 2;
let P be the pixel of unknown point, Q, as shown in FIG. 111,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
Figure BDA0001688855980000121
Figure BDA0001688855980000122
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
Figure BDA0001688855980000123
the resulting pixel result f (x, y) for the unknown point P is:
Figure BDA0001688855980000124
thereby generating a deconvolution structure as shown in fig. 2;
the deconvolution process can combine the outputs of multiple stages of the neural network to strengthen the result, the result is realized by a bilinear interpolation method, and the pixel value of the middle point is obtained by the pixel values of four surrounding points by using the bilinear interpolation method, so that the pooled layer can be filled to obtain the size of the original image;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure, and the specific process is as follows:
adding a convolution kernel of 1 x 1 into every two convolution kernels of 3 x 3, and calculating the characteristic diagram of each convolution layer to the last convolution layer by the following formula (13) according to the principle of the MLP convolution layer, wherein the convolution kernel is a convolution layer with 1 x 1 after an original convolution layer:
Figure BDA0001688855980000131
in the formula (13), (x, y) is a pixel index of the feature map, i.e., an x coordinate axis and a y coordinate axis, axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of layers of the MPL, f is an activation function, w is a weight coefficient, and b is a bias;
then, activating and outputting a feature map of the current layer through the ReLU;
step 3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent network overfitting, so as to form a symmetrical convolution neural network, wherein the specific process of preventing network overfitting is as follows: randomly discarding part of parameters in the iteration process of the VGG16 network model in the network, and setting the randomly discarded part of parameters as 0;
the neural network structure is as follows:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolution layer has three convolution sublayers, and a 1 × 1 convolution kernel is arranged between every two 3 × 3 convolution kernels;
reducing the size of the input image from 224 to 28 by 224 through feature extraction, and then performing size reduction by using an deconvolution layer to generate a feature map of the human body outline of the image;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
step 4.2, adjusting the image size of the 12 characteristic channels obtained in step 4.1 to 224 x 224, then transmitting the 12 characteristic channels into the convolutional neural network constructed in step 3, training by taking a groudtruth label diagram (the pixels of the human body area are 1, and the rest pixels are 0) of the original image as a teacher signal of the CNN to generate a CNN character model, wherein the training process comprises forward calculation of the neural network model and backward error transfer calculation of the neural network model, and the forward calculation and the backward error calculation are subjected to iterative processing, and the iteration number is 800;
the forward calculation, the reverse calculation and the iteration process are all represented by pseudo codes, each iteration is a process of realizing one forward calculation and one error back propagation, and the core pseudo codes in the neural network training process are as follows:
Step1:initModel(model);
v/initialize neural network model
Step2:for iter<-1to N do
Step2.1:forward(model);
V/neural network model Forward computation
Step2.2:backward(model);
V/neural network model inverse error transfer calculation
Step2.3:update(model);
// updating neural network weights
Step3:return trained model.
// Return to trained model
Step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
step 5.1, respectively settling the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
step 5.4, performing an and operation on the original image and the human body mask obtained in the step 5.3, namely performing an and operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask (a human body contour region is 1, and a non-human body region is 0), setting the corresponding position in the original image to be 0 if the pixel of the corresponding position of the mask is 0, and taking the original pixel from the corresponding position in the original image to obtain a human body contour image if the pixel of the mask is 1;
step 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image;
wherein, the overlapping rate is:
Figure BDA0001688855980000161
examples
The data set source is a hundred-degree human body image segmentation database, and data in the database are images which are shot from various angles and contain human bodies. 5387 training images and labeled samples in the database respectively. The invention selects 1000 images as training set, and selects 500 images as test set in the rest part. The network input image size was fixed at 224 x 224 in the experiment. In order to accurately and objectively evaluate the effect of the method, and to facilitate comparison with existing methods, the performance of the human contour extraction model of the improved method is measured by using an overlap ratio, wherein the overlap ratio is defined as follows:
Figure BDA0001688855980000162
wherein S is the degree of overlap, APExtracting network predicted body regions for body contours, AGTIs the actual body area. The higher the S, the higher the overlapping degree is, and the better the human body contour extraction effect is.
The differences of the five methods in terms of input image size, overlapping rate, time consumption and display card are shown in table 1;
TABLE 1 comparison of five human body segmentation methods
Figure BDA0001688855980000171
In the training process of the neural network, the speed of the GPU can reach 100 times or even 1000 times that of the CPU, the training on the middle-end graphics card of the GTX750 takes about several days, and the running on the high-end CPU of i7 takes at least more than one month, but the testing process is different from the training process, and in the testing stage, the speed of the CPU is still acceptable, one picture is about 10s, and the GPU can reach the testing speed of millisecond level. As can be seen from Table 1, the method provided by the invention has an overlapping rate of 92.03%, the test speed of a single picture reaches 68.84 milliseconds, and the method of Pixel-by-Pixel has an overlapping rate of 86.83%, but the test time is too long and has no real-time performance. Although the Alex-seg-net method consumes the least time in the test of a single image, the overlapping rate of the method only reaches 80.2 percent, and the effect is not ideal; the method is higher than a Pool5-net method in the aspect of overlapping rate, but the effect in the aspect of test time consumption is not as good as that of the Pool5-net method, while a display card GTX960 used by the Pool5-net method belongs to a middle and high-end display card, and the performance is obviously better than that of a GTX750 display card in the experiment, the method of the invention adds Canny edge characteristics on the basis of Gabor-CNN, thereby not only improving the detection rate, but also shortening the test time; therefore, it can be seen that the method of the present invention can exhibit good effects under the condition of limited hardware. In summary, the method provided by the invention combines the traditional characteristics with the deep learning, which not only achieves higher accuracy, but also has shorter testing time, and can meet the requirements of some practical applications.

Claims (4)

1. A human body contour extraction method based on deep learning is characterized by comprising the following steps:
step 1, extracting Gabor texture features of an original image;
step 2, extracting Canny edge characteristics of the original image;
step 3, building a convolutional neural network framework suitable for human body contour extraction:
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, connecting a pooling layer behind each convolutional layer, selecting the maximum pooling, selecting the size of a pooling window to be 2 x 2, and moving the step length to be 2;
let P be the pixel of an unknown point, Q11,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
Figure FDA0002900884070000011
Figure FDA0002900884070000012
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
Figure FDA0002900884070000013
the resulting pixel result f (x, y) for the unknown point P is:
Figure FDA0002900884070000014
Figure FDA0002900884070000021
further generating a deconvolution structure;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure, specifically:
after an original convolutional layer, a plurality of convolutional layers with convolution kernel of 1 × 1 are connected, and the feature map of each convolutional layer to the last convolutional layer is calculated by the following formula (13):
Figure FDA0002900884070000022
in the formula (13), (x, y) is a pixel index of the feature map, and axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of MPL layers, f is the activation function, and w is the weight coefficientAnd b is an offset;
then, activating and outputting a feature map of the current layer through the ReLU;
3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent over-fitting of the network, and forming a symmetrical convolution neural network;
the convolutional neural network is:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolution layer has three convolution sublayers, and a 1 × 1 convolution kernel is arranged between every two 3 × 3 convolution kernels;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
step 4.2, adjusting the image size of the 12 characteristic channels obtained in the step 4.1 to 224 x 224, then transmitting the 12 characteristic channels into the convolutional neural network constructed in the step 3, and training by taking a grountruth label graph of an original image as a teacher signal of the CNN to generate a CNN character model;
step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image.
2. The method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 1 is implemented according to the following steps:
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
Figure FDA0002900884070000031
in the formula (1), Ψu,vIn a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor nuclei, respectively, where u is 0, π/4, π/2, 3 π/4, π, 5 π/4, 6 π/4 or 7 π/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
Figure FDA0002900884070000041
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,
Figure FDA0002900884070000042
is a spacing factor, is used to limit the filter sampling frequency,
Figure FDA0002900884070000043
is the directional selectivity of the filter;
step (ii) of1.2, carrying out convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x, y), obtaining the Gabor texture characteristics of 8 directions:
Gu,v(x,y)=I(x,y)*Ψu,v (3)。
3. the method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 2 is implemented according to the following steps:
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
Figure FDA0002900884070000044
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
Figure FDA0002900884070000051
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating the first derivative Z of the horizontal direction x and the vertical direction y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
Figure FDA0002900884070000052
Figure FDA0002900884070000053
wherein, Sobel operator in abscissa x and ordinate y direction is:
Figure FDA0002900884070000054
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting the high threshold to 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting the low threshold to 1/2% of the high threshold, if the gray level of the point processed in step 2.4 is greater than the high threshold, setting the pixel value to 255, if the gray level of the point processed in step 2.4 is less than the low threshold, setting the pixel value to 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, examining the adjacent 8 pixel values, if no point with a value of 255 exists in the adjacent 8 pixel values, setting the pixel value of the point to 0, if a point with a value of 255 exists in the adjacent gradient area, setting the pixel value of the point to 255, and completing the edge feature extraction until all the points are processed.
4. The method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 5 is implemented according to the following steps:
step 5.1, respectively calculating the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
and 5.4, performing an AND operation on the original image and the human body mask obtained in the step 5.3, namely performing an AND operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask, setting the corresponding position in the original image as 0 if the pixel at the corresponding position of the mask is 0, and taking the original pixel at the corresponding position in the original image to obtain a human body contour image if the pixel in the mask is 1.
CN201810582283.3A 2018-06-07 2018-06-07 Human body contour extraction method based on deep learning Expired - Fee Related CN109033945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810582283.3A CN109033945B (en) 2018-06-07 2018-06-07 Human body contour extraction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810582283.3A CN109033945B (en) 2018-06-07 2018-06-07 Human body contour extraction method based on deep learning

Publications (2)

Publication Number Publication Date
CN109033945A CN109033945A (en) 2018-12-18
CN109033945B true CN109033945B (en) 2021-04-06

Family

ID=64612339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810582283.3A Expired - Fee Related CN109033945B (en) 2018-06-07 2018-06-07 Human body contour extraction method based on deep learning

Country Status (1)

Country Link
CN (1) CN109033945B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872326B (en) * 2019-01-25 2022-04-05 广西科技大学 Contour detection method based on deep reinforced network jump connection
CN109903301B (en) * 2019-01-28 2021-04-13 杭州电子科技大学 Image contour detection method based on multistage characteristic channel optimization coding
CN109920049B (en) * 2019-02-26 2021-05-04 清华大学 Edge information assisted fine three-dimensional face reconstruction method and system
CN113570052B (en) * 2020-04-28 2023-10-31 北京达佳互联信息技术有限公司 Image processing method, device, electronic equipment and storage medium
CN112102141B (en) * 2020-09-24 2022-04-08 腾讯科技(深圳)有限公司 Watermark detection method, watermark detection device, storage medium and electronic equipment
CN112258440B (en) * 2020-10-29 2024-01-02 北京达佳互联信息技术有限公司 Image processing method, device, electronic equipment and storage medium
CN112257729B (en) * 2020-11-13 2023-10-03 腾讯科技(深圳)有限公司 Image recognition method, device, equipment and storage medium
CN113128614B (en) * 2021-04-29 2023-06-16 西安微电子技术研究所 Convolution method based on image gradient, neural network based on direction convolution and classification method
CN113536968B (en) * 2021-06-25 2022-08-16 天津中科智能识别产业技术研究院有限公司 Method for automatically acquiring boundary coordinates of inner and outer circles of iris
CN113570627B (en) * 2021-07-02 2024-04-16 上海健康医学院 Training method of deep learning segmentation network and medical image segmentation method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678792A (en) * 2016-02-25 2016-06-15 中南大学 Method and system for extracting body profile

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4699298B2 (en) * 2006-06-28 2011-06-08 富士フイルム株式会社 Human body region extraction method, apparatus, and program
CN103093474B (en) * 2013-01-28 2015-03-25 电子科技大学 Three-dimensional mammary gland ultrasound image partition method based on homoplasmon and partial energy
CN105335716B (en) * 2015-10-29 2019-03-26 北京工业大学 A kind of pedestrian detection method extracting union feature based on improvement UDN
CN106529447B (en) * 2016-11-03 2020-01-21 河北工业大学 Method for identifying face of thumbnail
CN106778481A (en) * 2016-11-15 2017-05-31 上海百芝龙网络科技有限公司 A kind of body heath's monitoring method
CN106781282A (en) * 2016-12-29 2017-05-31 天津中科智能识别产业技术研究院有限公司 A kind of intelligent travelling crane driver fatigue early warning system
CN107301408B (en) * 2017-07-17 2020-06-23 成都通甲优博科技有限责任公司 Human body mask extraction method and device
CN108009472B (en) * 2017-10-25 2020-07-21 五邑大学 Finger back joint print recognition method based on convolutional neural network and Bayes classifier

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678792A (en) * 2016-02-25 2016-06-15 中南大学 Method and system for extracting body profile

Also Published As

Publication number Publication date
CN109033945A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109033945B (en) Human body contour extraction method based on deep learning
US11908244B2 (en) Human posture detection utilizing posture reference maps
CN108053417B (en) lung segmentation device of 3D U-Net network based on mixed rough segmentation characteristics
Wang et al. Haze concentration adaptive network for image dehazing
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN106408001A (en) Rapid area-of-interest detection method based on depth kernelized hashing
CN112016682B (en) Video characterization learning and pre-training method and device, electronic equipment and storage medium
CN112446862A (en) Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN110060225B (en) Medical image fusion method based on rapid finite shear wave transformation and sparse representation
Pandey et al. Segmentation of liver lesions with reduced complexity deep models
Cheng et al. DDU-Net: A dual dense U-structure network for medical image segmentation
Zheng et al. Interactive multi-scale feature representation enhancement for small object detection
CN114049503A (en) Saliency region detection method based on non-end-to-end deep learning network
CN111401209B (en) Action recognition method based on deep learning
CN116884036A (en) Live pig posture detection method, device, equipment and medium based on YOLOv5DA
Yin et al. Super resolution reconstruction of CT images based on multi-scale attention mechanism
CN110706209B (en) Method for positioning tumor in brain magnetic resonance image of grid network
Song et al. Spatial-aware dynamic lightweight self-supervised monocular depth estimation
Jin et al. Deep neural network-based noisy pixel estimation for breast ultrasound segmentation
Wu et al. Two-Stage Progressive Underwater Image Enhancement
Kim et al. Tackling Structural Hallucination in Image Translation with Local Diffusion
Yin et al. Visual Attention and ODE-inspired Fusion Network for image dehazing
Yuan et al. Enhanced target tracking algorithm for autonomous driving based on visible and infrared image fusion
Asha et al. Segmentation of Brain Tumors using traditional Multiscale bilateral Convolutional Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210406