CN109033945B - Human body contour extraction method based on deep learning - Google Patents
Human body contour extraction method based on deep learning Download PDFInfo
- Publication number
- CN109033945B CN109033945B CN201810582283.3A CN201810582283A CN109033945B CN 109033945 B CN109033945 B CN 109033945B CN 201810582283 A CN201810582283 A CN 201810582283A CN 109033945 B CN109033945 B CN 109033945B
- Authority
- CN
- China
- Prior art keywords
- layer
- human body
- point
- image
- original image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human body contour extraction method based on deep learning, which is implemented according to the following steps: step 1, extracting Gabor texture features of an original image; step 2, extracting Canny edge characteristics of the original image; step 3, building a convolutional neural network framework suitable for human body contour extraction; step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model; step 5, testing the structure of the trained CNN character model to obtain a human body contour image; and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image. The method of the invention achieves higher accuracy, improves the detection rate and shortens the test time.
Description
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a human body contour extraction method based on deep learning.
Background
Human body contour extraction plays an important role in the field of computer vision, and is a core technology of human body detection and human body behavior identification. The human body contour extraction technology is widely applied to the fields of intelligent monitoring, medical treatment and the like at present. The virtual reconstruction of the human body model is a key technology in a modern medical visualization system, and the accurate human body contour information acquisition can ensure that the reasonable medical analysis can be carried out on the diseases of the patient. On the other hand, along with the enhancement of the modern society on personal and public property safety requirements, the utilization rate of the intelligent monitoring system is gradually increased. The primary objective of the intelligent video monitoring technology is to acquire monitoring data by using various monitoring devices, so as to automatically understand and describe events occurring in a detected scene and predict events which may occur in the future. The human body contour extraction is used as a key supporting technology of the intelligent monitoring system, can provide the position and contour information of a human body in an image, is convenient for automatically tracking the human body and identifying behaviors, and therefore the purpose of intelligent monitoring is achieved.
At home and abroad, scholars propose various methods for realizing accurate human body recognition by extracting different features of images and combining classifier training aiming at the difficulty in human body detection of static images. However, although these conventional feature extraction methods can determine the position of the human body, they cannot accurately extract the contour of the human body. Aiming at the problem of target contour extraction, a plurality of effective schemes such as an active contour model, visual saliency and the like are provided. Although these methods can extract the target contour, they have some limitations in terms of computational complexity, real-time performance, and the like.
In recent years, a deep learning method gradually replaces a traditional feature extraction method, and breakthrough progress is made in the fields of target detection, image segmentation and the like. The purpose of deep learning is to automatically perform characteristic learning by simulating the operation of the human brain neural structure during data processing, and further complete the data processing result. Convolutional Neural Networks (CNN) are a type of model in deep learning methods, and their unique weight sharing structure and sparse connection mode make the Network itself dominant in image analysis.
Disclosure of Invention
The invention aims to provide a human body contour extraction method based on deep learning, and solves the problems that in the prior art, the human body contour extraction effect in a static image is poor and the model training speed is slow.
The invention adopts the technical scheme that a human body contour extraction method based on deep learning is implemented according to the following steps:
step 1, extracting Gabor texture features of an original image;
step 2, extracting Canny edge characteristics of the original image;
step 3, building a convolutional neural network framework suitable for human body contour extraction;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image.
The present invention is also characterized in that,
the step 1 is implemented according to the following steps:
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
in the formula (1), Ψu,vFor a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor kernel, respectively, where u is 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,is a spacing factor, is used to limit the frequency neutralization function,is the directional selectivity of the filter;
step 1.2, performing convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x, y), obtaining the Gabor texture characteristics of 8 directions:
Gu,v(x,y)=I(x,y)*Ψu,v (3)。
the step 2 is implemented according to the following steps:
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating the first derivative Z of the horizontal direction x and the vertical direction y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
wherein, Sobel operator in abscissa x and ordinate y direction is:
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting the high threshold to 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting the low threshold to 1/2% of the high threshold, if the gray level of the point processed in step 2.4 is greater than the high threshold, setting the pixel value to 255, if the gray level of the point processed in step 2.4 is less than the low threshold, setting the pixel value to 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, examining the adjacent 8 pixel values, if no point with a value of 255 exists in the adjacent 8 pixel values, setting the pixel value of the point to 0, if a point with a value of 255 exists in the adjacent gradient area, setting the pixel value of the point to 255, and completing the edge feature extraction until all the points are processed.
Step 3 is specifically implemented according to the following steps:
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, connecting a pooling layer behind each convolutional layer, selecting the maximum pooling, selecting the size of a pooling window to be 2 x 2, and moving the step length to be 2;
let P be the pixel of an unknown point, Q11,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
the resulting pixel result f (x, y) for the unknown point P is:
further generating a deconvolution structure;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure;
and 3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent over-fitting of the network, and forming a symmetrical convolutional neural network.
Step 3.2 is specifically implemented according to the following steps:
after an original convolutional layer, a plurality of convolutional layers with convolution kernel of 1 × 1 are connected, and the feature map of each convolutional layer to the last convolutional layer is calculated by the following formula (13):
in the formula (13), (x, y) is a pixel index of the feature map, and axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of MPL layers, and f isActivating a function, wherein w is a weight coefficient and b is offset;
and outputting the feature map of the current layer through the ReLU activation.
The convolutional neural network is:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolutional layer has three convolutional sublayers, with a 1 × 1 convolutional kernel in between each two 3 × 3 convolutional kernels.
Step 4 is specifically implemented according to the following steps:
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
and 4.2, adjusting the image size of the 12 characteristic channels obtained in the step 4.1 to 224 x 224, transmitting the 12 characteristic channels into the convolutional neural network constructed in the step 3, and training by taking the grountruth label graph of the original image as a teacher signal of the CNN to generate the CNN character model.
Step 5 is specifically implemented according to the following steps:
step 5.1, respectively settling the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
and 5.4, performing an AND operation on the original image and the human body mask obtained in the step 5.3, namely performing an AND operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask, setting the corresponding position in the original image as 0 if the pixel at the corresponding position of the mask is 0, and taking the original pixel at the corresponding position in the original image to obtain a human body contour image if the pixel in the mask is 1.
The beneficial effect of the invention is that,
(1) according to the human body contour extraction method based on deep learning, the operation of region selection on an original image is not needed, and a series of complex operations such as region combination on an output object are not needed;
(2) according to the human body contour extraction method based on deep learning, Canny edge characteristics are added on the basis of Gabor-CNN, high accuracy is achieved, the detection rate is improved, and the testing time is shortened.
Drawings
FIG. 1 is a schematic diagram of bilinear interpolation;
FIG. 2 is a graph of a convolutional neural network upsampling DAG used by the extraction method of the present invention;
FIG. 3 is a general structure diagram of a CNN used in the extraction method of the present invention;
FIG. 4 is a structural diagram of the extraction method of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a human body contour extraction method based on deep learning, which is implemented according to the following steps:
step 1, extracting Gabor texture features of an original image;
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
in the formula (1), Ψu,vFor a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor kernel, respectively, where u is 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,is a spacing factor, is used to limit the frequency neutralization function,is the directional selectivity of the filter;
step 1.2, performing convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x,y):
Gu,v(x,y)=I(x,y)*Ψu,v (3)
In order to obtain features of the original image, in particular local saliency features in multiple directions, a set of two-dimensional Gabor filters (Gabor kernel functions) with 8 directions u of 0, pi/4, pi/2, 3 pi/4, pi, 5 pi/4, 6 pi/4 or 7 pi/4, 8 directions is used for representation, wherein sigma is 2 pi, k ismax=π/2,Obtaining Gabor texture characteristics in 8 directions;
step 2, extracting Canny edge characteristics of the original image;
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating a first derivative Z of the horizontal direction (abscissa) x and the vertical direction (ordinate) y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
wherein, Sobel operator in abscissa x and ordinate y direction is:
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting a high threshold as 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting a low threshold as 1/2% of the high threshold, wherein if the gray level of the point processed in step 2.4 is greater than the high threshold, the pixel value is 255, if the gray level of the point processed in step 2.4 is less than the low threshold, the pixel value is 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, the adjacent 8 pixel value is examined, if no point with the value of 255 exists in the adjacent 8 pixel value, the pixel value of the point is 0, if a point with the value of 255 exists in the adjacent gradient area, the pixel value of the point is 255, and until all the points are processed, the edge feature extraction is completed;
step 3, building a convolutional neural network framework suitable for human body contour extraction;
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, and selecting the largest pooling mode by connecting a pooling layer behind each convolutional layer, wherein the size of a pooling window is selected to be 2 x 2, and the moving step is 2;
let P be the pixel of unknown point, Q, as shown in FIG. 111,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
the resulting pixel result f (x, y) for the unknown point P is:
thereby generating a deconvolution structure as shown in fig. 2;
the deconvolution process can combine the outputs of multiple stages of the neural network to strengthen the result, the result is realized by a bilinear interpolation method, and the pixel value of the middle point is obtained by the pixel values of four surrounding points by using the bilinear interpolation method, so that the pooled layer can be filled to obtain the size of the original image;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure, and the specific process is as follows:
adding a convolution kernel of 1 x 1 into every two convolution kernels of 3 x 3, and calculating the characteristic diagram of each convolution layer to the last convolution layer by the following formula (13) according to the principle of the MLP convolution layer, wherein the convolution kernel is a convolution layer with 1 x 1 after an original convolution layer:
in the formula (13), (x, y) is a pixel index of the feature map, i.e., an x coordinate axis and a y coordinate axis, axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of layers of the MPL, f is an activation function, w is a weight coefficient, and b is a bias;
then, activating and outputting a feature map of the current layer through the ReLU;
step 3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent network overfitting, so as to form a symmetrical convolution neural network, wherein the specific process of preventing network overfitting is as follows: randomly discarding part of parameters in the iteration process of the VGG16 network model in the network, and setting the randomly discarded part of parameters as 0;
the neural network structure is as follows:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolution layer has three convolution sublayers, and a 1 × 1 convolution kernel is arranged between every two 3 × 3 convolution kernels;
reducing the size of the input image from 224 to 28 by 224 through feature extraction, and then performing size reduction by using an deconvolution layer to generate a feature map of the human body outline of the image;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
step 4.2, adjusting the image size of the 12 characteristic channels obtained in step 4.1 to 224 x 224, then transmitting the 12 characteristic channels into the convolutional neural network constructed in step 3, training by taking a groudtruth label diagram (the pixels of the human body area are 1, and the rest pixels are 0) of the original image as a teacher signal of the CNN to generate a CNN character model, wherein the training process comprises forward calculation of the neural network model and backward error transfer calculation of the neural network model, and the forward calculation and the backward error calculation are subjected to iterative processing, and the iteration number is 800;
the forward calculation, the reverse calculation and the iteration process are all represented by pseudo codes, each iteration is a process of realizing one forward calculation and one error back propagation, and the core pseudo codes in the neural network training process are as follows:
Step1:initModel(model);
v/initialize neural network model
Step2:for iter<-1to N do
Step2.1:forward(model);
V/neural network model Forward computation
Step2.2:backward(model);
V/neural network model inverse error transfer calculation
Step2.3:update(model);
// updating neural network weights
Step3:return trained model.
// Return to trained model
Step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
step 5.1, respectively settling the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
step 5.4, performing an and operation on the original image and the human body mask obtained in the step 5.3, namely performing an and operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask (a human body contour region is 1, and a non-human body region is 0), setting the corresponding position in the original image to be 0 if the pixel of the corresponding position of the mask is 0, and taking the original pixel from the corresponding position in the original image to obtain a human body contour image if the pixel of the mask is 1;
step 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image;
wherein, the overlapping rate is:
examples
The data set source is a hundred-degree human body image segmentation database, and data in the database are images which are shot from various angles and contain human bodies. 5387 training images and labeled samples in the database respectively. The invention selects 1000 images as training set, and selects 500 images as test set in the rest part. The network input image size was fixed at 224 x 224 in the experiment. In order to accurately and objectively evaluate the effect of the method, and to facilitate comparison with existing methods, the performance of the human contour extraction model of the improved method is measured by using an overlap ratio, wherein the overlap ratio is defined as follows:
wherein S is the degree of overlap, APExtracting network predicted body regions for body contours, AGTIs the actual body area. The higher the S, the higher the overlapping degree is, and the better the human body contour extraction effect is.
The differences of the five methods in terms of input image size, overlapping rate, time consumption and display card are shown in table 1;
TABLE 1 comparison of five human body segmentation methods
In the training process of the neural network, the speed of the GPU can reach 100 times or even 1000 times that of the CPU, the training on the middle-end graphics card of the GTX750 takes about several days, and the running on the high-end CPU of i7 takes at least more than one month, but the testing process is different from the training process, and in the testing stage, the speed of the CPU is still acceptable, one picture is about 10s, and the GPU can reach the testing speed of millisecond level. As can be seen from Table 1, the method provided by the invention has an overlapping rate of 92.03%, the test speed of a single picture reaches 68.84 milliseconds, and the method of Pixel-by-Pixel has an overlapping rate of 86.83%, but the test time is too long and has no real-time performance. Although the Alex-seg-net method consumes the least time in the test of a single image, the overlapping rate of the method only reaches 80.2 percent, and the effect is not ideal; the method is higher than a Pool5-net method in the aspect of overlapping rate, but the effect in the aspect of test time consumption is not as good as that of the Pool5-net method, while a display card GTX960 used by the Pool5-net method belongs to a middle and high-end display card, and the performance is obviously better than that of a GTX750 display card in the experiment, the method of the invention adds Canny edge characteristics on the basis of Gabor-CNN, thereby not only improving the detection rate, but also shortening the test time; therefore, it can be seen that the method of the present invention can exhibit good effects under the condition of limited hardware. In summary, the method provided by the invention combines the traditional characteristics with the deep learning, which not only achieves higher accuracy, but also has shorter testing time, and can meet the requirements of some practical applications.
Claims (4)
1. A human body contour extraction method based on deep learning is characterized by comprising the following steps:
step 1, extracting Gabor texture features of an original image;
step 2, extracting Canny edge characteristics of the original image;
step 3, building a convolutional neural network framework suitable for human body contour extraction:
step 3.1, modifying the VGG16 network structure based on the VGG16 network model, reducing 5 convolutional layers in the VGG16 network model to 4, connecting a pooling layer behind each convolutional layer, selecting the maximum pooling, selecting the size of a pooling window to be 2 x 2, and moving the step length to be 2;
let P be the pixel of an unknown point, Q11,Q12,Q21,Q22For four points of known pixels around the P point, the known function f is then at Q11=(x1,y1),Q12=(x1,y2),Q21=(x2,y1) And Q22=(x2,y2) The values of the four points are linearly interpolated in the x direction to obtain two intermediate points R1,R2Pixel value f (R) of1) And f (R)2) Filling the pooled layers to obtain the size of the original image:
wherein R is1=(x,y1),R2=(x,y2);
Reunion of R1,R2Linear interpolation is performed in the y direction to obtain a pixel value f (P) of the unknown point P:
the resulting pixel result f (x, y) for the unknown point P is:
further generating a deconvolution structure;
step 3.2, introducing a network in the network on the basis of the step 3.1, namely replacing each original convolutional layer structure with an MLP convolutional layer structure, specifically:
after an original convolutional layer, a plurality of convolutional layers with convolution kernel of 1 × 1 are connected, and the feature map of each convolutional layer to the last convolutional layer is calculated by the following formula (13):
in the formula (13), (x, y) is a pixel index of the feature map, and axyIs an input block, k, centered at (x, y)tFor the index of the feature map, t is the number of MPL layers, f is the activation function, and w is the weight coefficientAnd b is an offset;
then, activating and outputting a feature map of the current layer through the ReLU;
3.3, adding a dropout layer between the convolution layer processed in the step 3.2 and the deconvolution layer to prevent over-fitting of the network, and forming a symmetrical convolution neural network;
the convolutional neural network is:
the first layer is an input layer, and the input size of the layer is as follows: 224 × 12;
the second layer is a convolutional layer with input dimensions: 112 × 8;
the third layer is a convolutional layer with input dimensions: 56 x 16;
the fourth layer is a convolutional layer with the input dimensions: 28 x 32;
the fifth layer is a dropout layer with input dimensions of: 28 x 64;
the sixth layer is a deconvolution layer, and the output size of the layer is: 56 x 32;
the seventh layer is a deconvolution layer, and the output size of the layer is: 112 × 16;
the eighth layer is a deconvolution layer, and the output size of the layer is: 224 × 8;
the ninth layer is an output layer with an output size of 224 × 1;
each convolution layer has three convolution sublayers, and a 1 × 1 convolution kernel is arranged between every two 3 × 3 convolution kernels;
step 4, transmitting the original image, the Gabor texture feature map extracted in the step 1 and the Canny edge feature map extracted in the step 2 into the convolutional neural network constructed in the step 3 together for training to generate a CNN character model;
step 4.1, combining the Gabor characteristics in 8 directions obtained in the step 1, the Canny edge characteristics obtained in the step 2 and the RGB three-channel characteristics of the original image into 12 characteristic channels;
step 4.2, adjusting the image size of the 12 characteristic channels obtained in the step 4.1 to 224 x 224, then transmitting the 12 characteristic channels into the convolutional neural network constructed in the step 3, and training by taking a grountruth label graph of an original image as a teacher signal of the CNN to generate a CNN character model;
step 5, testing the structure of the trained CNN character model to obtain a human body contour image;
and 6, recording the overlapping rate and the time consumption of the human body contour image through the testing process of the step 5, and evaluating the human body contour image.
2. The method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 1 is implemented according to the following steps:
step 1.1, obtaining a two-dimensional Gabor filter according to a formula (1):
in the formula (1), Ψu,vIn a two-dimensional Gabor filter, u, v are the orientation and dimensions of the Gabor nuclei, respectively, where u is 0, π/4, π/2, 3 π/4, π, 5 π/4, 6 π/4 or 7 π/4, ku,vThe device is used for controlling the width of a Gaussian window, wherein z is (x, y) space position coordinates, sigma 2 pi is the ratio of the width of the Gaussian window to the wavelength, and i is an imaginary number unit;
wherein the direction and wavelength of the oscillating part are ku,vComprises the following steps:
in the formula (2), the sampling frequency of the filter is kv=kmax/Fv,kmaxPi/2 is the maximum sampling frequency,is a spacing factor, is used to limit the filter sampling frequency,is the directional selectivity of the filter;
step (ii) of1.2, carrying out convolution operation on the original image I (x, y) and the two-dimensional Gabor filter obtained in the step 1.1, and extracting Gabor characteristic G of the original image at the position of (x, y)u,v(x, y), obtaining the Gabor texture characteristics of 8 directions:
Gu,v(x,y)=I(x,y)*Ψu,v (3)。
3. the method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 2 is implemented according to the following steps:
step 2.1, setting weight parameters for RGB three channels to complete gray processing of the original image, wherein the RGB three channel parameter setting expression is as follows:
Gray=R*0.299+G*0.587+B*0.114
step 2.2, processing the gray level image processed in the step 2.1 by using a first derivative of a two-dimensional Gaussian function, wherein the expression of the two-dimensional Gaussian function is as follows:
in the formula (4), δ is a smoothing parameter, and the larger δ is, the more remarkable the smoothing effect is;
and smoothing the 3 × 3 area of the original image by using a 3 × 3 Gaussian convolution kernel, wherein the Gaussian convolution kernel is as follows:
step 2.3, searching the position with the strongest gray intensity change in the gray image processed in the step 2.2, and calculating the first derivative Z of the horizontal direction x and the vertical direction y of the gray image by using a Sobel operatorxAnd ZyObtaining the boundary gradient amplitude | Z | and the direction β:
wherein, Sobel operator in abscissa x and ordinate y direction is:
step 2.4, equally dividing the boundary gradient amplitude | Z | obtained in the step 2.3 into four gradient areas, wherein each gradient area corresponds to a quadrant of a coordinate axis Z, then calculating all points in each area one by one along the gradient direction beta of each point in sequence, comparing the gradient amplitude | Z | of each point with two adjacent points, if the point is larger than the front point and the rear point, reserving the point, and if the point is smaller than the front point and the rear point, setting the point to be zero, thereby carrying out non-maximum value inhibition operation on the gray level image processed in the step 2.3 to thin the edge and eliminating non-edge noise points;
step 2.5, setting the high threshold to 70% of the overall gray level distribution of the gray level image processed in step 2.4, and setting the low threshold to 1/2% of the high threshold, if the gray level of the point processed in step 2.4 is greater than the high threshold, setting the pixel value to 255, if the gray level of the point processed in step 2.4 is less than the low threshold, setting the pixel value to 0, if the gray level of the point processed in step 2.4 is between the high threshold and the low threshold, examining the adjacent 8 pixel values, if no point with a value of 255 exists in the adjacent 8 pixel values, setting the pixel value of the point to 0, if a point with a value of 255 exists in the adjacent gradient area, setting the pixel value of the point to 255, and completing the edge feature extraction until all the points are processed.
4. The method for extracting the human body contour based on the deep learning as claimed in claim 1, wherein the step 5 is implemented according to the following steps:
step 5.1, respectively calculating the Gabor texture features extracted in the step 1 and the Canny edge features extracted in the step 2 for the image to be detected;
step 5.2, inputting RGB three-channel characteristics, Gabor texture characteristics and Canny edge characteristics of the image to be detected into the CNN model trained in the step 4 to obtain a human body contour heat map;
step 5.3, performing opening and closing operation on the human body contour heat map obtained in the step 5.2, and performing smoothing treatment on the image through a Gaussian low-pass filter to obtain a human body mask;
and 5.4, performing an AND operation on the original image and the human body mask obtained in the step 5.3, namely performing an AND operation on each pixel point of the original image and a corresponding pixel point of the generated human body mask, setting the corresponding position in the original image as 0 if the pixel at the corresponding position of the mask is 0, and taking the original pixel at the corresponding position in the original image to obtain a human body contour image if the pixel in the mask is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810582283.3A CN109033945B (en) | 2018-06-07 | 2018-06-07 | Human body contour extraction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810582283.3A CN109033945B (en) | 2018-06-07 | 2018-06-07 | Human body contour extraction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109033945A CN109033945A (en) | 2018-12-18 |
CN109033945B true CN109033945B (en) | 2021-04-06 |
Family
ID=64612339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810582283.3A Expired - Fee Related CN109033945B (en) | 2018-06-07 | 2018-06-07 | Human body contour extraction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033945B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109872326B (en) * | 2019-01-25 | 2022-04-05 | 广西科技大学 | Contour detection method based on deep reinforced network jump connection |
CN109903301B (en) * | 2019-01-28 | 2021-04-13 | 杭州电子科技大学 | Image contour detection method based on multistage characteristic channel optimization coding |
CN109920049B (en) * | 2019-02-26 | 2021-05-04 | 清华大学 | Edge information assisted fine three-dimensional face reconstruction method and system |
CN113570052B (en) * | 2020-04-28 | 2023-10-31 | 北京达佳互联信息技术有限公司 | Image processing method, device, electronic equipment and storage medium |
CN112102141B (en) * | 2020-09-24 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Watermark detection method, watermark detection device, storage medium and electronic equipment |
CN112258440B (en) * | 2020-10-29 | 2024-01-02 | 北京达佳互联信息技术有限公司 | Image processing method, device, electronic equipment and storage medium |
CN112257729B (en) * | 2020-11-13 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Image recognition method, device, equipment and storage medium |
CN113128614B (en) * | 2021-04-29 | 2023-06-16 | 西安微电子技术研究所 | Convolution method based on image gradient, neural network based on direction convolution and classification method |
CN113536968B (en) * | 2021-06-25 | 2022-08-16 | 天津中科智能识别产业技术研究院有限公司 | Method for automatically acquiring boundary coordinates of inner and outer circles of iris |
CN113570627B (en) * | 2021-07-02 | 2024-04-16 | 上海健康医学院 | Training method of deep learning segmentation network and medical image segmentation method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105678792A (en) * | 2016-02-25 | 2016-06-15 | 中南大学 | Method and system for extracting body profile |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4699298B2 (en) * | 2006-06-28 | 2011-06-08 | 富士フイルム株式会社 | Human body region extraction method, apparatus, and program |
CN103093474B (en) * | 2013-01-28 | 2015-03-25 | 电子科技大学 | Three-dimensional mammary gland ultrasound image partition method based on homoplasmon and partial energy |
CN105335716B (en) * | 2015-10-29 | 2019-03-26 | 北京工业大学 | A kind of pedestrian detection method extracting union feature based on improvement UDN |
CN106529447B (en) * | 2016-11-03 | 2020-01-21 | 河北工业大学 | Method for identifying face of thumbnail |
CN106778481A (en) * | 2016-11-15 | 2017-05-31 | 上海百芝龙网络科技有限公司 | A kind of body heath's monitoring method |
CN106781282A (en) * | 2016-12-29 | 2017-05-31 | 天津中科智能识别产业技术研究院有限公司 | A kind of intelligent travelling crane driver fatigue early warning system |
CN107301408B (en) * | 2017-07-17 | 2020-06-23 | 成都通甲优博科技有限责任公司 | Human body mask extraction method and device |
CN108009472B (en) * | 2017-10-25 | 2020-07-21 | 五邑大学 | Finger back joint print recognition method based on convolutional neural network and Bayes classifier |
-
2018
- 2018-06-07 CN CN201810582283.3A patent/CN109033945B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105678792A (en) * | 2016-02-25 | 2016-06-15 | 中南大学 | Method and system for extracting body profile |
Also Published As
Publication number | Publication date |
---|---|
CN109033945A (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033945B (en) | Human body contour extraction method based on deep learning | |
US11908244B2 (en) | Human posture detection utilizing posture reference maps | |
CN108053417B (en) | lung segmentation device of 3D U-Net network based on mixed rough segmentation characteristics | |
Wang et al. | Haze concentration adaptive network for image dehazing | |
CN109035172B (en) | Non-local mean ultrasonic image denoising method based on deep learning | |
CN106408001A (en) | Rapid area-of-interest detection method based on depth kernelized hashing | |
CN112016682B (en) | Video characterization learning and pre-training method and device, electronic equipment and storage medium | |
CN112446862A (en) | Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method | |
CN111310609B (en) | Video target detection method based on time sequence information and local feature similarity | |
CN110060225B (en) | Medical image fusion method based on rapid finite shear wave transformation and sparse representation | |
Pandey et al. | Segmentation of liver lesions with reduced complexity deep models | |
Cheng et al. | DDU-Net: A dual dense U-structure network for medical image segmentation | |
Zheng et al. | Interactive multi-scale feature representation enhancement for small object detection | |
CN114049503A (en) | Saliency region detection method based on non-end-to-end deep learning network | |
CN111401209B (en) | Action recognition method based on deep learning | |
CN116884036A (en) | Live pig posture detection method, device, equipment and medium based on YOLOv5DA | |
Yin et al. | Super resolution reconstruction of CT images based on multi-scale attention mechanism | |
CN110706209B (en) | Method for positioning tumor in brain magnetic resonance image of grid network | |
Song et al. | Spatial-aware dynamic lightweight self-supervised monocular depth estimation | |
Jin et al. | Deep neural network-based noisy pixel estimation for breast ultrasound segmentation | |
Wu et al. | Two-Stage Progressive Underwater Image Enhancement | |
Kim et al. | Tackling Structural Hallucination in Image Translation with Local Diffusion | |
Yin et al. | Visual Attention and ODE-inspired Fusion Network for image dehazing | |
Yuan et al. | Enhanced target tracking algorithm for autonomous driving based on visible and infrared image fusion | |
Asha et al. | Segmentation of Brain Tumors using traditional Multiscale bilateral Convolutional Neural Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210406 |