CN110263806B

CN110263806B - Skin image actual area estimation method based on deep learning

Info

Publication number: CN110263806B
Application number: CN201910385989.5A
Authority: CN
Inventors: 李东; 彭国豪; 王颖; 庄洪林
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-05-09
Filing date: 2019-05-09
Publication date: 2023-04-18
Anticipated expiration: 2039-05-09
Also published as: CN110263806A

Abstract

The invention provides a skin image actual area estimation method based on deep learning, which comprises the following steps: s1, preprocessing a shot skin image to obtain a pore density map of the shot skin image, and using the pore density map as a label of the skin image; s2, constructing a PDiNet convolution neural network model, taking the shot skin image and the corresponding label pore density map as the input of the model, and training the model by using a random gradient descent method; and S3, generating a pore density map of the shot skin image by using the trained model, further integrating the density map to obtain the number of pores on the image, and obtaining the actual size of the shot skin according to a formula. The method uses the convolutional neural network to learn the pore characteristics, has simple structure and low complexity, and improves the accuracy of pore counting estimation.

Description

Skin image actual area estimation method based on deep learning

Technical Field

The invention relates to the technical field of deep learning, in particular to a skin image actual area estimation method based on deep learning.

Background

Current skin area measurement protocols include ruler measurements, checkerboard measurements, manual measurements by taking a picture, automated measurements, and the like.

The ruler measurement method measures the length and the length of the skin by using a ruler, and then the skin area is measured by multiplication. This solution requires contact with the skin and is subject to considerable error.

The checkerboard measurement method uses transparent checkerboard to determine the skin area by counting the number of squares contained in the skin. This solution requires contact with the skin and is subject to considerable error.

The manual measurement method of taking a picture of the skin, using image software, manually segments the skin contour and calculates the area. This solution, although not requiring contact with the skin, is extremely labor intensive.

The automatic measurement method utilizes algorithms such as edge recognition to automatically segment the skin and calculate the area. The scheme has small workload, but cannot identify complex skin, such as skin with complex components and impurities, and is influenced by the shooting environment such as ambient brightness, and the skin edge identification success rate is low and the stability is poor.

In view of the above problems, a high-precision and high-stability skin image actual area estimation method based on deep learning is needed.

Disclosure of Invention

In order to overcome the defects, the invention provides a skin image actual area estimation method based on deep learning.

In order to solve the technical problems, the invention adopts the technical scheme that:

a skin image actual area estimation method based on deep learning comprises the following steps:

s1, preprocessing a shot skin image to generate a skin pore density map, and taking the pore density map as a corresponding shot skin image label;

s2, constructing a PDiNet convolution neural network model, taking the shot skin image and the corresponding pore density map label as the input of the model, and training the model by using a random gradient descent method;

and S3, testing the shot skin image by using the trained model, and integrating the pore density map generated by the model to obtain the number of pores on the image.

In a preferred embodiment, the specific step of generating the skin pore density map in S1 is as follows, and for each captured skin image, the original image is converted into the pore density map by a formula as follows:

where x represents a pore at a certain pixel position in the picture, N represents the number of all pores in the picture, i ∈ N, δ (x) is a dirac δ function, and G (x) is a gaussian kernel function. The significance of this loss function is to compute the euclidean distance between the pore density predicted by the model and its benchmark.

In a preferred embodiment, the convolutional neural network model includes a front-end network and a back-end network; the front-end network comprises ten convolutional layers and three pooling layers, and the rear-end network comprises one convolutional layer and six void convolutional layers; wherein the front-end network comprises a convolutional layer C1, a convolutional layer C2, a convolutional layer C3, a convolutional layer C4, a convolutional layer C5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8, a convolutional layer C9, a convolutional layer C10, a pooling layer P1, a pooling layer P2 and a pooling layer P3; the back-end network includes a convolutional layer C6, a void convolutional layer DC1, a void convolutional layer DC2, a void convolutional layer DC3, a void convolutional layer DC4, a void convolutional layer DC5, and a void convolutional layer DC6.

And S25, inputting the P2 downsampling feature map of 2W × 2H × 64 obtained in the step S24 into the C5 convolutional layer. The P2 downsampled feature map is convolved with a 1 pixel step size using the relu activation function through 128 3*3 filters, outputting a C3 convolved feature map of 2w x 2h x 128. Since the C6, C7 and C5 convolution parameters are completely consistent, the characteristic diagram of 2w × 2h × 128 passes through the C6, C7 convolution layer, and then the C7 convolution layer characteristic diagram of 2w × 2h × 128 is output, as in the operation flow of the C5 convolution layer.

And S26, inputting the C7 convolution layer characteristic diagram of 2W.2H.128 obtained in the step S25 into the P3 pooling layer. Dividing the C7 convolution feature map into 2*2 subregions, taking the maximum value of each region, and outputting W H128P 3 down-sampling feature map

And S27, inputting the P3 downsampling feature map of W H128 obtained in the step S26 into the C8 convolutional layer. The P3 downsampled feature map was convolved with a step size of 1 pixel using a relu activation function through a 256 3*3 filter, outputting a W H256C 8 convolved feature map. Since the convolution parameters of C9, C10 and C8 are completely consistent, the characteristic diagram of W x H256 passes through the convolution layers of C9 and C10, and then the characteristic diagram of W x H512 of C10 convolution layer is output, as the operation flow of C8 convolution layer

And S28, inputting the C10 convolution layer characteristic diagram of W H256 obtained in the step S27 into the DC1 hollow convolution layer. The expansion rate of the filter was set to 2 by using 256 expansion filters 3*3, and the C10 convolution signature was convolved with the relu activation function, thereby outputting a DC1 hole convolution signature of W × H × 256. Since the convolution parameters of DC2 and DC3 are completely consistent with those of DC1, the characteristic diagram of W × H256 passes through the DC2 and DC3 void convolution layers, and then the characteristic diagram of W × H256 of DC3 void convolution layers is output, as in the operation flow of the DC1 void convolution layer.

And S29, inputting the DC3 hole convolution layer feature diagram of W x H x 256 obtained in the step S28 into the DC4 hole convolution layer. Through 128 expansion filters 3*3, the expansion rate of the filters is set to be 2, the DC3 hole convolution characteristic diagram is subjected to convolution operation by using a relu activation function, and a DC4 hole convolution characteristic diagram of W x H x 128 is output;

and S210, inputting the characteristic diagram of the DC4 void convolution layer of W H128 obtained in the step S29 into the DC5 void convolution layer. Through 64 expansion filters 3*3, the expansion rate of the filters is set to be 2, the DC4 hole convolution characteristic diagram is subjected to convolution operation by using a relu activation function, and a DC5 hole convolution characteristic diagram of W H64 is output;

and S211, inputting the characteristic diagram of the DC5 void volume convolution layer of W H64 obtained in the step S210 into the DC6 void volume convolution layer. Performing convolution operation on the DC5 hole convolution characteristic diagram by using a relu activation function through 32 expansion filters 3*3 with the expansion rate of the filters set to be 2, and outputting a DC6 hole convolution characteristic diagram of W H64;

and S212, inputting the DC6 void convolution layer characteristic diagram of W x H x 32 obtained in the step S211 into the C11 convolution layer. Performing convolution operation on the DC6 hole convolution feature map by using a step size of 1 pixel and a relu activation function through 1 1*1 filters to output a C11 convolution feature map of W H1;

and S213, carrying out bilinear interpolation operation on the C11 convolution feature map of the W.H.1 obtained in the step S212 to enlarge the feature map to 8 times of the original feature map, thus obtaining the feature map of 8 W.H.1. And after obtaining the characteristic diagram of 8W 8H 1, optimizing the loss function by constructing the loss function and utilizing a random gradient descent method, so that the parameters of the PDInT model are adjusted to the optimal state.

In a preferred embodiment, the loss function is formulated as follows:

wherein N represents a graphic postNumber of pores, i ∈ N, F (X) _i θ) represents the pore density map predicted by the model, θ being a parameter of the model, F _i ^GT A reference of the pore density map is shown.

In a preferred embodiment, the step S3 includes the following steps:

s31, directly inputting the shot skin pore picture into the trained convolutional neural network model to obtain a corresponding skin pore density map, and solving the total number of pores in the skin pore density map, wherein the formula is as follows:

wherein N represents the total number of pores, L, W represents the length and width of the pore density map, and Z _l，w Values representing the coordinates (l, w) in the pore density map;

s32, solving the actual area of the skin shot in the graph according to a formula, wherein the formula is as follows:

s denotes the actual area of the skin photographed, N denotes the total number of pores, and p denotes the average density of skin pores.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the method uses the convolutional neural network to learn the pore characteristics, has simple structure and low complexity, and improves the accuracy of pore counting estimation.

Drawings

Fig. 1 is a flowchart of a skin image actual area estimation method based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a PDINet convolutional neural network.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and are used for illustration only, and should not be construed as limiting the patent. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, a skin image actual area estimation method based on deep learning includes the following steps:

s1, shooting a skin image, preprocessing the shot skin image to generate a skin pore density map, and taking the pore density map as a corresponding shot skin image label;

to convert the original skin pore image into a density map, a gaussian kernel is chosen to perform the conversion. Can be represented by the formula delta (x-x) _i ) To represent the location X in the skin pore image _i The method comprises the following steps of (1) obtaining a skin pore density map by convolving a pore with a Gaussian function and finally accumulating pores at all positions, wherein the specific formula is as follows:

/>

wherein x represents a pore at a certain pixel position of the picture, N represents the number of all pores on the picture, i belongs to N, delta (x) is a Dirac delta function, and G (x) is a Gaussian kernel function;

s2, constructing a PDiNet convolutional neural network model, wherein the PDiNet convolutional neural network model comprises a front-end network and a rear-end network as shown in FIG. 2; the front-end network comprises ten convolutional layers and three pooling layers, and the rear-end network comprises one convolutional layer and six void convolutional layers; wherein the front-end network comprises a convolutional layer C1, a convolutional layer C2, a convolutional layer C3, a convolutional layer C4, a convolutional layer C5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8, a convolutional layer C9, a convolutional layer C10, a pooling layer P1, a pooling layer P2 and a pooling layer P3; the back-end network comprises a convolutional layer C6, a void convolutional layer DC1, a void convolutional layer DC2, a void convolutional layer DC3, a void convolutional layer DC4, a void convolutional layer DC5 and a void convolutional layer DC6;

inputting the shot skin image and the label obtained in the step S1, namely the skin pore density map, into the model, and finally obtaining a characteristic map of the original image with the size being one eighth, wherein the specific flow is as follows;

s21, skin images of size 8w 8h 1 are input to the C1 convolution layer of the PDINet model. The skin pore density map was convolved with a filter of 32 3*3, using a step size of 1 pixel, using the relu activation function, to output a C1 convolutional layer feature map of 8w 8h 32, where 8W is the width of the feature map, 8H is the height of the feature map, and 32 is the number of filters. The C1 and C2 have the same convolution parameters, so like the operation flow of the C1 convolutional layer, the C2 convolutional layer feature map of 8w 8h 32 is output after the C2 convolutional layer.

And S22, inputting the C2 convolution layer characteristic diagram of 8W 8H 32 obtained in the step S21 into the P1 pooling layer. And (3) dividing the C2 convolution feature map into 2*2 sub-regions, taking the maximum value of each region, and outputting a P1 downsampling feature map of 4W x 4H x 32.

And S23, inputting the P1 downsampling feature map of 4W 4H 64 obtained in the step S22 into the C3 convolution layer. The P1 downsampled feature map was convolved using a step size of 1 pixel with the relu activation function through 64 3*3 filters, outputting a C3 convolved feature map of 4w × 4h × 64. Since C4 completely coincides with the C3 convolution parameters, the feature map of 4w × 4h × 64 passes through the C4 convolution layer, and then the C4 convolution layer feature map of 4w × 4h × 64 is output, as in the operation flow of the C3 convolution layer.

And S24, inputting the C4 convolution layer feature map of 4W 4H 64 obtained in the step S23 into the P2 pooling layer. And (3) dividing the C4 convolution feature map into 2*2 sub-regions, taking the maximum value of each region, and outputting a P2 downsampling feature map of 2W x 2H x 64.

And S25, inputting the P2 downsampling feature map of 2W × 2H × 64 obtained in the step S24 into the C5 convolutional layer. The P2 downsampled feature map is convolved using a 1 pixel step size using the relu activation function through a 128 3*3 filter to output a 2w x 2h x 128C 3 convolved feature map. Since C6, C7 and C5 convolution parameters are completely consistent, the C7 convolution layer feature map of 2w × 2h × 128 is output after the feature map of 2w × 2h × 128 passes through the C6, C7 convolution layer, as in the operation flow of the C5 convolution layer.

And S26, inputting the C7 convolution layer feature map of 2W × 2H × 128 obtained in the step S25 into the P3 pooling layer. Dividing the C7 convolution feature map into 2*2 subregions, taking the maximum value of each region, and outputting W H128P 3 down-sampling feature map

And S27, inputting the P3 downsampling feature map of W H128 obtained in the step S26 into the C8 convolutional layer. The P3 downsampled feature map was convolved using a filter of 256 3*3, using a step size of 1 pixel, using the relu activation function, to output a C8 convolved feature map of W × H × 256. Since the convolution parameters of C9, C10 and C8 are completely consistent, the characteristic diagram of W x H256 passes through the convolution layers of C9 and C10, and then the characteristic diagram of W x H512 of C10 convolution layer is output, as the operation flow of C8 convolution layer

And S29, inputting the characteristic diagram of the DC3 void volume convolution layer of W x H x 256 obtained in the step S28 into the DC4 void volume convolution layer. Through 128 expansion filters 3*3, the expansion rate of the filters is set to be 2, the DC3 hole convolution characteristic diagram is subjected to convolution operation by using a relu activation function, and a DC4 hole convolution characteristic diagram of W x H x 128 is output;

and S210, inputting the characteristic diagram of the DC4 hole convolution layer with W x H x 128 obtained in the step S29 into the DC5 hole convolution layer. Through 64 expansion filters 3*3, the expansion rate of the filters is set to be 2, the DC4 hole convolution characteristic diagram is subjected to convolution operation by using a relu activation function, and a DC5 hole convolution characteristic diagram of W H64 is output;

and S212, inputting the characteristic diagram of the DC6 void volume convolution layer of W H32 obtained in the step S211 into the C11 convolution layer. Performing convolution operation on the DC6 hole convolution feature map by using a step size of 1 pixel and a relu activation function through 1 1*1 filters to output a C11 convolution feature map of W H1;

and S213, carrying out bilinear interpolation operation on the C11 convolution feature map of the W.H.1 obtained in the step S212 to enlarge the feature map to 8 times of the original feature map, thus obtaining the feature map of 8 W.H.1. And after obtaining a feature map of 8W 8H 1, optimizing the loss function by constructing the loss function and utilizing a random gradient descent method, so that the parameters of the PDInT model are adjusted to the optimal state. Specifically, the loss function is formulated as follows:

where N represents the number of all pores in the picture, i ∈ N, F (X) _i θ) represents the pore density map predicted by the model, θ being a parameter of the model, F _i ^GT A reference of the pore density map is shown. The significance of this loss function is to compute the euclidean distance between the pore density predicted by the model and its benchmark.

And S3, after the PDiNet model is optimized, further calculation is needed in order to obtain the number of skin pores and the size of the actual skin. When the shot skin pore picture is directly input into a trained PDInet model, a skin pore density map corresponding to the picture is obtained, and then according to a formula:

wherein N represents the total number of pores, L, W represents the length and width of the pore density map, and Z _l，w Is shown in poresThe values at the coordinates (l, w) in the density map. Then according to the formula:

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A skin image actual area estimation method based on deep learning is characterized by comprising the following steps:

s2, constructing a PDiNet convolution neural network model, taking the shot skin image and the corresponding pore density map label as the input of the model, and training the model by using a random gradient descent method; the PDiNet convolutional neural network model comprises a front-end network and a back-end network; the front-end network comprises ten convolutional layers and three pooling layers, and the back-end network comprises one convolutional layer and six void convolutional layers; wherein the front-end network comprises a convolutional layer C1, a convolutional layer C2, a convolutional layer C3, a convolutional layer C4, a convolutional layer C5, a convolutional layer C6, a convolutional layer C7, a convolutional layer C8, a convolutional layer C9, a convolutional layer C10, a pooling layer P1, a pooling layer P2 and a pooling layer P3; the back-end network comprises a convolutional layer C11, a void convolutional layer DC1, a void convolutional layer DC2, a void convolutional layer DC3, a void convolutional layer DC4, a void convolutional layer DC5 and a void convolutional layer DC6; the method specifically comprises the following steps:

s21, inputting a skin image with the size of 8W 8H 1 into a C1 convolution layer of a PDInT model, performing convolution operation on a skin pore density map by using a step size of 1 pixel and a relu activation function through 32 filters of 3*3, and outputting a C1 convolution layer characteristic map of 8W 8H 32, wherein 8W is the width of the characteristic map, 8H is the height of the characteristic map, and 32 is the number of the filters; c1 and C2 have the same convolution parameter, after the characteristic diagram of 8W 8H 32 passes through the C2 convolution layer, the C2 convolution layer characteristic diagram of 8W 8H 32 is output;

s22, inputting the C2 convolution layer feature map of 8W 8H 32 obtained in the step S21 into a P1 pooling layer, dividing the C2 convolution feature map into 2*2 sub-regions, taking the maximum value of each region, and outputting a P1 down-sampling feature map of 4W 4H 32;

s23, inputting the P1 downsampled feature map of 4W 4H 64 obtained in the step S22 into a C3 convolution layer, performing convolution operation on the P1 downsampled feature map through filters of 64 3*3 by using the step length of 1 pixel and using a relu activation function, and outputting a C3 convolution feature map of 4W 4H 64; c4 is completely consistent with the convolution parameter of C3, and after the feature map of 4W 4H 64 passes through the C4 convolution layer, the feature map of 4W 4H 64C 4 convolution layer is output;

s24, inputting the C4 convolution layer feature map of 4W, 4H, 64 obtained in the step S23 into a P2 pooling layer, dividing the C4 convolution feature map into 2*2 sub-regions, taking the maximum value of each region, and outputting a P2 downsampling feature map of 2W, 2H and 64;

s25, inputting the P2 downsampling feature map of 2W x 2H x 64 obtained in the step S24 into a C5 convolution layer, performing convolution operation on the P2 downsampling feature map through a filter of 128 3*3 by using the step length of 1 pixel and using a relu activation function, outputting a C3 convolution feature map of 2W x 2H 128, wherein C6 and C7 are completely consistent with C5 convolution parameters, and after the feature map of 2W x 2H 128 is subjected to C6 and C7 convolution layers, outputting a C7 convolution layer feature map of 2W x 2H 128;

s26, inputting the C7 convolution layer feature map of 2W x 2H x 128 obtained in the step S25 into a P3 pooling layer, dividing the C7 convolution feature map into 2*2 sub-regions, taking the maximum value of each region, and outputting a P3 down-sampling feature map of W x H x 128;

s27, inputting the P3 downsampled feature map of W x H x 128 obtained in the step S26 into a C8 convolution layer, performing convolution operation on the P3 downsampled feature map through a filter of 256 3*3 by using the step length of 1 pixel and using a relu activation function, outputting a C8 convolution feature map of W x H256, wherein C9 and C10 are completely consistent with C8 convolution parameters, and after the feature map of W x H256 is subjected to C9 and C10 convolution, outputting a C10 convolution feature map of W x H512;

s28, inputting the C10 convolution characteristic diagram of W x H256 obtained in the step S27 into the DC1 hole convolution layer, setting the expansion rate of the filter to be 2 through 256 expansion filters 3*3, performing convolution operation on the C10 convolution characteristic diagram by using a relu activation function, outputting the DC1 hole convolution characteristic diagram of W x H256, wherein DC2 and DC3 are completely consistent with DC1 convolution parameters, and outputting the DC3 hole convolution characteristic diagram of W x H256 after the characteristic diagram of W x H256 is subjected to DC2 and DC3 hole convolution;

s29, inputting the DC3 hole convolution layer feature map of W x H x 256 obtained in the step S28 into the DC4 hole convolution layer, enabling the feature map to pass through 128 expansion filters of 3*3, setting the expansion rate of the filters to be 2, performing convolution operation on the DC3 hole convolution feature map by using a relu activation function, and outputting the DC4 hole convolution feature map of W x H128;

s210, inputting the characteristic diagram of the DC4 hole convolution layer of W H128 obtained in the step S29 into the DC5 hole convolution layer, setting the expansion rate of the filter to be 2 through 64 expansion filters 3*3, performing convolution operation on the characteristic diagram of the DC4 hole convolution layer by using a relu activation function, and outputting the characteristic diagram of the DC5 hole convolution layer of W H64;

s211, inputting the DC5 hole convolution layer feature graph of W x H x 64 obtained in the step S210 into the DC6 hole convolution layer, enabling the feature graph to pass through 32 expansion filters of 3*3, setting the expansion rate of the filters to be 2, performing convolution operation on the DC5 hole convolution feature graph by using a relu activation function, and outputting the DC6 hole convolution feature graph of W x H64;

s212, inputting the DC6 cavity convolution layer characteristic diagram of W H32 obtained in the step S211 into a C11 convolution layer, performing convolution operation on the DC6 cavity convolution characteristic diagram by using a step size of 1 pixel and a relu activation function through 1 filter 1*1, and outputting a C11 convolution characteristic diagram of W H1;

s213, performing bilinear interpolation operation on the C11 convolution characteristic diagram of the W.H.1 obtained in the step S212 to expand the characteristic diagram to 8 times of the original characteristic diagram, namely obtaining the characteristic diagram of 8 W.H.1, constructing a loss function, and optimizing the loss function by using a random gradient descent method to adjust the parameters of the PDInT model to an optimal state;

and S3, testing the shot skin image by using the trained model, integrating the pore density map generated by the model to obtain the number of pores on the image, and calculating the actual area of the skin image according to the number of pores.

2. The method for estimating the actual area of the skin image based on the deep learning of claim 1, wherein the step of generating the skin pore density map in S1 is as follows, and for each captured skin image, the original image is converted into the pore density map by a formula as follows:

where x denotes a pore at a certain pixel position in the picture, x _i For the ith pore at a certain pixel position in the picture, N represents the number of all pores in the picture, i ∈ N, δ (x) is a dirac δ function, and G (x) is a gaussian kernel function.

3. The method for estimating the actual area of the skin image based on the deep learning of claim 1, wherein the feature map of the C11 convolution layer is 1/8 of the input image, the feature map of the output layer is enlarged eight times by using a bilinear interpolation method to obtain a feature map of 8w × 8h × 1, and parameters of the PDINet model are adjusted to an optimal state by constructing a loss function and optimizing the loss function by using a random gradient descent method.

4. The method for estimating the actual area of the skin image based on the deep learning of claim 1, wherein the loss function is expressed as follows:

where N represents the number of all pores in the picture, i ∈ N, F (X) _i θ) represents the pore density map predicted by the model, θ being a parameter of the model, F _i ^GT A reference representing a pore density map.

5. The method for estimating the actual area of the skin image based on the deep learning of claim 3, wherein the step S3 comprises the following steps:

/>

where N represents the total number of pores, L and K represent the length and width, respectively, of the pore density map, Z _l，k A value representing the position in coordinates (l, k) in the pore density map;