CN116823852A - Strip-shaped skin scar image segmentation method and system based on convolutional neural network - Google Patents

Strip-shaped skin scar image segmentation method and system based on convolutional neural network Download PDF

Info

Publication number
CN116823852A
CN116823852A CN202310682091.0A CN202310682091A CN116823852A CN 116823852 A CN116823852 A CN 116823852A CN 202310682091 A CN202310682091 A CN 202310682091A CN 116823852 A CN116823852 A CN 116823852A
Authority
CN
China
Prior art keywords
module
shaped
strip
decoding
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310682091.0A
Other languages
Chinese (zh)
Other versions
CN116823852B (en
Inventor
石霏
周健
夏文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202310682091.0A priority Critical patent/CN116823852B/en
Publication of CN116823852A publication Critical patent/CN116823852A/en
Application granted granted Critical
Publication of CN116823852B publication Critical patent/CN116823852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a strip-shaped skin scar image segmentation method and system based on a convolutional neural network, wherein the method comprises the following steps: acquiring an image with strip-shaped skin scars; inputting the image into a U-shaped encoding and decoding network, and realizing image segmentation of the strip-shaped skin scar through the U-shaped encoding and decoding network; the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths. The SADC module designed by the application makes the network pay more attention to the strip-shaped characteristic of scars, and the GGMC module designed makes the network have stronger perceptibility for targets with different scales and different lengths.

Description

Strip-shaped skin scar image segmentation method and system based on convolutional neural network
Technical Field
The application relates to the technical field of image segmentation, in particular to a strip-shaped skin scar image segmentation method and system based on a convolutional neural network.
Background
Dermal scar means that the dermal tissue is completely damaged, forming fibrous connective tissue during the healing process. Scar analysis may be used for injury identification, crime evidence collection, and skin repair assessment. Along with the rapid development of computer technology, scar analysis methods based on digital images are gradually developed, and are mainly used for measuring and quantitatively analyzing clinical scars. The segmentation of scars in skin scar images is currently a difficult problem, and most of the current segmentation is done by using manual segmentation or semi-automatic segmentation tools. Therefore, the automatic segmentation of the scars in the skin scar image can be realized, the scar analysis efficiency can be greatly improved, and the subjectivity of manual measurement is avoided.
Image segmentation is to assign a class label to each pixel in the image. The traditional image segmentation method is to segment the image by basic features such as color, texture and the like. In recent years, image segmentation techniques based on deep learning have been rapidly developed, in which a convolutional neural network-based full convolutional network (Fully Convolutional Network, FCN) is first applied to an image segmentation task. After that, various convolutional neural network-based methods have grown endlessly.
In recent years, image segmentation algorithms based on deep learning are endless. The full convolution network FCN changes the final full connection layer of the traditional CNN for classification into a convolution layer, and simultaneously adopts an up-sampling mode to recover the characteristic size, so that the network can adapt to input images with any size, and the end-to-end segmentation task is realized. UNet is a variant of FCN, and the purpose of UNet is to solve the problem of medical image segmentation, where the encoder part uses cascaded convolution blocks, and the decoder uses bilinear interpolation and jump connection to implement feature recovery, and the jump connection fuses the position information of the bottom layer with the deep semantic information through stitching. The PSPNet provides a pyramid pooling module (Pyramid Pooling Module, PPM) which realizes the aggregation of multi-scale features and fuses information between different scales and different subregions. Deep lab series networks propose to use dilation convolution to increase the receptive field of the network, which builds a spatial pyramid structure (Atrous Spatial Pyramid Pooling, ASPP) for extracting multi-scale targets. CENet proposes a context extractor to generate more advanced semantic feature graphs that contains a dense hole convolution module (Dense Atrous Convolution, DAC) block and a Residual Multi-core pooling module (RMP). CPFNet adds two pyramid modules to fuse global and multi-Scale context information, a multi-Scale pyramid fusion (SAPF-Aware Pyramid Fusion) module and a global pyramid guidance (Global Pyramid Guidance, GPG) module, respectively.
The disadvantages of the prior art are as follows:
in the image segmentation task, the coding and decoding convolutional neural network represented by UNet finishes feature extraction through convolutional calculation, improves the receptive field of the network through pooling downsampling, and complements detail loss during feature recovery through jump connection operation. Although the existing codec structure network represented by UNet achieves a good segmentation effect in many tasks, the following disadvantages still exist:
(1) UNet adopts common convolution to extract features, and for the features with the same resolution, only local perceptibility is needed, and for global context extraction, the capability is insufficient. Second, UNet is also very limited in its ability to fuse features of each hierarchy, implemented only by the channel overlay concat method of the corresponding hierarchy features.
(2) In the decoder part, the prior art is typically just a simple convolution and upsampling concatenation, with upsampling typically employing bilinear interpolation or deconvolution operations. This decoding process is inadequate for spatial information recovery of features because the semantic information of each level of features is not fully utilized.
(3) In terms of data, deep learning models are driven by large amounts of supervised data, resulting in performance of the model being dependent on the amount and quality of the supervised data. When the data volume is insufficient, it is difficult to train a model with strong generalization ability. Data augmentation techniques are typically used to extend the data volume, such as some random augmentation, including rotation, flipping, contrast enhancement, etc., but have limited improvement in model performance.
Disclosure of Invention
Therefore, the application aims to solve the technical problems of insufficient sensing capability of the neural network to the strip-shaped targets, insufficient extraction capability to the multi-scale features and the like in the prior art.
In order to solve the technical problems, the application provides a strip-shaped skin scar image segmentation method based on a convolutional neural network, which comprises the following steps:
acquiring an image with strip-shaped skin scars;
inputting the image into a U-shaped encoding and decoding network, and realizing image segmentation of the strip-shaped skin scar through the U-shaped encoding and decoding network;
the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths.
In one embodiment of the present application, the U-shaped codec network includes an encoder for extracting features, a decoder for feature recovery of the features extracted by the encoder, and a header structure for image segmentation of the features recovered by the decoder;
the header structure comprises up-sampling, 3 x 3 convolution and 1 x 1 convolution connected in sequence;
the encoder adopts a ResNet34, the ResNet34 comprises a convolution layer, a first residual error coding module, a second residual error coding module, a third residual error coding module and a fourth residual error coding module which are sequentially connected, the first residual error coding module, the second residual error coding module, the third residual error coding module and the fourth residual error coding module respectively comprise 3X 3 convolution, reLu activation and 3X 3 convolution which are sequentially connected, and input characteristics and self second 3X 3 convolution output are added to obtain an output result;
the decoder comprises a third decoding module, a second decoding module, a first decoding module and a zeroth decoding module which are sequentially connected;
the characteristics output by the first residual error coding module are input to a first decoding module after passing through a CBR module;
the characteristics output by the second residual error coding module are input to a second decoding module after passing through a CBR module;
the characteristics output by the third residual error coding module are input to a third decoding module after passing through a CBR module;
the characteristics output by the fourth residual error coding module are input to a third decoding module after passing through a CBR module;
in the first decoding module, the second decoding module and the third decoding module, the features of the upper layer pass through the SADC module, and are added and fused with the features of the layer corresponding to the residual coding module after residual operation and up-sampling, so that feature reconstruction is realized;
in the zeroth decoding module, the last-level features pass through the SADC module, and output results are obtained after residual error operation and up-sampling;
the first decoding module and the second decoding module are added with the output of the zeroth decoding module after passing through the GGMC module, and then the added result is input into the head structure.
In one embodiment of the application, the CBR module comprises a 3 x 3 convolution, a batch normalization layer, and a ReLu activation function layer connected in sequence.
In one embodiment of the present application, the SADC module includes:
for the input feature diagram X ε R C×H×W Wherein C represents the number of channels, and H and W represent the height and width of the feature map, respectively; firstly, carrying out convolution, batch normalization and activation layer processing on a feature map X, and carrying out horizontal pooling and vertical pooling on the processed feature map X to respectively obtain two feature vectors X v ∈R C×H×1 and Xh ∈R C×1×W And directing the features toQuantity X v and Xh Adding after 3X 3 convolution, batch normalization and up-sampling operation, and rolling and batch normalization of the added result to obtain X m ∈R C×H×W The method comprises the steps of carrying out a first treatment on the surface of the X is to be m Through a Sigmoid function and multiplication with a feature map X, a global descriptor Z is obtained;
inputting the global descriptor Z into a multi-layer perceptron formed by two fully connected layers, wherein the multi-layer perceptron outputs a kernel allocation matrix K= [ K ] 1 ,k 2 ,...,k N ]Wherein N represents the number of independent 3×3 convolution kernels, k i Weights representing the ith convolution kernel, parameter constraints are performed by Softmax such that Σk i =1; let N independent convolution kernel parameters be θ= [ θ ] 12 ,...,θ N ]And (3) fusing the kernel distribution matrix K and the convolution kernel parameter theta, then acting on the feature map X, and finally outputting O.
In one embodiment of the present application, the fusion of the kernel distribution matrix K and the convolution kernel parameter θ is applied to the feature map X, and the final output O is given by the formula:
O=DConv(X)=Conv(X,∑k i θ i )
where DConv denotes a dynamic convolution operation and Conv denotes a convolution operation.
In one embodiment of the present application, the GGMC module includes:
for the input feature diagram X ε R C×H×W Wherein C represents the number of channels, H and W represent the height and width of the feature map respectively, feature scaling is performed in the spatial dimension by adopting average pooling, and the feature map X is converted into a feature X with the height and width H and W respectively s ∈R C×h×w Then feature X s Leveling according to the space dimension to obtainWill->Obtaining global descriptors G E R through a multi-layer perceptron C×1×1 The method comprises the steps of carrying out a first treatment on the surface of the The global descriptors G and X s Channel-by-channel fusion to obtain a feature map X with global perceptibility G ∈R C×h×w
The characteristic diagram X G The multi-scale local perception features are captured through depth convolution kernels with the sizes of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7 respectively; for the characteristic diagram X G Performing 1×1 point-by-point convolution to capture channel perceptual features;
for the characteristic diagram X G After the 3×3, 5×5 and 7×7 deep convolution kernels and the 1×1 point-by-point convolution operation, the feature alignment is realized by adopting batch normalization; and finally, adding the multi-scale local perception features and the channel perception features to obtain final output.
In one embodiment of the present application, the method further comprises training the U-shaped codec network by constructing an image dataset, the method of constructing an image dataset comprising:
acquiring multi-view images of the same strip skin scar, performing three-dimensional reconstruction on the multi-view images by using an SFM algorithm to obtain a three-dimensional model of the strip skin scar and camera parameters of each view, and obtaining a rotation transformation matrix of each view image relative to the three-dimensional model according to the camera parameters;
interpolation is carried out between adjacent rotation transformation matrixes, and the three-dimensional model is subjected to three-dimensional projection by the difference result, so that a plurality of two-dimensional pseudo views are generated.
In one embodiment of the present application, the rotation transformation matrix of each view image relative to the three-dimensional model is obtained according to the camera parameters, where the formula is:
wherein ,representing a rotation transformation of the ith view, [ u ] ix u iy u iz ]Representing a three-dimensional vector of unit length, u i Representing the ith view toRotation axis, θ, defined by unit vector i The angle of rotation of the ith view is represented, and N represents the total view number.
In one embodiment of the present application, the interpolation is performed between adjacent rotation transformation matrices, where the formula is:
wherein t epsilon (0, 1) represents an interpolation ratio parameter,a rotation transformation matrix representing two adjacent views,representing the interpolation result.
In order to solve the technical problems, the application provides a strip-shaped skin scar image segmentation system based on a convolutional neural network, which comprises the following steps:
the acquisition module is used for: for obtaining an image with striped skin scars;
an image segmentation module: the method comprises the steps of inputting the images into a U-shaped encoding and decoding network, and realizing image segmentation of strip-shaped skin scars through the U-shaped encoding and decoding network;
the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths.
Compared with the prior art, the technical scheme of the application has the following advantages:
the application creatively designs a dynamic convolution module (SADC module) based on strip attention, which takes strip pooling as an access point to enhance the perceptibility of a network to a strip target, and simultaneously endows attention to convolution, so that the characteristic acquisition is more flexible;
the application creatively designs a globally guided multi-core convolution module (GGMC module), which solves the problem of scale alignment existing in feature fusion from three angles of global, local and channel;
according to the application, a teacher model based on multi-view geometric projection data augmentation is adopted to enhance a training strategy, so that the identification capability of the network to scars at different positions and angles is improved under the condition of few data samples, and the generalization capability of the network is enhanced.
Drawings
In order that the application may be more readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.
FIG. 1 is an overall flow chart of an embodiment of the present application;
FIG. 2 is a schematic diagram of a U-type codec network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a SADC module according to an embodiment of the present application;
FIG. 4 is a schematic view of a GGMC module according to an embodiment of the application;
FIG. 5 is a flow chart of a teacher-student model enhanced training strategy based on multi-view geometric projection data augmentation in an embodiment of the present application;
fig. 6 is a graph comparing the segmentation results of striped skin scar images of the network of the present application with other segmentation networks.
Detailed Description
The present application will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the application and practice it.
Example 1
The application relates to a strip-shaped skin scar image segmentation method based on a convolutional neural network, which comprises the following steps:
acquiring an image with strip-shaped skin scars;
inputting the image into a U-shaped encoding and decoding network, and realizing image segmentation of the strip-shaped skin scar through the U-shaped encoding and decoding network;
the U-shaped codec network in this embodiment includes a dynamic convolution based on stripe attention (Spatial Attention Based Dynamic Convolution, SADC) module for focusing the U-shaped codec network on stripe features of the scar and a globally directed multi-core convolution module (Global Guided Multi-kernel Convolution, GGMC) module for increasing the perceptibility of the U-shaped codec network to scars of different sizes and different lengths.
The SADC module in this embodiment overcomes the drawbacks of the existing method that the characteristic acquisition capability is insufficient due to the insensitivity to the stripe target and the limitation of the attention mechanism, generates the stripe attention map by performing stripe pooling fusion in the horizontal and vertical directions on the characteristic map, and then dynamically distributes weights of a plurality of convolution kernels based on the stripe attention map, so that the network is more focused on the stripe characteristic of the scar. The GGMC module in the embodiment overcomes the defect that global and multi-scale semantic information cannot be captured during feature fusion in the existing method, so that the network has stronger perceptibility for targets with different scales and different lengths.
The present embodiment is described in detail below:
the embodiment is mainly used for accurately dividing the skin scar in the image. As shown in fig. 1, first, supervised training is performed using labeled data to obtain a pre-trained model. And then carrying out three-dimensional reconstruction on each scar sample in the labeled image to obtain a three-dimensional model of each scar sample. And interpolating and projecting two-dimensional pseudo views with more angles by using the three-dimensional model as unlabeled data. And for the unlabeled data, loading a pre-training model by adopting a teacher-student model strategy to carry out enhancement training, and finally using the trained student model for model test. The same network structure is adopted for the teacher model and the student model in the pre-training model, the teacher model and the student model.
(1) Network structure
The network structure in this embodiment is shown in fig. 2, and the overall network is a U-shaped codec network, and includes an encoder, a decoder, and a header structure (i.e., a splitting header) for splitting tasks. Specifically, the encoder part adopts ResNet34 as a feature extractor, and comprises a convolution layer and four residual error coding modules (a first residual error coding module, a second residual error coding module, a third residual error coding module and a fourth residual error coding module) which are sequentially connected; the first residual error coding module, the second residual error coding module, the third residual error coding module and the fourth residual error coding module all comprise 3×3 convolution, reLu activation and 3×3 convolution which are sequentially connected, and the input features are added with the own second 3×3 convolution output (residual error operation) to obtain an output result. The decoder comprises four decoding modules (a third decoding module, a second decoding module, a first decoding module and a zeroth decoding module) which are sequentially connected. The header structure includes up-sampling, 3 x 3 convolution, and 1 x 1 convolution, connected in sequence. The layer characteristics of the encoder output are input to the decoder after passing through a CBR module consisting of a 3 x 3 convolution, batch normalization and ReLu activation function. In the third decoding module, the second decoding module and the first decoding module, the last level features are subjected to residual error operation and up-sampling and then added and fused with the corresponding level features of the encoder through the SADC module, so that feature reconstruction is realized. In the zeroth decoding module, the last level of features pass through the SADC module, and output results are obtained after residual operation and up-sampling. The first decoding module and the second decoding module are added with the output of the zeroth decoding module after passing through the GGMC module, and then the added result is input into the head structure. In short, the GGMC module is used for multi-level feature fusion of the decoder, namely, the outputs of the middle two layers of the decoder are added with the output of the last layer of the decoder after passing through the GMMC module.
In the U-shaped codec network of this embodiment, the first residual coding module corresponds to the first decoding module (128×128), the second residual coding module corresponds to the second decoding module (64×64), the third residual coding module corresponds to the third decoding module (32×32), and the correspondence here means that the length and the width of the output feature map of the corresponding module are identical.
(1.1) dynamic convolution module based on striped attention (SADC module): as shown in fig. 3, for each level of feature map, the SADC module mainly includes two steps: (a) Determining the space position of a strip-shaped target based on a strip-shaped attention mechanism, and generating a global descriptor; (b) The method comprises the steps of using a global descriptor, regressing a kernel allocation matrix through a multi-layer perceptron, and using the kernel allocation matrix to weight and fuse a plurality of independent convolution kernels to dynamically allocate weights to the independent convolution kernels according to different input characteristics, wherein the method comprises the following steps of:
given a feature map X E R C×H×W As an input, where C represents the number of channels, and H and W represent the height and width of the feature map, respectively. Firstly, a feature map X is subjected to convolution, batch normalization and activation layer processing (namely a CBR module in fig. 3), and is subjected to horizontal pooling and vertical pooling to respectively obtain two feature vectors X v ∈R C×H×1 and Xh ∈R C ×1×W Then X is taken up v and Xh By adding after the convolution, batch normalization and up-sampling operations of 3X 3 (i.e., CBU in fig. 3), and then rolling and batch normalization (i.e., CB in fig. 3) of the added result to obtain X m ∈R C×H×W . X is to be m And through a Sigmoid function and multiplying the Sigmoid function with the feature map X, obtaining a global descriptor Z.
The global descriptor Z is input into a multi-layer perceptron (namely MLP in figure 3) formed by two fully connected layers, and the multi-layer perceptron outputs a kernel allocation matrix K= [ K ] 1 ,k 2 ,...,k N ]. Wherein N represents the number of independent 3×3 convolution kernels, k i The weight representing the ith convolution kernel, the present embodiment uses Softmax for parameter constraints such thatDynamic convolution applies a focus mechanism on the convolution kernels by assigning weights to the different convolution kernels. Let N independent convolution kernel parameters be θ= [ θ ] 12 ,...,θ N ]The parameters of the kernel distribution matrix K and the convolution kernel parameter theta are fused and then act on the feature map X, and finally O is output:
where DConv denotes a dynamic convolution operation and Conv denotes a convolution operation.
(1.2) globally booted multicore convolution module (GGMC module): as shown in fig. 4, for input features, the GGMC module mainly includes two steps: (a) The input features are subjected to feature scaling and flattening to obtain global descriptors, and the global descriptors are used for optimizing the input features to obtain global features; (b) The global features are subjected to multi-branch convolution structure, the channel characterization and multi-scale local characterization are captured, and finally all the features are fused to obtain output, wherein the method comprises the following steps:
for the input feature diagram X ε R C×H×W Feature scaling is performed in the spatial dimension by adopting average pooling, and an input feature map X is converted into a feature X with height and width of h and w respectively s ∈R C×h×w Then feature X s Leveling according to the space dimension to obtainWill->Obtaining a global descriptor G E R through a multi-layer perceptron C×1×1 . Global descriptors G and X s Channel-by-channel fusion, i.e. X s Each pixel is added with a value to obtain a feature map X with global perceptibility G ∈R C×h×w
For local perception features, feature map X is fused using multi-branch convolution G The multi-scale local perceptual features are captured by 3×3, 5×5, 7×7 size depth convolution kernels, respectively. For channel aware features, for feature map X G A 1 x 1 point-by-point convolution is employed to capture the relationship of each pixel in the channel dimension. For each branch result, batch Normalization (BN) is used to perform feature alignment. And finally, adding the local perception feature and the channel perception feature to obtain a final output.
(1.3) loss function
The present embodiment uses a joint loss function L of binary Focal loss and Dice loss seg The formula is:
L seg =L focal +L dice (2)
the formula of the Focal loss function is:
wherein, gamma is an adjusting factor, and gamma is more than 0, and in the embodiment, the value of gamma is 0.5.
Dice loss function L dice The similarity between the network prediction result and the manual annotation is calculated, and the calculation method is as follows:
wherein ,predicted value for ith pixel value, y i For the real label of the ith pixel, epsilon is a smoothing factor, the numerator or denominator is prevented from being zero, and the value is 10 -6
(2) Data augmentation based on multi-view geometric projection
The data amplification scheme of this example is shown in FIG. 5. Firstly, three-dimensional reconstruction is carried out on multi-view images of the same sample (strip skin scar) through a multi-view solid geometry algorithm (SFM algorithm), so that a three-dimensional model and camera parameters of each view can be obtained. And then interpolating a rotation transformation matrix formed by camera parameters, and performing three-dimensional projection on the three-dimensional model by utilizing an interpolation result to generate a two-dimensional camera view image (pseudo view). And finally, constructing a label-free data set from all the generated views, and sending the label-free data set into a teacher model for training. The data amplification protocol is specifically as follows:
for the same strip-shaped skin scar sample, images are respectively shot from multiple visual angles, and multiple views are obtained. And adopting a motion restoration Structure (SFM) algorithm in multi-view solid geometry to carry out three-dimensional reconstruction on the input multi-view. The three-dimensional model of the strip-shaped skin scar sample and the camera parameters corresponding to each view can be obtained through three-dimensional reconstruction, wherein the camera parameters comprise internal parameters and external parameters of the camera. The rotation transformation matrix of each view relative to the three-dimensional model is obtained through camera parameters, and the rotation transformation matrix of the view is represented by using a quaternary value in the embodiment, wherein the formula is as follows:
wherein ,representing a rotation transformation of the ith view, [ u ] ix u iy u iz ]Representing a three-dimensional vector of unit length, u i Representing the rotation axis, θ, of the ith view in unit vector definition i The angle of rotation of the ith view is represented, and N represents the total view number.
And interpolating between adjacent rotation transformation matrixes, and carrying out three-dimensional projection on the three-dimensional model by interpolation results, so as to generate a two-dimensional view with more angles. Assuming a rotational transformation matrix with two adjacent viewsObtaining ∈10 by linear interpolation method>The formula is:
wherein t epsilon (0, 1) is an interpolation proportion parameter,representing the interpolation result, will->The three-dimensional model is acted upon to obtain a composite pseudo-view.
(3) Enhanced training based on teacher-student model
Taking 0.5 t in the formula (6), generating twice the view number of each scar sample, namely doubling the data quantity, and amplifying strong data such as data mixing, rotation, overturning, translation and the like to obtain training data of a teacher model and a student model in the second stage. By using the generated data, the embodiment adopts a teacher-student model training strategy to carry out enhanced training on the pre-training model. As shown in fig. 5, the teacher model and the student model adopt the same network structure, and model parameters obtained by performing supervision training on the original data set are preloaded. The input pseudo view obtains a pseudo tag through the teacher model, meanwhile, the pseudo view is input to the student model to obtain a segmentation result, then the segmentation result output by the pseudo tag and the student model is utilized to calculate loss, the teacher model is used for guiding the student model to learn, the generalization of the student model is better and better, and finally the student model is used as the U-shaped coding and decoding network of the embodiment.
The application adopts an index moving average (Exponential Moving Average, EMA) strategy to gradually transfer the parameters of the student model into the teacher model, and the formula is as follows:
θ′ t =α t θ′ t-1 +(1-α tt (7)
wherein ,θ′t Weights, θ ', representing the current epoch teacher model' t-1 Teacher weight, θ 'representing last epoch' t Representing the weight of the current student model, total_epoch represents the total number of training cycles. Alpha t The weight ratio of the teacher model and the student model is represented. With the increase of training process, the performance of the student model is also continuously improved, the weight ratio of the student model can be properly improved, and the embodiment adopts a simple linear change strategy expressed by a formula (8), namely alpha 0 The initial teacher weight is represented, and the value is 0.99.
The experimental analysis is as follows:
in order to verify the effectiveness of the method, the application is verified by collecting the true strip-shaped skin scar image.
1) Data set
The data used in this experiment are all clinical data from Shanghai's department of municipal identification sciences. 744 strip-shaped skin scar images are acquired by adopting a mobile phone photographing mode, 130 scar samples are taken, and each sample has 4-10 different visual angles. Experiments were performed using a five-fold cross-validation approach. The resolution is uniformly adjusted to 512×512 for the input image. The sensitivity (Sen) was used as a segmentation evaluation index using the Dice coefficient, the cross-over ratio (IoU). The definition is as follows:
TP, FP, FN respectively represent true positive, false positive and false negative, scar area is used as positive sample, and background area is used as negative sample. TP represents that the true value is positive and the predicted value is positive. FP represents a true value as negative, while the predicted value is positive. FN represents true values as positive, but predicted values as negative.
To comprehensively evaluate the model performance, the parameters (Param) and the calculated amounts (FLPs) of different partitioned networks are also compared in experiments. The parameter quantity represents the space occupied by the parameter storage in the network and is used for measuring the space complexity of the model. The calculated amount represents the number of floating point operations needed by the network and is used for measuring the time complexity of the model.
2) Results
The present application employs a base network (Baseline) using a res net34 pre-training model as the encoder, each decoder consisting of up-sampling and convolution blocks, the encoder features being reduced in dimension and fused into the decoder by a skip connection. Corresponding ablation experiments are performed for the SADC module and the GGMC module. In a comparative experiment, the effectiveness of the proposed method was verified by comparing the method with other excellent convolutional neural network CNN-based segmentation networks, including UNet, CSNet, segNet, FCN, CENet, deepLabv3_ plus, CPFNet, PSPNet. All experimental results are supplemented with the training experimental results by using a teacher-student model, and the effectiveness of the data amplification scheme and the training strategy provided by the application is verified.
Table 1 lists the results of ablation experiments, and compared with a reference network, the performance of the method provided by the application is improved, and the cross-over ratio, the Dice coefficient and the sensitivity are respectively improved from 79.14%, 87.94% and 88.81% to 80.85%, 89.16% and 92.11%. The GGMC module is added, so that the network is improved in loU and Dice indexes. The SADC module is added, so that the network is improved in the indexes of loU, dice and Sen. From the aspects of calculation amount and parameter amount analysis, the GGMC module and the SADC module provided by the application do not introduce excessive parameter amount and calculation amount.
TABLE 1 comparison of ablation experimental results for strip skin scar image segmentation
Method Teacher and student mIoU mDice mSen param(M) flops(G)
baseline × 0.7914±0.0242 0.8794±0.0170 0.8881±0.0114 21.739 31.87
baseline+GGMC × 0.7971±0.0270 0.8833±0.0182 0.8830±0.0213 21.807 31.88
baseline+SADC × 0.7977±0.0238 0.8834±0.0165 0.8912±0.0128 21.896 32.27
The application is that × 0.8007±0.0275 0.8861±0.0181 0.9079±0.0149 21.964 32.27
baseline 0.7995±0.0232 0.8848±0.0159 0.8967±0.0141 21.739 31.87
baseline+GGMC 0.8013±0.0277 0.8859±0.0187 0.8951±0.0133 21.807 31.88
baseline+SADC 0.8013±0.0249 0.8863±0.0169 0.8977±0.0139 21.896 32.27
The application is that 0.8085±0.0253 0.8916±0.0164 0.9111±0.0142 21.964 32.27
Table 2 lists the results of the comparison experiment, since the present application uses the ResNet34 pre-training model as the encoder, during comparison, other network encoder parts except UNet and CSNet are replaced by the ResNet34 pre-training model, thereby ensuring fairness. For UNet, CSNet, its encoder cannot be replaced with the res net34 pre-training model because the network itself is relatively special. As can be seen from table 2, the index is relatively low for UNet and CSNet without pre-trained models. Because the skin scar image is photographed by the smart phone, the skin scar image belongs to natural images. The ResNet34 pre-training model performs pre-training on the natural image data set lmageNet, so that for a natural image, the pre-training model can provide a better initialization parameter for a network encoder part, and network training is easier to converge to an optimal value. Regarding the training strategy by using the teacher and student model and the data amplification scheme based on the multi-view geometric projection, the table shows that after the enhancement training is performed by using the teacher and student model, all methods have a certain improvement on indexes. In addition, the application has great advantages in calculation amount and parameter amount compared with other segmentation networks.
TABLE 2 comparison of results of comparative experiments for strip-shaped skin scar image segmentation
Method Backbone network Teacher and student mIoU mDice mSen param(M) flops(G)
UNet / × 0.7365±0.0284 0.8368±0.0228 0.8354±0.0322 8.637 65.7
CSNet / × 0.7498±0.0224 0.8481±0.0178 0.8549±0.0333 8.401 55.98
SegNet Res34 × 0.7648±0.0187 0.8625±0.0142 0.8798±0.0174 38.438 37.68
FCN Res34 × 0.7800±0.0235 0.8710±0.0182 0.8761±0.0206 25.21 40.97
CENet Res34 × 0.7913±0.0269 0.8791±0.0181 0.8915±0.0195 29.003 35.57
DeepLabv3_plus Res34 × 0.7926±0.0274 0.8806±0.0214 0.8905±0.0227 26.711 54.59
CPFNet Res34 × 0.7880±0.0266 0.8772±0.0184 0.8840±0.0162 30.651 32.13
PSPNet Res34 × 0.7736±0.0265 0.8680±0.0184 0.8822±0.0158 27.5 23.49
The application is that Res34 × 0.8007±0.0275 0.8861±0.0181 0.9079±0.0149 21.96 32.27
UNet / 0.7453±0.0251 0.8448±0.0191 0.8468±0.0305 8.637 65.7
CSNet / 0.7577±0.0196 0.8540±0.0154 0.8628±0.0259 8.401 55.98
SegNet Res34 0.7666±0.0176 0.8641±0.0132 0.8827±0.0186 38.438 37.68
FCN Res34 0.7853±0.0235 0.8754±0.0173 0.8791±0.0170 25.21 40.97
CENet Res34 0.7960±0.0286 0.8821±0.0192 0.8966±0.0210 29.003 35.57
DeepLabv3_plus Res34 0.7950±0.0273 0.8823±0.0187 0.8944±0.0238 26.711 54.59
CPFNet Res34 0.7917±0.0248 0.8799±0.0171 0.8883±0.0150 30.651 32.13
PSPNet Res34 0.7772±0.0236 0.8712±0.0163 0.8912±0.0162 27.5 23.49
The application is that Res34 0.8085±0.0253 0.8916±0.0164 0.9111±0.0142 21.96 32.27
Fig. 6 shows the segmentation results of the stripe scar images of different methods, namely the original image, the segmentation result of SegNet, the segmentation result of PSPNet, the segmentation result of FCN, the segmentation result of cene, the segmentation result of deepflev3+, the segmentation result of CPFNet and the segmentation result of the network of the application, from left to right, and as can be seen from fig. 6, the segmentation accuracy of the network of the application on the stripe scar images is obviously improved.
To this end, a novel deep learning network suitable for striped skin scar images has been implemented and validated. The teacher-student model enhancement training strategy based on multi-view geometric projection data amplification solves the problem that network under-fitting is caused by few data volume data samples. The dynamic convolution module based on the strip attention solves the problems of insensitivity to strip targets and insufficient feature acquisition capability caused by the limitation of an attention mechanism in the existing method. The multi-core convolution module for global guidance solves the problem that the coding and decoding network structure cannot capture global and multi-scale semantic information during feature fusion. The application is verified by comprehensive experiments on the collected 744 Zhang Tiaozhuang skin scar image data, and the experimental results show that the method has better segmentation performance on the strip scar image data.
Example two
The application relates to a strip-shaped skin scar image segmentation system based on a convolutional neural network, which comprises the following steps:
the acquisition module is used for: for obtaining an image with striped skin scars;
an image segmentation module: the method comprises the steps of inputting the images into a U-shaped encoding and decoding network, and realizing image segmentation of strip-shaped skin scars through the U-shaped encoding and decoding network;
the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present application will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present application.

Claims (10)

1. A strip skin scar image segmentation method based on a convolutional neural network is characterized by comprising the following steps of: comprising the following steps:
acquiring an image with strip-shaped skin scars;
inputting the image into a U-shaped encoding and decoding network, and realizing image segmentation of the strip-shaped skin scar through the U-shaped encoding and decoding network;
the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths.
2. The strip-shaped skin scar image segmentation method based on the convolutional neural network as set forth in claim 1, wherein: the U-shaped coding and decoding network comprises an encoder, a decoder and a header structure, wherein the encoder is used for extracting features, the decoder is used for carrying out feature recovery on the features extracted by the encoder, and the header structure is used for carrying out image segmentation on the features recovered by the decoder;
the header structure comprises up-sampling, 3 x 3 convolution and 1 x 1 convolution connected in sequence;
the encoder adopts a ResNet34, the ResNet34 comprises a convolution layer, a first residual error coding module, a second residual error coding module, a third residual error coding module and a fourth residual error coding module which are sequentially connected, the first residual error coding module, the second residual error coding module, the third residual error coding module and the fourth residual error coding module respectively comprise 3X 3 convolution, reLu activation and 3X 3 convolution which are sequentially connected, and input characteristics and self second 3X 3 convolution output are added to obtain an output result;
the decoder comprises a third decoding module, a second decoding module, a first decoding module and a zeroth decoding module which are sequentially connected;
the characteristics output by the first residual error coding module are input to a first decoding module after passing through a CBR module;
the characteristics output by the second residual error coding module are input to a second decoding module after passing through a CBR module;
the characteristics output by the third residual error coding module are input to a third decoding module after passing through a CBR module;
the characteristics output by the fourth residual error coding module are input to a third decoding module after passing through a CBR module;
in the first decoding module, the second decoding module and the third decoding module, the features of the upper layer pass through the SADC module, and are added and fused with the features of the layer corresponding to the residual coding module after residual operation and up-sampling, so that feature reconstruction is realized;
in the zeroth decoding module, the last-level features pass through the SADC module, and output results are obtained after residual error operation and up-sampling;
the first decoding module and the second decoding module are added with the output of the zeroth decoding module after passing through the GGMC module, and then the added result is input into the head structure.
3. The strip-shaped skin scar image segmentation method based on the convolutional neural network as set forth in claim 2, wherein: the CBR module comprises a 3X 3 convolution layer, a batch normalization layer and a ReLu activation function layer which are connected in sequence.
4. The strip-shaped skin scar image segmentation method based on the convolutional neural network as set forth in claim 1, wherein: the SADC module includes:
for the input feature diagram X ε R C×H×W Wherein C represents the number of channels, and H and W represent the height and width of the feature map, respectively; firstly, carrying out convolution, batch normalization and activation layer processing on a feature map X, and carrying out horizontal pooling and vertical pooling on the processed feature map X to respectively obtain two feature vectors X v ∈R C×H×1 and Xh ∈R C×1×W And then the characteristic vector X v and Xh By 3 x 3 rollsAdding after integrating, batch normalization and up-sampling operation, and rolling and batch normalization of the added result to obtain X m ∈R C×H×W The method comprises the steps of carrying out a first treatment on the surface of the X is to be m Through a Sigmoid function and multiplication with a feature map X, a global descriptor Z is obtained;
inputting the global descriptor Z into a multi-layer perceptron formed by two fully connected layers, wherein the multi-layer perceptron outputs a kernel allocation matrix K= [ K ] 1 ,k 2 ,...,k N ]Wherein N represents the number of independent 3×3 convolution kernels, k i Weights representing the ith convolution kernel, parametric constraints are made by Softmax such thatLet N independent convolution kernel parameters be θ= [ θ ] 12 ,...,θ N ]And (3) fusing the kernel distribution matrix K and the convolution kernel parameter theta, then acting on the feature map X, and finally outputting O.
5. The strip-shaped skin scar image segmentation method based on a convolutional neural network as set forth in claim 4, wherein: the kernel distribution matrix K and the convolution kernel parameter theta are fused and then act on the feature map X, and finally output O, wherein the formula is as follows:
O=DConv(X)=Conv(X,∑k i θ i )
where DConv denotes a dynamic convolution operation and Conv denotes a convolution operation.
6. The strip-shaped skin scar image segmentation method based on the convolutional neural network as set forth in claim 1, wherein: the GGMC module comprises:
for the input feature diagram X ε R C×H×W Wherein C represents the number of channels, H and W represent the height and width of the feature map respectively, feature scaling is performed in the spatial dimension by adopting average pooling, and the feature map X is converted into a feature X with the height and width H and W respectively s ∈R C×h×w Then feature X s Leveling according to the space dimension to obtainWill->Obtaining global descriptors G E R through a multi-layer perceptron C×1×1 The method comprises the steps of carrying out a first treatment on the surface of the The global descriptors G and X s Channel-by-channel fusion to obtain a feature map X with global perceptibility G ∈R C×h×w
The characteristic diagram X G The multi-scale local perception features are captured through depth convolution kernels with the sizes of 3 multiplied by 3, 5 multiplied by 5 and 7 multiplied by 7 respectively; for the characteristic diagram X G Performing 1×1 point-by-point convolution to capture channel perceptual features;
for the characteristic diagram X G After the 3×3, 5×5 and 7×7 deep convolution kernels and the 1×1 point-by-point convolution operation, the feature alignment is realized by adopting batch normalization; and finally, adding the multi-scale local perception features and the channel perception features to obtain final output.
7. The strip-shaped skin scar image segmentation method based on the convolutional neural network as set forth in claim 1, wherein: the method further comprises training the U-shaped coding and decoding network by constructing an image data set, and the method for constructing the image data set comprises the following steps:
acquiring multi-view images of the same strip skin scar, performing three-dimensional reconstruction on the multi-view images by using an SFM algorithm to obtain a three-dimensional model of the strip skin scar and camera parameters of each view, and obtaining a rotation transformation matrix of each view image relative to the three-dimensional model according to the camera parameters;
and interpolating between adjacent rotation transformation matrixes, and carrying out three-dimensional projection on the three-dimensional model by interpolation results to generate a plurality of two-dimensional pseudo views.
8. The strip-shaped skin scar image segmentation method based on a convolutional neural network as set forth in claim 7, wherein: the rotation transformation matrix of each view angle image relative to the three-dimensional model is obtained according to camera parameters, and the formula is as follows:
wherein ,representing a rotation transformation of the ith view, [ u ] ix u iy u iz ]Representing a three-dimensional vector of unit length, u i Representing the rotation axis, θ, of the ith view in unit vector definition i The angle of rotation of the ith view is represented, and N represents the total view number.
9. The strip-shaped skin scar image segmentation method based on a convolutional neural network as set forth in claim 7, wherein: the interpolation is carried out between adjacent rotation transformation matrixes, and the formula is as follows:
wherein t epsilon (0, 1) represents an interpolation ratio parameter,a rotation transformation matrix representing two adjacent views, < >>Representing the interpolation result.
10. A strip skin scar image segmentation system based on a convolutional neural network is characterized in that: comprising the following steps:
the acquisition module is used for: for obtaining an image with striped skin scars;
an image segmentation module: the method comprises the steps of inputting the images into a U-shaped encoding and decoding network, and realizing image segmentation of strip-shaped skin scars through the U-shaped encoding and decoding network;
the U-shaped encoding and decoding network comprises an SADC module and a GGMC module, wherein the SADC module is used for enabling the U-shaped encoding and decoding network to focus on strip-shaped characteristics of skin scars, and the GGMC module is used for increasing the perception capability of the U-shaped encoding and decoding network on the skin scars with different scales and different lengths.
CN202310682091.0A 2023-06-09 2023-06-09 Strip-shaped skin scar image segmentation method and system based on convolutional neural network Active CN116823852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310682091.0A CN116823852B (en) 2023-06-09 2023-06-09 Strip-shaped skin scar image segmentation method and system based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310682091.0A CN116823852B (en) 2023-06-09 2023-06-09 Strip-shaped skin scar image segmentation method and system based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN116823852A true CN116823852A (en) 2023-09-29
CN116823852B CN116823852B (en) 2024-07-19

Family

ID=88125070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310682091.0A Active CN116823852B (en) 2023-06-09 2023-06-09 Strip-shaped skin scar image segmentation method and system based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN116823852B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649633A (en) * 2024-01-30 2024-03-05 武汉纺织大学 Pavement pothole detection method for highway inspection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767406A (en) * 2021-02-02 2021-05-07 苏州大学 Deep convolution neural network suitable for corneal ulcer segmentation of fluorescence staining slit lamp image
CN114004811A (en) * 2021-11-01 2022-02-01 西安交通大学医学院第二附属医院 Image segmentation method and system based on multi-scale residual error coding and decoding network
US20220208355A1 (en) * 2020-12-30 2022-06-30 London Health Sciences Centre Research Inc. Contrast-agent-free medical diagnostic imaging
CN114820636A (en) * 2022-05-20 2022-07-29 南京邮电大学 Three-dimensional medical image segmentation model and training method and application thereof
CN115457021A (en) * 2022-09-30 2022-12-09 云南大学 Skin disease image segmentation method and system based on joint attention convolution neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220208355A1 (en) * 2020-12-30 2022-06-30 London Health Sciences Centre Research Inc. Contrast-agent-free medical diagnostic imaging
CN112767406A (en) * 2021-02-02 2021-05-07 苏州大学 Deep convolution neural network suitable for corneal ulcer segmentation of fluorescence staining slit lamp image
CN114004811A (en) * 2021-11-01 2022-02-01 西安交通大学医学院第二附属医院 Image segmentation method and system based on multi-scale residual error coding and decoding network
CN114820636A (en) * 2022-05-20 2022-07-29 南京邮电大学 Three-dimensional medical image segmentation model and training method and application thereof
CN115457021A (en) * 2022-09-30 2022-12-09 云南大学 Skin disease image segmentation method and system based on joint attention convolution neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUTONG CAI: ""MA-Unet: An improved version of Unet based on multi-scale and attention mechanism for medical image segmentation"", 《ARXIV》, 20 December 2020 (2020-12-20), pages 1 - 13 *
韩慧慧: "" 编码―解码结构的语义分割"", 《 中国图象图形学报》, 16 February 2020 (2020-02-16), pages 45 - 56 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649633A (en) * 2024-01-30 2024-03-05 武汉纺织大学 Pavement pothole detection method for highway inspection
CN117649633B (en) * 2024-01-30 2024-04-26 武汉纺织大学 Pavement pothole detection method for highway inspection

Also Published As

Publication number Publication date
CN116823852B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
CN110929622B (en) Video classification method, model training method, device, equipment and storage medium
Melekhov et al. Image-based localization using hourglass networks
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN112446270A (en) Training method of pedestrian re-identification network, and pedestrian re-identification method and device
Shen et al. Deep cross residual network for HEp-2 cell staining pattern classification
CN110930378B (en) Emphysema image processing method and system based on low data demand
Dong et al. Infrared image colorization using a s-shape network
CN113628261B (en) Infrared and visible light image registration method in electric power inspection scene
CN111079764A (en) Low-illumination license plate image recognition method and device based on deep learning
Khan et al. An encoder–decoder deep learning framework for building footprints extraction from aerial imagery
JP7225731B2 (en) Imaging multivariable data sequences
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN116823852B (en) Strip-shaped skin scar image segmentation method and system based on convolutional neural network
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN114241274A (en) Small target detection method based on super-resolution multi-scale feature fusion
CN114511502A (en) Gastrointestinal endoscope image polyp detection system based on artificial intelligence, terminal and storage medium
CN115131503A (en) Health monitoring method and system for iris three-dimensional recognition
CN116152561A (en) Image classification method based on fusion network of convolutional neural network and enhanced graph attention network
Guan et al. NCDCN: multi-focus image fusion via nest connection and dilated convolution network
Zhu et al. PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions
CN116843715A (en) Multi-view collaborative image segmentation method and system based on deep learning
CN114820755B (en) Depth map estimation method and system
Guo et al. Udtiri: An open-source road pothole detection benchmark suite
Pei et al. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant