Disclosure of Invention
The invention aims to improve a decoder of a U-shaped network, provides a multi-feature fusion decoder structure, and completes the task of segmenting quartz under a magnetite microscopic image so as to solve the problems in the background art.
In order to achieve the purpose, the invention adopts the following technical scheme:
the real-time mineral segmentation method based on the multi-feature fusion decoder comprises the following steps:
s1: performing semantic information segmentation on quartz in a white area of a label under a magnetite microscopic image, manufacturing a plurality of data sets, respectively training and testing the obtained data sets, and enhancing the magnetite microscopic image by adopting the combination of strategies such as vertical overturning, horizontal overturning, random rotation for n 90 degrees, affine transformation, random translation and the like;
s2: the enhancement method in the S1 adopts a region cloning data set enhancement method, in the training process of the data set, a partial region of another picture is randomly cloned from the data set to the index image, and meanwhile, the same operation is also executed by the label, so that the diversity of the mineral facies data is improved, and the overfitting phenomenon in the training process is reduced;
s3: constructing a relevant network model based on the data set processed in the S2, encoding, decoding and fusing the data set, fusing all feature maps with the same scale in the repeated encoding and decoding operation process, acquiring an encoder feature map and a decoder feature map, and encoding again after the encoder feature map and the decoder feature map are fused;
s4: when the data set is encoded and decoded in the S3, a multi-feature aggregation decoder structure and a lightweight Resnet34 are adopted to build an MA-net network structure;
s5: based on the MA-net network structure established in S4, a channel attention mechanism is introduced for training to obtain a network model, and the segmentation precision is improved;
s6: introducing a residual multi-kernel pooling module at the end of the MA-net network structure built in the S4, wherein the residual multi-kernel pooling module mainly depends on a plurality of effective views to detect objects with different sizes;
s7: in the training process of the data set in the S1, replacing Batch Normalization (BN) with Filter Response Normalization (FRN), and replacing modified linear unit (Relu) with corresponding activation layer Threshold Linear Unit (TLU); model performance analysis experiments were performed on the EM challenge data set, the LUNA challenge data set, and the DRIVE data set, respectively.
Preferably, a detail extraction module is added to the MA-net network structure described in S4 to improve the ability to segment small objects in the DRIVE data set.
Preferably, in the MA-net network structure described in S4, the first convolutional layer uses 16 channels, the number of output channels of the first convolutional core is reduced to 1/4 of the number of input channels, and this is used as the number of input channels of the second convolutional layer, and the parameter amount is greatly reduced without changing the input and output channels of each decoder block.
Preferably, the channel attention mechanism introduced in S5 includes the following steps:
a1: global average pooling is performed first to maintain a maximum receptive field;
a2: and then, a learnable weight is distributed to the channel of each feature map, so that the network model is more concerned about the classified main objects, and the channel attention mechanism adopts an ARM module.
Preferably, the step of introducing the residual multi-kernel pooling module in S6 includes the following steps:
b1: collecting context information by adopting four pooling kernels with different sizes to enrich high-level semantic information;
b2: obtaining a feature map with the same size as the original feature map by bilinear interpolation, and reducing the dimension to 1 by 1 × 1 convolution;
b3: merging the original characteristic diagram and the up-sampled characteristic diagram into a channel; the residual multi-kernel pooling module can cope with large variations in object size in the image.
Preferably, the FRN expression described in S7 is as follows:
where x is a vector of N dimensions (H W); unlike the normalization method where the mean is subtracted from the BN layer and then divided by the standard deviation, the mean of the quadratic norm is subtracted from the FRN; ε in the formula is a small normal amount to prevent division by 0.
Preferably, the TLU expression described in S7 is as follows:
zi=max(yi,τ)=ReLu(y-τ)+τ (3)
where τ is a learnable parameter.
Compared with the prior art, the invention provides a mineral real-time segmentation method based on a multi-feature fusion decoder, which has the following beneficial effects:
(1) the invention provides a multi-feature fusion decoder structure, a MA-Net network structure is built by combining with lightweight Resnet34, residual multi-core pooling is added at the end of an encoder to enhance the segmentation effect on various size targets, a channel attention mechanism is introduced to improve the segmentation precision, FRN is adopted to eliminate the dependence of the network on Batchsize in the training process, and meanwhile, compared with the decoder structure with a single path, the network has the advantages that a downsampling process is added, multi-stage feature information is aggregated in the encoding and decoding processes, so that the MA-Net network structure can be compared with other U-type networks, the number of network channels can be greatly reduced, parameters are reduced, and the segmentation precision is also guaranteed.
(2) In order to obtain a higher segmentation effect by using a small amount of training sample data, the method of randomly cloning a partial area of another picture in a data set to an index image is adopted for data enhancement; through experimental verification and analysis, the method can be applied to a segmentation task with random segmentation target space positions such as a mineral microscopic image, and can effectively reduce overfitting and improve segmentation precision.
(3) The invention is obtained by testing and analyzing on EM, LUNA and DRIVE data sets, and MA-Net shows outstanding performance when segmenting a larger target, is not good at segmenting a tiny target, and needs to be optimized and improved in the aspect of segmenting a small target; the MA-Net was used for the task of partitioning quartz in the magnetite phase, and the Dice coefficient reached 0.9637.
(4) In order to avoid the influence of batch size on a training result during training, filter response normalization is adopted to replace batch standardization, meanwhile, a corresponding activation layer threshold linear unit is adopted to replace a correction linear unit, the average Dice coefficients tested under an EM challenge data set and a LUNA challenge data set are 0.9657 and 0.9852 respectively, compared with 0.9584 and 0.9758 of U-net, the improvement is great, and meanwhile, the floating point operand is 5.72G and is only 1/43 of U-net; it can thus be seen that the task of using MA-Net for magnetite to divide quartz shows good dividing effect.
(5) The influence of each module of MA-Net on the segmentation effect and the real-time performance of the model is verified on a mineral microscopic image data set, the multi-feature fusion decoding strategy adopted by the MA-Net can fully extract the information of deep features and shallow features, the correlation of the deep features and the shallow features is learned to process isolated pixel points in the segmentation result, and the problems of information loss and poor fusion quality in the up-sampling process are greatly solved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
Example 1:
referring to fig. 1-6, the method for real-time mineral segmentation based on multi-feature fusion decoder includes the following steps:
s1: performing semantic information segmentation on quartz in a white area (shown in figure 1) of a label under a magnetite microscopic image, making a plurality of data sets, respectively training and testing the obtained data sets, and enhancing the magnetite microscopic image by adopting the combination of strategies of vertical overturning, horizontal overturning, random rotation for n 90 degrees, affine transformation, random translation and the like;
s2: the enhancement method in the S1 adopts a region cloning data set enhancement method, in the training process of the data set, a partial region of another picture is randomly cloned from the data set to the index image, and meanwhile, the same operation is also executed by the label, so that the diversity of the mineral facies data is improved, and the overfitting phenomenon in the training process is reduced;
the method is applied to the mining facies data set, the probability of increasing the richness of the data set is greater than the probability of destroying information, and the experimental part of the method verifies the method; the region clone data enhancement method is shown in FIG. 2;
s3: constructing a relevant network model based on the data set processed in the S2, encoding, decoding and fusing the data set, fusing all feature maps with the same scale in the repeated encoding and decoding operation process, acquiring an encoder feature map and a decoder feature map, and encoding again after the encoder feature map and the decoder feature map are fused;
the overall structure of the network is a coding and decoding structure, the traditional decoder structure adopts a single-path up-sampling feature map which is continuously enriched, all up-sampling processes are not closely connected, and a deep feature map is difficult to recover detailed information in the decoding process;
the invention has proposed a decoder structure to gather the characteristic of multiple stages in conjuction with characteristic multiplexing structure and code decoding structure that the open literature puts forward, this tactics fuses all characteristic pictures of the same scale in encoding and decoding the operation process repeatedly, and encode and can learn the correlation relation of the two further after encoder characteristic picture and decoder characteristic picture are fused again, make it more appropriate to fuse, because the characteristic picture of multiple stages of the structure supplements detailed information and spatial information in decoding process of each depth, can be very large to compress the channel number of characteristic picture compared with traditional structure, thus reduce the parameter number, U type network and stage characteristic multiplexing structure are as shown in fig. 3;
s4: when the data set is encoded and decoded in S3, a multi-feature aggregation decoder structure and a lightweight Resnet34 are used to construct an MA-net network structure, which is shown in fig. 4, where 'c' represents a merging channel and is a1 × 1 convolution;
in the MA-net network structure described in S4, the first convolutional layer uses 16 channels, the number of output channels of the first convolutional core is reduced to 1/4 of the number of input channels, and this is used as the number of input channels of the second convolutional layer, and the parameter amount is greatly reduced under the condition that the input and output channels of each decoder block are not changed;
in a deep convolutional neural network, the size of a shallow characteristic diagram is larger than that of a deep layer, and the calculation amount is influenced by the number of channels and is more sensitive; wherein the encoder parameters and the number of output channels are shown in table 1,/2 denotes 2-fold down-sampling; the decoder configuration parameters are shown in Table 2, where
2 denotes 2-fold upsampling;
TABLE 1 encoder Module parameters
TABLE 2 decoder Module parameters
S5: based on the MA-net network structure established in S4, a channel attention mechanism is introduced for training to obtain a network model, and the segmentation precision is improved;
the channel attention mechanism introduced in the step S5 includes the following steps:
a1: global average pooling is performed first to maintain a maximum receptive field;
a2: then, a learnable weight is distributed to the channel of each feature graph, so that the network model is more concerned about classified main objects, wherein an ARM module is adopted as a channel attention mechanism, as shown in FIG. 5;
s6: introducing a residual multi-kernel pooling module at the end of the MA-net network structure built in the S4, wherein the residual multi-kernel pooling module mainly depends on a plurality of effective views to detect objects with different sizes;
the step of introducing the residual multi-kernel pooling module in the step S6 includes the following steps:
b1: collecting context information by adopting four pooling kernels with different sizes to enrich high-level semantic information;
b2: obtaining a feature map with the same size as the original feature map by bilinear interpolation, and reducing the dimension to 1 by 1 × 1 convolution;
b3: merging the original characteristic diagram and the up-sampled characteristic diagram into a channel; the residual multi-kernel pooling module can cope with large changes in the size of objects in the image;
the module introduces fewer parameters, namely 388 parameters, which causes a slight increase in calculation cost, but the obtained accuracy improvement is more important, as shown in fig. 6, which is an RMP module;
s7: in the training process of the data set in the S1, replacing Batch Normalization (BN) with Filter Response Normalization (FRN), and replacing modified linear unit (Relu) with corresponding activation layer Threshold Linear Unit (TLU); respectively performing model performance analysis experiments on the EM challenge data set, the LUNA challenge data set and the DRIVE data set;
adding a detail extraction module into the MA-net network structure in S4 to improve the capability of segmenting tiny targets in the DRIVE data set;
in order to improve the capability of segmenting tiny targets in a DRIVE data set, a detail extraction module is added in a network, namely a spatial information extraction path of a small step height channel proposed in a public document is fused with an original decoder path, so that the segmentation effect of the tiny targets is effectively improved, the calculated amount is increased, and the precision of a segmentation task with a larger target cannot be improved;
the FRN expression described in S7 is as follows:
where x is a vector of N dimensions (H W); unlike the normalization method where the mean is subtracted from the BN layer and then divided by the standard deviation, the mean of the quadratic norm is subtracted from the FRN; ε in the formula is a small normal amount to prevent division by 0;
to solve the above-mentioned problem that Relu activation generates a 0 value, and the disclosure proposes a thresholded Relu adopted after FRN, i.e. TLU is important for training performance improvement, the TLU expression described in S7 is as follows:
zi=max(yi,τ)=ReLu(y-τ)+τ (3)
where τ is a learnable parameter.
The model performance analysis experiment is as follows:
(one) Experimental setup
The evaluation index adopted by the experiment is a Dice coefficient, and the test set is not subjected to any enhancement, such as multi-scale or multi-angle, so that the predicted result quality is higher. The Dice coefficient is a set similarity metric function, which is generally used to calculate the similarity of two samples, and has a value ranging from 0 to 1, and the segmentation is preferably 1 and 0 at worst, and is as follows, where TP, FP, and FN represent the number of true positives, false positives, and false negatives, respectively. The Dice expression is as follows:
the experimental operating system is Arch, a pytorch deep learning framework, the batch size (batch size) is 8, the Adam optimizer adopts a Dice coefficient loss function, and the input image size is 512 multiplied by 512.
Setting of channel
The number of channels of the output layer of the encoder is one of the main limitations of network acceleration, the experiment adopts Resnet18 as the reference network of the encoder to perform the experiment on the combination of three groups of channels on the magnetite microscopic image data set, as shown in Table 3, it can be seen that the calculated amount is obviously increased along with the increase of the number of channels output by each layer of the encoder, the segmentation precision of the channel number strategy 2 is greatly improved compared with that of the strategy 1, the segmentation precision of the strategy 3 is almost unchanged compared with that of the strategy 2, the segmentation task is considered to be not complex, excessive parameters only generate redundancy, and the network structure limits the capability of extracting semantic information.
TABLE 3MA-net number of channels comparison experiment
Coder network selection
In order to further explore the influence of the reference network depth of the MA-net encoder on the network performance and select a proper encoder network, a channel strategy 2 is adopted in the experiment, the segmentation performance of the Resnet-18 and Resnet-34 is compared on a magnetite microscopic image data set, in order to verify that the influence of the network depth and the number of channels on the segmentation performance is increased at the same time, a group of comparison experiments adopting an original parameter Resnet34 are added and are expressed by Resnet34-B, and table 4 shows the segmentation performance and the calculation amount of 3 encoders, so that the network deepening can be found to improve the model segmentation performance to a certain extent, the effect of an excessively deep network and the excessive number of channels is not large, and the calculation amount is increased sharply. And the lightweight encoder network has little impact on model performance. Later experiments all used lightweight Resnet34, channel number strategy 2.
TABLE 4MA-net reference network comparison experiment
(II) analysis of model
According to the invention, an MA-Net ablation experiment is carried out on a magnetite mineral microscopic image data set, the performance of each module is analyzed, the Dice coefficient, the parameter quantity and the calculated quantity of the module are shown in Table 5, it can be seen that an attention mechanism ARM has a certain effect on the model segmentation precision, the precision of the model is greatly improved by adopting residual multi-core pooling RMP, but the added calculated quantity is minimum, the segmentation precision is slightly improved by introducing an FRN normalization method, and meanwhile, the calculated quantity is reduced.
TABLE 5MA-net ablation experiments on mineral segmentation datasets
(III) model comparison experiment
Comparing the proposed MA-Net and advanced algorithms on the magnetite microscopic image data set, it can be seen that the MA-Net segmentation accuracy exceeds the other two networks, while the parameters and the calculated amount are minimal.
TABLE 6 Magnetite microscopic image data set modeling contrast experiment
Fig. 7 is a comparison graph of segmentation effect, it can be seen that CE-net is far lower than MA-net in segmentation effect, the CE-net segmentation image is easily interfered by some highlight portions, although the overall contour segmentation effect is better, a large number of holes exist in the image, and MA-net rarely occurs this case, CE-net adds a hole convolution and multi-kernel pooling at the end of the encoder to increase the receptive field, but the encoder feature map and the decoder feature map are fused in a simple addition manner, and information loss and improper fusion inevitably occur in the process of upsampling, and the multi-feature fusion decoding strategy adopted by MA-net can fully extract information of deep-layer features and shallow-layer features, learn the correlation thereof to process a large target in the segmentation result, and greatly overcome the problems of information loss and poor fusion in the process of upsampling, meanwhile, downsampling is carried out after the superficial layer feature maps are fused each time, and the expansion of the receptive field is beneficial to spatial information supplement of a large target.
Regional cloning data enhancement methods comparative experiments were as follows:
in order to analyze the effect of the regional clone data enhancement method, an experiment is carried out on a mineral microscopic image data set, as shown in fig. 8, the segmentation effect on a test set is compared by adopting the regional clone data enhancement method and when the regional clone data enhancement method is not adopted, the fact that the Dice value fluctuation is small when the data enhancement method is adopted can be seen from a curve, and a high segmentation effect is finally obtained; the analysis reason is as follows: the information of the two pictures is combined by adopting a region clone data enhancement method, so that the difference between the pictures can be effectively reduced, and further, the variance of the data is reduced;
by adopting the MA-Net network and the regional clone data enhancement method provided by the invention to perform the task of dividing quartz in the magnetite phase, the Dice coefficient reaches 0.9637, and is extremely close to the manually marked label, as shown in FIG. 9, a single picture can be predicted only by 0.16 second under the Ruilong R7-3700x CPU.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.