CN113570611A

CN113570611A - A real-time mineral segmentation method based on multi-feature fusion decoder

Info

Publication number: CN113570611A
Application number: CN202110847545.6A
Authority: CN
Inventors: 牛福生; 薛文强; 张晋霞; 郭力娜; 姚珊珊; 粱银英; 陈稳
Original assignee: North China University of Science and Technology
Current assignee: North China University of Science and Technology
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-29

Abstract

The invention discloses a real-time mineral segmentation method based on a multi-feature fusion decoder, and belongs to the technical field of mineral phase segmentation. The real-time mineral segmentation method based on the multi-feature fusion decoder includes the following steps: S1: Segment the quartz in the white area of the label under the magnetite microscopic image, create multiple data sets, and perform For training and testing, the magnetite microscopic image is enhanced by a combination of strategies such as vertical flip, horizontal flip, random rotation n 90 degrees, affine transformation and random translation; S2: The enhancement method in S1 uses region cloning The data set enhancement method, in the training process of the data set, randomly clones a partial area of another picture from the data set to the index image, and the label also performs the same operation; the invention solves the problem of using the traditional medical semantic segmentation strategy to complete the magnetite The task of segmenting quartz under microscopic images is still time-consuming.

Description

Mineral real-time segmentation method based on multi-feature fusion decoder

Technical Field

The invention belongs to the technical field of mineral facies segmentation, and particularly relates to a real-time mineral segmentation method based on a multi-feature fusion decoder.

Background

The technical mineralogy workers operating the microscope to quantitatively analyze minerals have high requirements on professional knowledge and practical experience, the method is original and long in working time, and the computer is used for quickly segmenting the mineral phases to obtain the components, so that the method has great significance to the technical mineralogy workers; in recent years, the classification work of mineral rocks and the like by adopting a deep learning method at home and abroad is increasing, and students obtain quite abundant results on the identification of minerals, but as the color and texture characteristics of the mineral facies under the mirror are complex and various, and the mineral facies are difficult to be segmented by adopting a traditional image processing method, the students are rarely involved in the segmentation of the mineral facies, and the segmentation of the mineral facies becomes possible along with the development of deep learning semantic segmentation in recent years.

Mineral microscopic image segmentation tasks are closer to medical image segmentation, a medical image segmentation method has great reference value, a plurality of effective medical image segmentation schemes appear in the past years, most classically, a U-shaped network obtains a good segmentation effect in the biomedical field, then, the U-Net + + and U-Net3+ are provided by improvement on the basis of the U-Net, and in 2019, hollow convolution and pyramid pooling are introduced into the U-shaped network, so that the segmentation precision is further improved, and the medical image segmentation algorithm is mature in the segmentation precision; however, it still takes a lot of time to segment these photographs by using the conventional medical semantic segmentation strategy, for example, one ore light sheet is 3.5 × 3.5cm in size, and a microscope with 50 times of objective lens needs to take ten thousand photographs to take a complete picture.

Therefore, under the inspiration of a characteristic multiplexing structure, a decoder of a U-shaped network is improved, a multi-characteristic fusion decoder structure is provided, and the task of segmenting quartz under a magnetite microscopic image is completed; in order to solve the problems, the invention provides a real-time mineral segmentation method based on a multi-feature fusion decoder.

Disclosure of Invention

The invention aims to improve a decoder of a U-shaped network, provides a multi-feature fusion decoder structure, and completes the task of segmenting quartz under a magnetite microscopic image so as to solve the problems in the background art.

In order to achieve the purpose, the invention adopts the following technical scheme:

the real-time mineral segmentation method based on the multi-feature fusion decoder comprises the following steps:

s1: performing semantic information segmentation on quartz in a white area of a label under a magnetite microscopic image, manufacturing a plurality of data sets, respectively training and testing the obtained data sets, and enhancing the magnetite microscopic image by adopting the combination of strategies such as vertical overturning, horizontal overturning, random rotation for n 90 degrees, affine transformation, random translation and the like;

s2: the enhancement method in the S1 adopts a region cloning data set enhancement method, in the training process of the data set, a partial region of another picture is randomly cloned from the data set to the index image, and meanwhile, the same operation is also executed by the label, so that the diversity of the mineral facies data is improved, and the overfitting phenomenon in the training process is reduced;

s3: constructing a relevant network model based on the data set processed in the S2, encoding, decoding and fusing the data set, fusing all feature maps with the same scale in the repeated encoding and decoding operation process, acquiring an encoder feature map and a decoder feature map, and encoding again after the encoder feature map and the decoder feature map are fused;

s4: when the data set is encoded and decoded in the S3, a multi-feature aggregation decoder structure and a lightweight Resnet34 are adopted to build an MA-net network structure;

s5: based on the MA-net network structure established in S4, a channel attention mechanism is introduced for training to obtain a network model, and the segmentation precision is improved;

s6: introducing a residual multi-kernel pooling module at the end of the MA-net network structure built in the S4, wherein the residual multi-kernel pooling module mainly depends on a plurality of effective views to detect objects with different sizes;

s7: in the training process of the data set in the S1, replacing Batch Normalization (BN) with Filter Response Normalization (FRN), and replacing modified linear unit (Relu) with corresponding activation layer Threshold Linear Unit (TLU); model performance analysis experiments were performed on the EM challenge data set, the LUNA challenge data set, and the DRIVE data set, respectively.

Preferably, a detail extraction module is added to the MA-net network structure described in S4 to improve the ability to segment small objects in the DRIVE data set.

Preferably, in the MA-net network structure described in S4, the first convolutional layer uses 16 channels, the number of output channels of the first convolutional core is reduced to 1/4 of the number of input channels, and this is used as the number of input channels of the second convolutional layer, and the parameter amount is greatly reduced without changing the input and output channels of each decoder block.

Preferably, the channel attention mechanism introduced in S5 includes the following steps:

a1: global average pooling is performed first to maintain a maximum receptive field;

a2: and then, a learnable weight is distributed to the channel of each feature map, so that the network model is more concerned about the classified main objects, and the channel attention mechanism adopts an ARM module.

Preferably, the step of introducing the residual multi-kernel pooling module in S6 includes the following steps:

b1: collecting context information by adopting four pooling kernels with different sizes to enrich high-level semantic information;

b2: obtaining a feature map with the same size as the original feature map by bilinear interpolation, and reducing the dimension to 1 by 1 × 1 convolution;

b3: merging the original characteristic diagram and the up-sampled characteristic diagram into a channel; the residual multi-kernel pooling module can cope with large variations in object size in the image.

Preferably, the FRN expression described in S7 is as follows:

where x is a vector of N dimensions (H W); unlike the normalization method where the mean is subtracted from the BN layer and then divided by the standard deviation, the mean of the quadratic norm is subtracted from the FRN; ε in the formula is a small normal amount to prevent division by 0.

Preferably, the TLU expression described in S7 is as follows:

z_i＝max(y_i,τ)＝ReLu(y-τ)+τ (3)

where τ is a learnable parameter.

Compared with the prior art, the invention provides a mineral real-time segmentation method based on a multi-feature fusion decoder, which has the following beneficial effects:

(1) the invention provides a multi-feature fusion decoder structure, a MA-Net network structure is built by combining with lightweight Resnet34, residual multi-core pooling is added at the end of an encoder to enhance the segmentation effect on various size targets, a channel attention mechanism is introduced to improve the segmentation precision, FRN is adopted to eliminate the dependence of the network on Batchsize in the training process, and meanwhile, compared with the decoder structure with a single path, the network has the advantages that a downsampling process is added, multi-stage feature information is aggregated in the encoding and decoding processes, so that the MA-Net network structure can be compared with other U-type networks, the number of network channels can be greatly reduced, parameters are reduced, and the segmentation precision is also guaranteed.

(2) In order to obtain a higher segmentation effect by using a small amount of training sample data, the method of randomly cloning a partial area of another picture in a data set to an index image is adopted for data enhancement; through experimental verification and analysis, the method can be applied to a segmentation task with random segmentation target space positions such as a mineral microscopic image, and can effectively reduce overfitting and improve segmentation precision.

(3) The invention is obtained by testing and analyzing on EM, LUNA and DRIVE data sets, and MA-Net shows outstanding performance when segmenting a larger target, is not good at segmenting a tiny target, and needs to be optimized and improved in the aspect of segmenting a small target; the MA-Net was used for the task of partitioning quartz in the magnetite phase, and the Dice coefficient reached 0.9637.

(4) In order to avoid the influence of batch size on a training result during training, filter response normalization is adopted to replace batch standardization, meanwhile, a corresponding activation layer threshold linear unit is adopted to replace a correction linear unit, the average Dice coefficients tested under an EM challenge data set and a LUNA challenge data set are 0.9657 and 0.9852 respectively, compared with 0.9584 and 0.9758 of U-net, the improvement is great, and meanwhile, the floating point operand is 5.72G and is only 1/43 of U-net; it can thus be seen that the task of using MA-Net for magnetite to divide quartz shows good dividing effect.

(5) The influence of each module of MA-Net on the segmentation effect and the real-time performance of the model is verified on a mineral microscopic image data set, the multi-feature fusion decoding strategy adopted by the MA-Net can fully extract the information of deep features and shallow features, the correlation of the deep features and the shallow features is learned to process isolated pixel points in the segmentation result, and the problems of information loss and poor fusion quality in the up-sampling process are greatly solved.

Drawings

FIG. 1 is a diagram showing magnetite mineral phase data set of the multi-feature fusion decoder-based mineral real-time segmentation method of the present invention;

FIG. 2 is a diagram showing a region clone data enhancement method of the multi-feature fusion decoder-based real-time mineral segmentation method of the present invention;

FIG. 3 is a structural comparison diagram of a U-shaped network and a stage feature multiplexing structure of the real-time mineral segmentation method based on the multi-feature fusion decoder, which is provided by the invention;

FIG. 4 is a diagram showing the MA-net structure of the multi-feature fusion decoder-based real-time mineral segmentation method proposed in the present invention;

FIG. 5 is an ARM module display diagram of the multi-feature fusion decoder-based real-time mineral segmentation method of the present invention;

FIG. 6 is a diagram showing a residual multi-kernel pooling module of the multi-feature fusion decoder-based real-time mineral segmentation method of the present invention;

FIG. 7 is a comparison graph of the mineral segmentation effect of the real-time mineral segmentation method based on the multi-feature fusion decoder according to the present invention;

FIG. 8 is a diagram illustrating the effect of the region clone data enhancement method of the real-time mineral segmentation method based on the multi-feature fusion decoder according to the present invention;

fig. 9 is a mineral segmentation result diagram of the real-time mineral segmentation method based on the multi-feature fusion decoder according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Example 1:

referring to fig. 1-6, the method for real-time mineral segmentation based on multi-feature fusion decoder includes the following steps:

s1: performing semantic information segmentation on quartz in a white area (shown in figure 1) of a label under a magnetite microscopic image, making a plurality of data sets, respectively training and testing the obtained data sets, and enhancing the magnetite microscopic image by adopting the combination of strategies of vertical overturning, horizontal overturning, random rotation for n 90 degrees, affine transformation, random translation and the like;

the method is applied to the mining facies data set, the probability of increasing the richness of the data set is greater than the probability of destroying information, and the experimental part of the method verifies the method; the region clone data enhancement method is shown in FIG. 2;

the overall structure of the network is a coding and decoding structure, the traditional decoder structure adopts a single-path up-sampling feature map which is continuously enriched, all up-sampling processes are not closely connected, and a deep feature map is difficult to recover detailed information in the decoding process;

the invention has proposed a decoder structure to gather the characteristic of multiple stages in conjuction with characteristic multiplexing structure and code decoding structure that the open literature puts forward, this tactics fuses all characteristic pictures of the same scale in encoding and decoding the operation process repeatedly, and encode and can learn the correlation relation of the two further after encoder characteristic picture and decoder characteristic picture are fused again, make it more appropriate to fuse, because the characteristic picture of multiple stages of the structure supplements detailed information and spatial information in decoding process of each depth, can be very large to compress the channel number of characteristic picture compared with traditional structure, thus reduce the parameter number, U type network and stage characteristic multiplexing structure are as shown in fig. 3;

s4: when the data set is encoded and decoded in S3, a multi-feature aggregation decoder structure and a lightweight Resnet34 are used to construct an MA-net network structure, which is shown in fig. 4, where 'c' represents a merging channel and is a1 × 1 convolution;

in the MA-net network structure described in S4, the first convolutional layer uses 16 channels, the number of output channels of the first convolutional core is reduced to 1/4 of the number of input channels, and this is used as the number of input channels of the second convolutional layer, and the parameter amount is greatly reduced under the condition that the input and output channels of each decoder block are not changed;

in a deep convolutional neural network, the size of a shallow characteristic diagram is larger than that of a deep layer, and the calculation amount is influenced by the number of channels and is more sensitive; wherein the encoder parameters and the number of output channels are shown in table 1,/2 denotes 2-fold down-sampling; the decoder configuration parameters are shown in Table 2, where

2 denotes 2-fold upsampling;

TABLE 1 encoder Module parameters

TABLE 2 decoder Module parameters

the channel attention mechanism introduced in the step S5 includes the following steps:

a2: then, a learnable weight is distributed to the channel of each feature graph, so that the network model is more concerned about classified main objects, wherein an ARM module is adopted as a channel attention mechanism, as shown in FIG. 5;

the step of introducing the residual multi-kernel pooling module in the step S6 includes the following steps:

b3: merging the original characteristic diagram and the up-sampled characteristic diagram into a channel; the residual multi-kernel pooling module can cope with large changes in the size of objects in the image;

the module introduces fewer parameters, namely 388 parameters, which causes a slight increase in calculation cost, but the obtained accuracy improvement is more important, as shown in fig. 6, which is an RMP module;

s7: in the training process of the data set in the S1, replacing Batch Normalization (BN) with Filter Response Normalization (FRN), and replacing modified linear unit (Relu) with corresponding activation layer Threshold Linear Unit (TLU); respectively performing model performance analysis experiments on the EM challenge data set, the LUNA challenge data set and the DRIVE data set;

adding a detail extraction module into the MA-net network structure in S4 to improve the capability of segmenting tiny targets in the DRIVE data set;

in order to improve the capability of segmenting tiny targets in a DRIVE data set, a detail extraction module is added in a network, namely a spatial information extraction path of a small step height channel proposed in a public document is fused with an original decoder path, so that the segmentation effect of the tiny targets is effectively improved, the calculated amount is increased, and the precision of a segmentation task with a larger target cannot be improved;

the FRN expression described in S7 is as follows:

where x is a vector of N dimensions (H W); unlike the normalization method where the mean is subtracted from the BN layer and then divided by the standard deviation, the mean of the quadratic norm is subtracted from the FRN; ε in the formula is a small normal amount to prevent division by 0;

to solve the above-mentioned problem that Relu activation generates a 0 value, and the disclosure proposes a thresholded Relu adopted after FRN, i.e. TLU is important for training performance improvement, the TLU expression described in S7 is as follows:

z_i＝max(y_i,τ)＝ReLu(y-τ)+τ (3)

where τ is a learnable parameter.

The model performance analysis experiment is as follows:

(one) Experimental setup

The evaluation index adopted by the experiment is a Dice coefficient, and the test set is not subjected to any enhancement, such as multi-scale or multi-angle, so that the predicted result quality is higher. The Dice coefficient is a set similarity metric function, which is generally used to calculate the similarity of two samples, and has a value ranging from 0 to 1, and the segmentation is preferably 1 and 0 at worst, and is as follows, where TP, FP, and FN represent the number of true positives, false positives, and false negatives, respectively. The Dice expression is as follows:

the experimental operating system is Arch, a pytorch deep learning framework, the batch size (batch size) is 8, the Adam optimizer adopts a Dice coefficient loss function, and the input image size is 512 multiplied by 512.

Setting of channel

The number of channels of the output layer of the encoder is one of the main limitations of network acceleration, the experiment adopts Resnet18 as the reference network of the encoder to perform the experiment on the combination of three groups of channels on the magnetite microscopic image data set, as shown in Table 3, it can be seen that the calculated amount is obviously increased along with the increase of the number of channels output by each layer of the encoder, the segmentation precision of the channel number strategy 2 is greatly improved compared with that of the strategy 1, the segmentation precision of the strategy 3 is almost unchanged compared with that of the strategy 2, the segmentation task is considered to be not complex, excessive parameters only generate redundancy, and the network structure limits the capability of extracting semantic information.

TABLE 3MA-net number of channels comparison experiment

Coder network selection

In order to further explore the influence of the reference network depth of the MA-net encoder on the network performance and select a proper encoder network, a channel strategy 2 is adopted in the experiment, the segmentation performance of the Resnet-18 and Resnet-34 is compared on a magnetite microscopic image data set, in order to verify that the influence of the network depth and the number of channels on the segmentation performance is increased at the same time, a group of comparison experiments adopting an original parameter Resnet34 are added and are expressed by Resnet34-B, and table 4 shows the segmentation performance and the calculation amount of 3 encoders, so that the network deepening can be found to improve the model segmentation performance to a certain extent, the effect of an excessively deep network and the excessive number of channels is not large, and the calculation amount is increased sharply. And the lightweight encoder network has little impact on model performance. Later experiments all used lightweight Resnet34, channel number strategy 2.

TABLE 4MA-net reference network comparison experiment

(II) analysis of model

According to the invention, an MA-Net ablation experiment is carried out on a magnetite mineral microscopic image data set, the performance of each module is analyzed, the Dice coefficient, the parameter quantity and the calculated quantity of the module are shown in Table 5, it can be seen that an attention mechanism ARM has a certain effect on the model segmentation precision, the precision of the model is greatly improved by adopting residual multi-core pooling RMP, but the added calculated quantity is minimum, the segmentation precision is slightly improved by introducing an FRN normalization method, and meanwhile, the calculated quantity is reduced.

TABLE 5MA-net ablation experiments on mineral segmentation datasets

(III) model comparison experiment

Comparing the proposed MA-Net and advanced algorithms on the magnetite microscopic image data set, it can be seen that the MA-Net segmentation accuracy exceeds the other two networks, while the parameters and the calculated amount are minimal.

TABLE 6 Magnetite microscopic image data set modeling contrast experiment

Fig. 7 is a comparison graph of segmentation effect, it can be seen that CE-net is far lower than MA-net in segmentation effect, the CE-net segmentation image is easily interfered by some highlight portions, although the overall contour segmentation effect is better, a large number of holes exist in the image, and MA-net rarely occurs this case, CE-net adds a hole convolution and multi-kernel pooling at the end of the encoder to increase the receptive field, but the encoder feature map and the decoder feature map are fused in a simple addition manner, and information loss and improper fusion inevitably occur in the process of upsampling, and the multi-feature fusion decoding strategy adopted by MA-net can fully extract information of deep-layer features and shallow-layer features, learn the correlation thereof to process a large target in the segmentation result, and greatly overcome the problems of information loss and poor fusion in the process of upsampling, meanwhile, downsampling is carried out after the superficial layer feature maps are fused each time, and the expansion of the receptive field is beneficial to spatial information supplement of a large target.

Regional cloning data enhancement methods comparative experiments were as follows:

in order to analyze the effect of the regional clone data enhancement method, an experiment is carried out on a mineral microscopic image data set, as shown in fig. 8, the segmentation effect on a test set is compared by adopting the regional clone data enhancement method and when the regional clone data enhancement method is not adopted, the fact that the Dice value fluctuation is small when the data enhancement method is adopted can be seen from a curve, and a high segmentation effect is finally obtained; the analysis reason is as follows: the information of the two pictures is combined by adopting a region clone data enhancement method, so that the difference between the pictures can be effectively reduced, and further, the variance of the data is reduced;

by adopting the MA-Net network and the regional clone data enhancement method provided by the invention to perform the task of dividing quartz in the magnetite phase, the Dice coefficient reaches 0.9637, and is extremely close to the manually marked label, as shown in FIG. 9, a single picture can be predicted only by 0.16 second under the Ruilong R7-3700x CPU.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. the mineral real-time segmentation method based on multi-feature fusion decoder, is characterized in that: comprise the steps:

S1: Semantic information segmentation is performed on the quartz in the white area of the label under the magnetite microscopic image, multiple datasets are made, and the obtained datasets are trained and tested respectively. Magnetite microscopic images are enhanced by a combination of degrees, affine transformations and random translations;

S2: The area clone data set enhancement method is used for the enhancement in the S1. During the training process of the data set, a part of the area of another image is randomly cloned from the data set to the index image, and the label also performs the same operation to improve the mining efficiency. The diversity of phase data and reduce the overfitting phenomenon of the training process;

S3: Build a relevant network model based on the data set processed in S2, encode, decode and fuse the data set, and fuse all feature maps of the same scale during repeated encoding and decoding operations to obtain encoder feature maps and decoding. feature map of the encoder, and encode it again after the encoder feature map and the decoder feature map are fused;

S4: When encoding and decoding the data set in the S3, the multi-feature aggregation decoder structure and the lightweight Resnet34 are used to build the MA-net network structure;

S5: Based on the MA-net network structure built in S4, the channel attention mechanism is introduced to train the network model to improve the segmentation accuracy;

S6: Introduce a residual multi-kernel pooling module at the end of the MA-net network structure built in S4, and the residual multi-kernel pooling module mainly relies on multiple effective fields of view to detect objects of different sizes;

S7: In the training process of the data set in S1, filter response normalization is used to replace batch normalization, and the corresponding activation layer threshold linear unit is used to replace the corrected linear unit; respectively in the EM challenge data set and the LUNA challenge Model performance analysis experiments are performed on the dataset and the DRIVE dataset.

2. the mineral real-time segmentation method based on multi-feature fusion decoder according to claim 1, is characterized in that: in the MA-net network structure described in S4, add detail extraction module, improve the segmentation of tiny targets in the DRIVE data set. ability.

3. The mineral real-time segmentation method based on multi-feature fusion decoder according to claim 1 or 2, characterized in that: in the MA-net network structure described in S4, the first convolutional layer adopts 16 channels, and the The number of output channels of the first convolution kernel is reduced to 1/4 of the number of input channels, and this is used as the number of input channels of the second convolution layer, with the input and output channels of each decoder block unchanged The number of parameters is reduced.

4. according to the mineral real-time segmentation method based on multi-feature fusion decoder described in claim 1, it is characterized in that: introducing channel attention mechanism in described S5, comprises the steps:

A1: Perform global average pooling first to maintain the largest receptive field;

A2: Then, by assigning learnable weights to the channels of each feature map, the network model is made to pay more attention to the main objects to be classified, and the channel attention mechanism adopts the ARM module.

5. The mineral real-time segmentation method based on multi-feature fusion decoder according to claim 1, is characterized in that: introducing residual multi-kernel pooling module in described S6 comprises the steps:

B1: Four pooling kernels of different sizes are used to collect contextual information to enrich high-level semantic information;

B2: Then obtain a feature map of the same size as the original feature map through bilinear interpolation and reduce the dimension to 1 through 1×1 convolution;

B3: Merge channels with the original feature map and the upsampled feature map; the residual multi-kernel pooling module is used to cope with large changes in the size of objects in the image.

6. the mineral real-time segmentation method based on multi-feature fusion decoder according to claim 1, is characterized in that: the FRN expression described in S7 is as follows:

where x is an N-dimensional (H×W) vector; the difference from the normalization method in which the BN layer subtracts the mean and then divides by the standard deviation is the FRN minus the mean of the quadratic norm; ε in the formula is a Small normal amount to prevent division by 0.

7. the mineral real-time segmentation method based on multi-feature fusion decoder according to claim 6, is characterized in that: the TLU expression described in S7 is as follows:

z _i =max(y _i ,τ)=ReLu(y-τ)+τ (3)

where τ is a learnable parameter.