CN115294412A

CN115294412A - Real-time coal rock segmentation network generation method based on deep learning

Info

Publication number: CN115294412A
Application number: CN202211230817.9A
Authority: CN
Inventors: 陈吉; 王星; 刘亚; 高峰; 李金岩; 李晓乐; 张鑫; 蒋文杰
Original assignee: Linyi University
Current assignee: Linyi University
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2022-11-04

Abstract

The invention provides a real-time coal rock segmentation network generation method based on deep learning, and belongs to the field of image processing. The method specifically comprises the following steps: s1: constructing a CB module; the CB module utilizes a convolution and downsampling operation module to rapidly downsample the large image to a feature image with low resolution; s2: constructing a UB module; the UB module is a convolution module consisting of small unets, and a high-resolution path is constructed by using a 1/16 characteristic diagram; s3: constructing a deep polymerization pooling pyramid module; increasing the receptive field of feature extraction and more fully extracting context information; s4: fusing dual resolution paths; the bilinear interpolation method and the solving method are used for realizing the fusion from low to high, the output of the high-resolution path and the low-resolution path is subjected to 2 times of upsampling and 8 times of upsampling, and the sampled feature maps are added; s5: constructing a segmentation head module; and two convolutional layers are used, so that the characteristic diagram is better classified into coal and rock.

Description

Real-time coal rock segmentation network generation method based on deep learning

Technical Field

The invention relates to a deep learning-based network generation method, in particular to a real-time coal rock segmentation network generation method based on deep learning, and belongs to the technical field of image processing.

Background

Coal is the most economic fossil energy in the world and also the main energy in China, and plays a significant role in energy safety and economic and social development in China. The intelligent and unmanned coal mining is a main way for reducing coal mine accidents, particularly working face accidents, while improving the coal yield. The coal rock interface automatic identification is a core technology for realizing intelligent and unmanned mining, is the key of automatic adjustment of the height of a cutting part of a roller of a coal mining machine and a heading and anchoring machine and automatic adjustment of top plate support of a hydraulic support, and is also a well-known world problem. Therefore, the deep research on the automatic identification method of the coal-rock interface has important theoretical significance and practical value. Because coal rock identification of a mining working face is different from classification of coal blocks and rock blocks, the edge information of the coal rock is acquired while the classification is identified. The coal rock in China has the characteristics of diversity, similarity, complexity and the like, and the existing semantic segmentation model has the problems of low accuracy, low speed, low performance and the like in the coal rock segmentation. Therefore, the problems that the number and the quality of coal rock samples are really improved and a more efficient coal rock identification model is constructed are currently and urgently needed to be solved.

Aiming at the problems, the invention designs an efficient real-time coal-rock segmentation network (UDNet-block Dual-resolution Networks). The network adopts a classical dual-resolution path network, the high-resolution path retains the structural information of the image, the low-resolution network extracts the semantic information of the image, and the pixels are classified according to the semantic information of the image. In order to reduce the parameter number and the operation amount of the network, a high-resolution path adopts 1/16 of lower resolution than 1/8 when a model structure is designed, a residual convolution module in the DDRNet is replaced by a more efficient UNet module, and meanwhile, the maximum pooling is used for down-sampling so as to achieve the purpose of reducing the parameters. And finally, aiming at the characteristic that coal rock segmentation is a two-classification task, the number of channels of the whole model is reduced. The experiment was performed on 3000 coal petrography image segmentation datasets that were labeled, using 7 for data: 3, dividing the training set and the verification set. The conclusion can be drawn from the experimental results: UDNet well achieves the balance of speed and precision, the calculation amount is reduced by about 15G, the parameters are reduced by about 2.8M, and the FPS is improved from 150 to 207 in comparison with DDRNet in a coal-rock segmentation task.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention aims to provide a real-time coal rock segmentation network based on deep learning.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the real-time coal rock segmentation network based on deep learning is characterized by specifically comprising the following steps of:

s1: constructing a CB module; the CB module utilizes a convolution and downsampling operation module to rapidly downsample the large image to a feature image with low resolution;

s2: constructing a UB module; the UB module is a convolution module consisting of small unets, and a high-resolution path is constructed by using a 1/16 characteristic diagram;

s3: constructing a deep polymerization pooling pyramid module; increasing the receptive field of feature extraction and more fully extracting context information;

s4: fusing dual resolution paths; the bilinear interpolation method and the solving method are used for realizing the fusion from low to high, the output of the high-resolution path and the low-resolution path is subjected to 2 times of upsampling and 8 times of upsampling, and the sampled feature maps are added;

s5: constructing a segmentation head module; two convolutional layers are used to better classify the feature map into coal and rock.

Preferably, the step S1 specifically includes the following steps:

s1-1: performing convolution operation; the convolutional neural network can automatically learn image characteristics under multiple resolutions, when the neural network is shallow, only a small region can be sensed by the constraint of a sensing field, so that local characteristics of the image can be learned, and when the neural network is deeper, the sensing field is unchanged but the image characteristics are reduced, so that the convolutional neural network has a larger sensing region and can learn more abstract characteristics of a target object;

s1-2: a down-sampling operation; redundant information in the image is filtered, and interesting features of the network are extracted so as to better obtain semantic information of the image.

Preferably, the step S2 specifically includes the following steps:

s2-1: extracting context information of different scales; the module utilizes the structure of Unet, integrates the characteristics of different receptive fields, and can extract more context information with different scales in the forward propagation process. The resolution of the input characteristic diagram and the output characteristic diagram of the module is the same, the number of channels is increased, and then Max pooling can be connected behind the UB module for down-sampling;

s2-2: performing pooling operation; compared with the operation of completing the downsampling by a convolution layer, the Max posing operation can ensure that the relative position of the features is unchanged while realizing the downsampling, does not need to train parameters in the process of completing the downsampling by using the operation, and can avoid the over-fitting problem while reducing the quantity of parameters. For a hop junction, all data in the shallow network is the result of shallow operations, the deep network is the result of deep operations, and the U-shaped structure contains the results of different depth operations.

Preferably, step S3 specifically includes the following steps: extracting context information from the low-resolution path; taking a feature map of 1/64 image resolution as input, generating feature maps of 1/128, 1/256 and 1/512 image resolutions by maximum pooling of exponential steps, generating input feature maps and pixel level information by global average pooling, firstly performing upsampling on the feature maps, then fusing context information of different scales in a layered residual error mode by using a plurality of 3 × 3 convolutions, and finally splicing and compressing all feature maps by using 1 × 1 convolution, wherein each scale yi can be written as:

(1)

wherein, C _1×1 Is a 1 × 1 convolution, C _3×3 Is a 3 × 3 convolution, U denotes an upsampling operation, P _j，k Representing a pooling layer of kernel size j and step size k, P _global Indicating global average pooling.

Preferably, the step S4 specifically includes the following steps: bilateral fusion includes fusion of high resolution branches to low resolution branches (high-to-low fusion) and fusion of low resolution branches to high resolution branches (low-to-high fusion). For high-to-low fusion, the high-resolution feature map is downsampled by a 3 × 3 convolution sequence in 2 steps before point-by-point summation. For low resolution to high resolution, the low resolution feature map is first compressed with a 1 × 1 convolution and then upsampled using bilinear interpolation. The ith high-resolution feature map XHi and the low-resolution feature map Xli may be written as:

(2)

where FH and FL correspond to the signature after the residual block at high resolution and low resolution respectively,

and

for transformers from low to high and from high to low,

and

r is a ReLU activation function for the i-1 th high-resolution feature map and the low-resolution feature map, different feature maps are subjected to down-sampling and up-sampling respectively and then are added, and the ReLU activation function is added after the feature maps.

Preferably, the step S5 specifically includes the following steps: the feature map extracted by the feature extraction network is firstly normalized, then is subjected to nonlinear conversion by using an activation function, and is then subjected to convolution processing, wherein the operations of normalization, activation and convolution are carried out twice in total, so that a better classification effect is achieved.

The invention has the beneficial effects that:

the UDNet model of the invention is composed of two paths with different resolutions. The CB module is a module comprising convolution and down-sampling operations, consists of a convolution layer and a normalization layer, and is used for rapidly down-sampling a large image to a feature image with low resolution, which is an effective method for reducing the calculation amount of a model. The UB module is a convolution module composed of small UNet, and because the UNet has symmetrical structure, the length and width of the input and output feature maps are completely the same except that the number of channels is different, the feature map is downsampled by adopting maxpool operation. UDNet is different from DDRNet, and uses 1/8 characteristic diagram to carry out forward propagation, but uses 1/16 characteristic diagram to construct high-resolution path, and UB module on the path reduces more than half of channel number compared with RB module, which is a part for reducing parameter and calculation amount. As for the construction of the low resolution path, a structure similar to resnet is composed except for using the UB module. And at the tail ends of the high-resolution path and the low-resolution path, performing 2-time upsampling and 8-time upsampling on the output of the two paths, adding the sampled feature maps, and then sending the feature maps to a segmentation head module for segmentation.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following detailed description is given in conjunction with the preferred embodiments, together with the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a UDNet network architecture according to the present invention;

FIG. 2 is a UB module structure of the present invention;

fig. 3 shows the UNet network structure of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

A deep learning-based real-time coal rock segmentation network according to an embodiment of the present invention is specifically described below with reference to fig. 1 to 3.

As shown in fig. 1, the invention provides a deep learning-based real-time coal rock segmentation network, which is characterized by specifically comprising the following steps:

s1: constructing a CB module; the CB module utilizes a convolution and downsampling operation module to rapidly downsample the large image to a feature image with low resolution; the method specifically comprises the following steps:

s1-1: performing convolution operation; the convolutional neural network can automatically learn the image characteristics under multiple resolutions, when the neural network is shallow, only a small region can be sensed by the constraint of a sensed field so as to learn the local characteristics of the image, and when the neural network is deep, the sensed field is unchanged but the image characteristics are reduced, so that the convolutional neural network has a large sensing region and can learn more abstract characteristics of a target object;

S2: constructing a UB module; the UB module is a convolution module consisting of small unets, and a high-resolution path is constructed by using a 1/16 characteristic diagram; the UB module is shown in fig. 2, and the UNet network is shown in fig. 3, and the method specifically includes the following steps:

s2-1: extracting context information of different scales; the module utilizes the Unet structure, integrates the characteristics of different receptive fields, and can extract more context information with different scales in the forward propagation process. The resolution of the input characteristic diagram and the output characteristic diagram of the module is the same, the number of channels is increased, and then Max boosting can be connected behind the UB module for down sampling.

S2-2: performing pooling operation; compared with the method for completing the downsampling operation by the convolutional layer, the Max boosting operation can ensure that the relative position of the features is unchanged while realizing the downsampling, does not need to train parameters in the process of completing the downsampling by using the Max boosting operation, and can avoid the over-fitting problem while reducing the number of parameters. For a hop junction, all data in the shallow network is the result of shallow operations, the deep network is the result of deep operations, and the U-shaped structure contains the results of different depth operations.

S3: constructing a deep polymerization pooling pyramid module; increasing the receptive field of feature extraction and more fully extracting context information; the method specifically comprises the following steps: and taking the feature map of 1/64 image resolution as an input, and generating the feature maps of 1/128, 1/256 and 1/512 image resolutions by adopting maximum pooling of exponential steps. Input feature maps and pixel level information are also generated using global average pooling. The feature map is first up-sampled and then the context information at different scales is fused in a hierarchical residual manner using multiple 3 x 3 convolutions. Finally, all feature maps are stitched and compressed using a 1 × 1 convolution. One input x, each scale yi can be written as:

wherein, C _1×1 Is a 1 × 1 convolution, C _3×3 Is a 3 × 3 convolution, U denotes an upsampling operation, P _j，k Denotes the pooling layer of kernel size j, step size k, P _global Indicating global average pooling.

S4: fusing dual resolution paths; the bilinear interpolation method and the solving method are used for realizing the fusion from low to high, the output of the high-resolution path and the low-resolution path is subjected to 2 times of upsampling and 8 times of upsampling, and the sampled feature maps are added; the method specifically comprises the following steps: bilateral fusion includes fusion of high resolution branches to low resolution branches (high-to-low fusion) and fusion of low resolution branches to high resolution branches (low-to-high fusion). For high-to-low fusion, the high resolution feature map is downsampled by a 3 × 3 convolution sequence in 2 steps before point-by-point summing. For low-resolution to high-resolution, the low-resolution feature map is first compressed with a 1 × 1 convolution and then upsampled with a bilinear interpolation method.

(2)

Wherein F _H And F _L Corresponding to the feature map after the residual block at high resolution and low resolution respectively,

and

for transformers from low to high and from high to low,

and

for the i-1 th high resolution feature map and low resolution feature map, R is the ReLU activation function, and will not beThe same feature maps are added after being respectively downsampled and upsampled, and the ReLU activation functions are added after the feature maps.

S5: constructing a segmentation head module; and two convolutional layers are used, so that the characteristic diagram is better classified into coal and rock. The method specifically comprises the following steps: the feature graph extracted by the feature extraction network is subjected to normalization processing, then nonlinear conversion is carried out by using an activation function, and then convolution processing is carried out. The operations of normalization, activation and convolution are carried out twice in total so as to achieve better classification effect.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A real-time coal rock segmentation network generation method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

s4: fusing dual resolution paths; realizing the fusion from low to high by using a bilinear interpolation method and a legality solving method, performing 2 times of upsampling and 8 times of upsampling on the output of a high-resolution path and a low-resolution path, and adding the sampled feature maps;

2. The deep learning-based real-time coal rock segmentation network generation method according to claim 1, characterized in that: in step S1, the method specifically includes the following steps:

s1-2: a down-sampling operation; redundant information in the image is filtered, and interesting characteristics of the network are extracted so as to obtain image semantic information better.

3. The deep learning-based real-time coal-rock segmentation network generation method according to claim 1, characterized in that: in step S2, the method specifically includes the following steps:

s2-1: extracting context information of different scales; the module utilizes the Unet structure, integrates the characteristics of different receptive fields, and can extract more context information with different scales in the forward propagation process; the resolution ratio of an input characteristic diagram and an output characteristic diagram of the module is the same, the number of channels is increased, and then Max posing can be connected behind the UB module for down-sampling;

s2-2: performing pooling operation; compared with the operation of finishing downsampling by a convolutional layer, the Max boosting operation can ensure that the relative position of the features is unchanged while downsampling is realized, and does not need training parameters when the downsampling is finished by using the operation, so that the problem of overfitting can be avoided while the number of parameters is reduced; for a hop junction, all data in the shallow network is the result of shallow operations, the deep network is the result of deep operations, and the U-shaped structure contains the results of different depth operations.

4. The deep learning-based real-time coal-rock segmentation network generation method according to claim 1, characterized in that: in step S3, the method specifically includes the following steps:

s3-1: low resolution path extraction context information:

taking a feature map of 1/64 image resolution as input, and adopting maximum pooling of exponential step sizes to generate feature maps of 1/128, 1/256 and 1/512 image resolutions;

s3-2: generating an input feature map and pixel level information using global average pooling:

firstly, upsampling a feature map, then fusing context information of different scales in a layered residual error mode by using a plurality of 3 × 3 convolutions, and finally, splicing and compressing all the feature maps by using 1 × 1 convolution, wherein one input x can be written as:

（1）

wherein, C _1×1 Is a 1 × 1 convolution, C _3×3 Is a 3 × 3 convolution, U denotes the upsampling operation, P _j，k Representing a pooling layer of kernel size j and step size k, P _global Indicating global average pooling.

5. The deep learning-based real-time coal-rock segmentation network generation method according to claim 1, characterized in that:

in step S4, the method specifically includes the following steps: bilateral fusion includes fusing a high resolution branch to a low resolution branch, i.e., high-to-low fusion, and fusing a low resolution branch to a high resolution branch, i.e., low-to-high fusion; for high-to-low fusion, the high-resolution feature map is downsampled by a 3 × 3 convolution sequence in 2 steps before point-by-point summation; for low resolution to high resolution, the low resolution feature map is first compressed by a 1 × 1 convolution and then upsampled by a bilinear interpolation, the ith high resolution feature map X _Hi And low resolution feature map X _Li Can be written as:

(2)

where FH and FL correspond to the signatures after the residual block at high and low resolution respectively,

and

for the low to high and high to low transformers,

and

6. The deep learning-based real-time coal-rock segmentation network generation method according to claim 1, characterized in that: in step S5, the method specifically includes the following steps: the feature map extracted by the feature extraction network is firstly subjected to normalization processing, then is subjected to nonlinear conversion by using an activation function, and is subjected to convolution processing.