CN114170422A - A Semantic Segmentation Method of Underground Image in Coal Mine - Google Patents

A Semantic Segmentation Method of Underground Image in Coal Mine Download PDF

Info

Publication number
CN114170422A
CN114170422A CN202111248280.4A CN202111248280A CN114170422A CN 114170422 A CN114170422 A CN 114170422A CN 202111248280 A CN202111248280 A CN 202111248280A CN 114170422 A CN114170422 A CN 114170422A
Authority
CN
China
Prior art keywords
feature map
stage
input
image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111248280.4A
Other languages
Chinese (zh)
Inventor
程健
肖洪飞
闫鹏鹏
李�昊
李和平
王广福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Coal Research Institute CCRI
Original Assignee
China Coal Research Institute CCRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Coal Research Institute CCRI filed Critical China Coal Research Institute CCRI
Priority to CN202111248280.4A priority Critical patent/CN114170422A/en
Publication of CN114170422A publication Critical patent/CN114170422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method for an underground coal mine image, and belongs to the field of computer vision. Firstly, preprocessing an acquired underground scene image to generate a data set; then constructing a feature extraction network with ResNet-101 as a framework, and inputting and using images with different scales to enhance the extracted features at each stage of the network; then constructing a fusion attention module to fuse the characteristics of each stage, and enhancing global information by using a global attention module to obtain a remote dependency relationship; and finally, inputting the obtained features into a classifier to generate a semantic graph, and finishing semantic segmentation of the image. The method greatly reduces the calculated amount and complexity, adopts an attention mechanism aiming at the complexity of the scene, highlights the semantic information of the target area and improves the image segmentation effect.

Description

Coal mine underground image semantic segmentation method
Technical Field
The invention relates to an image semantic segmentation method, in particular to a coal mine underground image semantic segmentation method which is suitable for being used underground, and belongs to the field of computer vision.
Background
The method for researching the structural characteristics and the recovery method of the underground tunnel visual scene has important significance for the square structural analysis of the underground complex scene. The underground characteristic analysis method is researched aiming at complex environments caused by direct strong light, dark light, dust, water mist and smoke in an underground scene of a coal mine. The traditional image analysis methods mainly comprise: in a complex scene under a well, the brightness can be changed at any time due to the sudden change of illumination conditions, and the position can be changed greatly due to small motion in a narrow scene by using a Lucas-Kanade algorithm, a matching method, an energy method, a phase method and the like. The traditional method is easy to generate wrong results under the complex underground scene. Therefore, the method for analyzing the underground complex scene needs to be researched.
For the problems of sudden illumination change, shadow, position offset and the like in the underground complex scene analysis, the semantic analysis method based on the deep learning image segmentation theory can be well solved, and the deep learning image segmentation model can approach to a nonlinear model with extremely high precision. However, research aiming at the structural scene mostly focuses on the indoor scene of the building, and related work performed under the coal mine underground tunnel scene is not available for a while. Therefore, the invention provides a semantic analysis method of a multilevel feature fusion image segmentation theory by combining the characteristic of large length-width ratio of a coal mine tunnel and ensuring the accuracy and speed of segmentation. In the image segmentation task of other scenes, a great deal of work is done by the predecessors.
The method comprises the steps of processing an image through frequency domain space combination analysis to obtain frequency characteristics of the image, then carrying out deconvolution processing on the processed image, extracting high-dimensional characteristics in the image as characteristic points, and carrying out segmentation training on the characteristic points in the image to be detected by adopting a convolutional neural network to obtain a detection result. The patent (the other two, the bright, horse, zheng and maple. image segmentation method, the device, the computer equipment and the storage medium [ P ]. Guangdong province: CN112598686B,2021-06-04.) utilizes the prior knowledge vector to encode the image to obtain a target characteristic map, then decodes the target characteristic map to obtain a first segmentation map, then reconstructs the first segmentation map according to the prior knowledge vector to obtain a plurality of marked segmentation images, and processes the target characteristic map to obtain a second segmentation map based on the target characteristic map, so that the second segmentation map is fused with a plurality of marked results, and the accuracy of image segmentation is improved. The method comprises the steps of counting an internal spectrum histogram of a remote sensing image as a first classification feature, carrying out supervised classification on the remote sensing image by using a curve matching algorithm, extracting spatial correlation between the remote sensing image and an adjacent image target as a second feature according to a primary result, and segmenting the image by using the curve matching algorithm again by combining the spectrum histogram and the spatial correlation. The patent (Hanjing, Chengxiaoyu, Li shoangyang, Zhang quan, Teng Jie, Wei chi and Li Yi, an infrared road scene segmentation method based on category prototype regression [ P ]. Jiangsu province: CN112381101B,2021-05-28.) uses category prototype regression on a data set to obtain category prototype characteristics, clusters network depth characteristics, enables global category characteristics to be more compact, amplifies differences among various categories, correspondingly constructs a relationship matrix and an attention module, enables overall characteristics to be more compact, and improves final image segmentation accuracy. A patent (heydong. image segmentation method and apparatus [ P ]. beijing, CN112101369B,2021-02-05.) uses a logical relationship between two target regions in an image and position information of respective vertices, wherein the two target regions at least partially overlap, determines position information of an intersection between respective boundaries of the two target regions according to the position information of the respective vertices of the two target regions, and segments the two different objects from the image to be processed according to the position information of the intersection and the logical relationship. In the literature (Chen C, Deng J, N Lv. I, L.I. LED structures Detection in Remote Sensing imaging based on Multi-scale Sensing Segmentation [ C ]//2020IEEE International reference on Smart Internet of things (SmartIoT). IEEE,2020.), a Multi-scale parallel structure is used to replace the traditional multilayer convolutional layer, based on which a new Semantic Segmentation network of coding and decoding structure is proposed, and a conditional random field is also used to constrain the Segmentation result, so that the network Segmentation precision is higher. The literature (Zhang F, Chen Y, Li Z, et al, ACFNet: environmental Class Feature Network for Semantic Segmentation [ C ]// International Conference on Computer Vision (ICCV), IEEE,2019.) proposes the concept of Class centers, which extracts the global context from a classification perspective. This class level context describes the overall representation of each class in the image. And then an attention class feature module is provided, different class centers are calculated and adaptively combined according to each pixel, and therefore an attention class feature segmentation network from coarse to fine is provided. Therefore, the accuracy of image segmentation is improved.
Disclosure of Invention
Aiming at the defects of the prior art, the method for semantically segmenting the underground coal mine image is simple in steps, good in segmentation effect and strong in robustness of scene feature description.
In order to overcome the defects of the prior art, the method for semantically segmenting the underground coal mine image comprises the following steps of:
step 1, acquiring an underground picture, performing labeling pretreatment on picture data, and dividing the image data subjected to labeling pretreatment into a training sample and a test sample data set.
Step 2, inputting a training sample data set into a feature extraction network to extract features of an input picture, wherein the feature extraction network comprises an improved ResNet-101 network, the improved ResNet-101 network structure deletes down-sampling operations of a fourth stage and a fifth stage in a conventional ResNet-101 network, and other contents of the fourth stage and the fifth stage are reserved;
step 3, simultaneously inputting the feature diagram output by the third stage and the additional input feature diagram through multi-scale input in the fourth stage of the improved ResNet-101 network, and outputting a low-level feature diagram; in the fifth stage, the feature diagram output in the fourth stage and the additional input feature diagram are simultaneously input through multi-scale input, and a high-level feature diagram is output; the additional input feature map is obtained after the input picture is processed by a residual error unit, and the additional input feature map is obtained by compressing the original input picture to enable the size of the original input picture to be the same as that of the output feature map of the previous stage;
step 4, constructing a fusion attention module after the fifth stage of the improved ResNet-101 network, fusing a low-level feature map and a high-level feature map by using the fusion attention module, and outputting a new feature map containing global context semantic information;
step 5, a global context enhancement module is constructed behind the fusion attention module, and global representation of the new feature graph is enhanced, so that the remote dependence relation among all pixels in the feature graph is obtained, and a final fusion feature graph is obtained;
step 6, inputting the final fusion feature map into a pre-trained classifier to generate a semantic map, detecting the performance of the generated semantic map by using a test sample data set, checking the performance of a feature extraction network, performing semantic segmentation on the coal mine underground photo image if the performance meets the standard, and re-training if the performance does not meet the standard;
and 7, performing semantic segmentation on the input coal mine underground image by using the trained feature extraction network.
The specific process of the step 1) is as follows:
and 11) acquiring a clear image by using an underground explosion-proof camera.
Step 12) performing semantic segmentation and annotation on the obtained image, namely classifying each pixel in the image; segmenting different regions in the image from each other, each region being defined by semantic information;
and step 13) randomly constructing a training sample set and a testing sample set by the marked images according to the ratio of 4: 1.
The specific process of the step 2) is as follows:
step 21) improving on the basis of an original ResNet-101 network, wherein the improved ResNet-101 network is divided into five stages in total and is used for extracting the characteristics of an input image so as to obtain output characteristic graphs of different levels;
step 22) each of the five stages of the improved ResNet-101 network comprises a plurality of channels, and the importance degree of information contained in each channel for semantic segmentation is different, so that a channel attention mechanism channel is added in each stage, and the importance degree of different channels is represented by allocating a weight value of 0-1 to each channel;
step 23) deleting the down-sampling operation of the fourth stage and the fifth stage for enriching the detail information, thereby preventing the fact that the receptive fields of the feature maps of the fourth and fifth stages of the conventional ResNet-101 are gradually increased along with the process of convolution and down-sampling and the detail information of the small target in the feature maps is gradually lost;
step 24) uses a dilation convolution to save the output signatures of the fourth and fifth stages such that the signatures of the third, fourth and fifth stages are the same size, 1/8 size for the input image.
The specific process of the step 3) is as follows:
step 31) because the receptive field is gradually increased along with the process of convolution and down sampling, the detail information of small targets is gradually lost, in order to obtain more detail information, multi-scale input is adopted, basic residual error units are respectively added at the input ends of the fourth and fifth stages of the improved ResNet-101 network, additional 1/8-sized input images are directly input into the basic residual error units to obtain additional input feature maps of the fourth and fifth stages, the additional input feature maps obtained in the step are subjected to feature extraction once and are low-level feature maps, in the improved ResNet-101 network, the input images of each stage except the first stage are output feature maps of the previous stage, the input of the fourth and fifth stages are high-level feature maps, and the contained detail information is less than that of the low-level feature maps;
step 32) fusing the additional input feature maps processed by the basic residual error unit in the fourth and fifth stages with the normal input feature maps in the fourth and fifth stages respectively, and fully utilizing the shallow feature map so as to enrich the information of the small target in the deep feature map;
step 33) the feature map of the size of the input image 1/8 is enhanced with multi-scale input by: assume that the ResNet-101 network contains L at phase iiLayer convolution, then the jth layer convolution can be defined as yj=Mj(xj) Wherein y isjIs the output tensor of the j-th layer, MjInput picture x of i-th stage containing convolution, ReLU activation function and regularization operationiHas a size of (N, H)j,Wj,Wj) N denotes the batch size, HiAnd WiHeight and width of input feature map, CiIs the number of channels; output characteristic diagram F of i stageiCan be expressed as:
Figure BDA0003321856200000051
step 34) IiThe additional input of the ith stage is represented, the resolution of the additional input is the same as the output tensor of the ith-1 stage, and the feature map after feature extraction is as follows:
Figure BDA0003321856200000052
step 35) the fusion input at stage i is represented as:
Figure BDA0003321856200000053
in the formula, FiThe output tensor representing the ith phase,
Figure BDA0003321856200000054
representing a channel splicing operation;
step 36) in which the fifth stage outputs a high-level characteristic diagram chihThe fourth stage low-level characteristic diagram χl
The specific process of the step 4) is as follows:
step 41) constructing a fusion attention modelBlock (2): the fusion attention Module contains two inputs, from step 36) a high level feature map of the fifth stage output
Figure BDA0003321856200000055
And low-level feature maps of the fourth stage output
Figure BDA0003321856200000056
Hh×WhIs a high-grade characteristic diagram chihNumber of spatial positions of (H)l×WlIs a low-level characteristic diagram chilThe number of spatial locations of (a); chAnd ClRespectively a high-level characteristic diagram xhAnd low level characteristic diagram chil1 × 1 convolution WθFor matching the low-level characteristic diagram chilFeature transformation of
Figure BDA0003321856200000057
Wherein
Figure BDA0003321856200000058
Is the number of channels of the converted features, R being the real number, εlIs a low-level characteristic diagram χlThe result after feature transformation is shown as formula (4):
εl=Wθl) (4)
step 42) converting the feature into a result epsilonlF (epsilon) is obtained after regularization of softmax functionl);
Step 43) processing f (epsilon) using bottleneck feature transformationl) Obtaining channel dependence, 1 × 1 convolution Wγ1And Wγ2Will be used forhThe feature conversion of the system obtains an attention output result
Figure BDA0003321856200000061
The results are as in formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl))))1(f(εl)))) (5)
output OFReflect xlPair chihCompensation of (2), theseThe compensation is from ×lIs selected from all the positions of the mobile phone,
step 44) finally outputting the fusion characteristic diagram YFComprises the following steps:
YF=cat(OF,χh) (6)。
the specific process of the step 5) is as follows:
step 51) constructing a global attention module after the fifth stage of the ResNet-101 network, acquiring a long-distance dependence relationship which is crucial to semantic segmentation, and setting an input feature X belonging to RC×H×WC, H, W are the number of channels, the height and the width of the space respectively, and the convolution W is 1 multiplied by 1θTo convert feature X:
Figure BDA0003321856200000062
θ=Wθ(X) (7)
wherein
Figure BDA0003321856200000063
Is the number of channels of the converted feature;
step 52) obtaining a similar matrix after regularization of the softmax function
Figure BDA0003321856200000065
Step 53) the output of the attention module is convolved by 1 × 1 by Wγ1And Wγ2And the intermediate normalization and the ReLU function, the result is as follows (8):
Figure BDA0003321856200000064
step 54) the final output profile YG∈RC×H×WThe expression of (a) is:
YG=cat(OG,X) (9)。
the specific process of the step 6) is as follows:
step 61) fusing the final output feature map Y obtained in the step 5) with the feature map YGInput classifierGenerating a channel semantic segmentation feature map;
step 62) comparing the generated feature map with the real label image labeled in the step 1) for supervising the training of the feature extraction network parameters, thereby obtaining a trained network model; inputting the test sample data set obtained in the step 1) as an input image into a trained network model, and checking the performance of the network model;
and 63) loading the trained model parameters, and performing scene semantic analysis on the next batch of pictures shot underground.
Has the advantages that:
the method adopts an attention mechanism aiming at the complex scene in the underground coal mine image, highlights the semantic information of the target area, improves the image segmentation effect, and gives consideration to the accuracy and the speed of image segmentation compared with other segmentation methods, so that the robustness is better.
The method enhances the extracted features by constructing a multi-scale input network; constructing a fusion attention module, and fusing the extracted characteristics of each stage; meanwhile, a global attention module is constructed to enhance global information and obtain a remote dependency relationship; and finally, the classifier is used for generating a semantic graph to finish semantic segmentation of the image, so that the segmentation accuracy is ensured, and the robustness of the algorithm is improved.
Description of the drawings:
FIG. 1 is a schematic diagram of a basic residual error network unit of the coal mine underground image semantic segmentation method.
FIG. 2 is a schematic view of an attention fusion module of the coal mine underground image semantic segmentation method.
FIG. 3 is a schematic diagram of a global attention module of the coal mine underground image semantic segmentation method.
FIG. 4 is a network framework diagram of the multi-feature fusion image segmentation method of the present invention.
The specific implementation mode is as follows:
the invention is further described below with reference to the accompanying drawings.
The invention relates to a semantic segmentation method for underground images of a coal mine, which comprises the steps of collecting underground scene pictures by using an underground explosion-proof camera, and then preprocessing the pictures to generate a data set; inputting a data set, selecting a feature extraction network to extract features of the picture, constructing a multi-scale input module, and performing enhanced extraction on a feature map; then constructing a fusion attention module, and fusing the extracted characteristics of each stage; constructing a global attention module to enhance global information and obtain a remote dependency relationship; finally, the classifier is used for generating a semantic graph, and semantic segmentation of the image is completed. Compared with other semantic segmentation methods, the method has the advantages that: the calculation amount and the complexity of the algorithm are greatly reduced, an attention mechanism is adopted for scene complexity, the semantic information of a target area is highlighted, the image segmentation effect is improved, and the robustness of the algorithm is greatly enhanced.
As shown in FIG. 4, the method for semantically segmenting the coal mine underground image comprises the following steps:
step 1) acquiring an underground image, performing annotation preprocessing on image data, and dividing the image data subjected to annotation preprocessing into a training sample and a test sample data set.
The specific process is as follows:
and 11) acquiring a clear image by using an underground explosion-proof camera.
Step 12) performing semantic segmentation and annotation on the obtained image, namely classifying each pixel in the image; different regions in the image are segmented from each other, each region being defined by semantic information.
And step 13) randomly constructing a training sample set and a testing sample set by the marked images according to the ratio of 4: 1.
Step 2) inputting the training sample data set obtained in the step 1) into a feature extraction network with ResNet-101 as a skeleton to extract input image features; the downsampling operations of the fourth and fifth stages of the five feature extraction stages in the ResNet-101 are deleted, and the rest of the fourth and fifth stages are retained so that the feature map is 1/8 of the input image.
The specific process is as follows:
step 21) using ResNet-101 as a skeleton network for feature extraction, wherein ResNet-101 is divided into five stages, each stage is composed of a basic constraint Unit (RCU) and is used for extracting features of an input image to obtain output feature maps of different levels.
Step 22) in five stages of the feature extraction network ResNet-101, each stage comprises a plurality of channels, and the importance degree of information contained in each channel for semantic segmentation is different, so that a channel attention mechanism channel is added in each stage, and a weight value of 0-1 is allocated to each channel to represent the importance degree of different channels.
Step 23) deleting the down-sampling operation of the fourth stage and the fifth stage, gradually increasing the receptive field of the characteristic diagram of the 4 th and 5 th stages of the existing ResNet-101 along with the process of convolution and down-sampling, gradually losing the detail information of the small target, and deleting the down-sampling operation of the fourth fifth stage in the step 23) for enriching the detail information.
Step 24) uses a dilation convolution to save the output signatures of the fourth and fifth stages such that the signatures of the third, fourth and fifth stages are the same size, 1/8 the size of the input image.
And 3) in the fourth and fifth stages of deleting the down-sampling operation, enhancing the size extracted in the step 2) into an input image feature map by adopting multi-scale input, and outputting the feature map.
The specific process is as follows:
step 31) the receptive field is gradually increased along with the process of Convolution and down-sampling, the detail information of the small target is gradually lost, in order to obtain more detail information, multi-scale input is adopted, an additional input image is input into a basic Residual error Unit (RCU) to obtain additional input feature maps of a fourth stage and a fifth stage, the structure of the basic Residual error Unit is shown in fig. 1, the additional input feature map obtained in the step is subjected to feature extraction once to obtain a low-level feature map, in a ResNet-101 network, the input images of each stage except the first stage are output feature maps of the previous stage, the inputs of the fourth stage and the fifth stage are high-level feature maps, and the contained detail information is less than that of the low-level feature maps.
And step 32) fusing the fourth and fifth stage additional input feature maps obtained in the step 31) with the ResNet-101 fourth and fifth stage input feature maps respectively, so as to fully utilize the shallow feature map to enrich the information of the small target in the deep feature map.
Step 33) the process of multi-scale input is: assume that the ResNet-101 network contains L at phase iiLayer convolution, then the jth layer convolution can be defined as yj=Mj(xj) Wherein y isjIs the output tensor of the j-th layer, MjInput picture x of i-th stage containing convolution, ReLU activation function and regularization operationiHas a size of (N, H)j,Wj,Wj) N denotes the batch size, HiAnd WiHeight and width of input feature map, CiIs the number of channels. Output characteristic diagram F of i stageiCan be expressed as:
Figure BDA0003321856200000091
step 34) IiRepresents the additional input of the ith stage at the same resolution as the output tensor of the ith-1 stage. The characteristic diagram after characteristic extraction is as follows:
Figure BDA0003321856200000092
step 35) the fusion input at stage i can be expressed as:
Figure BDA0003321856200000093
in the formula, FiThe output tensor representing the ith phase,
Figure BDA0003321856200000094
showing a channel splicing operation.
Step 36) outputting the high-level characteristic graph chi in the fifth stagehThe fourth stage low-level characteristic diagram χl
Step 4) constructing a fusion attention module, fusing the feature maps of the input image 1/8 obtained in the fourth and fifth stages in the step 3), and outputting a new feature map containing global context semantic information, which is specifically shown in fig. 2;
the specific process is as follows:
step 41) constructing a fusion attention module: the fusion attention Module contains two inputs, from step 36) a high level feature map of the fifth stage output
Figure BDA0003321856200000101
And low-level feature maps of the fourth stage output
Figure BDA0003321856200000102
Hh×WhIs a high-grade characteristic diagram chihNumber of spatial positions of (H)l×WlIs a low-level characteristic diagram chilThe number of spatial locations of (a); chAnd ClRespectively a high-level characteristic diagram xhAnd low level characteristic diagram chil1 × 1 convolution WθFor matching the low-level characteristic diagram chilFeature transformation of
Figure BDA0003321856200000103
Wherein
Figure BDA0003321856200000104
Is the number of channels of the converted features, R being the real number, εlIs a low-level characteristic diagram χlThe result after feature transformation is shown as formula (4):
εl=Wθl) (4)
step 42) converting the feature into a result epsilonlF (epsilon) is obtained after regularization of softmax functionl)。
Step 43) processing f (epsilon) using bottleneck feature transformationl) Obtaining channel dependence, 1 × 1 convolution Wγ1And Wγ2Will be used forhThe feature conversion of the system obtains an attention output result
Figure BDA0003321856200000105
The results are as in formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl)))) (5)
output OFReflect xlPair chihFrom χlIs selected from all the positions.
Step 44) finally outputting the fusion characteristic diagram YFComprises the following steps:
YF=cat(OF,χh) (6)
and 5) constructing a global attention module after the fifth stage of the ResNet-101 network, specifically, as shown in FIG. 3, enhancing the global representation of the new feature diagram obtained in the step 4), and obtaining the remote dependency relationship among the features of different levels to obtain a final fusion feature diagram.
The specific process is as follows:
step 51) constructing a global attention enhancement block, acquiring a long-distance dependence relationship which is crucial to semantic segmentation, and setting an input feature X belonging to RC×H×WC, H, W are the number of channels, the height and the width of the space respectively, and the convolution W is 1 multiplied by 1θTo convert feature X:
Figure BDA0003321856200000111
θ=Wθ(X) (7)
wherein
Figure BDA0003321856200000112
Is the number of channels of the converted feature.
Step 52) obtaining a similar matrix after regularization of the softmax function
Figure BDA0003321856200000113
Step 53) the output of the attention module is convolved by 1 × 1 by Wγ1And Wγ2And the intermediate normalization and the ReLU function, the result is as follows (8):
Figure BDA0003321856200000114
step 54) the final output profile YG∈RC×H×WCan be represented by the following formula:
YG=cat(OG,X) (9)
and 6) inputting the fusion output characteristic diagram obtained in the step 5) into a pre-trained classifier to generate a semantic diagram. And then inputting the test sample data set obtained in the step 1) into the trained network, and checking the performance of the network.
Step 61) fusing the final output feature map Y obtained in the step 5) with the feature map YGAnd inputting the data into a classifier to generate a channel semantic segmentation feature map.
Step 62) comparing the generated characteristic diagram with the real label image labeled in the step 1) for supervising the training of the network model parameters, thereby obtaining a trained network model; inputting the test sample data set obtained in the step 1) as an input image into a trained network model, and checking the performance of the network model.
And 63) loading the model parameters trained in the step 62), and carrying out scene semantic analysis on the next batch of photos from underground shooting.

Claims (7)

1.一种煤矿井下图像语义分割方法,其特征在于步骤如下:1. a coal mine image semantic segmentation method is characterized in that the steps are as follows: 步骤1、采集井下图片,对图片数据进行标注预处理,将标注预处理的图片数据分为训练样本与测试样本数据集。Step 1. Collect downhole pictures, label and preprocess the picture data, and divide the labelled and preprocessed picture data into training samples and test sample data sets. 步骤2、将训练样本数据集输入特征提取网络提取输入图片特征,特征提取网络包括改进的ResNet-101网络,改进的ResNet-101网络删除原本的第四和第五阶段的下采样操作,保留第四和第五阶段的其他内容;Step 2. Input the training sample data set into the feature extraction network to extract the input image features. The feature extraction network includes the improved ResNet-101 network. The improved ResNet-101 network deletes the original fourth and fifth stages of downsampling operations, and retains the first step. Other elements of Phases IV and V; 步骤3、在改进的ResNet-101网络的第四阶段通过多尺度输入同时输入第三阶段输出的特征图以及额外输入特征图,输出低级特征图;第五阶段通过多尺度输入同时输入第四阶段输出的特征图以及额外输入特征图,输出高级特征图;所述额外输入特征图为输入图片通过残差单元处理后获得,额外输入特征图通过压缩原始输入图片使其与前一阶段输出特征图大小相同获得;Step 3. In the fourth stage of the improved ResNet-101 network, the feature map output from the third stage and the additional input feature map are simultaneously input through multi-scale input, and the low-level feature map is output; the fifth stage simultaneously inputs the fourth stage through multi-scale input The output feature map and the additional input feature map, and output the advanced feature map; the additional input feature map is obtained after the input image is processed by the residual unit, and the additional input feature map compresses the original input image to make it and the previous stage output feature map get the same size; 步骤4、在改进的ResNet-101网络第五阶段后面构建融合注意力模块,利用融合注意力模块融合低级特征图和高级特征图,输出包含全局上下文语义信息的新的特征图;Step 4. Build a fusion attention module after the fifth stage of the improved ResNet-101 network, use the fusion attention module to fuse the low-level feature map and the high-level feature map, and output a new feature map containing global contextual semantic information; 步骤5、在融合注意力模块后面构建全局上下文增强模块,增强新的特征图的全局表示,从而获取特征图中各像素之间的远距离依赖关系,得到最终的融合特征图;Step 5. Build a global context enhancement module after the fusion attention module to enhance the global representation of the new feature map, so as to obtain the long-distance dependency between the pixels in the feature map, and obtain the final fusion feature map; 步骤6、将最终的融合特征图输入经过预训练的分类器中生成语义图,然后再利用测试样本数据集检测生成的语义图性能,检验特征提取网络的性能,性能达标即可用以对煤矿井下照片图像进行语义分割,若不达标则重新训练;Step 6. Input the final fusion feature map into the pre-trained classifier to generate a semantic map, and then use the test sample data set to detect the performance of the generated semantic map, and test the performance of the feature extraction network. Semantic segmentation of photo images, and retraining if not up to standard; 步骤7、使用训练好的特征提取网络对输入的煤矿井下图片进行煤矿井下图像语义分割。Step 7. Use the trained feature extraction network to perform semantic segmentation of the input coal mine underground image. 2.根据权利要求书1所述一种煤矿井下图像语义分割方法,其特征在于,所述步骤1)的具体过程为:2. a kind of coal mine underground image semantic segmentation method according to claim 1 is characterized in that, the concrete process of described step 1) is: 步骤11)采用井下防爆摄像机获取清晰图像。Step 11) Use an underground explosion-proof camera to obtain clear images. 步骤12)对所得到的图像人工进行语义分割标注,即将图像中的每个像素进行归类;图像中的不同区域彼此分割,各区域由语义信息所定义;Step 12) artificially perform semantic segmentation and labeling on the obtained image, that is, classify each pixel in the image; different regions in the image are segmented from each other, and each region is defined by semantic information; 步骤13)按照4:1的比例将标注后的图像随机构建训练样本集与测试样本集。Step 13) Randomly construct a training sample set and a test sample set from the labeled images according to the ratio of 4:1. 3.根据权利要求书1所述一种煤矿井下图像语义分割方法,其特征在于,所述的步骤2)的具体过程为:3. a kind of coal mine underground image semantic segmentation method according to claim 1 is characterized in that, the concrete process of described step 2) is: 步骤21)在原始ResNet-101网络的基础上改进,改进的ResNet-101网络共分为五个阶段,用来提取输入图像的特征,从而得到不同级别的输出特征图;Step 21) Improve on the basis of the original ResNet-101 network, the improved ResNet-101 network is divided into five stages, used to extract the features of the input image, thereby obtaining output feature maps of different levels; 步骤22)在改进的ResNet-101网络的五个阶段中每个阶段都包含多个通道,且每个通道所包含信息对于语义分割的重要程度也不一样,所以在每个阶段添加通道注意力机制通道,通过为每个通道分配0-1权值表示不同通道的重要程度;Step 22) Each of the five stages of the improved ResNet-101 network contains multiple channels, and the information contained in each channel is of different importance for semantic segmentation, so add channel attention at each stage. Mechanism channel, by assigning 0-1 weights to each channel to indicate the importance of different channels; 步骤23)为了丰富细节信息所以删除第四阶段和第五阶段的下采样操作,从而防止常规ResNet-101的第四和第五阶段特征图的感受野随着卷积、降采样的过程而逐渐增大而特征图中小目标的细节信息逐渐丢失;Step 23) In order to enrich the detailed information, the downsampling operations of the fourth and fifth stages are deleted, thereby preventing the receptive field of the fourth and fifth stage feature maps of the conventional ResNet-101 from gradually changing with the process of convolution and downsampling. When it increases, the detailed information of the small target in the feature map is gradually lost; 步骤24)使用膨胀卷积来保存第四和第五阶段的输出特征图,使得第三、第四和第五阶段的特征图大小相同,均为输入图像的1/8大小。Step 24) Use dilated convolution to save the output feature maps of the fourth and fifth stages, so that the feature maps of the third, fourth and fifth stages have the same size, which are all 1/8 the size of the input image. 4.根据权利要求书3所述一种煤矿井下图像语义分割方法,其特征在于,所述的步骤3)的具体过程为:4. a kind of coal mine underground image semantic segmentation method according to claim 3, is characterized in that, the concrete process of described step 3) is: 步骤31)由于感受野随着卷积、降采样的过程而逐渐增大,小目标的细节信息逐渐丢失,为了得到更多的细节信息,采用多尺度输入,在改进的ResNet-101网络第四和第五阶段的输入端分别增加分别添加基础残差单元,将额外的1/8大小的输入图像直接输入到基础残差单元得到第四和第五阶段额外输入特征图,此步骤所得到的额外输入特征图经过一次特征提取,为低层次特征图,在改进的ResNet-101网络中,除第一阶段外的每阶段的输入图像均为上一个阶段的输出特征图,第四和第五阶段的输入是高层次特征图,所包含的细节信息少于低层次特征图;Step 31) Since the receptive field gradually increases with the process of convolution and downsampling, the detailed information of the small target is gradually lost. In order to obtain more detailed information, multi-scale input is used. The basic residual unit is added to the input end of the fifth stage, and the additional 1/8-sized input image is directly input into the basic residual unit to obtain the additional input feature maps of the fourth and fifth stages. The additional input feature map is a low-level feature map after one feature extraction. In the improved ResNet-101 network, the input image of each stage except the first stage is the output feature map of the previous stage. The fourth and fifth stage The input of the stage is the high-level feature map, which contains less detailed information than the low-level feature map; 步骤32)将第四和第五阶段通过基础残差单元处理后的额外输入特征图分别与正常第四和第五阶段输入特征图融合,充分利用浅层特征图从而丰富小目标在深层特征图中的信息;Step 32) Integrate the additional input feature maps processed by the basic residual unit in the fourth and fifth stages with the normal input feature maps of the fourth and fifth stages respectively, and make full use of the shallow feature maps to enrich the small objects in the deep feature maps. information in; 步骤33)利用多尺度输入来增强输入图像1/8大小的特征图,其中多尺度输入的过程为:假设ResNet-101网络在第i阶段包含Li层卷积,那么第j层卷积就可以被定义为yj=Mj(xj),其中yj为第j层的输出张量,Mj包含卷积、ReLU激活函数和正则化操作,第i阶段的输入图片xi的尺寸为(N,Hj,Wj,Wj),N表示批次大小,Hi和Wi表示输入特征图的高度和宽度,Ci为通道数;第i阶段的输出特征图Fi可表示为:Step 33) Use multi-scale input to enhance the feature map of 1/8 size of the input image, wherein the process of multi-scale input is: Assuming that the ResNet-101 network contains Li layer convolution in the i -th stage, then the j-th layer convolution is It can be defined as y j = M j (x j ), where y j is the output tensor of the j-th layer, M j contains the convolution, ReLU activation function and regularization operation, and the size of the input image x i of the i-th stage is (N, H j , W j , W j ), N represents the batch size, Hi and Wi represent the height and width of the input feature map, and C i represents the number of channels; the output feature map F i of the i-th stage can be Expressed as:
Figure FDA0003321856190000031
Figure FDA0003321856190000031
步骤34)Ii表示第i阶段的额外输入,其分辨率与第i-1阶段的输出张量相同,其经过特征提取后的特征图为:Step 34) I i represents the additional input of the i-th stage, and its resolution is the same as the output tensor of the i-1-th stage, and its feature map after feature extraction is:
Figure FDA0003321856190000032
Figure FDA0003321856190000032
步骤35)第i阶段的融合输入表示为:Step 35) The fusion input of the i-th stage is expressed as:
Figure FDA0003321856190000033
Figure FDA0003321856190000033
式中,Fi表示第i阶段的输出张量,
Figure FDA0003321856190000034
表示通道拼接操作;
In the formula, F i represents the output tensor of the i-th stage,
Figure FDA0003321856190000034
Indicates the channel splicing operation;
步骤36)其中第五阶段输出高级特征图χh,第四阶段低级特征图χlStep 36) wherein the fifth stage outputs a high-level feature map χ h , and the fourth stage outputs a low-level feature map χ l .
5.根据权利要求书4所述一种煤矿井下图像语义分割方法,其特征在于,所述的步骤4)的具体过程为:5. according to the described a kind of coal mine underground image semantic segmentation method of claim 4, it is characterized in that, the concrete process of described step 4) is: 步骤41)构建融合注意力模块:融合注意力模块包含两个输入,来自步骤36)第五阶段输出的高级特征图
Figure FDA0003321856190000035
和第四阶段输出的低级特征图
Figure FDA0003321856190000036
Hh×Wh是高级特征图χh的空间位置的数量,Hl×Wl是低级特征图Xl的空间位置的数量;Ch和Cl分别是高级特征图Xh和低级特征图Xl的通道数,1×1卷积Wθ用于将低级特征图Xl的特征转换为
Figure FDA0003321856190000037
其中
Figure FDA0003321856190000038
是转换后的特征的通道数,R为实数,εl为低级特征图χl特征转换后的结果,如式(4)所示:
Step 41) Build the fused attention module: The fused attention module contains two inputs, the high-level feature maps from the fifth stage output in step 36)
Figure FDA0003321856190000035
and the low-level feature map output by the fourth stage
Figure FDA0003321856190000036
H h ×W h is the number of spatial locations of high-level feature map χ h , H l ×W l is the number of spatial locations of low-level feature map X l ; C h and C l are high-level feature map X h and low-level feature map, respectively The number of channels of X l , 1×1 convolution W θ is used to transform the features of the low-level feature map X l into
Figure FDA0003321856190000037
in
Figure FDA0003321856190000038
is the number of channels of the transformed feature, R is a real number, and ε l is the result of the transformation of the low-level feature map χ l , as shown in formula (4):
εl=Wθl) (4)ε l =W θl ) (4) 步骤42)将特征转换结果εl经过softmax函数正则化后得到f(εl);Step 42) obtain f(ε l ) after the feature conversion result ε l is normalized by the softmax function; 步骤43)采用瓶颈特征转换处理f(εl),获取通道依赖关系,1×1卷积Wγ1和Wγ2将用于χh的特征转换得到注意力输出结果
Figure FDA0003321856190000041
结果如式(5):
Step 43) Use bottleneck feature transformation to process f(ε l ) to obtain channel dependencies, 1×1 convolution W γ1 and W γ2 convert the features used for χ h to obtain the attention output result
Figure FDA0003321856190000041
The result is as formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl)))) (5)O F =W γ2 ReLU(LN(W γ1 (f(ε l )))) (5) 输出OF反映了χl对χh的补偿,这些补偿是从χl的所有位置中挑选出来的,The output OF reflects the compensations of χ l to χ h , which are picked from all positions of χ l , 步骤44)最后输出的融合特征图YF为:Step 44) The final output fusion feature map Y F is: YF=cat(OF,χh) (6)。Y F = cat( OF , χ h ) (6).
6.根据权利要求书1所述一种煤矿井下图像语义分割方法,其特征在于,所述的步骤5)的具体过程为:6. a kind of coal mine underground image semantic segmentation method according to claim 1 is characterized in that, the concrete process of described step 5) is: 步骤51)在ResNet-101网络第五阶段后面构建全局注意力模块,获取对语义分割至关重要的远距离依赖关系,设输入特征X∈RC×H×W,C,H,W分别为通道数、空间高度和宽度,1×1卷积Wθ用来转换特征X:
Figure FDA0003321856190000042
Step 51) Build a global attention module after the fifth stage of the ResNet-101 network to obtain long-range dependencies that are crucial to semantic segmentation. Let the input features X∈R C×H×W , C, H, and W be respectively Number of channels, spatial height and width, 1×1 convolution W θ is used to transform feature X:
Figure FDA0003321856190000042
θ=Wθ(X) (7)θ=W θ (X) (7) 其中
Figure FDA0003321856190000043
是转换后的特征的通道数;
in
Figure FDA0003321856190000043
is the number of channels of the transformed feature;
步骤52)经过softmax函数正则化之后得到相似矩阵
Figure FDA0003321856190000044
Step 52) After regularization by the softmax function, the similarity matrix is obtained
Figure FDA0003321856190000044
步骤53)注意力模块的输出由1×1卷积Wγ1和Wγ2以及中间的归一化和ReLU函数来计算,其结果如式(8):Step 53) The output of the attention module is calculated by 1×1 convolution W γ1 and W γ2 and the intermediate normalization and ReLU functions, and the result is as formula (8):
Figure FDA0003321856190000045
Figure FDA0003321856190000045
步骤54)最后的输出特征图YG∈RC×H×W的表达式为:Step 54) The expression of the final output feature map Y G ∈ R C×H×W is: YG=cat(OG,X) (9)。Y G = cat(O G , X) (9).
7.根据权利要求书1所述一种煤矿井下图像语义分割方法,其特征在于,所述的步骤6)的具体过程为:7. a kind of coal mine underground image semantic segmentation method according to claim 1 is characterized in that, the concrete process of described step 6) is: 步骤61)将步骤5)得到的最后的输出融合特征图YG输入分类器中,生成通道语义分割特征图;Step 61) input the final output fusion feature map Y G obtained in step 5) into the classifier to generate a channel semantic segmentation feature map; 步骤62)将生成的特征图与步骤1)中标注的真实标签图像对比,用来监督特征提取网络参数的训练,从而得到训练好的网络模型;将步骤1)得到的测试样本数据集作为输入图像输入训练好的网络模型,检验网络模型的性能;Step 62) Compare the generated feature map with the real label image marked in step 1) to supervise the training of feature extraction network parameters, thereby obtaining a trained network model; the test sample data set obtained in step 1) is used as input Input the image to the trained network model to test the performance of the network model; 步骤63)加载训练好的模型参数,对下一批来自井下拍摄的照片进行场景语义分析。Step 63) Load the trained model parameters, and perform scene semantic analysis on the next batch of photos taken from underground.
CN202111248280.4A 2021-10-26 2021-10-26 A Semantic Segmentation Method of Underground Image in Coal Mine Pending CN114170422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111248280.4A CN114170422A (en) 2021-10-26 2021-10-26 A Semantic Segmentation Method of Underground Image in Coal Mine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111248280.4A CN114170422A (en) 2021-10-26 2021-10-26 A Semantic Segmentation Method of Underground Image in Coal Mine

Publications (1)

Publication Number Publication Date
CN114170422A true CN114170422A (en) 2022-03-11

Family

ID=80477308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111248280.4A Pending CN114170422A (en) 2021-10-26 2021-10-26 A Semantic Segmentation Method of Underground Image in Coal Mine

Country Status (1)

Country Link
CN (1) CN114170422A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943724A (en) * 2022-06-09 2022-08-26 合肥合安智为科技有限公司 A dust detection and positioning method
CN115700781A (en) * 2022-11-08 2023-02-07 广东技术师范大学 Visual positioning method and system based on image inpainting in dynamic scene
CN116363134A (en) * 2023-06-01 2023-06-30 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751636A (en) * 2019-10-12 2020-02-04 天津工业大学 A method for detecting retinal arteriosclerosis in fundus images based on an improved codec network
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751636A (en) * 2019-10-12 2020-02-04 天津工业大学 A method for detecting retinal arteriosclerosis in fundus images based on an improved codec network
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任天赐;黄向生;丁伟利;安重阳;翟鹏博;: "全局双边网络的语义分割算法", 计算机科学, no. 1, 15 June 2020 (2020-06-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943724A (en) * 2022-06-09 2022-08-26 合肥合安智为科技有限公司 A dust detection and positioning method
CN115700781A (en) * 2022-11-08 2023-02-07 广东技术师范大学 Visual positioning method and system based on image inpainting in dynamic scene
CN116363134A (en) * 2023-06-01 2023-06-30 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment
CN116363134B (en) * 2023-06-01 2023-09-05 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment

Similar Documents

Publication Publication Date Title
CN114202672B (en) A small object detection method based on attention mechanism
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN111931684B (en) A weak and small target detection method based on discriminative features of video satellite data
Zhang et al. Infrared image segmentation for photovoltaic panels based on Res-UNet
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
Wang et al. Deep Learning for Object Detection: A Survey.
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN114170422A (en) A Semantic Segmentation Method of Underground Image in Coal Mine
CN118212532B (en) A method for extracting building change areas in dual-temporal remote sensing images based on twin hybrid attention mechanism and multi-scale feature fusion
CN111460936A (en) Remote sensing image building extraction method, system and electronic equipment based on U-Net network
CN110555420B (en) Fusion model network and method based on pedestrian regional feature extraction and re-identification
CN105243154B (en) Remote sensing image retrieval method based on notable point feature and sparse own coding and system
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
CN108764018A (en) A kind of multitask vehicle based on convolutional neural networks recognition methods and device again
CN117407557B (en) Zero sample instance segmentation method, system, readable storage medium and computer
Xi et al. Detection-driven exposure-correction network for nighttime drone-view object detection
CN118941526A (en) A road crack detection method, medium and product
Ye et al. An improved transformer-based concrete crack classification method
Yosry et al. Various frameworks for integrating image and video streams for spatiotemporal information learning employing 2D–3D residual networks for human action recognition
CN114926718A (en) Low-small slow target detection method with fusion of adjacent scale weight distribution characteristics
Kajabad et al. YOLOv4 for urban object detection: Case of electronic inventory in St. Petersburg
Balachandran et al. Moving scene-based video segmentation using fast convolutional neural network integration of VGG-16 net deep learning architecture
Balasundaram et al. Zero-DCE++ Inspired Object Detection in Less Illuminated Environment Using Improved YOLOv5.
Liu et al. Target detection of hyperspectral image based on faster R-CNN with data set adjustment and parameter turning
Zhang et al. Research on rainy day traffic sign recognition algorithm based on PMRNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination