CN114170422A - Coal mine underground image semantic segmentation method - Google Patents
Coal mine underground image semantic segmentation method Download PDFInfo
- Publication number
- CN114170422A CN114170422A CN202111248280.4A CN202111248280A CN114170422A CN 114170422 A CN114170422 A CN 114170422A CN 202111248280 A CN202111248280 A CN 202111248280A CN 114170422 A CN114170422 A CN 114170422A
- Authority
- CN
- China
- Prior art keywords
- feature
- stage
- input
- image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a semantic segmentation method for an underground coal mine image, and belongs to the field of computer vision. Firstly, preprocessing an acquired underground scene image to generate a data set; then constructing a feature extraction network with ResNet-101 as a framework, and inputting and using images with different scales to enhance the extracted features at each stage of the network; then constructing a fusion attention module to fuse the characteristics of each stage, and enhancing global information by using a global attention module to obtain a remote dependency relationship; and finally, inputting the obtained features into a classifier to generate a semantic graph, and finishing semantic segmentation of the image. The method greatly reduces the calculated amount and complexity, adopts an attention mechanism aiming at the complexity of the scene, highlights the semantic information of the target area and improves the image segmentation effect.
Description
Technical Field
The invention relates to an image semantic segmentation method, in particular to a coal mine underground image semantic segmentation method which is suitable for being used underground, and belongs to the field of computer vision.
Background
The method for researching the structural characteristics and the recovery method of the underground tunnel visual scene has important significance for the square structural analysis of the underground complex scene. The underground characteristic analysis method is researched aiming at complex environments caused by direct strong light, dark light, dust, water mist and smoke in an underground scene of a coal mine. The traditional image analysis methods mainly comprise: in a complex scene under a well, the brightness can be changed at any time due to the sudden change of illumination conditions, and the position can be changed greatly due to small motion in a narrow scene by using a Lucas-Kanade algorithm, a matching method, an energy method, a phase method and the like. The traditional method is easy to generate wrong results under the complex underground scene. Therefore, the method for analyzing the underground complex scene needs to be researched.
For the problems of sudden illumination change, shadow, position offset and the like in the underground complex scene analysis, the semantic analysis method based on the deep learning image segmentation theory can be well solved, and the deep learning image segmentation model can approach to a nonlinear model with extremely high precision. However, research aiming at the structural scene mostly focuses on the indoor scene of the building, and related work performed under the coal mine underground tunnel scene is not available for a while. Therefore, the invention provides a semantic analysis method of a multilevel feature fusion image segmentation theory by combining the characteristic of large length-width ratio of a coal mine tunnel and ensuring the accuracy and speed of segmentation. In the image segmentation task of other scenes, a great deal of work is done by the predecessors.
The method comprises the steps of processing an image through frequency domain space combination analysis to obtain frequency characteristics of the image, then carrying out deconvolution processing on the processed image, extracting high-dimensional characteristics in the image as characteristic points, and carrying out segmentation training on the characteristic points in the image to be detected by adopting a convolutional neural network to obtain a detection result. The patent (the other two, the bright, horse, zheng and maple. image segmentation method, the device, the computer equipment and the storage medium [ P ]. Guangdong province: CN112598686B,2021-06-04.) utilizes the prior knowledge vector to encode the image to obtain a target characteristic map, then decodes the target characteristic map to obtain a first segmentation map, then reconstructs the first segmentation map according to the prior knowledge vector to obtain a plurality of marked segmentation images, and processes the target characteristic map to obtain a second segmentation map based on the target characteristic map, so that the second segmentation map is fused with a plurality of marked results, and the accuracy of image segmentation is improved. The method comprises the steps of counting an internal spectrum histogram of a remote sensing image as a first classification feature, carrying out supervised classification on the remote sensing image by using a curve matching algorithm, extracting spatial correlation between the remote sensing image and an adjacent image target as a second feature according to a primary result, and segmenting the image by using the curve matching algorithm again by combining the spectrum histogram and the spatial correlation. The patent (Hanjing, Chengxiaoyu, Li shoangyang, Zhang quan, Teng Jie, Wei chi and Li Yi, an infrared road scene segmentation method based on category prototype regression [ P ]. Jiangsu province: CN112381101B,2021-05-28.) uses category prototype regression on a data set to obtain category prototype characteristics, clusters network depth characteristics, enables global category characteristics to be more compact, amplifies differences among various categories, correspondingly constructs a relationship matrix and an attention module, enables overall characteristics to be more compact, and improves final image segmentation accuracy. A patent (heydong. image segmentation method and apparatus [ P ]. beijing, CN112101369B,2021-02-05.) uses a logical relationship between two target regions in an image and position information of respective vertices, wherein the two target regions at least partially overlap, determines position information of an intersection between respective boundaries of the two target regions according to the position information of the respective vertices of the two target regions, and segments the two different objects from the image to be processed according to the position information of the intersection and the logical relationship. In the literature (Chen C, Deng J, N Lv. I, L.I. LED structures Detection in Remote Sensing imaging based on Multi-scale Sensing Segmentation [ C ]//2020IEEE International reference on Smart Internet of things (SmartIoT). IEEE,2020.), a Multi-scale parallel structure is used to replace the traditional multilayer convolutional layer, based on which a new Semantic Segmentation network of coding and decoding structure is proposed, and a conditional random field is also used to constrain the Segmentation result, so that the network Segmentation precision is higher. The literature (Zhang F, Chen Y, Li Z, et al, ACFNet: environmental Class Feature Network for Semantic Segmentation [ C ]// International Conference on Computer Vision (ICCV), IEEE,2019.) proposes the concept of Class centers, which extracts the global context from a classification perspective. This class level context describes the overall representation of each class in the image. And then an attention class feature module is provided, different class centers are calculated and adaptively combined according to each pixel, and therefore an attention class feature segmentation network from coarse to fine is provided. Therefore, the accuracy of image segmentation is improved.
Disclosure of Invention
Aiming at the defects of the prior art, the method for semantically segmenting the underground coal mine image is simple in steps, good in segmentation effect and strong in robustness of scene feature description.
In order to overcome the defects of the prior art, the method for semantically segmenting the underground coal mine image comprises the following steps of:
step 1, acquiring an underground picture, performing labeling pretreatment on picture data, and dividing the image data subjected to labeling pretreatment into a training sample and a test sample data set.
Step 2, inputting a training sample data set into a feature extraction network to extract features of an input picture, wherein the feature extraction network comprises an improved ResNet-101 network, the improved ResNet-101 network structure deletes down-sampling operations of a fourth stage and a fifth stage in a conventional ResNet-101 network, and other contents of the fourth stage and the fifth stage are reserved;
step 3, simultaneously inputting the feature diagram output by the third stage and the additional input feature diagram through multi-scale input in the fourth stage of the improved ResNet-101 network, and outputting a low-level feature diagram; in the fifth stage, the feature diagram output in the fourth stage and the additional input feature diagram are simultaneously input through multi-scale input, and a high-level feature diagram is output; the additional input feature map is obtained after the input picture is processed by a residual error unit, and the additional input feature map is obtained by compressing the original input picture to enable the size of the original input picture to be the same as that of the output feature map of the previous stage;
step 4, constructing a fusion attention module after the fifth stage of the improved ResNet-101 network, fusing a low-level feature map and a high-level feature map by using the fusion attention module, and outputting a new feature map containing global context semantic information;
step 5, a global context enhancement module is constructed behind the fusion attention module, and global representation of the new feature graph is enhanced, so that the remote dependence relation among all pixels in the feature graph is obtained, and a final fusion feature graph is obtained;
step 6, inputting the final fusion feature map into a pre-trained classifier to generate a semantic map, detecting the performance of the generated semantic map by using a test sample data set, checking the performance of a feature extraction network, performing semantic segmentation on the coal mine underground photo image if the performance meets the standard, and re-training if the performance does not meet the standard;
and 7, performing semantic segmentation on the input coal mine underground image by using the trained feature extraction network.
The specific process of the step 1) is as follows:
and 11) acquiring a clear image by using an underground explosion-proof camera.
Step 12) performing semantic segmentation and annotation on the obtained image, namely classifying each pixel in the image; segmenting different regions in the image from each other, each region being defined by semantic information;
and step 13) randomly constructing a training sample set and a testing sample set by the marked images according to the ratio of 4: 1.
The specific process of the step 2) is as follows:
step 21) improving on the basis of an original ResNet-101 network, wherein the improved ResNet-101 network is divided into five stages in total and is used for extracting the characteristics of an input image so as to obtain output characteristic graphs of different levels;
step 22) each of the five stages of the improved ResNet-101 network comprises a plurality of channels, and the importance degree of information contained in each channel for semantic segmentation is different, so that a channel attention mechanism channel is added in each stage, and the importance degree of different channels is represented by allocating a weight value of 0-1 to each channel;
step 23) deleting the down-sampling operation of the fourth stage and the fifth stage for enriching the detail information, thereby preventing the fact that the receptive fields of the feature maps of the fourth and fifth stages of the conventional ResNet-101 are gradually increased along with the process of convolution and down-sampling and the detail information of the small target in the feature maps is gradually lost;
step 24) uses a dilation convolution to save the output signatures of the fourth and fifth stages such that the signatures of the third, fourth and fifth stages are the same size, 1/8 size for the input image.
The specific process of the step 3) is as follows:
step 31) because the receptive field is gradually increased along with the process of convolution and down sampling, the detail information of small targets is gradually lost, in order to obtain more detail information, multi-scale input is adopted, basic residual error units are respectively added at the input ends of the fourth and fifth stages of the improved ResNet-101 network, additional 1/8-sized input images are directly input into the basic residual error units to obtain additional input feature maps of the fourth and fifth stages, the additional input feature maps obtained in the step are subjected to feature extraction once and are low-level feature maps, in the improved ResNet-101 network, the input images of each stage except the first stage are output feature maps of the previous stage, the input of the fourth and fifth stages are high-level feature maps, and the contained detail information is less than that of the low-level feature maps;
step 32) fusing the additional input feature maps processed by the basic residual error unit in the fourth and fifth stages with the normal input feature maps in the fourth and fifth stages respectively, and fully utilizing the shallow feature map so as to enrich the information of the small target in the deep feature map;
step 33) the feature map of the size of the input image 1/8 is enhanced with multi-scale input by: assume that the ResNet-101 network contains L at phase iiLayer convolution, then the jth layer convolution can be defined as yj=Mj(xj) Wherein y isjIs the output tensor of the j-th layer, MjInput picture x of i-th stage containing convolution, ReLU activation function and regularization operationiHas a size of (N, H)j,Wj,Wj) N denotes the batch size, HiAnd WiHeight and width of input feature map, CiIs the number of channels; output characteristic diagram F of i stageiCan be expressed as:
step 34) IiThe additional input of the ith stage is represented, the resolution of the additional input is the same as the output tensor of the ith-1 stage, and the feature map after feature extraction is as follows:
step 35) the fusion input at stage i is represented as:
in the formula, FiThe output tensor representing the ith phase,representing a channel splicing operation;
step 36) in which the fifth stage outputs a high-level characteristic diagram chihThe fourth stage low-level characteristic diagram χl。
The specific process of the step 4) is as follows:
step 41) constructing a fusion attention modelBlock (2): the fusion attention Module contains two inputs, from step 36) a high level feature map of the fifth stage outputAnd low-level feature maps of the fourth stage outputHh×WhIs a high-grade characteristic diagram chihNumber of spatial positions of (H)l×WlIs a low-level characteristic diagram chilThe number of spatial locations of (a); chAnd ClRespectively a high-level characteristic diagram xhAnd low level characteristic diagram chil1 × 1 convolution WθFor matching the low-level characteristic diagram chilFeature transformation ofWhereinIs the number of channels of the converted features, R being the real number, εlIs a low-level characteristic diagram χlThe result after feature transformation is shown as formula (4):
εl=Wθ(χl) (4)
step 42) converting the feature into a result epsilonlF (epsilon) is obtained after regularization of softmax functionl);
Step 43) processing f (epsilon) using bottleneck feature transformationl) Obtaining channel dependence, 1 × 1 convolution Wγ1And Wγ2Will be used forhThe feature conversion of the system obtains an attention output resultThe results are as in formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl))))1(f(εl)))) (5)
output OFReflect xlPair chihCompensation of (2), theseThe compensation is from ×lIs selected from all the positions of the mobile phone,
step 44) finally outputting the fusion characteristic diagram YFComprises the following steps:
YF=cat(OF,χh) (6)。
the specific process of the step 5) is as follows:
step 51) constructing a global attention module after the fifth stage of the ResNet-101 network, acquiring a long-distance dependence relationship which is crucial to semantic segmentation, and setting an input feature X belonging to RC×H×WC, H, W are the number of channels, the height and the width of the space respectively, and the convolution W is 1 multiplied by 1θTo convert feature X:
θ=Wθ(X) (7)
Step 53) the output of the attention module is convolved by 1 × 1 by Wγ1And Wγ2And the intermediate normalization and the ReLU function, the result is as follows (8):
step 54) the final output profile YG∈RC×H×WThe expression of (a) is:
YG=cat(OG,X) (9)。
the specific process of the step 6) is as follows:
step 61) fusing the final output feature map Y obtained in the step 5) with the feature map YGInput classifierGenerating a channel semantic segmentation feature map;
step 62) comparing the generated feature map with the real label image labeled in the step 1) for supervising the training of the feature extraction network parameters, thereby obtaining a trained network model; inputting the test sample data set obtained in the step 1) as an input image into a trained network model, and checking the performance of the network model;
and 63) loading the trained model parameters, and performing scene semantic analysis on the next batch of pictures shot underground.
Has the advantages that:
the method adopts an attention mechanism aiming at the complex scene in the underground coal mine image, highlights the semantic information of the target area, improves the image segmentation effect, and gives consideration to the accuracy and the speed of image segmentation compared with other segmentation methods, so that the robustness is better.
The method enhances the extracted features by constructing a multi-scale input network; constructing a fusion attention module, and fusing the extracted characteristics of each stage; meanwhile, a global attention module is constructed to enhance global information and obtain a remote dependency relationship; and finally, the classifier is used for generating a semantic graph to finish semantic segmentation of the image, so that the segmentation accuracy is ensured, and the robustness of the algorithm is improved.
Description of the drawings:
FIG. 1 is a schematic diagram of a basic residual error network unit of the coal mine underground image semantic segmentation method.
FIG. 2 is a schematic view of an attention fusion module of the coal mine underground image semantic segmentation method.
FIG. 3 is a schematic diagram of a global attention module of the coal mine underground image semantic segmentation method.
FIG. 4 is a network framework diagram of the multi-feature fusion image segmentation method of the present invention.
The specific implementation mode is as follows:
the invention is further described below with reference to the accompanying drawings.
The invention relates to a semantic segmentation method for underground images of a coal mine, which comprises the steps of collecting underground scene pictures by using an underground explosion-proof camera, and then preprocessing the pictures to generate a data set; inputting a data set, selecting a feature extraction network to extract features of the picture, constructing a multi-scale input module, and performing enhanced extraction on a feature map; then constructing a fusion attention module, and fusing the extracted characteristics of each stage; constructing a global attention module to enhance global information and obtain a remote dependency relationship; finally, the classifier is used for generating a semantic graph, and semantic segmentation of the image is completed. Compared with other semantic segmentation methods, the method has the advantages that: the calculation amount and the complexity of the algorithm are greatly reduced, an attention mechanism is adopted for scene complexity, the semantic information of a target area is highlighted, the image segmentation effect is improved, and the robustness of the algorithm is greatly enhanced.
As shown in FIG. 4, the method for semantically segmenting the coal mine underground image comprises the following steps:
step 1) acquiring an underground image, performing annotation preprocessing on image data, and dividing the image data subjected to annotation preprocessing into a training sample and a test sample data set.
The specific process is as follows:
and 11) acquiring a clear image by using an underground explosion-proof camera.
Step 12) performing semantic segmentation and annotation on the obtained image, namely classifying each pixel in the image; different regions in the image are segmented from each other, each region being defined by semantic information.
And step 13) randomly constructing a training sample set and a testing sample set by the marked images according to the ratio of 4: 1.
Step 2) inputting the training sample data set obtained in the step 1) into a feature extraction network with ResNet-101 as a skeleton to extract input image features; the downsampling operations of the fourth and fifth stages of the five feature extraction stages in the ResNet-101 are deleted, and the rest of the fourth and fifth stages are retained so that the feature map is 1/8 of the input image.
The specific process is as follows:
step 21) using ResNet-101 as a skeleton network for feature extraction, wherein ResNet-101 is divided into five stages, each stage is composed of a basic constraint Unit (RCU) and is used for extracting features of an input image to obtain output feature maps of different levels.
Step 22) in five stages of the feature extraction network ResNet-101, each stage comprises a plurality of channels, and the importance degree of information contained in each channel for semantic segmentation is different, so that a channel attention mechanism channel is added in each stage, and a weight value of 0-1 is allocated to each channel to represent the importance degree of different channels.
Step 23) deleting the down-sampling operation of the fourth stage and the fifth stage, gradually increasing the receptive field of the characteristic diagram of the 4 th and 5 th stages of the existing ResNet-101 along with the process of convolution and down-sampling, gradually losing the detail information of the small target, and deleting the down-sampling operation of the fourth fifth stage in the step 23) for enriching the detail information.
Step 24) uses a dilation convolution to save the output signatures of the fourth and fifth stages such that the signatures of the third, fourth and fifth stages are the same size, 1/8 the size of the input image.
And 3) in the fourth and fifth stages of deleting the down-sampling operation, enhancing the size extracted in the step 2) into an input image feature map by adopting multi-scale input, and outputting the feature map.
The specific process is as follows:
step 31) the receptive field is gradually increased along with the process of Convolution and down-sampling, the detail information of the small target is gradually lost, in order to obtain more detail information, multi-scale input is adopted, an additional input image is input into a basic Residual error Unit (RCU) to obtain additional input feature maps of a fourth stage and a fifth stage, the structure of the basic Residual error Unit is shown in fig. 1, the additional input feature map obtained in the step is subjected to feature extraction once to obtain a low-level feature map, in a ResNet-101 network, the input images of each stage except the first stage are output feature maps of the previous stage, the inputs of the fourth stage and the fifth stage are high-level feature maps, and the contained detail information is less than that of the low-level feature maps.
And step 32) fusing the fourth and fifth stage additional input feature maps obtained in the step 31) with the ResNet-101 fourth and fifth stage input feature maps respectively, so as to fully utilize the shallow feature map to enrich the information of the small target in the deep feature map.
Step 33) the process of multi-scale input is: assume that the ResNet-101 network contains L at phase iiLayer convolution, then the jth layer convolution can be defined as yj=Mj(xj) Wherein y isjIs the output tensor of the j-th layer, MjInput picture x of i-th stage containing convolution, ReLU activation function and regularization operationiHas a size of (N, H)j,Wj,Wj) N denotes the batch size, HiAnd WiHeight and width of input feature map, CiIs the number of channels. Output characteristic diagram F of i stageiCan be expressed as:
step 34) IiRepresents the additional input of the ith stage at the same resolution as the output tensor of the ith-1 stage. The characteristic diagram after characteristic extraction is as follows:
step 35) the fusion input at stage i can be expressed as:
in the formula, FiThe output tensor representing the ith phase,showing a channel splicing operation.
Step 36) outputting the high-level characteristic graph chi in the fifth stagehThe fourth stage low-level characteristic diagram χl。
Step 4) constructing a fusion attention module, fusing the feature maps of the input image 1/8 obtained in the fourth and fifth stages in the step 3), and outputting a new feature map containing global context semantic information, which is specifically shown in fig. 2;
the specific process is as follows:
step 41) constructing a fusion attention module: the fusion attention Module contains two inputs, from step 36) a high level feature map of the fifth stage outputAnd low-level feature maps of the fourth stage outputHh×WhIs a high-grade characteristic diagram chihNumber of spatial positions of (H)l×WlIs a low-level characteristic diagram chilThe number of spatial locations of (a); chAnd ClRespectively a high-level characteristic diagram xhAnd low level characteristic diagram chil1 × 1 convolution WθFor matching the low-level characteristic diagram chilFeature transformation ofWhereinIs the number of channels of the converted features, R being the real number, εlIs a low-level characteristic diagram χlThe result after feature transformation is shown as formula (4):
εl=Wθ(χl) (4)
step 42) converting the feature into a result epsilonlF (epsilon) is obtained after regularization of softmax functionl)。
Step 43) processing f (epsilon) using bottleneck feature transformationl) Obtaining channel dependence, 1 × 1 convolution Wγ1And Wγ2Will be used forhThe feature conversion of the system obtains an attention output resultThe results are as in formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl)))) (5)
output OFReflect xlPair chihFrom χlIs selected from all the positions.
Step 44) finally outputting the fusion characteristic diagram YFComprises the following steps:
YF=cat(OF,χh) (6)
and 5) constructing a global attention module after the fifth stage of the ResNet-101 network, specifically, as shown in FIG. 3, enhancing the global representation of the new feature diagram obtained in the step 4), and obtaining the remote dependency relationship among the features of different levels to obtain a final fusion feature diagram.
The specific process is as follows:
step 51) constructing a global attention enhancement block, acquiring a long-distance dependence relationship which is crucial to semantic segmentation, and setting an input feature X belonging to RC×H×WC, H, W are the number of channels, the height and the width of the space respectively, and the convolution W is 1 multiplied by 1θTo convert feature X:
θ=Wθ(X) (7)
Step 53) the output of the attention module is convolved by 1 × 1 by Wγ1And Wγ2And the intermediate normalization and the ReLU function, the result is as follows (8):
step 54) the final output profile YG∈RC×H×WCan be represented by the following formula:
YG=cat(OG,X) (9)
and 6) inputting the fusion output characteristic diagram obtained in the step 5) into a pre-trained classifier to generate a semantic diagram. And then inputting the test sample data set obtained in the step 1) into the trained network, and checking the performance of the network.
Step 61) fusing the final output feature map Y obtained in the step 5) with the feature map YGAnd inputting the data into a classifier to generate a channel semantic segmentation feature map.
Step 62) comparing the generated characteristic diagram with the real label image labeled in the step 1) for supervising the training of the network model parameters, thereby obtaining a trained network model; inputting the test sample data set obtained in the step 1) as an input image into a trained network model, and checking the performance of the network model.
And 63) loading the model parameters trained in the step 62), and carrying out scene semantic analysis on the next batch of photos from underground shooting.
Claims (7)
1. A semantic segmentation method for underground coal mine images is characterized by comprising the following steps:
step 1, acquiring an underground picture, performing labeling pretreatment on picture data, and dividing the image data subjected to labeling pretreatment into a training sample and a test sample data set.
Step 2, inputting the training sample data set into a feature extraction network to extract features of the input picture, wherein the feature extraction network comprises an improved ResNet-101 network, the improved ResNet-101 network deletes the original down-sampling operation of the fourth and fifth stages, and other contents of the fourth and fifth stages are reserved;
step 3, simultaneously inputting the feature diagram output by the third stage and the additional input feature diagram through multi-scale input in the fourth stage of the improved ResNet-101 network, and outputting a low-level feature diagram; in the fifth stage, the feature diagram output in the fourth stage and the additional input feature diagram are simultaneously input through multi-scale input, and a high-level feature diagram is output; the additional input feature map is obtained after the input picture is processed by a residual error unit, and the additional input feature map is obtained by compressing the original input picture to enable the size of the original input picture to be the same as that of the output feature map of the previous stage;
step 4, constructing a fusion attention module after the fifth stage of the improved ResNet-101 network, fusing a low-level feature map and a high-level feature map by using the fusion attention module, and outputting a new feature map containing global context semantic information;
step 5, a global context enhancement module is constructed behind the fusion attention module, and global representation of the new feature graph is enhanced, so that the remote dependence relation among all pixels in the feature graph is obtained, and a final fusion feature graph is obtained;
step 6, inputting the final fusion feature map into a pre-trained classifier to generate a semantic map, detecting the performance of the generated semantic map by using a test sample data set, checking the performance of a feature extraction network, performing semantic segmentation on the coal mine underground photo image if the performance meets the standard, and re-training if the performance does not meet the standard;
and 7, performing semantic segmentation on the input coal mine underground image by using the trained feature extraction network.
2. The method for semantically segmenting the coal mine underground image according to claim 1, wherein the specific process of the step 1) is as follows:
and 11) acquiring a clear image by using an underground explosion-proof camera.
Step 12) performing semantic segmentation and annotation on the obtained image, namely classifying each pixel in the image; segmenting different regions in the image from each other, each region being defined by semantic information;
and step 13) randomly constructing a training sample set and a testing sample set by the marked images according to the ratio of 4: 1.
3. The method for semantically segmenting the coal mine underground image according to claim 1, wherein the specific process of the step 2) is as follows:
step 21) improving on the basis of an original ResNet-101 network, wherein the improved ResNet-101 network is divided into five stages in total and is used for extracting the characteristics of an input image so as to obtain output characteristic graphs of different levels;
step 22) each of the five stages of the improved ResNet-101 network comprises a plurality of channels, and the importance degree of information contained in each channel for semantic segmentation is different, so that a channel attention mechanism channel is added in each stage, and the importance degree of different channels is represented by allocating a weight value of 0-1 to each channel;
step 23) deleting the down-sampling operation of the fourth stage and the fifth stage for enriching the detail information, thereby preventing the fact that the receptive fields of the feature maps of the fourth and fifth stages of the conventional ResNet-101 are gradually increased along with the process of convolution and down-sampling and the detail information of the small target in the feature maps is gradually lost;
step 24) uses a dilation convolution to save the output signatures of the fourth and fifth stages such that the signatures of the third, fourth and fifth stages are the same size, 1/8 size for the input image.
4. The method for semantically segmenting the coal mine underground image according to claim 3, wherein the specific process of the step 3) is as follows:
step 31) because the receptive field is gradually increased along with the process of convolution and down sampling, the detail information of small targets is gradually lost, in order to obtain more detail information, multi-scale input is adopted, basic residual error units are respectively added at the input ends of the fourth and fifth stages of the improved ResNet-101 network, additional 1/8-sized input images are directly input into the basic residual error units to obtain additional input feature maps of the fourth and fifth stages, the additional input feature maps obtained in the step are subjected to feature extraction once and are low-level feature maps, in the improved ResNet-101 network, the input images of each stage except the first stage are output feature maps of the previous stage, the input of the fourth and fifth stages are high-level feature maps, and the contained detail information is less than that of the low-level feature maps;
step 32) fusing the additional input feature maps processed by the basic residual error unit in the fourth and fifth stages with the normal input feature maps in the fourth and fifth stages respectively, and fully utilizing the shallow feature map so as to enrich the information of the small target in the deep feature map;
step 33) the feature map of the size of the input image 1/8 is enhanced with multi-scale input by: assume that the ResNet-101 network contains L at phase iiLayer convolution, then the jth layer convolution can be defined as yj=Mj(xj) Wherein y isjIs the output tensor of the j-th layer, MjInput picture x of i-th stage containing convolution, ReLU activation function and regularization operationiHas a size of (N, H)j,Wj,Wj) N denotes the batch size, HiAnd WiHeight and width of input feature map, CiIs the number of channels; output characteristic diagram F of i stageiCan be expressed as:
step 34) IiThe additional input of the ith stage is represented, the resolution of the additional input is the same as the output tensor of the ith-1 stage, and the feature map after feature extraction is as follows:
step 35) the fusion input at stage i is represented as:
in the formula, FiThe output tensor representing the ith phase,representing a channel splicing operation;
step 36) in which the fifth stage outputs a high-level characteristic diagram chihThe fourth stage low-level characteristic diagram χl。
5. The method for semantically segmenting the coal mine underground image according to claim 4, wherein the specific process of the step 4) is as follows:
step 41) constructing a fusion attention module: the fusion attention Module contains two inputs, from step 36) a high level feature map of the fifth stage outputAnd low-level feature maps of the fourth stage outputHh×WhIs a high-grade characteristic diagram chihNumber of spatial positions of (H)l×WlIs a low level feature diagram XlThe number of spatial locations of (a); chAnd ClAre respectively a high-level feature map XhAnd low level feature map Xl1 × 1 convolution WθFor mapping low-level features XlFeature transformation ofWhereinIs the number of channels of the converted features, R being the real number, εlIs a low-level characteristic diagram χlThe result after feature transformation is shown as formula (4):
εl=Wθ(χl) (4)
step 42) converting the feature into a result epsilonlF (epsilon) is obtained after regularization of softmax functionl);
Step 43) adopting the bottleneck characteristicSign conversion process f (ε)l) Obtaining channel dependence, 1 × 1 convolution Wγ1And Wγ2Will be used forhThe feature conversion of the system obtains an attention output resultThe results are as in formula (5):
OF=Wγ2ReLU(LN(Wγ1(f(εl)))) (5)
output OFReflect xlPair chihFrom χlIs selected from all the positions of the mobile phone,
step 44) finally outputting the fusion characteristic diagram YFComprises the following steps:
YF=cat(OF,χh) (6)。
6. the method for semantically segmenting the coal mine underground image according to claim 1, wherein the specific process of the step 5) is as follows:
step 51) constructing a global attention module after the fifth stage of the ResNet-101 network, acquiring a long-distance dependence relationship which is crucial to semantic segmentation, and setting an input feature X belonging to RC×H×WC, H, W are the number of channels, the height and the width of the space respectively, and the convolution W is 1 multiplied by 1θTo convert feature X:
θ=Wθ(X) (7)
Step 53) the output of the attention module is convolved by 1 × 1 by Wγ1And Wγ2And the intermediate normalization and the ReLU function, the result is as follows (8):
step 54) the final output profile YG∈RC×H×WThe expression of (a) is:
YG=cat(OG,X) (9)。
7. the method for semantically segmenting the coal mine underground image according to claim 1, wherein the specific process of the step 6) is as follows:
step 61) fusing the final output feature map Y obtained in the step 5) with the feature map YGInputting the data into a classifier, and generating a channel semantic segmentation feature map;
step 62) comparing the generated feature map with the real label image labeled in the step 1) for supervising the training of the feature extraction network parameters, thereby obtaining a trained network model; inputting the test sample data set obtained in the step 1) as an input image into a trained network model, and checking the performance of the network model;
and 63) loading the trained model parameters, and performing scene semantic analysis on the next batch of pictures shot underground.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111248280.4A CN114170422A (en) | 2021-10-26 | 2021-10-26 | Coal mine underground image semantic segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111248280.4A CN114170422A (en) | 2021-10-26 | 2021-10-26 | Coal mine underground image semantic segmentation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114170422A true CN114170422A (en) | 2022-03-11 |
Family
ID=80477308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111248280.4A Pending CN114170422A (en) | 2021-10-26 | 2021-10-26 | Coal mine underground image semantic segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114170422A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115700781A (en) * | 2022-11-08 | 2023-02-07 | 广东技术师范大学 | Visual positioning method and system based on image inpainting in dynamic scene |
CN116363134A (en) * | 2023-06-01 | 2023-06-30 | 深圳海清智元科技股份有限公司 | Method and device for identifying and dividing coal and gangue and electronic equipment |
-
2021
- 2021-10-26 CN CN202111248280.4A patent/CN114170422A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115700781A (en) * | 2022-11-08 | 2023-02-07 | 广东技术师范大学 | Visual positioning method and system based on image inpainting in dynamic scene |
CN116363134A (en) * | 2023-06-01 | 2023-06-30 | 深圳海清智元科技股份有限公司 | Method and device for identifying and dividing coal and gangue and electronic equipment |
CN116363134B (en) * | 2023-06-01 | 2023-09-05 | 深圳海清智元科技股份有限公司 | Method and device for identifying and dividing coal and gangue and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
Zhang et al. | Deep hierarchical guidance and regularization learning for end-to-end depth estimation | |
CN108765279A (en) | A kind of pedestrian's face super-resolution reconstruction method towards monitoring scene | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN113077491B (en) | RGBT target tracking method based on cross-modal sharing and specific representation form | |
Wang et al. | Deep Learning for Object Detection: A Survey. | |
Noh et al. | Unsupervised change detection based on image reconstruction loss | |
CN113435411A (en) | Improved DeepLabV3+ based open pit land utilization identification method | |
CN114170422A (en) | Coal mine underground image semantic segmentation method | |
CN109977834B (en) | Method and device for segmenting human hand and interactive object from depth image | |
CN114155474A (en) | Damage identification technology based on video semantic segmentation algorithm | |
CN112700476A (en) | Infrared ship video tracking method based on convolutional neural network | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
Yang et al. | Multi visual feature fusion based fog visibility estimation for expressway surveillance using deep learning network | |
Chen et al. | Small object detection model for UAV aerial image based on YOLOv7 | |
Shi et al. | Cpa-yolov7: Contextual and pyramid attention-based improvement of yolov7 for drones scene target detection | |
CN112132839B (en) | Multi-scale rapid face segmentation method based on deep convolution cascade network | |
Khoshboresh-Masouleh et al. | Robust building footprint extraction from big multi-sensor data using deep competition network | |
Kajabad et al. | YOLOv4 for urban object detection: Case of electronic inventory in St. Petersburg | |
Duan et al. | Attention enhanced ConvNet-RNN for Chinese vehicle license plate recognition | |
Xi et al. | Detection-Driven Exposure-Correction Network for Nighttime Drone-View Object Detection | |
Guo et al. | Udtiri: An open-source road pothole detection benchmark suite | |
Fan et al. | Attention-modulated triplet network for face sketch recognition | |
Wang et al. | Improving deep learning based object detection of mobile robot vision by HSI preprocessing method and CycleGAN method under inconsistent illumination conditions in real environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |