CN109740608B

CN109740608B - Image segmentation method based on deep learning

Info

Publication number: CN109740608B
Application number: CN201811627300.7A
Authority: CN
Inventors: 刘博�; 刘银星
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2021-02-02
Anticipated expiration: 2038-12-28
Also published as: CN109740608A

Abstract

The invention discloses an image segmentation method based on deep learning, which comprises the steps of firstly utilizing image data enhancement and extraction technology to expand a data set; and 5-fold processing is carried out by considering underground depth information during image acquisition. And finally, adopting the improved classification network as an encoder and the improved FPN network structure as a decoder, and carrying out model training by using TGS open-source data.

Description

Image segmentation method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to image processing, an image semantic segmentation method, a deep learning image segmentation method and the like.

Background

With the development of artificial intelligence technology, the application of computer vision is more common. In computer vision application, image segmentation is an essential link, image semantic segmentation can be a foundation technology for image understanding, and the image semantic segmentation has great significance in the fields of medical image research, geological image research, automatic driving systems, modern industry and the like. For example, in areas where large amounts of oil and gas are concentrated on the earth, large salt deposits tend to form below the surface of the earth, and these salt deposit deposits are present in the subsurface in the form of high temperature liquids. Prior to mining, the geology is rigorously surveyed in order to determine their location. Some salt layers, rock formations, in the subsurface may be reacted into the image by seismic imaging techniques. And further through the study of the images to identify their specific locations. Unfortunately, it is very difficult to segment specific locations of different geological structures for these specialized geological image markers. In addition, these geological images, which are derived from acoustic feedback, typically require a skilled person to label them. This also results in very subjective results, which present great difficulties in 3D rendering of the seismic imaged image. More seriously, if the mining is not accurately judged and blindly carried out, salt layers can flow out and be sprayed out, which can bring potential danger to drilling personnel and drilling equipment of oil and gas companies and cause huge economic loss.

Image semantic segmentation is a very important research direction in the field of computer vision, and with the development of deep learning, a classification task and a semantic segmentation task are also greatly improved, and the semantic segmentation can be regarded as a pixel-level-based classification task of an image.

In 1985, Hinton et al proposed a back propagation algorithm to make training of neural networks simple and feasible. In classification, LeNet5 marked the real appearance of CNN in 1998. AlexNet was proposed in 2012, and the champion of the annual ImageNet triage was obtained. In 2014, google lenet used the inclusion structure, which not only deepened but also "widened" the network result. In 2015, hogel et al proposed a deep residual network ResNet, using residual units, with a number of network layers of 1000. In 2016, DenseNet proposed by Huang et al uses dense connection, and the parameters and calculation amount are only half of ResNet on the premise of achieving the classification accuracy equivalent to ResNet. In 2017, the Squeeze-and-Excitation Networks (SENET) proposed by Momenta of domestic automatic driving companies adopts a 'feature re-calibration' strategy, and the champion of the ImageNet image classification task is obtained in the same year. In terms of segmentation, full convolutional networks FCN proposed by Long et al in 2014 enables dense pixel-level classification in convolutional neural networks without requiring a full connectivity layer. The U-Net network proposed in 2015 uses a structure of an encoder-decoder, and introduces a quick connection between the encoder and the decoder, so that the detailed information of an object can be better recovered, and particularly, a segmentation task on a medical image can be better achieved. In 2017, a Feature Pyramid Network (FPN) proposed by Tsung-Yi Lin et al simultaneously utilizes high-resolution of low-layer features and high-semantic information of high-layer features, and obtains a better prediction effect by fusing the Feature information of different layers. These classification networks and segmentation networks based on deep learning have achieved good results on natural images as well as on medical images. However, in the field of geological salt layer images, the networks are used for segmentation tasks, and good segmentation effects are difficult to obtain, mainly because geological images are influenced by the depth of the stratum, geological imaging results at different depths are inconsistent, and secondly, image data are few, so that difficulty is brought to deep network learning, and further the segmentation effects are low.

Disclosure of Invention

The invention aims to solve the technical problem of providing a geological salt layer image segmentation method based on deep learning. And finally, the classification network is used as an encoder, the FPN network structure is used as a decoder, and data open source of TGS (world-leading geoscience data corporation) is used for model training.

The invention provides a salt layer segmentation method based on deep learning SEnet154 and FPN network for a small amount of geological salt layer image data. In the data preprocessing stage, the original image is turned left and right, the brightness is randomly changed and the like by using an image enhancement technology, so that the purpose of expanding the capacity of the original data set is achieved. And then extracting data according to the underground depth distribution represented by the image data. According to the geological depth, the data are divided into 5 intervals, and each interval is subdivided according to the area, namely 5-fold training is carried out. By the method, data obtained each time are approximately in normal distribution in the underground depth distribution, and the generalization capability of the model is further improved. In the design of the network structure, a U-Net network structure similar to an encoder-decoder is adopted, and a deep network with stronger classification capability is adopted in the encoding stage. Examples of deep classification networks that may be used during the encoding stage are set 154, SE-Resnet101, Resnet152, etc. The method is characterized in that deeper classification networks are adopted to strive for learning more information from geological salt layer images, and in addition, pre-training models are used for carrying out parameter initialization on the networks, so that the learning of low-semantic information in the images is accelerated, and further the convergence of the models is accelerated. In the decoding stage, an FPN network is adopted, and the FPN network achieves the prediction effect by fusing the characteristics of different layers by simultaneously utilizing the high-resolution of the low-layer characteristics and the high-semantic information of the high-layer characteristics. And the prediction is carried out on each fused feature layer independently, namely on each layer of feature, the prediction is carried out independently, and then information fusion is carried out. Generally speaking, by considering the factor of the depth of the geological salt layer image in the underground, in each iterative training, the data tends to be distributed too much, and a deeper network model and a network structure with multiple reuse of characteristics are adopted. Therefore, although the geological image data set is small, the information in the image can be learned as much as possible, and therefore the semantic segmentation task of the geological salt layer image can be efficiently and accurately achieved.

In order to achieve the purpose, the invention adopts the following technical scheme: in order to better implement the whole method, the Python language is preferred as a basis. In the aspect of image enhancement, an OpenCV (open source computer vision library) is used for carrying out left-right turning on original image data, randomly changing brightness, contrast and random scale, then crop, translation, random Gaussian noise and the like, and then outputting an image enhanced data set. And in the basic row, dividing 5 intervals according to the depth of the underground during image acquisition (imaging geology with different depths of the underground during seismic imaging) for 5-fold training. And then, realizing the SENEt154 classification network in the encoding stage by using an open source deep learning framework PyTorch, and initializing by using the SENEt154 network parameters trained on ImageNet by the predecessor. And similarly, the FPN network in the decoding stage is realized by using a PyTorch framework, so that the model is built. And finally, carrying out 5-fold training on the data set to select the optimal model parameters.

A geological salt layer image segmentation method based on deep learning comprises the following steps:

step 1, acquiring geological salt layer image data sets in related fields, and cleaning the data.

And 2, enhancing the original data by using an image enhancement technology, increasing the number of samples and enriching the data content.

And 3, dividing the data into 5 intervals according to the underground depth of the data set during collection, and dividing the intervals into 5 equal parts according to the area.

And 4, building a model, wherein a SENet154 network is preferred in an encoding stage, and an FPN network is used in a decoding stage.

And 5, performing 5-fold training according to the 5 data sets obtained in the step 3, and voting to select an optimal result.

Preferably, step 2 specifically comprises the following steps:

and 2.1, enhancing data form, labeling the original image and the mask thereof, zooming the length and width of the original image and mirroring (left-right turning) according to a certain proportion. Carrying out translation operation on the original image and the mask mark thereof, and using edge pixels to fill an image area generated by translation;

and 2.2, enhancing a data space domain. The brightness and the contrast are changed randomly to a small degree, the physical significance of the image is guaranteed, and the data enhancement is found to be effective through experiments. Median filtering, gaussian filtering, etc.;

preferably, step 3 specifically comprises the following steps:

3.1, according to the principle of geological image imaging, images acquired at different depths have certain correlation in a spatial domain, and a data set is divided into five parts according to geological depth;

3.2, in order to improve the generalization capability of the depth model, dividing the data in the step 3.1 into 5 parts, and dividing each part into 5 parts according to the area marked by the salt layer region;

step 3.3, integrating the data of the two steps to ensure that each fold contains data of five depth intervals and 5 salt layer area intervals;

preferably, step 4 specifically comprises the following steps:

step 4.1, the encoding phase of the depth model, using SENET154 as the infrastructure. The SENet154 uses an SE-block structure, so that effective characteristics are highlighted, and a deeper network layer number can provide better characteristics;

and 4.2, designing a decoding stage, and modifying the conventional FPN for the semantic segmentation task of the geological image. The FPN is utilized to fuse the characteristics of multiple resolutions, and the segmentation of the small-area salt layer is improved to a certain extent;

4.3, introducing a hypercolumn module on the basis of the FPN, and further fusing the characteristics of multiple resolutions;

4.4, adding a global average pooling layer and a classification head at the last of the encoder, and introducing classification auxiliary loss in a segmentation network;

step 4.5, introducing segmentation auxiliary loss at each resolution level of a decoder, and further adjusting the training of parameters of each level;

compared with the prior art, the invention has the following obvious advantages:

the deep learning model generally needs a large amount of data sets to be trained so as to search for an optimal model, and the geological salt layer image data set is smaller than data sets such as other natural images and medical images. Secondly, in the encoding stage of the model, a deeper classification network, such as a 154-layer SENET network, is adopted, in addition, a 'feature recalibration' strategy is adopted, the importance degree of each feature channel is acquired through a learning mode, then, according to the importance degree, useful features are improved, the features which are not useful for the current task are suppressed, and the interdependence relation between the feature channels is explicitly modeled. When the parameters are initialized, the low semantic information can be learned quickly by loading the pre-training model, and the convergence of the model is accelerated. In the decoding stage, the characteristics of multi-resolution are further fused by means of an FPN network, the segmentation of a small-area salt layer is obviously improved, the FPN structure effectively reduces the parameter quantity in the decoding stage, reduces the use of video memory and improves the training speed. Generally speaking, the method can effectively and accurately segment the geological salt deposit image.

Description of the drawings:

FIG. 1 is a flow chart of a method according to the present invention;

FIG. 2 is a 5-fold statistical analysis diagram in data processing according to the present invention;

FIG. 3 is a diagram of a semantic segmentation network architecture designed by the present invention;

FIG. 4 is a SENET network schematic diagram in accordance with the present invention;

FIG. 5 is a schematic diagram of an FPN network according to the present invention;

Detailed Description

The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.

The hardware equipment used by the invention comprises 1 PC and 1 1080 video card;

as shown in fig. 1, the invention provides a geological salt layer image segmentation method based on deep learning, which specifically comprises the following steps:

step 1, acquiring geological salt layer image data sets of related fields, and performing first cleaning on the data (for example, removing dirty data).

And 2, enhancing the original data by utilizing an image enhancement technology so as to increase the number of samples and enrich the content of the data set.

Step 2.1, form enhancement, namely labeling the original image and the mask thereof, zooming the original image and the mask thereof according to a certain proportion, intercepting the size required by the semantic segmentation network, distorting the original sample to a certain extent, and fusing multiple scales during prediction;

and 2.2, enhancing the form, marking the original image and the mask thereof, and carrying out mirror image processing. The geological image reflects the actual geological condition of the underground, has a certain depth and has practical significance. Furthermore, a translation operation is performed, wherein the image area resulting from the translation can be filled up using edge pixels;

step 2.3, enhancing the image space domain, wherein experiments show that the physical significance of the image can be ensured by randomly changing the brightness and the contrast to a small extent, so that effective data enhancement is realized;

and 3, performing statistical division into 5 intervals according to the underground depth of the data set during collection, and performing random division into 5 equal parts in each interval.

3.1, according to the principle of geological image imaging, images acquired at different depths have certain correlation in a spatial domain, so that a data set is divided into five equal parts according to the depth under the ground;

and 3.2, in order to improve the generalization capability of the depth model, dividing the data in the step 3.1 into 5 parts, and dividing each part into 5 parts according to the area marked by the salt layer region.

And 3.3, integrating the two steps to enable each fold to contain data of five depth intervals and 5 salt layer area intervals.

As shown in fig. 2, which is a 5-fold statistical analysis graph in data processing according to the present invention, the abscissa represents the underground depth collected by the training set and the test set, and the ordinate represents the frequency of occurrence of a certain depth, and data is obtained according to the distribution condition 5-fold, so that the overall distribution is similar to the positive-power distribution.

As shown in fig. 3, the structure diagram of the whole semantic segmentation network based on the geological salt layer image is composed of two major modules, an encoding module and a decoding module.

Step 4.1, the encoding phase of the depth model, uses SENET154 as the infrastructure. The SEnet uses an SE-block structure, so that effective characteristics are highlighted, and a deep network layer number can provide better characteristics.

As shown in FIG. 4, the SE-block schematic of SENSet. The SENET network uses two operations of 'Squeeze' and 'Excitation' to realize a brand-new 'feature recalibration' strategy. Namely, the importance degree of each feature channel is automatically acquired through a learning mode, and then useful features are promoted according to the importance degree and the features which are not useful for the current task are suppressed.

And 4.2, designing a decoding stage, wherein an FPN network structure with the capability of fusing multi-layer features is adopted and used in the semantic segmentation task. The FPN network can fuse the characteristics of multiresolution, has certain help to the segmentation of small region salt layer, and the FPN structure can effectively reduce the parameter quantity of decoding stage and the use amount of video memory in addition.

And 4.3, additionally, introducing a hypercolumn module in the FPN network, and further fusing the multi-resolution characteristics.

And 4.4, finally, adding a global average pooling layer and a classifier at the last of the encoder, and introducing classification auxiliary loss in the segmentation network. At the level of each resolution of the decoder, a segmentation assistance penalty is introduced, further adjusting the training of each level of parameters.

Fig. 5 is a schematic diagram of the FPN network according to the present invention. As can be seen from the figure, the FPN network adopts a structure from bottom to top and then from top to bottom, and then performs horizontal line connection operation, thereby not only fusing the multi-resolution features in the bottom layer, but also learning the high semantic information in the deep layer. In addition, the FPN structure effectively reduces the parameter amount in the decoding stage and the use of video memory, and improves the training speed.

And 5, performing 5-fold training, and voting to output an optimal result.

The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims

1. An image segmentation method based on deep learning aims at solving the problem of geological image segmentation, and comprises the following steps:

step 1, acquiring a geological salt layer image data set, and cleaning the data;

step 2, enhancing the original data by using an image enhancement technology;

the step 2 specifically comprises the following steps: enhancing data form, labeling the original image and the mask thereof, zooming the length and width of the original image and mirroring the original image and the mask according to a preset proportion; carrying out translation operation on the original image and the mask mark thereof, and using edge pixels to fill an image area generated by translation;

step 3, dividing the data set into 5 intervals according to the underground depth statistics during the collection, and dividing the intervals into 5 equal parts according to the area;

the step 3 specifically comprises the following steps:

step 3.2, dividing the data in the step 3.1 into 5 parts, and dividing each part into 5 parts according to the area marked by the salt bed area;

step 4, model building, wherein a SENet154 network is preferred in an encoding stage, and an FPN network is used in a decoding stage;

the step 4 specifically comprises the following steps:

step 4.1, in the encoding stage of the depth model, SENET154 is used as a basic structure; the SENet154 uses an SE-block structure, so that effective characteristics are highlighted, and a deeper network layer number can provide better characteristics;

step 4.2, designing in a decoding stage, and modifying a conventional FPN network to be used in a semantic segmentation task of the geological image;