CN117173193A

CN117173193A - Attention mechanism-based unsupervised color geological image segmentation method and system

Info

Publication number: CN117173193A
Application number: CN202310870642.6A
Authority: CN
Inventors: 邱芹军; 马凯; 刘俊杰; 杨盈; 段雨希; 谢忠; 陶留锋; 郑帅; 田苗
Original assignee: Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center; China University of Geosciences; China Three Gorges University CTGU
Current assignee: Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center; China University of Geosciences; China Three Gorges University CTGU
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-12-05

Abstract

The invention discloses an unsupervised color geologic image segmentation method and system based on an attention mechanism, which comprises the steps of firstly carrying out mathematical morphological preprocessing on a color geologic image to obtain a preprocessed image; then, the preprocessed image is respectively utilized to obtain a rough segmentation image and a super-pixel segmentation image by a SE-UNet neural network and a Felz clustering algorithm; and finally, taking the super-pixel segmentation map as a pseudo label, optimizing the rough segmentation image based on the super-pixel segmentation map, and outputting a prediction segmentation result by forward propagation to realize label-free color geological image segmentation. The invention segments the color geological image without any manual labeling, and has strong automatic segmentation capability.

Description

Attention mechanism-based unsupervised color geological image segmentation method and system

Technical Field

The invention relates to the technical field of computer application and the technical field of image segmentation, in particular to an unsupervised color image segmentation method and an unsupervised color geological image segmentation system based on a graph clustering algorithm and an attention mechanism.

Background

With the continuous development of technology, the importance of image segmentation in the field of computer vision is increasingly highlighted. Image segmentation refers to the process of dividing an image into a plurality of sub-regions or pixel sets, so that pixels in each sub-region have similar characteristics or semantic information, and the main purpose of the image segmentation is to extract semantic information in the image and divide the image into regions with independent semantics, thereby realizing understanding and interpretation of the image. Image segmentation plays an important role in many application fields, including automatic driving, medical image analysis and the like, and through the image segmentation, accurate extraction and positioning of objects, boundaries and structures in an image can be realized, so that a foundation is provided for subsequent image processing and analysis tasks.

The current image segmentation methods can be divided into three types of methods based on rules, machine learning and deep learning. Rule-based methods use predefined rules and pattern matching for image segmentation, typically relying on manually designed rules and thresholds to divide pixels into different regions. The machine learning method can automatically learn the image segmentation mode and characteristics, learn by means of training data, and segment a new image according to the learned model. The deep learning method learns a higher-level feature representation from the original pixel level of the image by using a deep neural network, thereby realizing a more accurate segmentation effect.

In the field of image segmentation, a great deal of research is devoted to extracting information from unstructured images. While these efforts have achieved some success, existing image segmentation methods still have limitations in the field of geologic images. The rule-based segmentation method often requires manual intervention, uses manually marked feature rules for segmentation, utilizes visual surface features such as textures, colors and the like of images and external structural features for processing, and has limited performance and robustness. Due to the complexity and diversity of geologic images, supervised machine learning and deep learning algorithms are not suitable for geologic image segmentation scenarios. Semi-supervised and unsupervised deep learning segmentation methods still have certain limitations in extracting information from highly heterogeneous and complex geologic images.

Disclosure of Invention

In order to solve the technical problems, the invention provides an unsupervised color geologic image segmentation method and system based on a graph clustering algorithm and an attention mechanism.

The technical scheme adopted by the method is as follows: an unsupervised color geologic image segmentation method based on an attention mechanism comprises the following steps:

step 1: carrying out mathematical morphological preprocessing on the color geological image to obtain a preprocessed image;

step 2: respectively utilizing an SE-UNet neural network and a Felz clustering algorithm to the preprocessed image to obtain a rough segmentation image and a super-pixel segmentation image;

the SE-UNet neural network is formed by an encoder, a decoder and jump connection;

the encoder consists of a convolution layer, a ReLU correction linear unit and a pooling layer which are sequentially connected; the convolution layer applies sliding windows to different areas of the input source image through convolution check to extract a local feature matrix of the image; the ReLU correction linear unit enables features to be easier to distinguish by performing nonlinear mapping on the features extracted by the convolution layer; the pooling layer is used for partitioning the output characteristics processed by the ReLU correction linear unit according to the designated pooling size;

the jump connection is used for splicing the characteristic diagram in the encoder and the characteristic diagram in the corresponding decoder according to the channel, and the spliced characteristic diagram is used as the input of the decoder; wherein, adding SE attention mechanism in jump connection;

the decoder is composed of a convolution layer, a ReLU correction linear unit and a transposed convolution which are sequentially connected; the transpose convolution restores the position information of the image through the up-sampling feature map, and simultaneously combines with the low-level features corresponding to the encoder to output a segmentation result;

step 3: and taking the super-pixel segmentation map as a pseudo label, optimizing the rough segmentation image based on the super-pixel segmentation map, and outputting a prediction segmentation result by forward propagation to realize label-free color geological image segmentation.

The system of the invention adopts the technical proposal that: an attention-based, unsupervised color geologic image segmentation system, comprising:

one or more processors;

and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the attention-based method of unsupervised color geologic image segmentation.

The invention has the following beneficial effects:

1. according to the invention, mathematical morphology preprocessing is performed on the image to be segmented, so that the influence of irrelevant factors such as noise on segmentation is reduced, the structural information and the edge information of the image are better reserved, and the quality and the segmentability of the geological image are improved.

2. The invention adopts Felz clustering algorithm, and adopts weighted summation of color similarity and texture similarity to the similarity judging part of pixels in the image to obtain the edge weight, thereby enhancing the clustering characteristic information judging capability of the algorithm on the color geological image.

3. The invention provides a SE-UNet multi-mode information extraction module, which embeds a SE attention mechanism into a jump connection part of an encoder and a decoder of UNet, so that the SE attention mechanism can adaptively learn and adjust the feature weight in a geological image, effectively extract and strengthen important geological features and improve the perceptibility and the characterization capability of image segmentation.

4. The invention provides a clustering algorithm and a segmentation model of an attention mechanism based on a graph to segment a color geological image, no manual labeling is needed, and the automatic segmentation capability is strong.

Drawings

The following examples, as well as specific embodiments, are used to further illustrate the technical solutions herein. In addition, in the course of describing the technical solutions, some drawings are also used. Other figures and the intent of the present invention can be derived from these figures without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of an embodiment of the present invention.

FIG. 2 is a schematic representation of a color geologic image according to an embodiment of the invention.

FIG. 3 is a schematic representation of color geologic image preprocessing, in accordance with an embodiment of the invention.

Fig. 4 is a diagram of a SE-UNet neural network according to an embodiment of the present invention.

FIG. 5 is a graph illustrating the result of super-pixel segmentation according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a segmentation result obtained according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and practice of the invention, those of ordinary skill in the art will now make further details with reference to the drawings and examples, it being understood that the examples described herein are for the purpose of illustration and explanation only and are not intended to limit the invention thereto.

Referring to fig. 1, the method for dividing an unsupervised color geologic image based on an attention mechanism provided by the invention comprises the following steps:

in one embodiment, the mathematical morphological pre-processing is a closed operation, wherein the closed operation is a process of performing an expansion operation on the image followed by a corrosion operation.

In one embodiment, the expansion operation is performed by taking an input source image as a and a structural element as B. The expression of the structural element B for expanding the input source image A is as follows:

wherein,representing expansion calculation->Representing the mapping of structural element B with respect to the origin, +.>Representation->Translating sigma displacements. The above formula shows that the result of the source image a expanding with structural element B is a set of origin positions of B when the mapping of structural element B with respect to the origin translates σ displacements intersecting a with at least 1 non-zero element.

In one embodiment, the expression of the erosion operation performed by the structural element B on the input source image a is:

wherein Θ represents the corrosion operation, (B) _σ Representing the structural element B shifted by σ displacements. The above formula states that the source image a erodes with structural element B to a set of σ that all B still contains in a after B is translated.

In summary, the processing of the input source image by mathematical morphological closing operation can be expressed as:

firstly, performing expansion operation on an input source image, and then performing corrosion operation to obtain a preprocessed source image x.

Please refer to fig. 2, which is a color geologic image adopted in the present embodiment, and refer to fig. 3, which is a pre-processed effect diagram of the color geologic image in the present embodiment.

please refer to fig. 4, the SE-UNet neural network of the present embodiment is composed of a multi-layer encoder module, a multi-layer decoder module, and an attention mechanism-jump connection module;

in one embodiment, the multi-layer encoder module comprises four encoder blocks arranged in series, each encoder block is composed of a convolution layer, a batch normalization layer and a ReLU layer, wherein the convolution kernel in the convolution layer is 3, the step size is 1, and the padding is 1; each convolution layer of the encoder blocks is sequentially added with a batch normalization layer and an activation layer, and the encoder blocks are connected by using a downsampling layer, wherein the downsampling layer refers to a maximum pooling operation, the pooling core size in the maximum pooling operation is 2, and the step length is 2. The input image firstly enters a first encoder block, nonlinear transformation is carried out through a convolution layer with the convolution kernel size of 3 and then a batch normalization layer and a ReLU layer, the processing operation is repeated once, namely, two times of convolution, batch normalization and ReLU operation are carried out, the output characteristic F1 of the encoder block is obtained, the pooling kernel size of 2 is carried out on the characteristic F1, and the maximum pooling operation with the step length of 2 is carried out to obtain a downsampled output characteristic G1; the input of the second encoder block of the encoder module is the characteristic G1 obtained by the downsampling layer of the output characteristic F1 of the first encoder block of the encoder module, and the corresponding output characteristic of the encoder block is F2; the input of the subsequent third encoder and the input of the subsequent fourth encoder are all obtained by downsampling operation on the output characteristics of the last adjacent encoder block, so that the input characteristics of the third encoder and the input characteristics of the fourth encoder are G2 and G3 respectively, and the output of the corresponding encoder blocks are F3 and F4.

In one embodiment, the attention mechanism-skip connection module is configured to output features F1, F2, and F3 to the first three encoder blocks in the multi-layer encoder module, and obtain channel dimension features C1, C2, and C3 by performing global average pooling on the encoder block output features F1, F2, and F3 in a spatial dimension; the channel dimension characteristics C1, C2 and C3 are input into two full-connection layers to obtain channel attention weights W1, W2 and W3, wherein a ReLU nonlinear activation function is added after a first full-connection layer, a Sigmoid nonlinear activation function is added after a second full-connection layer, and then the channel attention weights W1, W2 and W3 are multiplied with encoder block output characteristics F1, F2 and F3 channel by channel respectively to obtain attention fusion characteristics S1, S2 and S3.

In one embodiment, the multi-layer decoder module is composed of three decoder blocks, each composed of a convolution layer, a batch normalization layer and a ReLU layer, similar to the encoder block structure, but with the opposite number of output channels to the encoder block, all connected by an upsampling layer, wherein the upsampling layer refers to the transpose convolution, the transpose convolution kernel size being 2, and the step size being 2. For upsampling the encoder module bottom layer encoder block output characteristics and the decoder module decoder block output characteristics. The input of the first decoder block of the multi-layer decoder module is a multi-scale feature T1 of the multi-layer encoder module, wherein the transposition convolution value of the output feature F4 of the fourth encoder block and the attention fusion feature S3 are subjected to splicing operation, the splicing operation is to splice the transposition convolution value of the encoder block feature F4 and the attention fusion feature S3 in a channel dimension, the number of channels of the obtained multi-scale feature T1 is 2 times that of the attention fusion feature S3, and the size of the obtained multi-scale feature T1 is consistent with that of the attention fusion feature S3; the input of the second decoder block of the multi-layer decoder module is the multi-scale input characteristic T1 of the last decoder module, and the multi-scale input characteristic T2 is obtained by processing the first decoder block and splicing the transposed convolution output characteristic and the attention fusion characteristic S2 of the attention mechanism-jump module on a space channel, wherein the channel number of the characteristic T2 is 2 times of the channel number of the attention fusion characteristic S2, and the size is consistent with S2; the input of the third decoder block of the multi-layer decoder module is the multi-scale input characteristic T2 of the last decoder block, the output characteristic of transposed convolution is spliced with the attention fusion characteristic S1 of the attention mechanism-jump module on a space channel through the processing of the second decoder block, and the multi-scale input characteristic T3 is obtained, wherein the channel number of the characteristic T3 is 2 times of the channel number of the attention fusion characteristic S1, and the size is consistent with S1; and the output characteristic M of the third decoder block is subjected to a space channel adjustment convolution layer to obtain a rough segmentation characteristic diagram.

The encoder is responsible for extracting and abstracting the characteristic information of the input image for fine segmentation in the decoder. Each module of the encoder consists of a convolutional layer, a ReLU modified linear unit, and a pooling layer. The convolution layer applies sliding windows to different areas of an input source image through convolution check to extract a local feature matrix of the image, and the expression of the convolution layer is as follows:

wherein x represents the preprocessed source image, ω represents the convolution kernel, b represents the bias term, M, N table, ω _m,n Representing the size of the convolution kernel of m rows and n columns, x _(i+m),(j+n) Representing the eigenvalue at the offset (m, n) at the source image position (i, j), this is essentially the multiplication of the value at the convolution check corresponding position (m, n) with the corresponding position of the input image after the offset (m, n) at (i, j).

The ReLU correction linear unit enables features to be easier to distinguish by performing nonlinear mapping on the features extracted by the convolution layer; the expression of ReLU is:

y＝max(0,z)；

where y represents the feature map after the correction unit ReLU processing, and max () represents the maximum value function.

The output feature map processed by the correction unit is segmented according to the designated pooling size, only the strongest features of the local area are reserved, the dimension and the size of the feature map can be effectively reduced, the calculated amount is reduced, and overfitting is prevented. The expression of the maximum pooling layer is:

where y represents the image processed by the correction unit ReLU, l represents the output feature map, and (i, j) represents the position index of the output feature map.

geological images generally have complex spatial structures and texture distribution, and SE attention mechanisms are added in jump connection, so that key features in the images can be adaptively learned and selected, irrelevant feature information interference is ignored, and the sensitivity of a network to a target is improved. The SE attention mechanism consists of a Squeeze operation, an expression operation and a weight operation.

In one embodiment, the Squeeze operation compression encodes spatial information on each channel into global features by global averaging pooling.

Wherein F is _sq () Representing the definition of the squeze operation, C, H and W respectively represent the channel number, the height and the width of the feature map, and l _c The feature map representing the c-th channel, i, j representing the pixel position.

In one embodiment, the accounting operation transmits the global feature obtained by the squeze operation into two full-connection layers, and the two full-connection layers are given a weight by assignment operation, so that the mutual dependence of different channels is constructed.

s＝F _ex (z，W)＝φ[g(z，W)]＝φ[W ₂ ρ(W ₁ ，z)]；

Wherein F is _ex () Define for the specification operation that z is the squeze operation output, W ₁ And W is ₂ For channel weights, φ () is a Sigmoid normalization function, ρ () is a ReLU activation function.

In one embodiment, the weight operation weights the weight s obtained by the expression operation to the original input feature, and obtains an output and uses the output as the input of the next stage.

Wherein F is _scale () Define s for Reweight operation _c And outputting matrix channel weight values for the specification operation.

The decoder is composed of a convolution layer, a ReLU correction linear unit and a transposed convolution which are sequentially connected; the transpose convolution recovers the position information of the image by upsampling the feature map, combines with the low-level features corresponding to the encoder, and outputs a segmentation result with the expression:

wherein,is a transposed convolution kernel, l represents the output profile of the encoder,>output profile representing a jump connection in combination with the SE attention mechanism +.>The splicing operation is represented, p represents the offset term, k and l represent the magnitudes of the transposed convolution kernels in the horizontal and vertical directions, and x and y represent the position indexes of the output feature map.

The clustering algorithm adopts a Felz clustering algorithm, a super-pixel segmentation map is obtained by clustering geological objects through the Felz clustering algorithm, the Felz clustering algorithm carries out region merging through judging standards of inter-region spacing and intra-region spacing dissimilarity, and the threshold value is adaptively adjusted according to local features of image data and greedy selection is used for image segmentation.

In one embodiment, the Felz clustering algorithm specifically includes the following sub-steps:

step 2.1: constructing an undirected graph;

abstracting a geologic image using an undirected graph G, where g= (V, E), consisting of a vertex set V and an edge set E, V _i ∈V，(v _i ,v _j ) E represents the adjacent vertex (v _i ,v _j ) Edges, omega of the connection between _ij ＝(v _i ,v _j ) Representing each connected edge (v _i ,v _j ) Weight of R= (C) ₁ ,…,C _r ) Represents the mutually disjoint areas after segmentation, and in the initial state, the areas in R are all vertexes,the judgment standard of boundary definition between image areas is composed of inter-area spacing and intra-area spacing;

step 2.2: calculating weights between edges according to the relation between undirected graph nodes, for the points belonging to e _n Each edge of E is set with its connected vertex as (v _i ,v _j ) N is the serial number of the edge; combining the characteristics of the color geologic image, and carrying out weighted summation by adopting the color similarity and the texture similarity to obtain the weight of the edge; the weight expression of the edge is:

ω _ij ＝ω _c *euc_dis(v _i ,v _j )+ω _t *ssim_dis(v _i ,v _j )；

wherein omega _c And omega _t Is the weight of color similarity and texture similarity, used for adjusting the contribution that can also be in the weight; euc _dis (v) _i ,v _j ) For color similarity measurement, for measuring distance in color space, the expression is: euc _dis (v) _i ,v _j )＝And->Respectively represent v _i And v _j RGB values of (a); ssim_dis (v) _i ,v _j ) For texture similarity measurement, the expression is: cvi and cvj represent the gray values of the corresponding pixels, lum (cvi, cvj) is the luminance similarity, +.>Is contrast similarity, ->Is structural similarity, α is a weighting coefficient for balancing the importance of brightness, contrast, and structural similarity; the expression of the luminance similarity is:andthe brightness average value of the corresponding pixel is obtained by average gray scale, a is a constant and is used for stable calculation and preventing denominator from being zero; the expression of contrast similarity is:and->The brightness standard deviation of the corresponding pixel is obtained by the gray standard deviation, b is a constant and is used for stable calculation and preventing the denominator from being zero; the structural similarity is expressed as: />Is->And->The luminance covariance of (2) is obtained by gray covariance, c is a constant for stable calculation and prevents the denominator from being zero;

step 2.3: edge set E is based on dissimilarity, i.e. the weight ω of the edge _ij Non-descending weight ordering

Dissimilarity refers to the difference between two pixels in a geologic image, which is abstracted into an undirected graph in the invention, and is actually pointed to by the weight ω of an edge _ij Because the weight of the edge is obtained by calculating the vertices connected by the edge, when the weight corresponding to the edge is larger, the larger the difference between the two vertices connected by the edge is, namely, the higher the dissimilarity is, the smaller the weight corresponding to the edge is, namely, the smaller the difference between the two vertices connected by the edge is, namely, the lower the dissimilarity is.

Step 2.4: for the currently selected edge e _n Performing merging judgment, and setting the connected vertex as (v) _i ,v _j ) When edge e _n When the corresponding vertices do not belong to the same region, i.e. area_id (v _i )≠area_id(v _j ) And v _i Belonging to area C _i And v _j Belonging to area C _j When the interval between the areas is not more than the minimum value of the interval between the two areas, carrying out area combination, and executing the step 2.5;

wherein the region merging determination expression is: dif (C) _i ,C _j )≤Min_Int_dis(C _i ,C _j )，Dif(C _i ,C _j ) Represents the inter-area spacing, min_Int_dis (C) _i ,C _j ) Representing the minimum value of the distance between the two areas;

the inter-region pitch means the divided region C _i The maximum weight in the Minimum Spanning Tree (MST) in (1.ltoreq.i.ltoreq.r), i.e. the weight of the edge with the greatest dissimilarity in the corresponding region, is applicable to Int_dis (C) _i ) The expression is: max () is a maximum function; omega (e) refers to the weight of an edge in the region; e ∈ ->Refers to belonging to region C _i Corresponding minimal spanning tree edge set +.>Edge e of (a);

the inter-region distance is the smallest weight value among the point pairs belonging to two regions and having edge connection, i.e. the weight value of one edge with the smallest dissimilarity between the two regions, and is determined by dif (C _i ,C _j ) And (3) representing that i is not equal to j, wherein the expression is: min () is a minimum function;

the minimum value of the distance between the two regions is the minimum internal difference in the regions, and Min_Int_dis (C _i ,C _j ) The expression is: min_Int_dis (C) _i ,C _j )＝min(Int_dis(C _i )+λ(C _i ),Int_dis(C _j )+λ(C _j ) λ is a threshold function based on region size in order to prevent over-segmentation problems when each region is a single pixel value at the start of segmentation; λ (C) =α/num _C ，num _C For the region C containing the number of pixels, a is a fixed parameter for controlling the size of the formed region, and when a=0, each pixel point in the figure is an independent area, and when alpha = +++, the process, the entire map becomes a block area;

step 2.5: will v _i Class label area_id (v) _i ) And v _j Belonging area class label area_id (v _j ) Unified as area_id (v _i ) And adjusts the dissimilarity threshold of the region to be int_dis (C _i +C _j )+λ(C _i +C _j )；

Step 2.6: if n is less than or equal to m, selecting the next edge to continue to execute the step 2.4 according to the arrangement sequence in the step 2.3 until the final super-pixel segmentation graph is output. Please refer to fig. 5, which is a diagram illustrating a super pixel segmentation result according to an embodiment of the present invention.

In one embodiment, the coarse-segmented image is optimized using a cross entropy loss function (CrossEntropy Loss) expressed as:

wherein y represents a 'pseudo label' obtained by the super-pixel segmentation map, p represents a rough segmentation map label obtained by the segmentation model, and n is the maximum clustering number set according to experiments.

The SE-UNet neural network of the embodiment is a trained network; the training process comprises the following steps:

(1) Acquiring a color geologic image by searching published geologic field literature and a geologic data website, and constructing a color geologic image training data set;

(2) Carrying out mathematical morphological preprocessing on an input source image to obtain a preprocessed source image;

(3) Taking the preprocessed source image as input, and inputting the input into the SE-UNet neural network, thereby obtaining a rough segmentation image;

taking the preprocessed source image as input, and inputting the input into a Felz clustering algorithm to obtain a super-pixel segmentation map;

(4) Taking the super-pixel segmentation map as a pseudo tag, optimizing a coarse segmentation result based on the super-pixel segmentation map, and outputting a prediction segmentation result by forward propagation to realize the color geological image segmentation without the tag;

(5) Evaluating the segmentation result by using a universal evaluation index in the image segmentation field;

if the evaluation is qualified, finishing the training;

if it is not acceptable, the following operations are taken:

(1) Super parameter tuning: the super parameters of the network, such as learning rate, batch size, weight attenuation and the like, can be adjusted to find the optimal super parameter combination through cross-validation or an automatic parameter adjustment method based on performance indexes.

(2) Model structure optimization: attempts have been made to change the network structure of SE-UNet, such as increasing or decreasing the number of layers of encoder and decoder, adjusting the size of convolution kernel, enhancing the feature fusion capability of the attention-jump connection module, etc., to accommodate the characteristics of the geologic image.

(3) And (3) optimizing a clustering algorithm: and (3) adjusting the number of clusters in the clustering algorithm and related parameters, and controlling factors influencing the segmentation performance, such as the size of the minimum merging region in the super-pixel segmentation map.

In one embodiment, in order to better embody the performance of the algorithm and more comprehensively and intuitively compare the geological image segmentation performance difference based on different models or clustering algorithms and the like, the performance difference of the algorithm color segmentation is evaluated by adopting Pixel Accuracy (PA), average cross-over ratio (MIoU), frequency-weighted cross-over ratio (FWIOU) and Dice coefficient. PA represents the proportion of the segmentation correct pixels to the total pixels, MIoU is used to calculate the ratio of the intersection to the union of the segmentation algorithm to the class prediction result and the true value, and then the total number of classes is compared to evaluate the coincidence of the segmentation. FWIOU is a lifting formula of MIoU, and can be calculated by setting corresponding weights according to the occurrence frequency of each category. The Dice is used for calculating the similarity degree between the segmentation result and the real result.

Fig. 6 is a schematic diagram of the segmentation result obtained according to the embodiment of the present invention, from which it can be seen that:

(1) Region segmentation: the method is capable of segmenting a color geologic image into different regions, each region representing a portion of a geologic entity. The segmentation results may provide a region division of different geological entities in the geological image, helping to further analyze and understand the geological structure.

(2) Boundary extraction: the segmentation results may effectively extract boundaries between geologic entities, such as boundaries between rock formations. Such boundary information may be used to identify boundaries of different geological units, to perform geologic structure analysis and modeling, and so forth.

(3) Region attribute: the segmentation result may also provide attribute information for each segmented region, such as color features, texture features, shape features, etc. Such attribute information may be used to further analyze and study the characteristics of the geological entity and its relationships with other geological elements.

It should be understood that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

Claims

1. An unsupervised color geologic image segmentation method based on an attention mechanism is characterized by comprising the following steps:

2. The attention-based method for unsupervised color geologic image segmentation as defined in claim 1, wherein: the mathematical morphology preprocessing in the step 1 is that firstly, the color geological image is expanded, and then the preprocessed image x is obtained after the preprocessing by the corrosion operation;

the expansion operation is that an input source image is marked as A, and a structural element is marked as B; the expression of the structural element B for expanding the input source image A is as follows:

wherein,representing expansion calculation->Representing the mapping of structural element B with respect to the origin, +.>Representation->Translating sigma displacements;

the expression of the corrosion operation performed by the structural element B on the input source image A is as follows:

wherein,representing corrosion operation () _σ Representing the structural element B shifted by σ displacements.

3. The attention-based method for unsupervised color geologic image segmentation as defined in claim 1, wherein: in step 2, the SE-UNet neural network is composed of a multi-layer encoder module, a multi-layer decoder module and an attention mechanism-jump connection module;

the multi-layer encoder module comprises four encoder blocks which are arranged in series, each encoder block consists of a convolution layer, a batch normalization layer and a ReLU layer, and the encoder blocks are connected by using a downsampling layer, wherein the downsampling layer refers to the maximum pooling operation; the input image firstly enters a first encoder block, the output characteristic F1 of the encoder block is obtained through two convolutions, batch normalization and ReLU operation, and the characteristic F1 is subjected to maximum pooling operation to obtain a downsampled output characteristic G1; the input of the second encoder block of the encoder module is the characteristic G1 obtained by the downsampling layer of the output characteristic F1 of the first encoder block of the encoder module, and the corresponding output characteristic of the encoder block is F2; the input of the subsequent third encoder and the input of the subsequent fourth encoder are obtained by downsampling the output characteristics of the last adjacent encoder block, so that the input characteristics of the third encoder and the input characteristics of the fourth encoder are G2 and G3 respectively, and the output of the corresponding encoder blocks are F3 and F4;

the attention mechanism-jump connection module is used for outputting characteristics F1, F2 and F3 to the first three encoder blocks in the multi-layer encoder module, and obtaining channel dimension characteristics C1, C2 and C3 by carrying out global average pooling on the encoder block output characteristics F1, F2 and F3 in a space dimension; the channel dimension characteristics C1, C2 and C3 are input into two full-connection layers to obtain channel attention weights W1, W2 and W3, wherein a ReLU nonlinear activation function is added after a first full-connection layer, a Sigmoid nonlinear activation function is added after a second full-connection layer, and then the channel attention weights W1, W2 and W3 are multiplied with encoder block output characteristics F1, F2 and F3 channel by channel respectively to obtain attention fusion characteristics S1, S2 and S3;

the multi-layer decoder module consists of three decoder blocks, each decoder block consists of a convolution layer, a batch normalization layer and a ReLU layer, and the decoder blocks are connected by an up-sampling layer, wherein the up-sampling layer refers to transpose convolution, and is used for up-sampling the output characteristics of the encoder blocks at the bottom layer of the encoder module and the output characteristics of the decoder blocks of the decoder module; the input of the first decoder block of the multi-layer decoder module is a multi-scale feature T1 of the splicing operation of the transposed convolution value of the output feature F4 of the fourth encoder block of the multi-layer encoder module and the attention fusion feature S3, wherein the splicing operation is to splice the transposed convolution value of the encoder block feature F4 and the attention fusion feature S3 in the channel dimension, the channel number of the obtained multi-scale feature T1 is 2 times of the channel number of the attention fusion feature S3, and the size is consistent with S3; the input of the second decoder block of the multi-layer decoder module is the multi-scale input characteristic T1 of the last decoder module, and the multi-scale input characteristic T2 is obtained by processing the first decoder block and splicing the transposed convolution output characteristic and the attention fusion characteristic S2 of the attention mechanism-jump module on a space channel, wherein the channel number of the characteristic T2 is 2 times of the channel number of the attention fusion characteristic S2, and the size is consistent with S2; the input of the third decoder block of the multi-layer decoder module is the multi-scale input characteristic T2 of the last decoder block, the output characteristic of transposed convolution is spliced with the attention fusion characteristic S1 of the attention mechanism-jump module on a space channel through the processing of the second decoder block, and the multi-scale input characteristic T3 is obtained, wherein the channel number of the characteristic T3 is 2 times of the channel number of the attention fusion characteristic S1, and the size is consistent with S1; and the output characteristic M of the third decoder block is subjected to a space channel adjustment convolution layer to obtain a rough segmentation characteristic diagram.

4. The attention-based method for unsupervised color geologic image segmentation as defined in claim 1, wherein: in step 2, the SE attention mechanism is composed of a Squeeze operation, an expression operation and a weight operation;

the Squeeze operation compresses and codes the space information on each channel into the global characteristic z through global average pooling _c ；

Wherein F is _sq () Representing the definition of the squeze operation, C, G and W respectively represent the channel number, the height and the width of the feature map, and l _c A feature map representing a c-th channel, i, j representing pixel positions;

the accounting operation is carried out, global features obtained by the squeze operation are transmitted into two full-connection layers, the two full-connection layers are given a weight through assignment operation, and the mutual dependence s of different channels is constructed;

s＝D _ex (z,W)＝φ[g(z,W)]＝φ[W ₂ ρ(W ₁ ,z)]；

wherein F is _ex () Define for the specification operation that z is the squeze operation output, W ₁ And W is ₂ For channel weight, phi () is Sigmoid normalization function, rho () is ReLU activation function;

weighting the weight s obtained by the expression operation to the original input characteristics by the weight operation to obtain output and serve as input of the next stage;

5. The attention-based method for unsupervised color geologic image segmentation as defined in claim 1, wherein: the Felz clustering algorithm in the step 2 specifically comprises the following sub-steps:

step 2.1: constructing an undirected graph;

ω _ij ＝ω _c *euc_dis(v _i ,v _j )+ω _t *ssim_dis(v _i ,v _j )；

wherein omega _c And omega _t Is the weight of color similarity and texture similarity, used for adjusting the contribution that can also be in the weight; euc _dis (v) _i ,v _j ) For color similarity measurement, for measuring distance in color space, the expression is: andrespectively represent v _i And v _j RGB values of (a); ssim_dis (v) _i ,v _j ) For texture similarity measurement, the expression is: and->Representing the gray value of the corresponding pixel, +.>Is the brightness similarity, +.>Is contrast similarity, ->Is structural similarity, α is a weighting coefficient for balancing the importance of brightness, contrast, and structural similarity; the expression of the luminance similarity is: andis the average value of the brightness of the corresponding pixel, the average value of the brightness is obtained by the average gray scale, a is a constant, and is used for stable calculation and preventing divisionThe mother is zero; the expression of contrast similarity is: and->The brightness standard deviation of the corresponding pixel is obtained by the gray standard deviation, b is a constant and is used for stable calculation and preventing the denominator from being zero; the structural similarity is expressed as: /> Is->And->The luminance covariance of (2) is obtained by gray covariance, c is a constant for stable calculation and prevents the denominator from being zero;

Step 2.4: for the currently selected edge e _n Performing merging judgment, and setting the connected vertex as (v) _i ,v _j ) When edge e _n When the corresponding vertices do not belong to the same region, i.e. area_id (v _i )≠area_id(v _j ) And v _i Belonging to area C _i And v _j Belonging to area C _j Is between the regions of (1)When the distance is not larger than the minimum value of the distance in the two areas, carrying out area combination, and executing the step 2.5;

the inter-region pitch means the divided region C _i The maximum weight in the minimum spanning tree, i.e. the weight of the edge with the maximum dissimilarity in the corresponding region, is applied to Int_dis (C _i ) The expression is:max () is a maximum function; />Refers to region C _i Corresponding minimum spanning tree MST, wherein the parameter +.>Then the region C is represented _i All the included edge sets, ω (e) refers to the weight of a certain edge in the region; />Refers to belonging to region C _i Corresponding minimal spanning tree edge set +.>Edge e of (a);

Step 2.6: if n is less than or equal to m, selecting the next edge to continue to execute the step 2.4 according to the arrangement sequence in the step 2.3 until the final super-pixel segmentation graph is output.

6. The attention-based method for unsupervised color geologic image segmentation as defined in claim 1, wherein: in step 3, the super-pixel segmentation map is used as a pseudo tag, the coarse segmentation result is optimized based on the super-pixel segmentation map, the coarse segmentation image is optimized by using a cross entropy loss function, and the expression is as follows:

wherein y represents a 'pseudo label' obtained by the super-pixel segmentation map, p represents a rough segmentation map label obtained by the segmentation model, and the maximum clustering number is set according to experiments.

7. The attention-based color geologic image segmentation method without supervision as defined in any one of claims 1-6, wherein: the SE-UNet neural network is a trained network; the training process comprises the following steps:

(5) Evaluating the segmentation result by using a universal evaluation index in the image segmentation field; if the evaluation is qualified, finishing the training;

if it is not acceptable, the following operations are taken:

1) Super parameter tuning: adjusting the super parameters of the network, and finding out the optimal super parameter combination through cross verification or an automatic parameter adjustment method based on performance indexes;

2) Model structure optimization: changing the network structure of SE-UNet, increasing or reducing the layer number of the encoder and the decoder, adjusting the size of the convolution kernel, and enhancing the feature fusion capability of the attention-jump connection module so as to adapt to the characteristics of the geological image;

3) And (3) optimizing a clustering algorithm: and adjusting the number of clusters and related parameters in a clustering algorithm, and controlling factors influencing the segmentation performance in the super-pixel segmentation map.

8. The attention-based color geologic image segmentation method without supervision as recited in claim 7, wherein: and evaluating the performance difference of the algorithm color segmentation by adopting the pixel accuracy PA, the average cross-over ratio MIoU, the frequency-weight cross-over ratio FWIOU and the Dice coefficient.

9. An attention-based, unsupervised color geologic image segmentation system, comprising:

one or more processors;

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the attention-based color geologic image segmentation method as defined in any of claims 1 to 6.