CN115393293A

CN115393293A - Electron microscope red blood cell segmentation and positioning method based on UNet network and watershed algorithm

Info

Publication number: CN115393293A
Application number: CN202210964897.4A
Authority: CN
Inventors: 彭爽; 陶丽
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2022-08-12
Filing date: 2022-08-12
Publication date: 2022-11-25

Abstract

The invention relates to the technical field of electron microscope cell diagnosis, and particularly discloses an electron microscope red blood cell segmentation and positioning method based on a UNet network and a watershed algorithm, which comprises the following steps: s1, constructing a UNet-based multi-attention segmentation model to segment a cell foreground and a cell background of the electron microscope red blood cells; and S2, positioning and segmenting cells of the segmented graph based on a watershed algorithm. The invention firstly adopts methods such as data enhancement and the like to expand the richness of a data set, and then adopts a designed multi-attention segmentation model Aca-UNet to segment the foreground and the background of the cell. Finally, after distance conversion and watershed correction are carried out on the well-segmented graph, a single cell area is searched, and positioning and segmentation of the cell are completed. The method applies a computer-aided algorithm to an electron microscope data set for the first time, and in an experiment, the dice coefficient of dense cell segmentation reaches 88.94%, and the dice coefficient of high-magnification cell segmentation reaches 93.10%.

Description

Electron microscope red blood cell segmentation and positioning method based on UNet network and watershed algorithm

Technical Field

The invention relates to the technical field of electron microscope cell diagnosis, in particular to an electron microscope red blood cell segmentation and positioning method based on a UNet network and a watershed algorithm.

Background

Electron microscopy images have become more and more widely used in recent years as an important means for detecting human diseases. Traditional electron microscope image pathological diagnosis mainly judges and counts the form and size of cells in the electron microscope image through pathologists, so that the work efficiency of doctors is undoubtedly greatly reduced, and the diagnosis result has strong subjectivity. Therefore, the computer technology is of great significance for assisting doctors in analyzing the electron microscope images.

On the other hand, with the development of deep learning and computer hardware, more and more deep learning methods are used in medical image processing. By using Weighted Res-UNet, the researcher well solves the problem of fundus blood vessel segmentation by weighting loss calculation with different backgrounds and foregrounds and a UNet model based on a residual error network. A researcher uses the UNet model to detect the spots of the T cells, and compared with the traditional segmentation method based on region growth, the segmentation precision is greatly improved, and the feasibility of the deep learning model in the field of medical images is verified. In recent years, application of a deep learning model to medical influence is not in the way, researchers propose that a brain tumor image is segmented based on a multi-modal feature fusion image segmentation network, and the addition of inter-stage fusion and interlayer jump connection is verified to have an effect of improving the segmentation effect of the multi-modal medical image; a medical image segmentation network based on attention and pyramid fusion is provided, so that the problem of edge blurring of segmented images is effectively solved; an attention mechanism module fusing edge enhancement is provided, cross-channel information with position perception and more definite edge contour information are captured, and therefore segmentation accuracy is improved. The work above all provides practical reference for solving problems in medical data by using a deep learning model.

In contrast, in many cases, a red blood smear data set is used as a study target in the study of the fractional count of abnormal red blood cells. However, the above work is based on the characteristics of high contrast and abundant data volume of the red blood image cell data set, and how to accurately and quickly partition and locate the blood cells becomes a key difficulty for the characteristics of low contrast and rare data volume of the electron microscope image data image.

Disclosure of Invention

The invention provides an electron microscope red blood cell segmentation and positioning method based on a UNet network and a watershed algorithm, which solves the technical problems that: how to accurately and quickly segment and position blood cells.

In order to solve the technical problems, the invention provides an electron microscope erythrocyte segmentation and positioning method based on a UNet network and a watershed algorithm, which comprises the following steps:

s1, constructing a multi-attention segmentation model based on a UNet network to segment a cell foreground and a cell background of the electron microscope red blood cells;

and S2, positioning and segmenting cells of the segmented graph based on a watershed algorithm.

Further, in the step S1, the multi-attention segmentation model includes an encoder network and a decoder network designed according to a UNet network, the encoder network includes a first residual block, a second residual block, a third residual block, a first self-attention mechanism block and a second self-attention mechanism block from a first layer to a fifth layer, the decoder network includes a first depth separable volume block, a second depth separable volume block, a first decoder block and a second decoder block from a fourth layer to the first layer, and the multi-attention segmentation model further includes 4 feature selection modules disposed at a skip connection stage of a corresponding layer;

the first self-attention machine block and the second self-attention machine block extract low-dimensional features of the image by convolution operation, and perform remote modeling on high-dimensional features of the image by imitating a Transformer;

under the condition that the first depth separable volume block and the second depth separable volume block keep the same size and resolution in the whole network, separating a mixed space dimension and a channel dimension by means of depth separable convolution and Point wise convolution;

and the characteristic selection module is used for selecting characteristics during characteristic fusion.

Further, the feature selection module comprises a channel attention mechanism module and a space attention mechanism module;

in the channel attention mechanism module, after CNN (CNN) features of an image are extracted, a feature layer firstly obtains a feature graph with the size of C1 x 1 through a self-adaptive average pooling layer, wherein C is the number of feature channels, then convolution weighting is carried out through a 1-dimensional convolution kernel with the convolution kernel size of k to realize information interaction among different channels, the weight of each feature channel is obtained through a sigmoid function, a channel weighted graph is obtained by weighting an integral feature graph, and finally the feature graph after channel weighting is input into the space attention mechanism module;

in the spatial attention mechanism module, maximum pooling and average pooling operations are respectively carried out on the feature dimensions of input feature maps, the feature dimensions are converted into 1 × h × W, and h and W are respectively the height and the width of an original image, the two feature maps are spliced and then sent into a convolution kernel with the size of the convolution kernel adjustable to carry out information interaction and feature reconstruction, and finally spatial feature weights are obtained and weighted with the feature maps processed by channel attention to obtain a final feature map.

Further, in the first self-attention mechanism block or the second self-attention mechanism block, feature encoding is performed on the feature map firstly, the size of each encoded word vector is P × P, the number of each encoded word vector is H × W/(P × P), and then position information is encoded by using a depth separable convolution operation.

Further, in the step S1, the loss function adopted in the process of training the multi-attention segmentation model is as follows:

wherein, the first and the second end of the pipe are connected with each other,

which represents the loss of the BCE and,

representing a DICE loss, and having:

where p and q represent the predicted pixel value and its actual pixel value, respectively, w ₁ 、w ₂ Respectively representing the weight when the foreground and the background are lost in calculation, k represents the kth pixel point, n is the total number of the pixel points, X and Y respectively represent a prediction element and a real element, | X |, N |, Y |, represent the intersection of the prediction element and the real element, | X |, Y |, represent the respective element number.

Preferably, w ₁ :w ₂ ＝2:1。

Preferably, the data set used for training and testing the multi-attention segmentation model includes 48 dense cell maps and 42 high magnification images, and the images obtained by data amplification of the dense cell maps and the 42 high magnification images, and the corresponding label map is a binarized image obtained by filling the target area of the original image with an image tool and then performing binarization and denoising processing.

Further, the step S2 specifically includes the steps of:

s21, roughly dividing the foreground and background of the electron microscope image obtained in the step S1 through a deep learning model, graying the binary image with the division result of 0 or 1, and displaying the binary image as a black-and-white image, wherein black is a background region irrelevant to electron microscope cells or a non-cellular object, and white is a target cell region;

s22, taking a background area obtained by expanding the original rough segmentation image as the background of the image, and taking a foreground area obtained by distance conversion of a screened proper distance value as the foreground of the image;

s23, marking the area between the image foreground and the background as an unknown area, and dividing the foreground and the background of the unknown area by adopting a watershed algorithm;

and S24, judging a connected region of the divided images, dividing an independent cell region and obtaining position information.

The invention provides an electron microscope erythrocyte segmentation and positioning method based on a UNet network and a watershed algorithm. Finally, after distance conversion and watershed correction are carried out on the well-segmented graph, a single cell area is searched, and positioning and segmentation of the cell are completed. The method applies the computer-aided algorithm to the electron microscope data set for the first time, and in the experiment, the dice coefficient of dense cell segmentation reaches 88.94%, and the dice coefficient of high-magnification cell segmentation reaches 93.10%.

Drawings

FIG. 1 is an exemplary diagram of two types of electron microscope image data sets provided by an embodiment of the invention;

FIG. 2 is a diagram illustrating the effect of image preprocessing provided by an embodiment of the present invention;

fig. 3 is a diagram of an Aca-UNet network structure according to an embodiment of the present invention;

fig. 4 is a network diagram of UNet framework provided by an embodiment of the present invention;

FIG. 5 is a block diagram of a feature selection module provided by an embodiment of the present invention;

FIG. 6 is a block diagram of a self-attention device block provided by an embodiment of the present invention;

FIG. 7 is a block diagram of a depth separable convolution module provided by an embodiment of the present invention;

FIG. 8 is a diagram illustrating the steps for counting cells by segmentation based on the watershed algorithm according to an embodiment of the present invention;

FIG. 9 is an exemplary graph of a dense cell division count provided by an embodiment of the present invention;

FIG. 10 is a highly magnified cytometric map provided by an embodiment of the present invention;

FIG. 11 is a comparison of the effects of different model test sets provided by the embodiments of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are given solely for the purpose of illustration and are not to be construed as limitations of the present invention, including reference to and illustration of the accompanying drawings, which are not to be construed as limitations of the scope of the invention, since many variations thereof are possible without departing from the spirit and scope of the invention.

Aiming at the characteristics of low contrast and small data volume of an electron microscope erythrocyte image, the embodiment of the invention provides an electron microscope erythrocyte segmentation and positioning method based on a UNet network and a watershed algorithm, which comprises the following steps:

s1, constructing a multi-attention segmentation model (called an Aca-UNet network in the example) based on the UNet network, and segmenting the cell foreground and the cell background of the electron microscope red blood cells;

The data set used in the method is from a first hospital affiliated to Chongqing medical university, and is different from the traditional blood picture, the contrast of an electron microscope image is low, the resolution is high, the image is gray, and the mixed mixture of the foreground and the background is serious. Therefore, the method is not suitable for the traditional segmentation method such as the Otsu Threshold (OTSU) and the like. A suitable deep learning model is designed for positioning, dividing and counting cells. As used herein, the electron microscope image data set can be divided into a dense cell data set and a high magnification data set, wherein a portion of the image data is shown in fig. 1, wherein (a) of fig. 1 is a high magnification image and (b) is a dense cell image.

For the original drawing illustrated in fig. 2 (a), a corresponding label drawing needs to be generated. In order to ensure that more boundary information is kept as much as possible, a traditional anchor point selection area defining mode is not adopted, target area filling is carried out by using image tools such as Photoshop (the region of interest, namely a red blood cell area, is filled to be white), but the finally filled image is not a pure binary image, as shown in fig. 2 (b), and a plurality of noise points with insignificant darkness exist, so that the image which is filled manually needs to be subjected to binary de-noising processing, and the obtained label image is as shown in fig. 2 (c).

Because of the particularity of the electron microscope image, the data provided by the cooperation mechanism is limited, and only comprises 48 Dense cytograms (density images) and 42 High magnification images (High images), so that the semanteme richness of the data set is expanded by using space transformation such as overturning, rotating and the like, and meanwhile, in order to ensure the relevant information of the electron microscope image original image, the relevant operation such as color transformation and the like is avoided. After the data amplification operation is carried out, 20% of data in the data are selected as a test set, and the other data are selected as training sets, so that data characteristic learning and model verification of the electron microscope image are carried out.

The Aca-UNet network proposed herein is based on the encoder and decoder structure of UNet network, mainly including 4 parts: (1) UNet basic skeleton; (2) a feature selection module; (3) A self-attention mechanism module of high-level semantic features (comprising a self-attention mechanism block 1 and a self-attention mechanism block 2); (4) Depth separable convolution modules (including depth separable convolution block 1, depth separable convolution block 2). The overall simple structure of the model is shown in fig. 3. Specifically, the multi-attention segmentation model includes an encoder network designed according to the UNet network, a decoder network, the encoder network including a first residual block (i.e., residual block 1), a second residual block (i.e., residual block 2), a third residual block (i.e., residual block 3), a first self-attention block (i.e., self-attention block 1), a second self-attention block (i.e., self-attention block 2) of a first layer to a fifth layer, the decoder network including a first depth separable volume block (i.e., depth separable volume block 1), a second depth separable volume block (i.e., depth separable volume block 2), a first decoder block (i.e., decoder block 1), and a second decoder block (i.e., decoder block 2) of a fourth layer to the first layer, and the multi-attention segmentation model further including 4 feature selection modules (denoted by FS) provided at a skip connection stage of a corresponding layer.

The basic skeleton of the UNet is a backbone structure of an Aca-UNet network, and the core of the basic skeleton of the UNet is a down-sampling module for extracting image features and an up-sampling module for recovering image resolution. The feature selection module is a module designed for fully fusing image channel and spatial feature information. The self-attention mechanism module performs global modeling for the image in high-level semantic information. The depth separable convolution module can properly improve the performance of the model under the condition of reducing the parameter quantity of the model.

In particular, UNet's basic skeleton, as a classical network in the field of medical image segmentation, has also evolved many varieties in recent years. But its core idea has not changed all the time, and its structure mainly consists of down sampling path (encoder network) for capturing global information and up sampling path (decoder network) for gradually restoring pixel scale, which in recent years performs well in many segmentation tasks. In this context, some adjustments are made on this basis following the basic architecture of UNet networks. And, because of the specificity and specificity of the task of segmenting the cell image, the relevant characteristic channel number of the network is adjusted to reduce the model overhead. The UNet network framework structure employed herein is shown in fig. 4. As can be seen from fig. 3 and 4, the input image of the Aca-UNet network is 3 × 512 × 512, the number of channels of the image is 32 after downsampling by the residual block 1, the number of channels of the image is 64 after downsampling by the residual block 2, the height and width of the image are 1/2 of the original image, the number of channels of the image is 128 after downsampling by the residual block 3, the height and width of the image are 1/4 of the original image, the number of channels of the image is 256 after downsampling by the attention-assisting block 1, the height and width of the image are 1/8 of the original image, and the number of channels of the image is 512 after downsampling by the attention-assisting block 2, and the height and width of the image are 1/16 of the original image. The specific structure of the residual block is shown in the right half of fig. 3.

In order to solve the problem of information loss when the UNet module performs downsampling, a feature selection module is designed in the UNet jump connection stage to perform feature selection during feature fusion. The feature selection module is shown in FIG. 5 and includes a channel attention module and a spatial attention module. Firstly, after CNN features of an image are extracted, a feature layer obtains a feature map with the size of C1 x 1 through a self-adaptive average pooling layer, wherein C is the number of feature channels, and then convolution weighting is carried out through a 1-dimensional convolution kernel with the convolution kernel size of k, so that information interaction among different channels is realized. And obtaining the weight of each characteristic channel through a sigmoid function, and weighting the whole characteristic graph to obtain a channel weighted graph. And then inputting the feature maps after channel weighting into a space attention mechanism module, performing maximum pooling and average pooling (averaging) operation on the feature dimensions of the feature maps, converting the feature dimensions into 1 × H × W (H and W are respectively the height and width of the original image), splicing the two feature maps, sending the two feature maps into a convolution kernel with the adjustable convolution kernel size for information interaction and feature reconstruction, and finally obtaining the space feature weight. And weighting the feature map with the feature map processed by the channel attention to obtain a final feature map.

In order to model long-distance semantic dependence of a feature map and better extract semantic features of an image, a convolution operation is used for extracting low-dimensional features of the image, and a self-attention mechanism module is constructed by imitating a Transformer to model high-dimensional features of the image in a long distance. The module as a whole is shown in fig. 6. The feature map is firstly subjected to feature coding, the size of each coded word vector is P x P, the number of the coded word vectors is H x W/P x P, in a position information coding mode, position information is coded by adopting a depth separable convolution operation (DW), and the module can effectively code positions and has less parameter quantity. In the concrete implementation, firstly, the image is subjected to feature extraction through a convolution block corresponding to a UNet network to form a feature map, and the feature map is sent to a convolution module with a convolution kernel size of k and a step pitch of s to carry out convolution and word embedding operation. And then, carrying out dimension transformation on the feature graph of the word embedding, namely transforming the feature graph with the size of C H W into a word block with the size of P C, wherein P is the product of H and W. And then, after normalizing the word blocks after word vectorization, in order to fully explore the mutual relation between each word block, long-distance modeling is carried out by adopting a multi-head attention mechanism. The core idea is that a q, k and v matrix is formed for each word block, the q of each word block is used for carrying out similarity calculation on the k of the rest word blocks, and then the similarity calculation is multiplied by a v value corresponding to the similarity calculation to obtain a final result. And then, sequentially carrying out residual connection and normalization, learning potential features by the fully-connected layer, carrying out convolution position coding by the depth separable convolution, learning hidden features by the fully-connected layer, carrying out dimensionality reduction on the word block to obtain an image feature block of C H W, and obtaining a final result.

In order to alleviate the problem that the parameters of the segmentation model are too large and are not easy to deploy, a depth separable convolution module is adopted to model the feature graph, the module can separate mixed space dimensions and channel dimensions by means of Depthwise convolution (DW convolution) and Pointwise convolution under the condition that the size and the resolution of the whole network are kept the same, but the module can reduce the inference speed and the training speed of the model while reducing the parameters of the model, so the DW convolution is adopted only when the model is sampled. The specific structure of the depth separable convolution module is shown in fig. 7. The method is specifically realized in that a convolution module with a product kernel size of k and a step pitch of s is adopted to carry out convolution operation on input image features, and the size of the image features after convolution is changed into 1/s of the original size. And performing convolution operation on the image after the convolution operation through a Depthwise convolution, wherein if the number of the input image characteristic channels is C, the number of convolution kernels performing the Depthwise convolution is C, and each convolution kernel performs convolution operation with one channel of the input characteristic respectively to learn the relevant characteristic. After the GELU activation function and batch normalization operation, the image features are subjected to convolution operation by Pointwise convolution, wherein the Pointwise convolution is a convolution kernel with the kernel size of 1, and the number of the Pointwise convolution kernels determines the number of feature channels of the output image. And finally, obtaining the final image characteristics after the GELU activation function and batch normalization operation.

A mixed Loss function is adopted, the difference between a predicted characteristic diagram and a real characteristic diagram is measured by the numerical sum of DICE Loss and BCE Loss, and the overall Loss function formula is shown as a formula (3). The sum of BCE Loss and DICE Loss is referred to herein as the total Loss. Where DICE Loss is shown in equation (1), p and q represent the predicted pixel value and its actual pixel value, respectively. w is a ₁ 、w ₂ Representing the weights at which the foreground and background are lost in the calculation, respectively. The text is a foreground and background problem of binary classification, and in order to better distinguish single cells, the text increases the weight of a model segmentation background so that the model focuses more on the image background and boundaries between each cell. The BCE Loss is shown in formula (2), where | X ∞ Y | represents the intersection of the predicted element and the real element, | X |, and | Y | represent the respective number of elements.

The watershed algorithm is a region image segmentation algorithm based on a mathematical morphology theory, and is widely used in a plurality of image segmentation post-processing. After the electron microscope cell picture is initially segmented, the segmented image is positioned and counted by using a watershed algorithm. The flow chart is shown in fig. 8, and comprises the operations of roughly segmenting the foreground and background of an electron microscope image through a deep learning model, performing opening operation, performing distance transformation, determining the foreground and the background, marking unknown regions, performing watershed algorithm transformation, searching contours and the like.

1) After the deep learning model structure is trained properly, performing preliminary segmentation on a foreground and a background of an electron microscope image, wherein the segmentation result is a binary image of 0 or 1, and displaying the binary image as a black-white image after graying, wherein black is a background region irrelevant to electron microscope cells or a non-cellular object, and white is a target cell region;

2) The method is characterized in that the opening operation is an extensive method in image processing, isolated points are removed in a mode of firstly performing corrosion operation and then performing expansion operation, and two targets which are finely connected together are separated;

3) The distance transformation was first proposed by scholars in 1966 and is widely used in the image field nowadays, the distance transformation is used for transforming the binary image of the foreground and the background into a gray image, namely a gray image, wherein the gray value of a pixel in the image is the distance between the pixel and the nearest background pixel, and the foreground region of the image cell is obtained by screening a proper distance value;

4) Taking a background area obtained by expanding an original rough segmentation image as the background of the image, screening a proper distance value, and performing distance transformation to obtain a foreground image as the foreground of the image;

5) In order to finely divide the cell foreground and background, the region between the foreground and the background of the selected image is an unknown region, and the watershed algorithm is adopted to divide the foreground and the background of the unknown region;

6) And judging a connected region of the divided image, dividing an individual cell region and obtaining position information.

Correspondingly, step S2 substantially includes the steps of:

and S24, judging a connected region of the divided images, dividing an individual cell region and obtaining position information.

Exemplary segmentation counts are shown in fig. 9 and 10, where fig. 9 corresponds to a dense cell map and fig. 10 corresponds to a high magnification.

The following are experiments and analyses of the results thereof.

(1) Preliminary segmentation of images

After denoising and data enhancement are carried out on the image data set, the initial learning rate is set to be 0.001, the batch size is set to be 4, the number of training rounds is set to be 40, gradient updating is carried out by adopting an adam optimizer, and a final training model is obtained after training of all rounds is completed. The performance of the proposed model is compared with the recently popular segmentation model, and the results of comparing dense cell datasets, i.e. enlarged cell datasets, are shown in table 1 and table 2, respectively. The model provided by the invention has poor performance on the high-power amplification training data set, and presumably because the single cell area of the high-power amplification data set is too large and the distance between single cells is too small, the self-attention mechanism cannot well construct global context information through the connection between the coding word blocks. Therefore, the DW convolution is used for extracting the image characteristics, and the comparison result is shown in Table 2.

TABLE 1 comparison of dense cell training results

TABLE 2 magnified comparative plot of cell training effect

In fig. 11, the text compares the comparison of the effects of different model test sets, and it can be seen that the model in the text has stronger expressive force on over-segmentation, under-segmentation and detail processing, and is closer to the real label.

(2) Watershed segmentation post-processing and counting

In the aspect of measuring the segmentation and counting precision, TP, TN, FP and FN are used for measuring the performance of the algorithm. Where TP represents the number of correctly divided cells (i.e., the number of single cells divided into single cells), TN represents the number of incorrectly divided cells (i.e., the number of single cells divided into multiple cells or not divided), FT represents the number of correctly divided cells, and FN represents the number of divided cells of an object other than cells. Since objects that are not cells are filtered primarily at the time of primary segmentation, FN and FT values are determined by the model performance presented herein. The overall segmentation effect is measured herein as the proportion of correctly segmented cells to total cells (ACC). Several images were randomly selected for testing, some of the images tested are shown in fig. 11, and the test indices are shown in table 3 (numbers 1 to 3 correspond to dense cell maps 1 to 3 in fig. 11, and numbers 4 and 5 correspond to

high magnification images

1 and 2, respectively, in fig. 11).

TABLE 3

As can be seen from table 3 and fig. 11, the dense cell segmentation effect is perfect, and the high magnification cell image has a poor segmentation effect due to a high cell overlapping degree, but the overall segmentation effect is good, and meets the segmentation requirements of the text.

(3) Ablation experiment of attention mechanism

Ablation experiments were performed on the proposed model to verify the validity of the model. The overall model is shown in fig. 3, the self-attention mechanism module (ACa-UNet _ mini 1) and the feature selection module (ACa-UNet _ mini 2) are removed in sequence, the transformation of the model performance is observed, and the overall effect is shown in table 4.

Table 4 ablation experiment of attention mechanism

It can be seen that the Aca-UNet proposed by the present invention is optimal in all aspects, and the self-attention mechanism module and the feature selection module substantially contribute to the effect of the present invention.

(4) Ablation experiment with loss function weight

In order to explore the influence of the loss function on the performance of the model, a relevant ablation experiment is carried out on the loss function. The specific gravity of the background and the foreground is set to be 1, 2 and 1 respectively, and the data set training is carried out by using the model provided by the text. The training effect is shown in Table 5 below by Loss 1, loss 2, and Loss 3, respectively.

TABLE 5 loss function ablation experiment

As shown in table 5, when the foreground to background weight ratio is 2 ₁ :w ₂ = 2), the training effect was the best, verifying the rationality of the background region that is more concerned here for segmenting out individual cells.

In conclusion, the self solution is provided for the problem of automatic division and counting of the electron microscope cells, and the blank of the field is solved. For the dense electron microscope cell image, the effectiveness of image semantic feature extraction is improved by adopting a feature channel selection fusion and a self-attention mechanism remote modeling mode. The high magnification is due to the limitation of self semantic features, namely the feature that the single cell area is too large and the cell interval is small. The method adopts channel selection fusion and depth separable convolution to extract semantic features, and then combines a watershed algorithm to perform cell segmentation and counting. Wherein the dice coefficient of dense cell segmentation reaches 88.94%, the dice coefficient of high magnification cell reaches 93.10%, and the counting and watershed processing indexes are all in line with expectations.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims

1. An electron microscope red blood cell segmentation and positioning method based on a UNet network and a watershed algorithm is characterized by comprising the following steps:

s1, constructing a multi-attention segmentation model based on a UNet network to segment a cell foreground and a cell background of electron microscope erythrocytes;

2. The UNet network and watershed algorithm-based electron microscope red blood cell segmentation and localization method according to claim 1, wherein in the step S1, the multi-attention segmentation model comprises an encoder network and a decoder network designed according to the UNet network, the encoder network comprises a first residual block, a second residual block, a third residual block, a first self-attention mechanism block and a second self-attention mechanism block from a first layer to a fifth layer, the decoder network comprises a first depth separable rolling block, a second depth separable rolling block, a first decoder block and a second decoder block from a fourth layer to the first layer, and the multi-attention segmentation model further comprises 4 feature selection modules arranged at a jump connection stage of the corresponding layer;

separating a hybrid spatial dimension and a channel dimension by means of depth-separable convolution and Point wise convolution, with the first depth-separable volume block and the second depth-separable volume block remaining the same size and resolution throughout the network;

3. The UNet network and watershed algorithm-based electron microscope erythrocyte segmentation and positioning method according to claim 2, wherein:

the characteristic selection module comprises a channel attention mechanism module and a space attention mechanism module;

in the space attention mechanism module, maximum pooling and average pooling operations are respectively carried out on feature dimensions of input feature maps, the feature dimensions are converted into 1 × h × W, h and W are respectively the height and the width of an original image, the two feature maps are spliced and then sent into a convolution kernel with the size of the convolution kernel adjustable to carry out information interaction and feature reconstruction, finally, space feature weights are obtained, and the space feature weights and the feature maps processed by channel attention are weighted to obtain a final feature map.

4. The UNet network and watershed algorithm-based electron microscope erythrocyte segmentation and positioning method according to claim 3, wherein: in the first or second self-attention mechanism block, feature encoding is performed on the feature map, the size of each encoded word vector is P × P, and the number of each encoded word vector is H × W/(P × P), and then position information is encoded by using a depth separable convolution operation.

5. The UNet network and watershed algorithm-based electron microscope red blood cell segmentation and localization method according to claim 4, wherein in the step S1, the loss function adopted in the process of training the multi-attention segmentation model is as follows:

wherein the content of the first and second substances,

which represents the loss of the BCE,

represents a loss of DICE and has:

6. The UNet network and watershed algorithm-based electron microscope erythrocyte segmentation and positioning method according to claim 5, wherein w is ₁ :w ₂ ＝2:1。

7. The method for segmenting and positioning erythrocytes of an electron microscope based on the UNet network and the watershed algorithm as claimed in any one of claims 1 to 6, wherein the data set adopted for training and testing the multi-attention segmentation model comprises 48 dense cytograms and 42 high magnification images and images obtained by data amplification of the images, and the corresponding label images are binarized images obtained by filling target areas of the original images by using an image tool and then performing binarization denoising processing.

8. The method for segmenting and positioning the erythrocytes of the electron microscope based on the UNet network and the watershed algorithm according to claim 1, wherein: the step S2 specifically includes the steps of:

s21, roughly dividing the foreground and the background of the electron microscope image obtained in the step S1 through a deep learning model, graying the binary image with the division result of 0 or 1, and displaying the binary image as a black-white image, wherein black is a background area irrelevant to electron microscope cells or a non-cellular object, and white is a target cell area;

s23, marking the region between the image foreground and the background as an unknown region, and dividing the foreground and the background of the unknown region by adopting a watershed algorithm;