CN112446892A

CN112446892A - Cell nucleus segmentation method based on attention learning

Info

Publication number: CN112446892A
Application number: CN202011272095.4A
Authority: CN
Inventors: 何勇军; 赵晶
Original assignee: Heilongjiang Jizhitong Intelligent Technology Co ltd
Current assignee: Heilongjiang Jizhitong Intelligent Technology Co ltd
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-03-05

Abstract

The invention discloses a cell nucleus segmentation method based on attention learning, and relates to a cell nucleus segmentation problem in intelligent pathological diagnosis. The intelligent pathological diagnosis technology utilizes a deep learning technology to segment and identify abnormal cells in the cell image. However, there are fewer models for segmentation of nuclei in cellular images. There are the following problems: (1) the segmentation accuracy is low because the problems of overlapping nuclei and non-obvious boundaries are not considered. (2) Contextual information of the cell nucleus edge is not considered, so that under segmentation or over segmentation is caused, and the result of subsequent classification is influenced. Therefore, a cell nucleus segmentation method based on attention learning is provided. Experiments show that: the model can effectively solve the problems of segmentation of overlapped cells, under-segmentation or over-segmentation of unclear boundaries and the like. The invention is applied to the nucleus segmentation in the intelligent pathological diagnosis.

Description

Cell nucleus segmentation method based on attention learning

Technical Field

The invention relates to intelligent pathological diagnosis, in particular to nuclear segmentation

Background

Cervical cancer is the second largest killer threatening the health of women. One female dies of cervical cancer every two minutes worldwide. Early cervical cancer can be completely cured, so early diagnosis and treatment are effective means for dealing with cancer outbreaks. Liquid-based thin-layer cell detection is the most common cytology inspection technology for cervical cancer internationally at present, and can find partial precancerous lesions and microbial infection. However, the traditional pathological diagnosis completely depends on manual operation and manual reading by visual observation of doctors. But the workload is large, the diagnosis rate is low in accuracy, and large-area screening cannot be popularized. With the development of computer image processing and artificial intelligence technology, automatic pathological diagnosis technology is developed. Cell segmentation is the basis of analysis in this technique. The purpose of cell segmentation is to accurately locate and outline cells, providing input for subsequent processing. However, due to the various human cell nuclei, large color difference, uneven staining and pathological changes of the human cell nuclei, part of the cell nuclei are similar to the color of cytoplasm or background, and the boundary area is not obvious. Moreover, a large number of adhered or overlapped cells also cause the problems of high segmentation difficulty, low precision and easy generation of under-segmentation or over-segmentation.

The current segmentation methods are of two types, one is a traditional segmentation method, and the traditional segmentation method comprises a threshold method, region-based segmentation, undirected graph-based segmentation, active contour segmentation, fuzzy clustering and the like. The threshold method divides the pixel points of the image into a plurality of classes by setting different characteristic thresholds, and has the advantages of simple realization, small calculated amount, stable performance and the like. The threshold method does not utilize the spatial information of pixels, cannot segment a target with large internal variance, and is easy to be interfered by noise. Region-based segmentation, typically using a watershed segmentation method, uses spatial information of the image to classify pixels into regions. In order to improve the watershed segmentation performance, researchers provide methods of self-defining mark points, improving the implementation principle of an algorithm, re-segmenting or merging segmentation results and the like. In addition, the region segmentation method also comprises simple linear iterative clustering, mean shift and other super-pixel segmentation methods. Based on undirected graph segmentation, Graphcut is taken as a representative, the basic idea is to establish a graph, image pixels or superpixels are taken as image vertexes, and the optimization aim is to find a cut so that sub-graphs are not connected. Iterative versions of Graphcut achieve better performance, but such methods suffer from high temporal and spatial complexity. The basic idea of the active contour model is to use a continuous curve to express the target contour, and convert the segmentation process into a process of solving the minimum value of the energy functional. Typical examples of the method include a parametric active contour model represented by a snake model and a geometric active contour model represented by a level set method. Most medical images have ambiguity, the fuzzy clustering method introduces the concept of 'membership' in the fuzzy theory, and pixel points are distributed to areas with high membership, so that the segmentation accuracy is improved.

Another method is a deep learning method, UNet first applies a deep learning technique to medical image segmentation, and advances the segmentation of cell images and liver CT images. The jump join in UNet directly combines shallow features with deep features easily creates a semantic gap. UNet + + mirrors the dense connection concept of DenseNet. Dense convolution blocks are added in jump connection, and semantic gap is reduced. Brugger et al, proposes a UNet model with partially reversible sequences. The reversible sequence can realize the back propagation without storing the intermediate result of the previous propagation, thereby reducing the memory consumption in the training process. CE-Net replaces UNet's coding module with a pre-trained model, proposing a context coding network to capture more advanced information and preserve spatial information for 2D medical image segmentation. Mainly comprises three modules: the device comprises a feature coding module, a context extraction module and a feature decoding module. The method performs better than UNet on different 2D medical image segmentation tasks. 3D U2-Net changes the way convolution is done in UNet. The main contribution is to convert the standard convolution in the UNet structure into a domain adapter based on deep separable convolution, and the parameter quantity of the network is reduced. Oktay et al propose a new medical image Attention Gate (AG) model. The model automatically focuses on target structures of different shapes and sizes. The AG-trained model suppresses irrelevant areas in the input image through implicit learning, while highlighting salient features that are useful for a specific task. The AG can be easily integrated into standard convolutional neural network architectures, reducing the time for model prediction.

Most of the above models are for image segmentation in imaging, and few models are for segmentation of cell nuclei in TCT images. There are mainly the following problems: (1) the segmentation accuracy is low because the problems of overlapping nuclei and non-obvious boundaries are not considered. (2) Contextual information at the nuclear boundary is not taken into account, resulting in under-segmentation or over-segmentation. Therefore, a cell nucleus segmentation method based on attention learning is provided. The context coding layer comprises four groups of parallel hole convolutions, and the hole convolutions with different expansion rates can extract the characteristics of the receptive fields with different sizes. The fusion of the different receptive fields may relate to contextual information in the image. The method can effectively solve the problems of under-segmentation and over-segmentation caused by unclear boundaries. The attention learning mechanism comprises a scaling module, a coding and decoding module and attention loss. The scaling module can obtain more image detail information. The coding and decoding module learns the attention weight matrix and solves the problem that the traditional attention mechanism is limited by a specific task. Attention loss can define a remarkable area, and the edge of the cell nucleus is taken as an area needing learning in the text, so that the segmentation problem of the overlapped cell nucleus is solved.

Disclosure of Invention

The invention aims to solve the problem of nucleus segmentation in intelligent pathological diagnosis and provides a nucleus segmentation method based on an attention learning mechanism.

The above object of the invention is mainly achieved by the following technical solutions, and the model structure and the training method are shown in fig. 1:

s1, preparing a training data set comprises four steps:

s11, selecting an image: the nuclear images were obtained from 100 cervical smears and the exfoliated cells were collected from populations of different ages and different disease states, including all lesion levels for the TBS diagnostic criteria.

S12, marking cell nucleus: manually circling the outline of the cell nucleus, and storing the position information of the outline of the cell nucleus in each image into a labeling file.

S13 generates a labeled graph: firstly, creating a completely black single-channel image (the gray value is 0), then reading the position information in the annotation file, and setting the gray value of each cell nucleus area to be 255.

S14, cutting the picture: the cell nucleus image and the label map are cropped simultaneously, with a crop size of 515 x 512. Ensure that the nuclear image and the marker map are consistent.

S2, preparing an attention learning loss label chart, comprising the following steps:

s21, obtaining a loss signature by calculation or image processing operation: and respectively detecting edge images in the X direction and the Y direction of the marker image by using a sobel operator, and fusing the two images to obtain a loss marker image.

S22, loading to a memory after corresponding: after the cell nucleus images, the marker maps and the loss marker maps are in one-to-one correspondence, the whole sequence is disordered, and then the cell nucleus images, the marker maps and the loss marker maps are loaded into the memory in batches.

S3, the attention mechanism is widely applied in the semantic segmentation field, and can be used for emphasizing or selecting important information of a target processing object and suppressing some irrelevant detailed information. The attention weight is obtained through a series of calculations in a general attention mechanism. This attention mechanism can only accomplish a specific task, and the attention is not controllable. Therefore, the present invention provides an attention learning mechanism, which comprises a scaling module, a coding/decoding module and attention loss as shown in fig. 2. Firstly, the input feature map is enlarged to 2 times of the original image, and then a single-channel feature matrix is obtained by using the coding and decoding module. And (4) recovering the original resolution ratio of the feature matrix after the feature matrix is calculated by a Sigmod function, and obtaining the attention matrix. The output of the attention learning structure, o (x), is calculated as follows:

in the formula, f (x) represents an input feature map, and a (x) represents an attention weight matrix.

Note that the zooming module in the mechanics learning mechanism zooms in the edge of the overlapping cell nuclei, so that the network obtains more detailed features. The encoder then encodes the global features of the image and the decoder learns the attention weights. The attention loss makes the attention mechanism not limited by specific tasks, and the advantages are further highlighted.

S4, in the image segmentation, the feature extraction includes convolution and pooling operations. Pooling may increase the receptive field of the convolution kernel and improve computational efficiency. Image segmentation is a classification at the pixel level, so an upsampling operation is required to restore the resolution of the image. However, there is always some information loss in the process of down-sampling and up-sampling. The hole convolution solves this problem to some extent. Hole convolution is a process of adding some spaces or zeros between elements in a convolution kernel to enlarge the convolution kernel. The hole convolution has a relatively important parameter expansion Rate. As shown in fig. 3, when Rate is 1, the convolution kernel has no space between each element. The convolution kernels for different rates have different receptive fields. The convolution fusion of the holes with different expansion rates can extract the context information of the image. With this property and taking advantage of CE-Net we have designed a context coding structure. The convolution of the holes with different expansion rates are operated in series to form four parallel branches. The expansion rate of each branch is increased continuously, the receptive fields are respectively 3, 7, 9 and 19, and the features extracted from the branches are fused in an additive mode. Because the importance degree of the features extracted by different branches is different, an SE channel attention mechanism is added on the basis. The structure extracts context information with different resolutions, and avoids information loss.

And (3) building a network structure, wherein the network adopts a U-Net structure with an attention mechanism, as shown in figure 1. The encoder utilizes a pre-trained model of resnet34 in the ImageNet dataset and inserts a context coding layer before its four-layer coding structure. The decoder employs an upsampling and convolution operation. Each layer of the encoder is connected with an attention learning structure, and the output characteristic diagram is fused with the characteristic diagram with the same resolution of the up-sampling for convolution. The output of the network comprises the up-sampled output and the output r of the attention learning structure. The formula for r is as follows:

wherein C (x) represents 1 × 1 convolution, ups (x, i) represents i upsampling on x, O_i(x) Indicates the ith noteAnd (5) learning the output of the structure through intention mechanics.

S5, training the model and segmenting the cell nucleus. The model designed in S4 is first trained using the data set created in S1, and we use a fixed step method to attenuate the learning rate during training. The loss is then calculated, the formula is as follows:

L＝L_bce(r,g)+L_bce(c,m) (3)

wherein L represents the loss of the model, L_bceRepresenting a binary cross entropy loss function, r is shown in equation 2, g is the loss marker in S2, c represents the output of the model, and m represents the marker in S2. After 200 training rounds, the loss is approximately stable, and the training is stopped to obtain parameters of the cell nucleus segmentation model. And finally, segmenting cell nuclei by using a model.

Effects of the invention

The invention provides a cervical cell nucleus segmentation method for attention learning. The existing segmentation method cannot be used for cervical cell nucleus segmentation, and the prior knowledge of the cell nucleus is not utilized, so that the cell nucleus segmentation effect is poor. The attention learning mechanism focuses on the edge position of the cell nucleus, and effectively segments the overlapped cell nucleus. The context coding layer further extracts the context characteristics of cell nucleuses, and the problems of under-segmentation and over-segmentation are avoided. We verified the advantages of our proposed model in our dataset experimentally as shown in figure 5 and table 1. But it is further verified that the present invention has the same advantages in other public data sets.

TABLE 1 test results of different models on the inventive data set

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a model structure and loss calculation method according to an embodiment of the present invention;

FIG. 2 is a diagram of an attention learning mechanism according to an embodiment of the present invention;

FIG. 3 is a convolution of holes of different expansion rates according to an embodiment of the present invention;

FIG. 4 is a multi-scale context coding layer according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the effect of model segmentation according to an embodiment of the present invention.

Detailed description of the invention

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

the invention provides a cell nucleus segmentation method based on an attention learning mechanism, which comprises the following steps:

s1, preparing a training data set;

s2, preparing an attention mechanics learning loss label diagram, and loading data;

s3, building an attention mechanics learning structure;

s4, building a network structure;

s5, training the model and segmenting the cell nucleus.

The following examples illustrate the invention in detail:

the embodiment of the invention is specifically realized.

S1, preparing a training data set comprises four steps:

(1) selecting an image: the nuclear images were obtained from 100 cervical smears and the exfoliated cells were collected from populations of different ages and different disease states, including all lesion levels for the TBS diagnostic criteria.

(2) Marking cell nucleus: manually circling the outline of the cell nucleus, and storing the position information of the outline of the cell nucleus in each image into a labeling file.

(3) Generating a labeled graph: firstly, creating a completely black single-channel image (the gray value is 0), then reading the position information in the annotation file, and setting the gray value of each cell nucleus area to be 255.

(4) Cutting the picture: the cell nucleus image and the label map are cropped simultaneously, with a crop size of 515 x 512. Ensure that the nuclear image and the marker map are consistent.

S2, preparing an attention learning loss label chart, comprising the following steps: firstly, edge images in the X direction and the Y direction of the marker image are respectively detected by using a sobel operator, and a loss marker image is obtained after the two images are fused. And then, after the cell nucleus images, the marker maps and the loss marker maps are in one-to-one correspondence, the whole sequence is disordered, and then the cell nucleus images, the marker maps and the loss marker maps are loaded into the memory in batches.

S3, constructing an attention learning structure as shown in FIG. 2, wherein the attention learning structure comprises a scaling module, a coding and decoding module and attention loss. Firstly, the input feature map is enlarged to 2 times of the original image, and the result is input into the coding and decoding module to obtain a single-channel feature matrix. And performing Sigmod operation on the feature matrix, and recovering the original input resolution ratio of the obtained result to obtain an attention weight matrix. The output of the attention learning structure, o (x), is calculated as follows:

S4, building a network structure as shown in figure 1, wherein the network adopts a U-Net structure with an attention mechanism. The encoder utilizes the pre-trained model of resnet34 in the ImageNet dataset and inserts the context coding layer before its four-layer coding structure, as shown in fig. 4. The decoder employs an upsampling and convolution operation. Each layer of the encoder is connected with an attention learning structure, and the output characteristic diagram is fused with the characteristic diagram with the same resolution of the up-sampling for convolution. The output of the network comprises the up-sampled output and the output r of the attention learning structure. The formula for r is as follows:

wherein C (x) represents 1 × 1 convolution, ups (x, i) represents i upsampling on x, O_i(x) The output of the ith attention learning structure is shown. In the context coding layer, the hole convolutions with different sampling rates form four branches, the expansion rate of each branch is increased continuously, the receptive fields are respectively 3, 7, 9 and 19, and the features extracted by the branches are fused in an additive mode to form context coding. Then a channel attention mechanism is introduced to make the structure coding capacity stronger.

L＝L_bce(r,g)+L_bce(c,m) (3)

wherein L represents the loss of the model, L_bceRepresenting a binary cross entropy loss function, r is shown in equation (2), g is the loss label in S2, c represents the output of the model, and m represents the label in S2.

Training and setting: the primary coding structure in the model is based on ResNet pre-trained on ImageNet. The source code implementation is based on a pytorre platform. The training and testing platform is an Ubuntu Server 18.04.2LTS system which is provided with 4 RTX 2080Ti graphics cards and double CPUs. The invention uses 3 display cards, batch is 4, learning rate is 0.00005, training times is 200. And stopping training to obtain parameters of the cell nucleus segmentation model. And finally, segmenting cell nuclei by using a model. The final implementation effect is shown in fig. 5, which shows that the overlapped cell nuclei are well segmented, and the cell nuclei with unclear boundaries do not have over-segmentation phenomenon.

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A method for segmenting cell nucleus based on attention learning is characterized by comprising the following steps:

s1, preparing a training data set;

s3, building an attention mechanics learning structure;

s4, building a network structure;

s5, training the model and segmenting the cell nucleus.

2. The method as claimed in claim 1, wherein the step S1 of preparing the training data set comprises four steps:

(1) selecting an image: the cell image is from 100 cell smears, and the cast-off cells are collected from people of different ages and different disease conditions, wherein the cell image comprises all lesion levels of TBS diagnosis standard;

(2) marking cell nucleus: manually circling the outline of the cell nucleus, and storing the position information of the outline of the cell nucleus in each image into a labeling file;

(3) generating a labeled graph: firstly, creating a full-black single-channel image (the gray value is 0), then reading the position information in the labeling file, and setting the gray value of each cell nucleus area to be 255;

(4) cutting the picture: cropping the cell nucleus image and the label map at the same time, with a crop size of 515 x 512; ensure that the nuclear image and the marker map crop consistently.

3. The method as claimed in claim 1, wherein the step S2 of preparing the attention learning loss label map comprises the following steps: firstly, edge images in the X direction and the Y direction of a marker image are respectively detected by using an edge detection algorithm, and a loss marker image is obtained after two images are fused; and then, after the cell nucleus images, the marker maps and the loss marker maps are in one-to-one correspondence, the whole sequence is disordered, and then the cell nucleus images, the marker maps and the loss marker maps are loaded into the memory in batches.

4. The attention-learning-based nucleus segmentation method as claimed in claim 1, wherein the attention-learning structure is constructed in step S3, and comprises a scaling module, a coding/decoding module and attention loss; firstly, amplifying an input feature map to be 2 times of an original image, and inputting a result into an encoding and decoding module to obtain a single-channel feature matrix; then, carrying out Sigmod operation on the characteristic matrix, and recovering the original input resolution ratio of the obtained result to obtain an attention weight matrix; the output of the attention learning structure, o (x), is calculated as follows:

5. The attention-learning-based nucleus segmentation method as claimed in claim 1, wherein a network structure is built in step S4, and the network adopts a U-Net structure with an attention mechanism; the encoder is a pre-training model in the ImageNet data set, and a context coding layer is inserted in front of each layer of coding structure of the encoder; the decoder adopts the operations of upsampling and convolution; each layer of the encoder is connected with an attention learning structure, and the output characteristic diagram and the characteristic diagram with the same resolution of the up-sampling are fused together for convolution; the output of the network comprises the output of the up-sampling and the output r of the attention learning structure; the formula for r is as follows:

wherein C (x) represents 1 × 1 convolution, ups (x, i) represents i upsampling on x, O_i(x) Is shown asOutputting the i attention learning structures; in a context coding layer, hole convolutions with different sampling rates form four branches, the expansion rate of each branch is increased continuously, the receptive fields are respectively 3, 7, 9 and 19, and the features extracted by the branches are fused in an additive mode to form context codes; then a channel attention mechanism is introduced to make the structure coding capacity stronger.

6. The attention-learning-based nucleus segmentation method as claimed in claim 1, wherein the model is trained and the nucleus is segmented in step S5; firstly, training a model designed in S4 by using a data set manufactured in S1, and attenuating a learning rate by adopting a fixed step length method in the training; the loss is then calculated, the formula is as follows:

L＝L_bce(r,g)+L_bce(c,m) (3)

wherein L represents the loss of the model, L_bceRepresenting a binary cross entropy loss function, r is shown as equation (2), g is a loss label graph in S2, c represents the output of the model, and m represents a label graph in S2; after 200 training rounds, the loss is approximately stable, and the training is stopped to obtain parameters of a cell nucleus segmentation model; and finally, segmenting cell nuclei by using a model.