CN116342600A

CN116342600A - Segmentation method of cell nuclei in thymoma histopathological image

Info

Publication number: CN116342600A
Application number: CN202310612940.5A
Authority: CN
Inventors: 陈皇; 秦晋; 张�杰; 钟定荣; 刘维凡
Original assignee: China Japan Friendship Hospital; Beijing Jiaotong University
Current assignee: China Japan Friendship Hospital; Beijing Jiaotong University
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-06-27
Anticipated expiration: 2043-05-29
Also published as: CN116342600B

Abstract

The invention provides a segmentation method of cell nuclei in a thymoma histopathological image, which comprises the following steps: the weighted position sensitive axial multi-head self-attention is adopted, the multi-head cross attention is connected in a jumping manner, the decoder is helped to fuse information, the weighted loss function and the full-automatic post-processing method are adopted to segment the error prone area and separate the contacted or overlapped cell nuclei, and the visual effect of cell nucleus segmentation is enhanced; a thymoma histopathological image dataset is constructed. The weighted axial multi-head self-attention of the invention controls the specific gravity of the relative position codes by introducing the weight of the relative position codes, thereby improving the segmentation accuracy, reducing the calculation complexity and controlling the calculation cost; the original two shallow convolution layers in UNet are combined with the two deep transform layers taking self-attention as the core, so that the problem that CNN cannot model long-distance dependence is solved, and the inherent induction bias of CNN lacking in the transform is made up.

Description

Segmentation method of cell nuclei in thymoma histopathological image

Technical Field

The invention relates to the technical field of medical image processing, in particular to a method for segmenting nuclei in a thymoma histopathological image.

Background

In the pathological image of the thymoma, accurate and automatic cell nucleus segmentation is a key precondition for high-quality cell nucleus feature extraction and efficient thymoma diagnosis by utilizing a computer-aided diagnosis technology. The existing thymoma pathological image cell nucleus segmentation technology mainly comprises the following steps: conventional methods, convolutional Neural Network (CNN) based methods, and Transformer based methods.

However, existing pathological image cell nucleus segmentation techniques still suffer from a number of drawbacks or deficiencies due to the varying shape of the cell nuclei, blurred boundaries, inconsistent staining, and the general contact, overlapping or aggregation of two or more cell nuclei.

Wherein, the traditional method comprises the following steps: including threshold-based methods represented by Otsu thresholds, region-based methods represented by labeled controlled watershed, edge-based methods represented by active contour models and level sets, and the like. The traditional method is simple and quick to realize, but cannot cope with complex cases of high heterogeneity among the thymoma cell nuclei and in the thymoma cell nuclei, large quantity of cell nuclei aggregation and the like, and the segmentation result is generally poor.

A Convolutional Neural Network (CNN) based method: with the development of deep learning technology, CNN is the basic architecture of the current cell nucleus segmentation model, and better results than the traditional methods have been achieved in complex cell nucleus segmentation cases. Of these, the UNet architecture is the most popular model of cell nucleus segmentation and is widely refined and extended, such AS ResUNet, unet++, AS-UNet, and Attention UNet, among others. However, convolution can only capture local information within a limited receptive field, so CNN lacks the ability to capture long-range dependencies, which are however very important for predicting segmentation results. Although the method of continuous downsampling or hole convolution can expand the receptive field of the network to some extent by stacking multiple convolutions, they still have limited effectiveness.

Visual transducer-based methods: transformer and its self-attention mechanism have proven to be effective methods to address modeling long-range dependencies, but applying a Transformer to nuclear segmentation also faces many difficulties. Firstly, inaccurate position information caused by a small data set is established on a large amount of training data, but because histopathological image acquisition and cell nucleus labeling are complicated, the data set for cell nucleus segmentation is usually not large, less training data can lead to inaccurate position code learning in a self-attention mechanism, the performance of a model is negatively influenced, and if the position code is not used, the transducer cannot contain any position information, so that the characteristics such as space structure of the cell nucleus are not captured. The second is that a high resolution two-dimensional image necessarily incurs higher computational costs. The prior art solutions to the above problems are typically to narrow down the image to a low resolution, divide the image into a number of small blocks, and add local constraints to the self-attention mechanism. However, if the image is scaled down to low resolution, high resolution features cannot be captured in the image; if the image is divided into a plurality of small blocks, one cell nucleus can be separated into different small blocks, so that the association between adjacent pixels is destroyed, and the model is not beneficial to capturing local information; global connections may be sacrificed if local constraints are added for the self-attention mechanism. In contrast, although the transform-based approach models global information well, the transform lacks the Inductive bias inherent to CNN, including translational equivalence and locality, which must be learned with large amounts of data, so that small data typically takes better effect on CNN than the transform-based approach, which typically requires only a small dataset to train, while training the transform-based approach typically requires pre-training on a larger dataset.

Furthermore, neither CNN nor transducer-based methods without any special strategy can satisfactorily isolate the contacted or overlapped nuclei, and the current development of methods for segmenting the nuclei of thymoma pathology images is further hampered due to factors such as the scarcity of thymoma histopathology image data and the high cost of labeling.

In summary, nuclear segmentation in histopathological images of thymoma is critical for nuclear feature extraction and thymoma diagnosis. However, the diversity of cell nucleus shapes, blurring of boundaries, overlapping of locations, and lack of available data sets present serious challenges to the task of cell nucleus segmentation.

Disclosure of Invention

In view of the above, the invention aims to provide an automatic and accurate cell nucleus segmentation method based on deep learning, namely a multi-attention transducer network, which is used for segmenting thymoma cell nuclei and lays a foundation for cell nucleus feature extraction and computer-aided diagnosis of thymoma.

In order to achieve the above object of the present invention, the following technical solutions are specifically adopted:

the invention provides a segmentation method of cell nuclei in a thymoma histopathological image, which comprises the following steps:

introducing manually-adjusted super parameters as weights of the relative position codes on the basis of the position-sensitive axial multi-head self-attention and the relative position codes weighted by the transducer;

the weight of the relative position codes can control the proportion of the relative position codes, so that the model can obtain the position information and simultaneously reduce the negative influence on the model caused by inaccurate position information learning due to insufficient data. The performance of the model on a small data set is improved while the global connection and the high-efficiency calculation are ensured;

embedding a weighted-position-sensitive axial multi-head self-attention mechanism-based transducer layer into a deep layer of a UNet-based encoder and decoder, replacing an original deep convolution layer in the UNet, and constructing a multi-attention transducer network model while the shallow layer is still a convolution layer so as to fully exert the advantages of the UNet and the transducer and fully capture global information while keeping local details;

introducing cross-attention (similar to self-attention) in the jump connection of the multi-attention transducer network model, filtering the encoder features by the generated attention attempts, reducing the differences between the encoder and decoder features, thereby helping the decoder to better fuse information;

training the multi-attention transducer network model by using a weighted loss function, introducing a predefined weight graph into the loss function, improving the weight of pixels in a region which is easy to cause segmentation errors, and promoting model learning in the region which is easy to cause segmentation errors to obtain a cell nucleus probability graph;

and (3) performing post-treatment on the cell nucleus probability map by adopting a full-automatic post-treatment method, wherein the post-treatment method comprises improved mark-controlled watershed, thresholding and isolated pixel block removal, and further separating contacted or overlapped cell nuclei.

Further, the multi-attention transducer network model includes: an encoder (contracted path), a decoder (expanded path);

the encoder consists of a convolution layer, a transform layer and downsampling, and the decoder consists of a convolution layer, a transform layer and upsampling; wherein each convolution layer comprises two identical residual convolution modules; each transducer layer contains two identical residual transducer modules.

The encoder consists of a convolution layer, a transducer layer and downsampling; the decoder consists of a convolutional layer, a transform layer, and upsampling. In a four-layer network, two layers of a shallower layer are convolution layers, and two layers of a deeper layer are transducer layers. Each convolution layer contains two identical residual convolution modules. Each transducer layer contains two identical residual transducer modules.

The core of the transducer layer is weighted position-sensitive axial multi-headed self-attention, including highly axial and breadth axial. The jump joint multi-head cross-attention is located in the deepest two layers of the jump joint.

Further, the weighted position-sensitive axial multi-headed self-attention mechanism includes: width axially weighted position sensitive axial multi-headed self-attention, height axially weighted position sensitive axial multi-headed self-attention;

the width axial weighted position sensitive axial multi-head self-attention method comprises the following steps: inputting an image, and height of the imageHIs set to beHA number of samples of the sample were taken,Hmultiplying by batch sizeBDimension of query and key by 1 x 1 convolution

Reduced to channel number->

To further reduce the computational cost; if the dimension W of the key and value is greater than the axial constraintm=56Downsampling tow=mThe method comprises the steps of carrying out a first treatment on the surface of the Splitting the query, key and value into a plurality of heads, each head having a dimension +.>

Or->

So that->

，/>

；

Preferably, queries, keys and values are split inton=8 heads;

the highly axial weighted position-sensitive axial multi-headed self-attention method is the same as the width axial weighted position-sensitive axial multi-headed self-attention method.

Further, the width axial weighted position sensitive axial multi-head self-attention position coding method comprises the following steps:

using relative position coding, which is compared to absolute bitsThe coding is more flexible and generalized; position coding is shared between all headers, but other parameters are not; relative position code along the width axis, of size

The method comprises the steps of carrying out a first treatment on the surface of the The relative position codes for query, key and value are +.>

、/>

And->

；

、/>

And->

Respectively and weight->

、/>

And->

Multiplying.

Preferably, the weight

、/>

And->

Are all set to 0.1.

Further, the normalization processing method of the weighted position sensitive axial multi-head self-attention mechanism comprises the following steps of: multiplying scalar before Softmax function

To eliminate dimension->

The processing results of the plurality of heads are spliced to generate an output of the same size as the input.

Further, a method of introducing cross-attention (similar to self-attention) in a jump connection of the multi-attention transducer network model comprises:

inputting a high resolution low level feature map from an encoder or jump connection and a low resolution high level feature map from a decoder, the high level feature map being used in cross-attention

Projection as a query by 1 x 1 convolutionqAnd keykLow-level feature map->

Projection as valuevThe method comprises the steps of carrying out a first treatment on the surface of the Multi-headed cross-attention in jump-connect uses two-dimensional attention, downsampled feature map sizes arehAndwis generally much smaller thanHAndWthe method comprises the steps of carrying out a first treatment on the surface of the Key is pressedkSum valuevTo avoid redundancy of pairs of attention among all pixels and to reduce computational complexity toO(HWhw)The method comprises the steps of carrying out a first treatment on the surface of the Splitting the query, key and value into a plurality of heads, each head having a dimension +.>

So that->

Calculating attention score and normalizing to 0-1 by Sigmoid function, and generating attention force diagram by up-sampling and enlarging sizeAtt；

Preferably, the method comprises the steps of,h=w=8; splitting queries, keys, and values inton=8 heads;

trying the attention toAttOriginal low-level features communicated with a jump connectionDrawing of the figure

Performing element-by-element multiplication to obtain new low-level characteristic diagram +.>

As an output; will->

Advanced feature map with upsampling +.>

Element-by-element addition to generate a new feature map +.>

Then, with->

Splicing is performed in the channel dimension.

Further, the method of training the multi-attention transporter network model using weighted loss functions includes:

when a binary cross entropy loss function of a cell nucleus and a background pixel is calculated, multiplying each pixel by the corresponding pixel weight of the pixel in a weight map; the weight map consists of a first sub weight map, a second sub weight map, a third sub weight map, a fourth sub weight map and five sub weight maps of the fourth sub weight map, and the main setting method of each sub weight map comprises the following steps:

(1) In the first sub-weight graph, the initial weights of the kernel pixel and the background pixel are respectively 10/7 and 1;

(2) The hematoxylin component diagram is obtained by using color deconvolution, the hematoxylin component diagram is normalized to be between 0 and 1, the smaller the image gray value is, the higher the hematoxylin intensity is, the cell nucleus pixel value of the ground real label diagram is 1, the background image value is 0, and the hematoxylin component diagram is multiplied with the ground real label diagram to obtain a sub-weight diagram II;

(3) Giving high weight to background pixels with high hematoxylin intensity, setting the cell nucleus pixel value of the complementary graph of the ground real tag graph to 0, and setting the background pixel value to 1; multiplying the result obtained by subtracting the hematoxylin component diagram from the diagram with the pixel value of 1 in the same size by the complementary diagram of the ground real label diagram to obtain a sub-weight diagram III;

(4) Dividing the shortest checkerboard distance map of nuclear pixels to their boundaries by a constant

As the power of e and multiplying by a constant alpha ₁ Obtaining a sub-weight map IV by the ground real tag map;

(5) The background pixels closer to the nucleus are given higher weight, and the sum of the shortest checkerboard distance maps of the background pixels to the nearest and next nearest nuclei is divided by a constant

As the power of e and multiplying by a constant alpha ₂ And obtaining a sub-weight map five from the complementary map of the ground real tag map.

Preferably alpha ₁ =1 and

=/>

=ɑ ₂ =4. The five sub-weight maps are respectively set to powers of 1, 3, 4 and 1, and all the sub-weight maps are summed to obtain a final weight map.

Further, the fully automatic post-processing method comprises the following steps:

firstly, a watershed for improving marker control is provided, a pixel value of a nucleus is set as a local maximum value of an image, and the pixel value is obviously reduced among the nuclei; if the decreasing amplitude of the pixel values on all paths P from one local maximum pixel M to another larger local maximum pixel N is larger than a certain threshold value, M is considered as a nucleus core, and M is taken as an initial mark to carry out watershed; in the present invention, the threshold is 96; meanwhile, threshold operation is implemented, only pixels larger than a certain threshold are nuclear pixels, otherwise, the pixels are background pixels; in the present invention, the threshold is 0.5; and finally, detecting all connected domains in the cell nucleus mask by using an eight-connected domain algorithm, and removing the connected domains with the area smaller than a certain threshold value as isolated pixel blocks. In the present invention, the threshold is 10.

And generating a final cell nucleus segmentation result by a post-processing method of the cell nucleus probability map.

The invention combines the original two shallow convolution layers in the UNet and the two deep layer convertors with self-attention as the core, utilizes the complementary advantages of the two layers, fully captures global information while keeping local details, solves the problem that CNN cannot model and depend on a long distance, and also compensates the inherent induction bias of CNN which is lack of convertors.

The jump connection multi-head cross attention of the invention filters the encoder characteristics through the attention generated by the cross attention, enhances the encoder characteristics, and simultaneously reduces the difference between the encoder characteristics and the decoder characteristics, thereby helping the decoder to fuse information and further improving the segmentation accuracy.

The weighted loss function and the full-automatic post-processing method of the invention promote the model to divide the error-prone area (divide the error-prone area) and comprise the contacted or overlapped cell nuclei, thereby improving the dividing accuracy, solving the problem that the contacted or overlapped cell nuclei are difficult to separate and enhancing the visual effect of cell nucleus division.

The invention can fully automatically and relatively accurately divide the cell nucleus in the tissue pathology image of the thymoma on the constructed tissue pathology image data set of the thymoma, thereby being beneficial to the feature extraction of the subsequent cell nucleus and the pathology computer-aided diagnosis of the thymoma.

The invention also provides an information data processing terminal which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, enables the processor to execute the segmentation method and the construction method of the cell nuclei in the thymoma histopathological image.

The invention also provides a computer readable storage medium storing a computer program which when executed by a processor causes the processor to execute the above method for segmenting nuclei in thymoma histopathological images and the construction method thereof.

Compared with the prior art, the invention has the beneficial effects that:

the weighted axial multi-head self-attention of the invention controls the proportion of the relative position codes by introducing the weight of the relative position codes, avoids the negative influence caused by inaccurate learning position information due to a small data set as much as possible while utilizing the position information provided by the position codes, and ensures that the invention is suitable for cell nucleus segmentation on the small data set and improves the segmentation accuracy. Meanwhile, the high-efficiency calculation of the axial multi-head self-attention is inherited, the calculation complexity is reduced, and the calculation cost is controllable.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings needed in the embodiments of the present application, and it is obvious that the drawings described below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall architecture diagram of a multi-attention transducer network according to an embodiment of the present invention.

FIG. 2 is a diagram of a weighted position-sensitive axial multi-headed self-attention module architecture along the width axis provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a jump joint multi-head cross-attention module architecture provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a computer device according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for segmenting nuclei in a thymoma histopathological image provided by the invention.

Description of the embodiments

Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only for illustrating the present invention and should not be construed as limiting the scope of the present invention. The specific conditions are not noted in the examples and are carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.

The present examples collect hematoxylin-eosin stained slides made of various thymoma pathology tissues, and the slides were scanned by a high throughput digital scanner to make digitized thymoma histopathology whole slide images. Patches of 224 x 224 size, which do not overlap, are cropped in each 40-fold magnified whole slide image.

The present embodiment invites trained researchers to annotate the nuclei, which the pathologist examines and revises. Finally, the present examples construct a thymoma histopathological image dataset containing 100 images and 6,874 extensively annotated nuclei. This is the first histopathological image dataset focusing on thymoma and detailing the nucleus.

The whole structure is as follows: the embodiment of the invention provides a multi-attention transducer network, and the whole architecture is shown in fig. 1. The multi-attention transducer network adopts the basic architecture of UNet, and consists of two parts, namely an encoder (contracted path) on the left side and a decoder (expanded path) symmetrical on the right side in fig. 1 (a). The encoder consists of a convolution layer, a transducer layer and downsampling; the decoder consists of a convolutional layer, a transform layer, and upsampling. In a four-layer network, two layers of a shallower layer are convolution layers, and two layers of a deeper layer are transducer layers. Each convolution layer contains two identical residual convolution modules, as shown in fig. 1 (b). Each transducer layer contains two identical residual transducer modules, as shown in fig. 1 (c). The core of the transducer layer is weighted position-sensitive axial multi-headed self-attention, including highly axial and breadth axial. The jump joint multi-head cross-attention is located in the deepest two layers of the jump joint. In addition, embodiments of the present invention use weighted loss functions for training. And generating a final cell nucleus segmentation result by a post-processing method of the cell nucleus probability map.

Weighted position-sensitive axial multi-headed self-attention: with reference to FIG. 2, a view along the width axis is shownIs a weighted position-sensitive axial multi-headed self-attention. The self-attention in the height direction is similar to that in the width direction. In width axial self-attention, the height of the imageHIs considered asHSample number, multiplied by batch sizeB. Dimension of query and key by 1 x 1 convolution

Reduced to channel number->

To further reduce the computational cost. If the dimension W of the key and value is greater than the axial constraintm=56Downsampling tow=m. The query, key and value are split inton=8 heads, each head having a dimension +.>

Or->

And->

，/>

. In this attention mechanism, relative position coding is used, which is more flexible and generalized than absolute position coding. The position coding is shared between all the headers, but the other parameters are not. The relative position code is also along the width axis and is of a size of

. Inquiry, key and value are encoded +.corresponding to the relative position respectively>

、/>

And->

And weight->

、/>

And->

Multiplying. In the embodiment of the present invention, the weights are all set to 0.1. Multiplying the scalar +.>

To eliminate the effect of the dimension. Finally, the results of the plurality of heads are spliced to generate an output of the same size as the input.

Jumping connects multi-headed cross attention: referring to fig. 3, a jump joint multi-headed crossover attention is shown. The inputs to this module are a high resolution low level feature map from the encoder (or jump connection) and a low resolution high level feature map from the decoder. In cross-attention, high-level feature maps

Projected as a query by 1 x 1 convolutionqAnd keykValue ofvThen it is a low-level feature map->

Is a projection of (a). Jumping connects the multi-headed cross-attention using two-dimensional attention,handwis the size of the downsampled feature map, which is typically much smaller thanHAndW. Key is pressedkSum valuevTo avoid redundancy of pairs of attention among all pixels and to reduce computational complexity toO(HWhw). In an embodiment of the present invention, in the present invention,h=w=8. Queries, keys and values are also split inton=8 heads, each head having a dimension +.>

And->

. The calculated attention score is normalized to between 0 and 1 by a Sigmoid function and is upsampledThe size is enlarged, eventually creating an attention map. Attention seeking to do soAttOriginal low-level feature map transferred with jump connection

As an output thereof. Finally, let(s)>

And upsampled advanced feature map

Element-by-element addition to generate a new feature map +.>

Then, with->

Splicing is performed in the channel dimension.

Weighted loss function: in calculating the binary cross entropy loss function of the nucleus and background pixels, each pixel is multiplied by its corresponding pixel weight in the weight map. The weight map consists of a first sub weight map, a second sub weight map, a third sub weight map, a fourth sub weight map and five sub weight maps of the fourth sub weight map, and the main setting and the function of each sub weight map are as follows:

(1) The weights of the nucleus pixels are increased and the weights of the background pixels are decreased to cope with the imbalance of the pixel classes. In the first sub-weight map, the initial weights of the kernel pixel and the background pixel are 10/7 and 1, respectively.

(2) The nuclei pixels with low hematoxylin intensities are given high weights to better segment nuclei with uneven staining or coarse chromatin. A hematoxylin component map was obtained using color deconvolution and normalized to between 0 and 1, with smaller image gray values indicating higher hematoxylin intensities. The cell nucleus pixel value of the ground real label graph is 1, and the background pixel value is 0. And multiplying the hematoxylin component map with the ground real label map to obtain a secondary weight map II.

(3) Background pixels of high hematoxylin intensity are given high weight to reduce false positives. The cell nucleus pixel value of the complementary graph of the ground real label graph is 0, and the background image value is 1. And multiplying the result obtained by subtracting the hematoxylin component diagram from the diagram with the pixel value of 1 in the same size by the complementary diagram of the ground real label diagram to obtain a sub-weight diagram III.

(4) The more nuclear pixels that are closer to the nuclear boundary are given higher weight to reduce segmentation errors due to boundary blurring, as well as false negatives of small nuclei. Dividing the shortest checkerboard distance map of nuclear pixels to their boundaries by a constant

As the power of e and multiplying by a constant alpha ₁ And obtaining a sub-weight map IV by the ground real tag map.

(5) The background pixels closer to the nuclei are given a higher weight to separate the nuclei that are touching, overlapping or aggregated. Dividing the sum of the shortest checkerboard distance maps of background pixels to the nearest and next nearest nuclei by a constant

As the power of e and multiplying by a constant alpha ₂ And obtaining a sub-weight map five from the complementary map of the ground real tag map. In the present invention, alpha ₁ =1 and->

=/>

The post-treatment method comprises the following steps: first is a watershed of improved mark control. It is assumed that the pixel value of the nucleus is the local maximum of the image, and the pixel value drops significantly between nuclei. If the magnitude of the drop in pixel values is greater than a certain threshold on all paths P from one local maximum pixel M to another, larger local maximum pixel N, M is considered a nuclear core and the watershed is performed as an initial marker. In the present invention, the threshold is 96. And meanwhile, threshold operation is implemented, only pixels larger than a certain threshold are nuclear pixels, and otherwise, background pixels are implemented. In the present invention, the threshold value is 0.5. And finally, detecting all connected domains in the cell nucleus mask by using an eight-connected domain algorithm, wherein the connected domains with the area smaller than a certain threshold value are regarded as isolated pixel blocks to be removed. In the present invention, the threshold is 10.

Referring to FIG. 5, a flow chart of a method for segmenting nuclei in a histopathological image of thymoma according to an embodiment of the present invention is shown.

In the implementation of the present embodiments, all multi-attention transducer network models were trained and tested on a platform equipped with Intel Core i7-12800HX CPU and NVIDIA GeForce RTX 3080Ti GPU (16 GB). The initial learning rate is 1e-3, the batch size is 8, and the maximum training round number is 100. Momentum using Adam optimizer is

And->

. Learning rate decay and early stop strategies are also applied. If the verification loss is not reduced by 5 rounds, the learning rate is attenuated by half to avoid gradient oscillation. If the validation loss does not drop to new low within 20 rounds, training is stopped to prevent overfitting. The input to the network is a 224 x 224 RGB image and is normalized to between 0 and 1. To improve the robustness of the model, a combination of horizontal flipping, vertical flipping, and random rotation is used for data enhancement.

The present example constructs a thymoma histopathological image dataset containing 100 hematoxylin-eosin stained histopathological images and 6,874 extensively annotated nuclei.

The embodiment of the invention provides weighted position-sensitive axial multi-head self-attention, and improves the segmentation performance of the model on a small data set under acceptable calculation cost.

The embodiment of the invention provides jumping connection multi-head cross attention, helps a decoder to fuse information, and further improves the segmentation performance.

The embodiment of the invention provides a weighted loss function and a full-automatic post-processing method, which are used for dividing an error-prone area and separating contacted or overlapped cell nuclei, so that the visual effect of cell nucleus division is enhanced.

The embodiment of the invention also provides a computer device, and fig. 4 is a schematic structural diagram of the computer device provided by the embodiment of the invention; referring to fig. 4 of the drawings, the computer apparatus includes: input means 43, output means 44, memory 42 and processor 41; the memory 42 is configured to store one or more programs; when the one or more programs are executed by the one or more processors 41, the one or more processors 41 are caused to implement the method for segmenting nuclei in thymoma histopathological images as provided in the above embodiments; wherein the input device 43, the output device 44, the memory 42 and the processor 41 may be connected by a bus or otherwise, for example in fig. 4 by a bus connection.

The memory 42 is used as a readable storage medium of a computing device, and can be used for storing a software program and a computer executable program, and the program instructions corresponding to the segmentation method of the cell nucleus in the thymoma histopathological image according to the embodiment of the invention; the memory 42 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the device, etc.; in addition, memory 42 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device; in some examples, memory 42 may further comprise memory located remotely from processor 41, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 43 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function control of the device; the output device 44 may include a display device such as a display screen.

The processor 41 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 42, i.e. the above-described method for segmenting nuclei in histopathological images of thymoma is implemented.

The computer device provided by the above embodiment can be used for executing the segmentation method of the cell nucleus in the thymoma histopathological image, and has corresponding functions and beneficial effects.

Embodiments of the present invention also provide a storage medium containing computer executable instructions which, when executed by a computer processor, are for performing a method of segmentation of nuclei in a thymoma histopathological image as provided by the above embodiments, the storage medium being any of various types of memory devices or storage devices, the storage medium comprising: mounting media such as CD-ROM, floppy disk or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, lanbas (Rambus) RAM, etc.; nonvolatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory components, etc.; the storage medium may also include other types of memory or combinations thereof; in addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a second, different computer system, the second computer system being connected to the first computer system through a network (such as the internet); the second computer system may provide program instructions to the first computer for execution. Storage media includes two or more storage media that may reside in different locations (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the method for segmenting nuclei in the histopathological image of thymoma described in the above embodiments, and may also perform the related operations in the method for segmenting nuclei in the histopathological image of thymoma provided in any embodiment of the present invention.

While particular embodiments of the present invention have been illustrated and described, it will be appreciated that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.

Claims

1. A method for segmenting nuclei in a histopathological image of thymoma comprising:

embedding a weighted-position-sensitive axial multi-head self-attention mechanism-based transducer layer into a deep layer of a UNet-based encoder and decoder, replacing an original deep convolution layer in the UNet, and constructing a multi-attention transducer network model by using the shallow layer as a convolution layer;

introducing cross-attention in the jumping connection of the multi-attention transducer network model, filtering encoder features by the generated attention attempts, reducing differences between encoder and decoder features;

2. The method of claim 1, wherein the multi-attentional transducer network model comprises: an encoder and a decoder;

3. The method of segmentation of nuclei in thymoma histopathological images of claim 1 wherein the weighted position sensitive axial multi-headed self-attention mechanism comprises: width axially weighted position sensitive axial multi-headed self-attention, height axially weighted position sensitive axial multi-headed self-attention;

Reduced to channel number->

Half of (2); if the dimension W of the key and value is greater than the axial constraintm=56Downsampling tow=mThe method comprises the steps of carrying out a first treatment on the surface of the Splitting the query, key and value into a plurality of heads, each head having a dimension +.>

Or->

So that->

，/>

The method comprises the steps of carrying out a first treatment on the surface of the The highly axial weighted position-sensitive axial multi-headed self-attention method is the same as the width axial weighted position-sensitive axial multi-headed self-attention method.

4. A method of segmentation of nuclei in thymoma histopathological images according to claim 3 wherein the width-axis weighted position-sensitive axial multi-headed self-attention position coding method comprises:

using relative position coding along the width axis of a dimension of

、/>

And->

；/>

And->

Respectively and weight->

、/>

And->

Multiplying.

5. The method of segmentation of nuclei in thymoma histopathological images of claim 4 wherein the normalization process of the weighted position sensitive axial multi-headed self-attention mechanism comprises: multiplying scalar before Softmax function

To eliminate dimension->

6. The method of claim 1, wherein the method of introducing cross-attention in the jumping connection of the multi-attention transducer network model comprises:

Projection as a query by 1 x 1 convolutionqAnd keykLow-level feature map->

Projection as valuevThe method comprises the steps of carrying out a first treatment on the surface of the Multi-headed cross-attention in jump-connect uses two-dimensional attention, downsampled feature map sizes arehAndwbond keykSum valuevIs reduced in size and the computational complexity is reduced toO(HWhw)The method comprises the steps of carrying out a first treatment on the surface of the Splitting the query, key and value into a plurality of heads, each head having a dimension +.>

So that->

The method comprises the steps of carrying out a first treatment on the surface of the Calculating attention score and normalizing to 0-1 by Sigmoid function, and generating attention force diagram by up-sampling and enlarging sizeAtt；

Trying the attention toAttOriginal low-level feature map transferred with jump connection

As an output; will->

Advanced feature map with upsampling +.>

Element-by-element addition to generate a new feature map

Then, with->

Splicing is performed in the channel dimension.

7. The method of claim 1, wherein training the multi-attentional transporter network model using weighted loss functions comprises:

8. The method for segmenting nuclei in thymoma histopathological images of claim 1, wherein the fully automated post-processing method comprises:

setting the pixel value of the nucleus as the local maximum value of the image, and obviously reducing the pixel value among the nuclei; if the decreasing amplitude of the pixel values on all paths P from one local maximum pixel M to another larger local maximum pixel N is larger than a certain threshold value, M is considered as a nucleus core, and M is taken as an initial mark to carry out watershed; meanwhile, threshold operation is implemented, only pixels larger than a certain threshold are nuclear pixels, otherwise, the pixels are background pixels; and detecting all connected domains in the cell nucleus mask by using an eight-connected domain algorithm, and removing the connected domains with the area smaller than a certain threshold value as isolated pixel blocks.

9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the method for segmentation of nuclei in thymoma histopathological images according to any one of claims 1-8.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, carries out the steps of the method for segmentation of nuclei in thymoma histopathological images according to any one of claims 1-8.