CN112580748B

CN112580748B - Method for counting classified cells of stain image

Info

Publication number: CN112580748B
Application number: CN202011608423.3A
Authority: CN
Inventors: 仲佳慧; 曹永盛; 张于凤
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-10-14
Anticipated expiration: 2040-12-30
Also published as: CN112580748A

Abstract

The invention relates to a method for counting classified cells of a stain image, which comprises the steps of firstly manufacturing a precisely classified cell data set, then training a model by adopting a target detection method in deep learning so as to identify a first cell and a second cell in the current stain image, and simultaneously analyzing a detection result and optimizing the model so as to obtain more accurate number of the first cell. Comparing the detection result with the manual annotation of a pathologist, the F1 score of the first A-type cells in the current staining image of the first-type cells can reach about 95%, and the F1 score of the first B-type cells is about 91%.

Description

A Method for Differential Cell Counting of Stained Images

技术领域technical field

本发明涉及计算机视觉中的深度学习技术以及目标检测技术。The present invention relates to deep learning technology and target detection technology in computer vision.

背景技术Background technique

目标检测作为图像理解中的重要一环，其任务是找出图像中所有感兴趣的目标(物体)，确定它们的位置和大小，是机器视觉领域的核心问题之一。卷积神经网络CNN是深度学习的基本工具之一,通常用于图像分析。VGG，GoogleLenet，resnet等卷积神经网络，在目标检测和语义分割方面都表现出卓越的性能。与图像分类不同，目标检测需要在图像内定位对象。基于对象检测的深度学习模型可以分为两个阶段。一个阶段是生成区域建议，另一阶段是对区域进行分类并为每个对象提供置信度。一些相关的方法包含Fast R-CNN，以及改进后的Faster R-CNN，SPP，R-FCN和Mask R-CNN。深度学习在各种任务上取得了空前的表现，尤其是在生物医学领域。此外，基于深度学习的端到端检测方法，例如SSD，YOLO和RON等。可以直接预测对象的大小、位置和标签,没有任何中间步骤，相比于两个阶段的Faster-RCNN提高了检测的速度。尽管CNN具有吸引人的品质，但还是有必要将训练集做的足够大。As an important part of image understanding, object detection is one of the core problems in the field of machine vision. Convolutional Neural Network CNN is one of the basic tools of deep learning and is usually used for image analysis. Convolutional neural networks such as VGG, GoogleLenet, resnet, etc., have shown excellent performance in object detection and semantic segmentation. Unlike image classification, object detection requires locating objects within an image. Object detection based deep learning models can be divided into two stages. One stage is to generate region proposals, and the other stage is to classify regions and provide confidence for each object. Some related methods include Fast R-CNN, and the improved Faster R-CNN, SPP, R-FCN and Mask R-CNN. Deep learning has achieved unprecedented performance on a variety of tasks, especially in the biomedical field. Furthermore, end-to-end detection methods based on deep learning, such as SSD, YOLO and RON, etc. The size, location and label of objects can be directly predicted without any intermediate steps, which improves the speed of detection compared to the two-stage Faster-RCNN. Despite the attractive qualities of CNNs, it is necessary to make the training set large enough.

Ki-67增殖指数是癌细胞增殖的重要生物标志物，与肿瘤的分化，侵袭，转移和预后密切相关，快速获取准确的Ki-67指数对于临床研究具有很大的意义。Ki-67指数是阳性癌细胞个数占所有癌细胞个数的比例，但由于Ki-67染色图像中各类细胞的细胞核的形态、颜色极其相似，使得一些传统方法会将非肿瘤细胞视为肿瘤细胞，从而导致大量计数错误。南京航空航天大学Ruihan Zhang等人利用GAN网络(生成对抗网络)，一种成功的生成模型训练新方法，通过生成更多的人工样本进行数据增强，结合CNN和SSD来提高Ki67准确度。Dayong Wang等人将整个细分乳腺癌图像分为patch，并根据patch进行分类。Saha等人建立了自动评分系统Ki-67使用带有期望最大化的Gamma混合模型GMM进行种子点检测、patch的选择和深度学习，使得最终的精度达到了93％,召回率达到了88％。Ki-67的实际分析表明，有限的标记数据集可能导致CNN不足，进而导致训练集过拟合并影响准确度。Ki-67 proliferation index is an important biomarker of cancer cell proliferation, which is closely related to tumor differentiation, invasion, metastasis and prognosis. Quickly obtaining an accurate Ki-67 index is of great significance for clinical research. The Ki-67 index is the ratio of the number of positive cancer cells to the total number of cancer cells. However, due to the extremely similar morphology and color of the nuclei of various types of cells in the Ki-67 staining images, some traditional methods will regard non-tumor cells as non-tumor cells. tumor cells, resulting in a large number of count errors. Ruihan Zhang et al. of Nanjing University of Aeronautics and Astronautics used GAN network (Generation Adversarial Network), a successful new method for generative model training, by generating more artificial samples for data enhancement, combined with CNN and SSD to improve Ki67 accuracy. Dayong Wang et al. divided the entire segmented breast cancer images into patches and classified them according to the patches. Saha et al. established an automatic scoring system Ki-67 using a Gamma mixture model GMM with expectation maximization for seed point detection, patch selection and deep learning, resulting in a final precision of 93% and a recall rate of 88%. Practical analysis of Ki-67 shows that limited labeled datasets may lead to insufficient CNN, which in turn leads to overfitting of the training set and affects accuracy.

许多计算机化方法都依赖颜色特征来检测和分类细胞来进行Ki-67评分。Al-Lahham等人首先将K-means聚类应用于变换后的色彩空间，随后使用数学形态学和连接成分分析对Ki-67染色的组织学图像上的细胞进行分段和计数。在中使用图像分析系统来量化肿瘤细胞，其中需要适当选择颜色强度阈值。Markiewicz使用分水岭算法来分离接触细胞，并使用支持向量机(SVM)分类器将免疫阳性细胞与免疫阴性细胞区分开。然而，这些方法不能同时精确区分肿瘤和非肿瘤细胞并且将接触细胞分开。Ki-67图像属于免疫组化染色图像(IHC)，近年来IHC染色图像自动细胞核分割的研究近来引起了人们的关注。大多数相关研究集中在基于阈值，边缘检测或基于机器学习的像素分类的图像分割方法上。其中像素强度阈值化方法将利用红色，绿色和蓝色(RGB)颜色空间中的像素强度，并根据棕色和蓝色之间的差异应用强度转换和全局阈值化。在监督和无监督的学习方法中，将单个像素作为研究对象，而同一类别中的像素共同构成了组织的每个组成部分。研究人员需要在进行监督分类之前，选择包括所有细胞类型在内的每个组织成分的代表性区域作为训练样本，其性能很大程度上取决于预先定义的训练样本的质量和全面性。由于深度学习在医学的应用还不是很广泛，对于Ki-67染色图像来说并没有可以直接使用的数据集。如何有效的提高Ki-67染色图像的检测指标，仍是现在研究的重点方向。Many computerized methods rely on color features to detect and classify cells for Ki-67 scoring. Al-Lahham et al. first applied K-means clustering to the transformed color space and subsequently used mathematical morphology and connected component analysis to segment and count cells on Ki-67-stained histological images. Tumor cells are quantified using an image analysis system in which appropriate selection of color intensity thresholds is required. Markiewicz used a watershed algorithm to separate contact cells and a support vector machine (SVM) classifier to distinguish immunopositive cells from immunonegative cells. However, these methods cannot simultaneously accurately distinguish tumor and non-tumor cells and separate contact cells. Ki-67 images belong to immunohistochemical staining images (IHC). In recent years, the research on automatic nucleus segmentation of IHC staining images has recently attracted people's attention. Most of the related research focuses on image segmentation methods based on thresholding, edge detection or pixel classification based on machine learning. The pixel intensity thresholding method will utilize pixel intensities in the red, green and blue (RGB) color space and apply intensity transformation and global thresholding based on the difference between brown and blue. In both supervised and unsupervised learning methods, individual pixels are studied, while pixels in the same class collectively make up each component of the organization. Researchers need to select representative regions of each tissue component, including all cell types, as training samples before performing supervised classification, the performance of which largely depends on the quality and comprehensiveness of the pre-defined training samples. Since the application of deep learning in medicine is not very extensive, there is no dataset that can be directly used for Ki-67 stained images. How to effectively improve the detection index of Ki-67 stained images is still the focus of current research.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是，基于深度学习和全监督的方法，提出一种检测色图像的分类细胞个数的方法。The technical problem to be solved by the present invention is to propose a method for detecting the number of classified cells of a color image based on the deep learning and full supervision method.

本发明为解决上述技术问题所采用的技术方案是，一种对染色图像的分类细胞计数的方法，包括以下步骤：The technical solution adopted by the present invention to solve the above-mentioned technical problems is a method for classifying cells of stained images, comprising the following steps:

1)创建训练集：1) Create a training set:

1-1)采集标本图像，人工标记部分标签图像，对已经进行标记的扫描全片的区域上切分出小图像patch；标记使用的标签分为四类：第一A类细胞、第一B类细胞、第二A类细胞和第二B类细胞；其中，第一A类细胞和第一B类属于第一类细胞，第二A类细胞和第二B类细胞属于第二类细胞；1-1) Collect the specimen image, manually mark part of the label image, and cut out small image patches on the area of the scanned whole film that has been marked; the labels used for labeling are divided into four categories: the first type A cells, the first type B cells Class cells, second class A cells and second class B cells; wherein the first class A cells and the first class B belong to the first class of cells, and the second class A cells and the second class B cells belong to the second class of cells;

1-2)筛除背景占比高或细胞模糊的patch，将完成筛选的patch以及其对应标签组成预训练集；1-2) Screen out patches with a high background ratio or blurred cells, and form a pre-training set of the filtered patches and their corresponding labels;

1-3)将预训练集输入Libra-RCNN网络模型完成预训练；1-3) Input the pre-training set into the Libra-RCNN network model to complete the pre-training;

1-4)将一部分未标记的标本图像输入预训练得到的预训练网络模型中进行测试，得到预训练网络模型输出的测试结果；1-4) input a part of unlabeled specimen images into the pre-training network model obtained by pre-training for testing, and obtain the test results output by the pre-training network model;

1-5)对将预训练网络模型输出的测试结果进行人工修正，再筛除背景占比高或细胞模糊的patch，将经过人工修正和筛选的测试结果增加至作预训练集中，判断是否满足预训练结束条件，如是，将当前的预训练集作为训练集，之后进入步骤2)，否则，返回步骤1-3)；1-5) Manually correct the test results output by the pre-training network model, and then screen out patches with a high proportion of background or blurred cells, and add the manually corrected and screened test results to the pre-training set to determine whether it meets the requirements. Pre-training end condition, if so, take the current pre-training set as the training set, and then enter step 2), otherwise, return to step 1-3);

2)将最终得到的完善的训练集输入Libra-RCNN网络模型中进行训练，得到训练模型；2) Input the final perfect training set into the Libra-RCNN network model for training to obtain a training model;

3)检测步骤：3) Detection steps:

3-1)在输入的待检测的扫描全片上选取感兴趣区域ROI；可以依据机器学习判断出的第一类细胞的数量，再选取第一类细胞较多的区域作为ROI或者由医生自由选取ROI；3-1) Select the region of interest ROI on the input scan full slice to be detected; the number of cells of the first type determined by machine learning can be used, and then the region with more cells of the first type can be selected as the ROI or freely selected by the doctor ROI;

3-2)在选取好的ROI内切取切片patch；3-2) Cut the slice patch in the selected ROI;

3-3)对所有patch进行检测分类，得到所有patch的最终检测结果；每一个patch的检测分类过程为：3-3) Detect and classify all patches to obtain the final detection results of all patches; the detection and classification process of each patch is as follows:

3-3-1)利用训练好的模型对每一个patch进行检测，并对patch在预设的重叠区域overlap内的重复检测框进行去重；3-3-1) Use the trained model to detect each patch, and deduplicate the duplicate detection frame of the patch in the preset overlapping area overlap;

3-3-2)遍历当前patch上所有检测框，并进行不同类间检测框去重，遍历完毕得到该patch的最终检测结果；其中，不同类间检测框去重的具体方式为：在对两两不同类别检测框计算交并比IOU的大小，当出现IOU大于预设阈值的情况，则删除两两不同类别检测框中置信度较低的那一个，保留信度较高的检测框；3-3-2) Traverse all detection frames on the current patch, and deduplicate detection frames between different classes, and obtain the final detection result of the patch after the traversal; wherein, the specific method of deduplication between detection frames between different classes is: The size of the intersection ratio of the IOU is calculated for the detection boxes of two different categories. When the IOU is larger than the preset threshold, the one with the lower confidence in the detection boxes of the different categories is deleted, and the detection frame with the higher reliability is retained;

4)将所有patch检测结果坐标映射到整个扫描全片上，根据统计得到的第一A类细胞和第一B类细胞个数计算出第一类细胞个数。本发明通过进行采集标本、整理筛选、切分patch、再次筛选等步骤组成最终的数据集。并且数据集的细胞种类一共分为四类，其中第一类细胞有两类，第二类细胞有两类，第二类细胞的检测不仅仅可以使得第一类细胞的判别更加精准，而且可用于拓展其他医学指标分析。Libra-RCNN网络是Faster-RCNN网络的改进，大大提高了检测性能。未解决检测结果出现了同一细胞有不同类别的检测框的问题特别提出了不同类间检测框去重的方法。4) The coordinates of all patch detection results are mapped to the entire scanning film, and the number of the first type of cells is calculated according to the number of the first type A cells and the first type B cells obtained by statistics. In the present invention, the final data set is formed by the steps of collecting specimens, sorting and screening, dividing patches, and re-screening. And the cell types of the dataset are divided into four categories, of which there are two types of cells of the first type and two types of cells of the second type. The detection of the second type of cells can not only make the discrimination of the first type of cells more accurate, but also can be used To expand the analysis of other medical indicators. The Libra-RCNN network is an improvement of the Faster-RCNN network, which greatly improves the detection performance. The problem that the same cell has different categories of detection boxes has not been solved in the detection results. In particular, a method of deduplication between detection boxes between different categories is proposed.

本发明的有益效果是，检测与分类准确，为染色图像分类细胞的准确计数提供了基础，准确地提供分类细胞个数，能为临床医学提供更好的辅助。The beneficial effect of the invention is that detection and classification are accurate, which provides a basis for the accurate counting of the classified cells in the stained image, and the number of classified cells is accurately provided, which can provide better assistance for clinical medicine.

附图说明Description of drawings

图1为四类细胞的典型形态；Figure 1 shows the typical morphology of four types of cells;

图2检测结果在全片上的可视化；Figure 2. Visualization of the detection results on the whole film;

图3对比试验数据。Figure 3 compares the experimental data.

具体实施方式Detailed ways

下面根据附图和实例对本发明进行进一步详细说明。The present invention will be described in further detail below according to the accompanying drawings and examples.

实施例对本发明分类细胞的计数进行应用。四类标签中第一A类细胞对应阳性癌细胞，第一B类细胞对应阴性癌细胞，第二A类细胞对应淋巴细胞，第二B类细胞对应间质细胞。The examples apply the counting of sorted cells of the present invention. Among the four types of labels, the first type A cells correspond to positive cancer cells, the first type B cells correspond to negative cancer cells, the second type A cells correspond to lymphocytes, and the second type B cells correspond to stromal cells.

本发明以研究乳腺癌中的Ki-67染色图像为主，Ki-67染色图像的细胞大致分为褐色和蓝色两种颜色，形态不一。其中阳性癌细胞一般为棕褐色，形态呈圆形，体积较大；阴性癌细胞一般为蓝色，呈圆形分布，体积与阳性癌细胞；而正常细胞(间质细胞，淋巴细胞等)，多数为蓝色，少部分为褐色，形态各异。我们将细胞一共分为四类，阳性癌细胞(positivetumor cell)，阴性癌细胞(negative tumor cell)，淋巴细胞(lymphocyte cell)和间质细胞(stromal cell)。四类细胞的典型形态如图1所示，典型的淋巴细胞和间质细胞大多为蓝色，形态与癌细胞较易区分开，但一些特殊的淋巴和间质细胞形态与癌细胞很相似，易与癌细胞(尤其是阴性癌细胞)发生混淆。The invention mainly studies the Ki-67 staining images in breast cancer, and the cells in the Ki-67 staining images are roughly divided into two colors, brown and blue, with different shapes. Among them, positive cancer cells are generally brown, round in shape, and larger in size; negative cancer cells are generally blue, with a circular distribution, and the volume is the same as that of positive cancer cells; while normal cells (stromal cells, lymphocytes, etc.), Most are blue, a few are brown, and the shapes are different. We divided the cells into four categories, positive tumor cells, negative tumor cells, lymphocytes and stromal cells. The typical morphology of the four types of cells is shown in Figure 1. The typical lymphocytes and interstitial cells are mostly blue, and the morphology is easy to distinguish from cancer cells, but some special lymphoid and interstitial cells are very similar in shape to cancer cells. It is easy to be confused with cancer cells (especially negative cancer cells).

全监督目标检测的方法，例如R-CNN，Fast R-CNN和Faster-RCNN，是基于two-stage的目标检测方法：首先提取感兴趣区域ROI，然后对其进行分类。随后，one-stage目标检测方法也应运而生，例如SSD，YOLOv2和RetinaNet。相比于two-stage方法，这些方法的速度会更快，但精度比不上two-stage方法。本发明更注重精度的大小，所以选取two-stage网络Libra-rcnn。Libra-RCNN整体网络架构由两个主要模块组成：(1)区域提议网络(RPN)，它返回图像中ROI；(2)检测网络，它在执行边界框回归的同时对区域内的目标进行分类。RPN的锚点具有三个比例和三个长宽比。由于检测目标是形状不规则的细胞，因此有必要设置多个纵横比。根据细胞的体型本发明使用三个长宽比，即1：1、1：2和2：1。同时ResNeXt模型作为基本的卷积神经网络，以便从输入图像中提取更完整有力的特征图。Fully supervised object detection methods, such as R-CNN, Fast R-CNN and Faster-RCNN, are two-stage based object detection methods: first extract the region of interest ROI, and then classify it. Subsequently, one-stage object detection methods also emerged, such as SSD, YOLOv2 and RetinaNet. Compared to the two-stage method, these methods will be faster, but not as accurate as the two-stage method. The present invention pays more attention to the size of the precision, so the two-stage network Libra-rcnn is selected. The overall network architecture of Libra-RCNN consists of two main modules: (1) Region Proposal Network (RPN), which returns ROIs in images; (2) Detection Network, which classifies objects within regions while performing bounding box regression . The anchors of RPN have three scales and three aspect ratios. Since the detection target is irregularly shaped cells, it is necessary to set multiple aspect ratios. The present invention uses three aspect ratios, namely 1:1, 1:2 and 2:1, depending on the size of the cells. At the same time, the ResNeXt model acts as a basic convolutional neural network to extract more complete and powerful feature maps from the input image.

全监督网络的数据集格外重要，所以对于数据的处理也要更加精准。首先我们挑选出扫描清晰，阳性癌细胞占比不同的Ki-67全片。在病理医生标记的区域中切分patch(小图像),大小为1024*1024，并根据自己的需求设置一定的overlap(交叠部分)。为了减少病理学家的标注工作量，我们首先让医生标注三四张wsi(全片)，在标注区域中切分patch,将背景占比高以及细胞较模糊的patch筛除掉，整理好所有patch进行初步的训练得到训练模型，随后将训练模型去测试更多的wsi，将测试结果整理好返回给医生进行修改，医生将错判，漏判的细胞进行标注，修正好加入训练集对模型进行fine tune,如此反复迭代。The data set of the fully supervised network is particularly important, so the processing of the data must be more accurate. First, we selected the whole Ki-67 films with clear scans and different proportions of positive cancer cells. Segment the patch (small image) in the area marked by the pathologist, the size is 1024*1024, and set a certain overlap (overlapping part) according to your own needs. In order to reduce the labeling workload of pathologists, we first asked doctors to label three or four wsi (full films), segmented patches in the labeling area, screened out the patches with high background proportion and blurred cells, and sorted out all the patches. The patch performs preliminary training to obtain the training model, and then the training model is used to test more wsi, and the test results are sorted and returned to the doctor for modification. Perform fine tune, and so on.

有限的训练集数据会让模型过度拟合，造成准确率的降低。为了增加数据的多样性，我们利用一些其他染色方式的图片，通过风格转换的方式生成Ki-67染色的图像，从而实现增强数据的目的。为了实现这一目标，我们采用CycleGAN，该网络可以实现不成对的图像到图像转换，通过不成对的示例学习两个图像域X和Y之间的映射功能。而映射G：X→Y和一个逆映射F：Y→X是使用CNN共同学习的。我们训练CycleGAN学习源域Xs和目标域Xt之间的映射功能，将其他类型的染色图像转换为Ki-67图像。通过CycleGAN来生成更多的数据样本，从而达到数据增强的目的。The limited training set data will make the model overfit and reduce the accuracy. In order to increase the diversity of the data, we use some other dyed pictures to generate Ki-67 dyed images through style transfer, so as to achieve the purpose of enhancing the data. To achieve this, we employ CycleGAN, a network that enables unpaired image-to-image translation, learning a mapping function between two image domains X and Y from unpaired examples. Whereas the mapping G:X→Y and an inverse mapping F:Y→X are jointly learned using CNN. We train CycleGAN to learn a mapping function between source domain Xs and target domain Xt, converting other types of stained images to Ki-67 images. More data samples are generated through CycleGAN to achieve the purpose of data enhancement.

之后开始训练模型，随着数据多样性的增加进行了一系列的对比实验，选出最优模型进行检测，对检测结果进行可视化分析，分析错检、漏检细胞的详细情况，其中主要存在的问题有：1.重叠细胞漏检；2.同一个细胞存在两个检测框；3.检测细胞类别错误。实施例将nms改为soft-nms缓解了重叠细胞的漏检情况，而针对第二个问题利用交并比IOU对检测结果进行处理，将同一个细胞的两个检测框的置信度进行比较，保留置信度较大的检测框。经过优化后的检测结果提升了三四个百分点。最后一个问题包含了不同类别之间的错检情况，大致分为两种情况：1)阴性癌细胞与正常细胞之间的错检；2)阳性癌细胞与淋巴细胞的错检；其中第一种情况为主要错误情况，第二种情况少数存在。这两种错检情况随着数据集的完善都会有所改善，但缓和幅度较小，也是在本发明的基础上可以继续提高性能的改善点。After that, the training model was started. With the increase of data diversity, a series of comparative experiments were carried out. The optimal model was selected for detection, and the detection results were visually analyzed to analyze the details of misdetected and missed cells. The problems are: 1. Missing detection of overlapping cells; 2. There are two detection boxes for the same cell; 3. The wrong cell type is detected. In the embodiment, changing nms to soft-nms alleviates the missed detection of overlapping cells, and for the second problem, the detection result is processed by the intersection ratio IOU, and the confidence levels of the two detection frames of the same cell are compared, Retain detection boxes with high confidence. The optimized detection results have improved by three or four percentage points. The last question covers the misdetection between different categories, which can be roughly divided into two cases: 1) Misdetection between negative cancer cells and normal cells; 2) Misdetection between positive cancer cells and lymphocytes; the first The first case is the main error case, and the second case is rare. These two false detection situations will be improved with the improvement of the data set, but the mitigation range is small, which is also an improvement point that can continue to improve the performance on the basis of the present invention.

具体步骤如下：Specific steps are as follows:

1)创建训练集：1) Create a training set:

1-1)采集标本图像，人工标记部分标签图像，对已经进行标记的Ki-67扫描全片的区域上切分出小图像patch；1-1) Collect specimen images, manually mark part of the label images, and cut out small image patches on the area of the entire Ki-67 scan that has been marked;

1-3)通过CycleGAN网络将除Ki67之外的染色方式的图像转换为Ki67图像，从而达到增强预训练集的目的；1-3) Convert images of dyeing methods other than Ki67 to Ki67 images through the CycleGAN network, so as to achieve the purpose of enhancing the pre-training set;

1-4)将预训练集输入Libra-RCNN网络模型完成预训练；1-4) Input the pre-training set into the Libra-RCNN network model to complete the pre-training;

1-5)将一部分未标记的标本图像输入预训练得到的预训练网络模型中进行测试，得到预训练网络模型输出的测试结果；1-5) input a part of unlabeled specimen images into the pre-training network model obtained by pre-training for testing, and obtain the test results output by the pre-training network model;

1-6)对将预训练网络模型输出的测试结果进行人工修正，再筛除背景占比高或细胞模糊的patch，将经过人工修正和筛选的测试结果增加至作预训练集中，判断是否满足预训练结束条件，如是，将当前的预训练集作为训练集，之后进入步骤2)，否则，返回步骤1-4)；1-6) Manually correct the test results output by the pre-training network model, and then screen out patches with a high background ratio or blurred cells, and add the manually corrected and screened test results to the pre-training set to determine whether the results are satisfied. The pre-training end condition, if so, take the current pre-training set as the training set, and then enter step 2), otherwise, return to step 1-4);

3)检测步骤：3) Detection steps:

3-1)在输入的待检测的Ki-67扫描全片上选取感兴趣区域ROI；可以依据机器学习判断出的癌细胞的数量，再选取癌细胞较多的区域作为ROI或者由医生自由选取ROI；3-1) Select the ROI of the region of interest on the input Ki-67 scan to be detected; the number of cancer cells determined by machine learning can be used, and then the region with more cancer cells can be selected as the ROI or the ROI can be freely selected by the doctor ;

3-3-1)利用训练好的模型对每一个patch进行检测，并对切除patch时的overlap内的细胞进行重复检测框的去重；3-3-1) Use the trained model to detect each patch, and repeat the deduplication of the detection frame for the cells in the overlap when the patch is removed;

3-3-2)遍历当前patch上所有检测框，并对同一细胞的两个不同类别的检测框进行去重，遍历完毕得到该patch的最终检测结果；其中，不同类间检测框去重的具体方式为：在对两两不同类别检测框计算交并比IOU的大小，当出现IOU大于预设阈值的情况，则删除两两不同类别检测框中置信度较低的那一个，保留信度较高的检测框；3-3-2) Traverse all the detection frames on the current patch, and deduplicate the detection frames of two different categories of the same cell, and obtain the final detection result of the patch after the traversal; among them, the detection frames between different classes are deduplicated The specific method is as follows: when calculating the size of the intersection and union ratio IOU for the detection boxes of different categories, when the IOU is larger than the preset threshold, delete the one with lower confidence in the detection boxes of different categories, and keep the reliability. higher detection frame;

4)将所有patch检测结果坐标映射到整个Ki-67扫描全片上，根据统计得到的阳性癌细胞和阴性癌细胞个数计算出Ki67指标。4) Map the coordinates of all patch detection results to the whole Ki-67 scan, and calculate the Ki67 index according to the number of positive and negative cancer cells obtained by statistics.

图2所示为在Ki-67全片上的检测结果。因为Ki-67指标只与癌细胞有关，所以我们最终只将癌细胞进行标注，为了更加清晰，我们将检测框转换为点点在细胞上，其中深黑色为阳性癌细胞，浅灰色为阴性癌细胞，图中随意框出去的区域为一个矩形(同样支持不规则区域)。这样我们最终提供给医生的是Ki-67指标以及我们检测出的癌细胞。Figure 2 shows the detection results on the whole Ki-67 film. Because the Ki-67 indicator is only related to cancer cells, we only label the cancer cells in the end. In order to be more clear, we convert the detection frame to dots on the cells, in which the dark black is positive cancer cells, and the light gray is negative cancer cells , the area arbitrarily framed in the figure is a rectangle (irregular areas are also supported). So what we end up giving the doctor is the Ki-67 indicator and the cancer cells we detect.

实施例的检测评估指标为F1分数。F1分数为recall(召回率)和preciso精度)的调和平均，其中recall＝TP/(TP+FN)，precison＝TP/(TP+FP)。(TP：真阳的数量，FP＝假阳的数量，FN＝假阴的数量)。The detection evaluation index of the embodiment is F1 score. The F1 score is the harmonic mean of recall (recall rate) and precise precision), where recall=TP/(TP+FN) and precision=TP/(TP+FP). (TP: number of true positives, FP=number of false positives, FN=number of false negatives).

将检测结果与病理学家的手动注释相比，在乳腺癌的Ki-67图像中阳性癌细胞的F1分数的为95％左右，阴性癌细胞的F1分数为91％左右，F1分数为recall和precison的调和平均。具体指标如图3所示，并且检测框的位置较准确，其中阴性癌细胞的指标略低于阳性癌细胞，其原因为阴性癌细胞更容易与正常细胞发生混淆，导致检测指标低于阳性癌细胞。试验结果证明可以初步用于辅助医生进行临床判断。Comparing the detection results with the pathologist's manual annotation, in the Ki-67 images of breast cancer, the F1 score of positive cancer cells was around 95%, the F1 score of negative cancer cells was around 91%, and the F1 score of recall and The harmonic mean of precision. The specific indicators are shown in Figure 3, and the position of the detection frame is more accurate. The indicators of negative cancer cells are slightly lower than that of positive cancer cells. The reason is that negative cancer cells are more likely to be confused with normal cells, resulting in detection indicators lower than positive cancer cells. cell. The test results prove that it can be used initially to assist doctors in making clinical judgments.

Claims

1. A method of classifying a cell count for a stain image, comprising the steps of:

1) Creating a training set:

1-1) collecting a specimen image, manually marking a part of label image, and cutting a small image patch on a marked scanning whole area; labels used for labeling fall into four categories: a first class a cell, a first class B cell, a second class a cell, and a second class B cell; wherein the first A-type cell and the first B-type cell belong to a first type of cell, and the second A-type cell and the second B-type cell belong to a second type of cell;

1-2) screening out the patch with high background occupation ratio or fuzzy cells, and forming a pre-training set by the screened patch and the corresponding label;

1-3) inputting the pre-training set into a Libra-RCNN network model to complete pre-training;

1-4) inputting a part of unlabelled specimen images into a pre-training network model obtained by pre-training for testing to obtain a test result output by the pre-training network model;

1-5) manually correcting the test result output by the pre-training network model, screening out the patch with high background occupation ratio or fuzzy cells, adding the manually corrected and screened test result into a pre-training set, and judging whether a pre-training finishing condition is met, wherein if yes, the current pre-training set is used as a training set, and then the step 2) is carried out, and if not, the step 1-3) is carried out;

2) Inputting the finally obtained perfect training set into a Libra-RCNN network model for training to obtain a training model;

3) A detection step:

3-1) selecting a region of interest ROI on the input scanning full sheet to be detected; according to the number of the first type cells judged by machine learning, selecting a region with more first type cells as an ROI or freely selecting the ROI by a doctor;

3-2) cutting slices patch in the selected ROI;

3-3) carrying out detection classification on all the patches to obtain the final detection results of all the patches; the detection classification process of each patch is as follows:

3-3-1) detecting each patch by using the trained model, and removing the duplicate of the repeated detection frame of the patch in the preset overlap region overlap;

3-3-2) traversing all the detection frames on the current patch, and performing duplication removal on the detection frames among different classes to obtain a final detection result of the patch after traversing is finished; the specific way of removing the duplicate of the detection frame among different classes is as follows: calculating the size of the intersection comparison IOU of every two different types of detection frames, and when the IOU is larger than a preset threshold value, deleting the detection frame with lower confidence level from every two different types of detection frames, and keeping the detection frame with higher confidence level;

4) And mapping all patch detection result coordinates to the whole scanning whole slice, and calculating the number of the first type cells according to the counted number of the first type A cells and the first type B cells.

2. The method of claim 1, wherein the specimen image includes a current stain image and a current stain image into which other types of stain images than the current stain image are converted, and the conversion of the other types of stain images into the current stain image is accomplished using a CycleGAN network.