CN114494197A

CN114494197A - Cerebrospinal fluid cell identification and classification method for small-complexity sample

Info

Publication number: CN114494197A
Application number: CN202210094305.8A
Authority: CN
Inventors: 屈剑锋; 万亚辉; 刘金卓; 满石林
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-13

Abstract

The invention discloses a complex small sample cerebrospinal fluid cell identification and classification method. The specific method steps are as follows: S1: use a microscope splicing imaging platform to obtain cell images under a glass slide sample, including monocytes, lymphocytes and neutrophils image; S2: Preprocess the resulting image set. In view of the background impurities caused by lens contamination or improper operation in the collected cell samples, as well as the problem of overlapping and adhesion of individual cells, the main purpose is to filter and denoise the image, remove irrelevant factors in the image, and correct Adhesive cells are separated; S3: Transfer training of the model for a small sample set; S4: According to the obtained model, use the BP algorithm to reverse fine-tune the weights and thresholds of the model; S5: Use the trained model to perform training on the test set Recognition of cerebrospinal fluid cell images and further optimization of the algorithm. The invention can effectively identify different types of human cerebrospinal fluid cells.

Description

A complex small sample cerebrospinal fluid cell identification and classification method

技术领域technical field

本发明涉及医学细胞识别与分类技术领域，尤其涉及一种复杂性小样本脑脊液细胞识别与分类方法。The invention relates to the technical field of medical cell identification and classification, in particular to a complex small sample cerebrospinal fluid cell identification and classification method.

背景技术Background technique

脑脊液(Cerebrospinal fluid，CSF)细胞学，是神经科医生的最重要工具之一，包含总体细胞计数和细胞学分类，为中枢神经系统及其涵盖的一系列病理状况提供重要的第一手信息。CSF样本需要立即处理，尽可能在收集后1小时内处理。在正常CSF细胞以T淋巴细胞为主，少量单核巨噬细胞，偶见B淋巴细胞；细胞数明显增加，镜下以中性粒细胞为主时多见于细菌性脑膜炎，需进一步寻找胞内细菌证据；以淋巴细胞和单核细胞为主的细胞背景多见于病毒感染和慢性炎症；以混杂细胞反应为背景时，可发生于结核性脑膜炎；吞噬红细胞或含血红蛋白降解产物片段的巨噬细胞(后者称为含铁血黄素细胞)，均提示陈旧性的蛛网膜下腔出血；镜下发现异型细胞怀疑肿瘤时，需结合临床及免疫细胞化学染色综合判断。Cerebrospinal fluid (CSF) cytology, one of the most important tools for neurologists, includes gross cell counts and cytological classifications, providing important first-hand information on the central nervous system and the range of pathological conditions it covers. CSF samples need to be processed immediately, if possible within 1 hour of collection. In normal CSF cells are mainly T lymphocytes, a small amount of mononuclear macrophages, and occasionally B lymphocytes; the number of cells increases significantly, and neutrophils are more common under microscope in bacterial meningitis, and further search for cells is required. Evidence of endobacteria; cellular background dominated by lymphocytes and monocytes is more common in viral infections and chronic inflammation; can occur in tuberculous meningitis in the background of promiscuous cellular responses; phagocytosis of erythrocytes or macrophages containing fragments of hemoglobin degradation products Phage cells (the latter are called hemosiderin cells) are all indicative of old subarachnoid hemorrhage; when atypical cells are found under the microscope to suspect a tumor, a comprehensive judgment should be combined with clinical and immunocytochemical staining.

近年来宏基因组测序技术受到了广泛的关注，在中枢神经系统感染性疾病病原体检测中具有一定的价值但仍存在一些不足：标本易受污染，影响检测结果，限制病原体检测的总体敏感性；容易出现假阴性和假阳性结果，甚至导致检测结果无法分析；检测费用高，限制其广泛运用等。所以，宏基因组测序尚不能取代传统的诊断方法。In recent years, metagenomic sequencing technology has received extensive attention, and it has certain value in the detection of pathogens of infectious diseases in the central nervous system, but there are still some shortcomings: the specimen is easily contaminated, which affects the detection results and limits the overall sensitivity of pathogen detection; There are false negative and false positive results, and even the test results cannot be analyzed; the test cost is high, which limits its wide application. Therefore, metagenomic sequencing cannot yet replace traditional diagnostic methods.

迄今为止，大多数临床实验室仍采用手工法对脑脊液中的细胞进行计数及分类，该分析采用的是直接在显微镜下根据细胞核形态分别计数单个核(包括淋巴和单核细胞)和多个核细胞，共计数100个。此方法存在操作繁琐，耗时费力，不同操作者之间由于熟练程度、规范程度不同，具有很大的主观性，结果重复性低、误差较大，无法进行室内或室间质控，并且结果回报时间又长，无法较好地满足临床需要，不适于现代化医院大规模临床工作的开展。与血液和尿液相比，脑脊液样本量少。手工计数时取样量少，不能保证计数的精确度。如果能实现或部分实现脑脊液标本自动化细胞检测，可在一定程度上解决上述问题。目前尚无专用的计数及分类脑脊液细胞的自动分析仪器。To date, most clinical laboratories still count and classify cells in CSF by hand, using the method of counting single nuclei (including lymphoid and monocytes) and multiple nuclei directly under the microscope based on their nuclear morphology. A total of 100 cells were counted. This method is cumbersome, time-consuming and labor-intensive. Due to the difference in proficiency and standardization among different operators, it has great subjectivity. The results have low repeatability and large errors. The return time is long, it cannot meet the clinical needs well, and it is not suitable for the development of large-scale clinical work in modern hospitals. Cerebrospinal fluid samples are small in volume compared to blood and urine. In manual counting, the sampling amount is small, and the accuracy of counting cannot be guaranteed. If the automated cell detection of cerebrospinal fluid samples can be realized or partially realized, the above problems can be solved to a certain extent. At present, there is no special automatic analyzer for counting and classifying cerebrospinal fluid cells.

随着自动化细胞检测技术的发展，近年来许多研究者尝试使用各类细胞分析仪(如全自动尿沉渣分析仪和血液分析仪)对脑脊液细胞进行计数和分析。现在一些新型号的血细胞分析仪增加了体液细胞分析的功能，使实验室对胸水、腹水等中的细胞进行自动计数和分类成为可能。但因脑脊液因其特殊性，样本量较少，且各类仪器本身的原理和内部设计等问题限制了其在脑脊液标本检测中的应用，加之从脑脊液细胞的取样来看，样本玻片存在一定的细菌杂质以及细胞粘连的情况，这对脑脊液细胞识别分类的影响存在着巨大影响。With the development of automated cell detection technology, in recent years, many researchers have tried to use various cell analyzers (such as automatic urine sediment analyzer and blood analyzer) to count and analyze cerebrospinal fluid cells. Now some new models of blood cell analyzers have added the function of body fluid cell analysis, making it possible for laboratories to automatically count and classify cells in pleural fluid, ascites, etc. However, due to the particularity of cerebrospinal fluid, the sample size is small, and the principle and internal design of various instruments limit their application in the detection of cerebrospinal fluid samples. The presence of bacterial impurities and cell adhesion has a huge impact on the identification and classification of cerebrospinal fluid cells.

基于深度学习的脑脊液细胞自动识别技术，能够帮助神经科医生快速建立更科学的鉴别诊断模型，可以减少医生因主观因素对镜检结果造成的不利影响，帮助医生进行细胞计数及分类，大大提高诊断率；且可以与高水平医疗机构资源相互融合，让整体诊断模式趋为规范、统一，极大地提高优质医疗资源向基层医疗机构的辐射作用，提高基层医院的鉴别诊断水平。因此，构建基于深度学习的脑脊液细胞自动识别系统，对于提升中枢神经系统感染性疾病的诊断率、解决地区医疗差异、低年资及基层医师误诊等问题具有重大意义，从而最终使广大患者受益。The automatic identification technology of cerebrospinal fluid cells based on deep learning can help neurologists to quickly establish a more scientific differential diagnosis model, which can reduce the adverse effects of subjective factors on the results of microscopic examinations, help doctors to count and classify cells, and greatly improve the diagnosis. And it can be integrated with the resources of high-level medical institutions, so that the overall diagnosis model tends to be standardized and unified, greatly improving the radiation effect of high-quality medical resources to primary medical institutions, and improving the differential diagnosis level of primary hospitals. Therefore, the construction of an automatic identification system of cerebrospinal fluid cells based on deep learning is of great significance for improving the diagnosis rate of infectious diseases in the central nervous system, solving the problems of regional medical differences, low seniority and misdiagnosis by primary physicians, thus ultimately benefiting the majority of patients.

发明内容SUMMARY OF THE INVENTION

1.要解决的技术问题1. Technical problems to be solved

本发明的目的是为了解决现有技术中因脑脊液因其特殊性，样本量较少，且各类仪器本身的原理和内部设计等问题限制了其在脑脊液标本检测中的应用，加之从脑脊液细胞的取样来看，样本玻片存在一定的细菌杂质以及细胞粘连的情况，这对脑脊液细胞识别分类的影响存在着巨大影响的问题，而提出的一种复杂性小样本脑脊液细胞识别与分类方法。The purpose of the present invention is to solve the problems in the prior art due to the particularity of cerebrospinal fluid, the small sample size, and the principles and internal design of various instruments themselves limit their application in the detection of cerebrospinal fluid samples. From the sampling point of view, the sample slides have certain bacterial impurities and cell adhesion, which has a huge impact on the identification and classification of cerebrospinal fluid cells. A complex small sample cerebrospinal fluid cell identification and classification method is proposed.

2.技术方案2. Technical solutions

为了实现上述目的，本发明采用了如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种复杂性小样本脑脊液细胞识别与分类方法，包括以下步骤：A complex small sample cerebrospinal fluid cell identification and classification method, comprising the following steps:

S1：使用显微镜自动扫描平台进行样本玻片的图像获取，得到具有多个细胞的脑脊液细胞玻片的完整图像集；S1: use the microscope automatic scanning platform to acquire the image of the sample slide to obtain a complete image set of the cerebrospinal fluid cell slide with multiple cells;

S2：对所得图像集进行预处理，对图像进行滤波去噪，剔除图片中不相关的因素，并且对相互粘连的细胞进行分离处理，将所得样本图像集分批次形成训练集和测试集；S2: Preprocess the obtained image set, filter and denoise the image, remove irrelevant factors in the picture, and separate the cells that adhere to each other, and form the obtained sample image set into training set and test set in batches;

S3：针对小样本集进行模型的迁移训练，利用相近领域训练好的深度学习网络对小样本数据集进行迁移学习；S3: Carry out model transfer training for small sample sets, and use deep learning networks trained in similar fields to perform transfer learning on small sample data sets;

S4：将训练好的模型，利用BP算法进行权重和阈值的反向微调，进一步优化模型；S4: Use the BP algorithm to fine-tune the weights and thresholds of the trained model to further optimize the model;

S5：将测试集输入到模型中，输出结果即为脑脊液细胞识别结果。S5: Input the test set into the model, and the output result is the cerebrospinal fluid cell identification result.

优选地，所述S1中将脑脊液细胞玻片放置显微镜的电动平移台上，利用软件系统对玻片的扫描范围进行对角线的坐标点定位，确定图像的扫描范围，并记下扫描的图像范围大小，利用软件系统平台进行采集图像的拼接，得到一张完整的细胞样本图片，对后面的玻片图像采集重复此步骤。Preferably, in the step S1, the cerebrospinal fluid cell slide is placed on the electric translation stage of the microscope, and a software system is used to locate the diagonal coordinates of the scanning range of the slide, determine the scanning range of the image, and record the scanned image The size of the range, the software system platform is used to stitch the collected images to obtain a complete picture of the cell sample, and this step is repeated for the subsequent slide image collection.

优选地，所述S2中对图像集进行预处理的具体步骤为：Preferably, the specific steps of preprocessing the image set in S2 are:

步骤1：针对样本背景中的无关杂质，首先对图像进行背景分离，通过最大类间方差法获取二值图像，利用形态学开操作对二值图像中目标的轮廓进行平滑处理，这一部分在形态学的开操作中，就能将背景中的不是目标的杂质去除，最后用Canny边界检测算法获取目标的轮廓边缘信息；Step 1: For the irrelevant impurities in the background of the sample, firstly separate the background of the image, obtain the binary image by the maximum inter-class variance method, and use the morphological opening operation to smooth the contour of the target in the binary image. In the open operation of learning, the impurities in the background that are not the target can be removed, and finally the Canny boundary detection algorithm is used to obtain the contour edge information of the target;

步骤2：采用凹点检测来对粘连细胞进行分割，粘连细胞凹点指的是由于两个至多个类圆形目标，因相互重叠而产生粘连后所形成的凹区域中的局部曲率最大点，对于近圆性的图像来说，不存在曲率突变的情况，除非是两个或者多个细胞；Step 2: Use the concave point detection to segment the adhesion cells. The concave point of the adhesion cell refers to the local maximum curvature point in the concave area formed by the adhesion of two or more quasi-circular objects due to overlapping with each other. For near-circular images, there is no sudden change in curvature unless there are two or more cells;

步骤3：椭圆拟合，为了获取粘连目标因粘连而丢失掉的轮廓边界，该算法利用目标一般呈现为类圆形的先验知识，使用基于最小二乘法的椭圆拟合方法进行拟合以完成粘连分割。Step 3: Ellipse fitting. In order to obtain the outline boundary of the sticking target lost due to sticking, the algorithm uses the prior knowledge that the target generally appears to be a circle, and uses the ellipse fitting method based on the least squares method for fitting to complete Adhesive segmentation.

优选地，所述S3中采用已在其他领域训练好的多层ResNet模型，截取模型的全连接层前面的部分，输出部分根据所需要的分类种类设置三个输出节点，再利用预训练的迁移方式，将当前多层ResNet的参数作为本发明的初始参数，再用S2中所处理过的图片数据进行网络的训练，具体过程为：Preferably, the multi-layer ResNet model that has been trained in other fields is used in the S3, the part in front of the fully connected layer of the model is intercepted, the output part sets three output nodes according to the required classification type, and then uses the pre-trained migration method, take the parameters of the current multi-layer ResNet as the initial parameters of the present invention, and then use the image data processed in S2 to train the network. The specific process is:

1)根据输入数据的维数确定第一层网络的节点数，也就是输入层节点数；1) Determine the number of nodes in the first layer of the network according to the dimension of the input data, that is, the number of nodes in the input layer;

2)输入数据到残差网络单元，根据ResNet网络的特性，即残差网络的恒等映射函数，每个模块的输出都是当前输入加上残差，利用训练数据对网络进行层层训练；2) Input data to the residual network unit. According to the characteristics of the ResNet network, that is, the identity mapping function of the residual network, the output of each module is the current input plus the residual, and the training data is used to train the network layer by layer;

3)使用已训练好的ResNet网络进行网络的迁移学习，利用其训练得很好的参数作为本模型的训练初始参数，省去了一部分的训练时间以及训练样本，十分适合小样本的训练学习。3) Use the trained ResNet network for network transfer learning, and use its well-trained parameters as the initial training parameters of this model, which saves part of the training time and training samples, which is very suitable for training and learning with small samples.

优选地，所述S4中优化模型的具体步骤为：Preferably, the specific steps of optimizing the model in the S4 are:

1)当训练完成后，通过在ResNet的最顶层添加标签数据，对模型进行有监督训练，即使用反向传播算法(BP)对网络的相关参数进行微调；1) When the training is completed, the model is supervised by adding label data to the top layer of ResNet, that is, using the back propagation algorithm (BP) to fine-tune the relevant parameters of the network;

2)分别将所分类别的带标签数据，输入到ResNet的最顶层中，通过BP算法微调ResNet的权重和阈值，通过有监督的训练将进一步减少训练误差和提高迁移学习识别模型的准确率。2) Input the classified labeled data into the top layer of ResNet, fine-tune the weight and threshold of ResNet through BP algorithm, and further reduce the training error and improve the accuracy of the transfer learning recognition model through supervised training.

优选地，所述S5中将测试集数据输入到训练好的分类模型中，经过多层ResNet映射后，输出层节点数为识别状态的数量，输入向量在输出层成功激活相应类别节点。Preferably, in the step S5, the test set data is input into the trained classification model, and after multi-layer ResNet mapping, the number of output layer nodes is the number of recognition states, and the input vector successfully activates the corresponding category nodes in the output layer.

优选地，所述S5中类别节点中单核细胞为节点0、淋巴细胞为节点1、中性粒细胞为节点2。Preferably, in the category nodes in S5, monocytes are node 0, lymphocytes are node 1, and neutrophils are node 2.

3.有益效果3. Beneficial effects

相比于现有技术，本发明的优点在于：Compared with the prior art, the advantages of the present invention are:

(1)本发明中，能够有效解决采集到的图像中由于背景杂质的存在而导致特征提取困难的问题，能够很好地适应复杂背景下的脑脊液细胞识别；由于采用已训练好的模型参数为本发明的初始训练参数，一定程度上是减少了部分训练时间，训练好的模型参数一般来说是比随机选取的参数更加有可靠性，所以适用于小样本的学习训练；针对有细胞粘连的样本，本发明也利用其细胞的近圆性，对其进行了中心点的预测来进行分割，使得本发明的适用情况多样化。(1) In the present invention, the problem of difficulty in feature extraction due to the existence of background impurities in the collected images can be effectively solved, and the recognition of cerebrospinal fluid cells under complex backgrounds can be well adapted; since the trained model parameters are The initial training parameters of the present invention reduce part of the training time to a certain extent, and the trained model parameters are generally more reliable than randomly selected parameters, so they are suitable for small sample learning and training; For the sample, the present invention also utilizes the near-circularity of its cells, and predicts its center point to segment it, so that the application of the present invention is diversified.

(2)本发明中，基于深度学习的脑脊液细胞自动识别技术，能够帮助神经科医生快速建立更科学的鉴别诊断模型，可以减少医生因主观因素对镜检结果造成的不利影响，帮助医生进行细胞计数及分类，大大提高诊断率；且可以与高水平医疗机构资源相互融合，让整体诊断模式趋为规范、统一，极大地提高优质医疗资源向基层医疗机构的辐射作用，提高基层医院的鉴别诊断水平，因此，构建基于深度学习的脑脊液细胞自动识别系统，对于提升中枢神经系统感染性疾病的诊断率、解决地区医疗差异、低年资及基层医师误诊等问题具有重大意义，从而最终使广大患者受益。(2) In the present invention, the automatic identification technology of cerebrospinal fluid cells based on deep learning can help neurologists quickly establish a more scientific differential diagnosis model, can reduce the adverse effects of doctors on the results of microscopic examination due to subjective factors, and help doctors in cell Counting and classification can greatly improve the diagnosis rate; and can be integrated with the resources of high-level medical institutions, so that the overall diagnosis model tends to be standardized and unified, greatly improving the radiation effect of high-quality medical resources to primary medical institutions, and improving the differential diagnosis of primary hospitals. Therefore, the construction of an automatic identification system of cerebrospinal fluid cells based on deep learning is of great significance for improving the diagnosis rate of infectious diseases in the central nervous system, solving the problems of regional medical differences, low seniority and misdiagnosis by primary physicians, so as to eventually make the majority of patients benefit.

附图说明Description of drawings

图1为本发明提出的一种复杂性小样本脑脊液细胞识别与分类方法的技术流程框图；Fig. 1 is a technical flowchart of a method for identifying and classifying complex small sample cerebrospinal fluid cells proposed by the present invention;

图2为本发明提出的一种复杂性小样本脑脊液细胞识别与分类方法的凹点示意图；FIG. 2 is a schematic diagram of a concave point of a complex small sample cerebrospinal fluid cell identification and classification method proposed by the present invention;

图3为本发明中迁移学习的ResNet模型；Fig. 3 is the ResNet model of transfer learning in the present invention;

图4为本发明中迁移学习示例。FIG. 4 is an example of transfer learning in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments.

实施例1：Example 1:

参照图1，一种复杂性小样本脑脊液细胞识别与分类方法，包括以下步骤：Referring to Figure 1, a complex small sample cerebrospinal fluid cell identification and classification method includes the following steps:

将脑脊液细胞玻片放置显微镜的电动平移台上，利用软件系统对玻片的扫描范围进行对角线的坐标点定位，确定图像的扫描范围，并记下扫描的图像范围大小，利用软件系统平台进行采集图像的拼接，得到一张完整的细胞样本图片，对后面的玻片图像采集重复此步骤；Place the cerebrospinal fluid cell slide on the electric translation stage of the microscope, use the software system to locate the diagonal coordinate points of the scanning range of the slide, determine the scanning range of the image, and note the size of the scanned image range, use the software system platform Perform the stitching of the collected images to obtain a complete picture of the cell sample, and repeat this step for the subsequent slide image collection;

对图像集进行预处理的具体步骤为：The specific steps of preprocessing the image set are as follows:

步骤3：椭圆拟合，为了获取粘连目标因粘连而丢失掉的轮廓边界，该算法利用目标一般呈现为类圆形的先验知识，使用基于最小二乘法的椭圆拟合方法进行拟合以完成粘连分割；Step 3: Ellipse fitting. In order to obtain the outline boundary of the sticking target lost due to sticking, the algorithm uses the prior knowledge that the target generally appears to be a circle, and uses the ellipse fitting method based on the least squares method for fitting to complete Adhesion segmentation;

采用已在其他领域训练好的多层ResNet模型，截取模型的全连接层前面的部分，输出部分根据所需要的分类种类设置三个输出节点，再利用预训练的迁移方式，将当前多层ResNet的参数作为本发明的初始参数，再用S2中所处理过的图片数据进行网络的训练，具体过程为：Use the multi-layer ResNet model that has been trained in other fields, intercept the part in front of the fully connected layer of the model, set three output nodes in the output part according to the required classification type, and then use the pre-training migration method to transfer the current multi-layer ResNet The parameters of the present invention are used as the initial parameters of the present invention, and the image data processed in S2 is used to train the network. The specific process is:

3)使用已训练好的ResNet网络进行网络的迁移学习，利用其训练得很好的参数作为本模型的训练初始参数，省去了一部分的训练时间以及训练样本，十分适合小样本的训练学习；3) Use the trained ResNet network for network transfer learning, and use its well-trained parameters as the initial training parameters of the model, which saves part of the training time and training samples, which is very suitable for training and learning with small samples;

优化模型的具体步骤为：The specific steps to optimize the model are:

将测试集数据输入到训练好的分类模型中，经过多层ResNet映射后，输出层节点数为识别状态的数量，输入向量在输出层成功激活相应类别节点，类别节点中单核细胞为节点0、淋巴细胞为节点1、中性粒细胞为节点2。Input the test set data into the trained classification model, after multi-layer ResNet mapping, the number of output layer nodes is the number of recognition states, the input vector successfully activates the corresponding category node in the output layer, and the monocyte in the category node is node 0 , lymphocytes are node 1, and neutrophils are node 2.

本发明中，能够有效解决采集到的图像中由于背景杂质的存在而导致特征提取困难的问题，能够很好地适应复杂背景下的脑脊液细胞识别；由于采用已训练好的模型参数为本发明的初始训练参数，一定程度上是减少了部分训练时间，训练好的模型参数一般来说是比随机选取的参数更加有可靠性，所以适用于小样本的学习训练；针对有细胞粘连的样本，本发明也利用其细胞的近圆性，对其进行了中心点的预测来进行分割，使得本发明的适用情况多样化。In the present invention, the problem of difficulty in feature extraction caused by the existence of background impurities in the collected images can be effectively solved, and the recognition of cerebrospinal fluid cells under complex backgrounds can be well adapted; since the trained model parameters are used in the present invention The initial training parameters reduce part of the training time to a certain extent, and the trained model parameters are generally more reliable than randomly selected parameters, so they are suitable for small sample learning and training; for samples with cell adhesion, this The invention also makes use of the near-circularity of its cells to predict the center point for segmentation, so that the application of the invention is diversified.

本发明中，基于深度学习的脑脊液细胞自动识别技术，能够帮助神经科医生快速建立更科学的鉴别诊断模型，可以减少医生因主观因素对镜检结果造成的不利影响，帮助医生进行细胞计数及分类，大大提高诊断率；且可以与高水平医疗机构资源相互融合，让整体诊断模式趋为规范、统一，极大地提高优质医疗资源向基层医疗机构的辐射作用，提高基层医院的鉴别诊断水平，因此，构建基于深度学习的脑脊液细胞自动识别系统，对于提升中枢神经系统感染性疾病的诊断率、解决地区医疗差异、低年资及基层医师误诊等问题具有重大意义，从而最终使广大患者受益。In the present invention, the automatic identification technology of cerebrospinal fluid cells based on deep learning can help neurologists to quickly establish a more scientific differential diagnosis model, can reduce the adverse effects of doctors on microscopic examination results caused by subjective factors, and help doctors to count and classify cells , greatly improving the diagnosis rate; and can be integrated with the resources of high-level medical institutions, so that the overall diagnosis model tends to be standardized and unified, greatly improving the radiation effect of high-quality medical resources to primary medical institutions, and improving the level of differential diagnosis in primary hospitals. Therefore, The construction of an automatic identification system for cerebrospinal fluid cells based on deep learning is of great significance for improving the diagnosis rate of infectious diseases in the central nervous system, solving the problems of regional medical differences, low seniority and misdiagnosis by primary physicians, thus ultimately benefiting the majority of patients.

实施例2：Example 2:

参照图1-4，一种复杂性小样本脑脊液细胞识别与分类方法，包括以下步骤：Referring to Figures 1-4, a complex small sample cerebrospinal fluid cell identification and classification method includes the following steps:

1)凹点检测：1) pit detection:

首先，通过一种改进的曲率尺度空间算法(Curvature Scale Space,CSS)对目标轮廓进行角点检测。这种改进的CSS算法以相对较低的尺度保留所有真实角点，然后将所有候选角点的曲率与自适应局部阈值进行比较以移除冗余角点。通常，候选角点的自适应局部阈值是根据其邻域区域的曲率确定的，绝对曲率低于其局部阈值的候选角点将被消除。在角点的候选者中，尽管一些点在曲率数值上被检测为局部最大值，但是它们在支持区域(Region of Support,ROS)中，相邻的点之间的差异却非常小，在选择支持区域的时候，也要选择合适的区域。First, an improved curvature scale space algorithm (Curvature Scale Space, CSS) is used to detect the corner points of the target contour. This improved CSS algorithm preserves all ground-truth corners at a relatively low scale, and then compares the curvature of all candidate corners with an adaptive local threshold to remove redundant corners. Usually, the adaptive local threshold of a candidate corner is determined according to the curvature of its neighborhood area, and candidate corners whose absolute curvature is lower than their local threshold will be eliminated. Among the candidates for corner points, although some points are detected as local maxima in the curvature value, they are in the region of support (ROS), and the difference between adjacent points is very small. When supporting regions, also select the appropriate region.

自适应局部阈值的设定方法为：The setting method of the adaptive local threshold is:

其中，

是邻域区域的曲率均值，p代表候选角点的位置，R₁与R₂为支持区域的尺寸大小，C为系数；in,

is the mean curvature of the neighborhood area, p represents the position of the candidate corner point, R ₁ and R ₂ are the size of the support area, and C is the coefficient;

2)轮廓段分组：2) Contour segment grouping:

使用1)中获得的凹点将粘连区域的轮廓分割为多个轮廓段。由于每个轮廓段并非都对应于一个单独的目标，可能存在多个轮廓段都属于同一个目标的情况。因此，需要将属于同一个目标的轮廓段分为一组。对于某个轮廓段，对于在其一定邻域范围内的另一个轮廓段s_j，若s_i和s_j满足分为一组的条件，则将其分为同一组，这种分组方法包括以下三个条件约束：Use the concave points obtained in 1) to segment the contour of the adhesion region into multiple contour segments. Since each contour segment does not correspond to a separate target, there may be situations where multiple contour segments belong to the same target. Therefore, it is necessary to group the contour segments belonging to the same target into a group. For a contour segment, for another contour segment s _j within a certain neighborhood range, if s _i and s _j satisfy the condition of being divided into one group, they are divided into the same group. This grouping method includes the following Three conditional constraints:

条件1：若分为一组后拟合出的椭圆产生的平均距离偏差(Average DistanceDeviation,ADD)小于组合前任意一个轮廓段单独拟合出的椭圆产生的平均距离偏差，则将这些轮廓段分为同一组。Condition 1: If the average distance deviation (Average DistanceDeviation, ADD) of the fitted ellipses after being divided into a group is smaller than the average distance deviation produced by the ellipses fitted individually by any contour segment before the combination, then these contour segments are divided into two groups. for the same group.

条件2：若分为同组后拟合出的椭圆的重心与每个轮廓段分别单独拟合出的椭圆重心的距离都较为接近，则可分为一组。Condition 2: If the distance between the center of gravity of the fitted ellipse after being divided into the same group and the center of gravity of the ellipse fitted separately by each contour segment is relatively close, it can be divided into one group.

条件3：如果任意两个轮廓段s_i和s_j分别拟合出的椭圆间的重心相距很近，则可分为一组；Condition 3: If the centers of gravity between the ellipses fitted by any two contour segments s _i and s _j respectively are very close, they can be grouped into one group;

采用已在其他领域训练好的多层ResNet模型，截取模型的全连接层前面的部分，输出部分根据所需要的分类种类设置三个输出节点，再利用预训练的迁移方式，即，将当前多层ResNet模型的参数作为本发明的初始参数，再用权利要求3步骤二所处理过的图片数据进行网络的训练，迁移学习的示例如图4。具体实施方式如下：The multi-layer ResNet model that has been trained in other fields is used, and the front part of the fully connected layer of the model is intercepted. The output part sets three output nodes according to the required classification type, and then uses the pre-training migration method, that is, the current multi-layer The parameters of the layer ResNet model are used as the initial parameters of the present invention, and the image data processed in step 2 of claim 3 is used to train the network. An example of transfer learning is shown in Figure 4. The specific implementation is as follows:

ResNet是由多个串联在一起的卷积模块构成，每一个卷积模块都包括一层卷积一层池化，图3为一个ResNet模块。在训练时，将该单元目标映射(即要趋近的最优解)假设为F(x)+x，而输出为：y+x，那么训练的目标就变成了使y趋近于F(x)。即去掉映射前后相同的主体部分x，从而突出微小的变化(残差)。ResNet is composed of multiple convolution modules connected in series. Each convolution module includes one layer of convolution and one layer of pooling. Figure 3 shows a ResNet module. During training, the unit target mapping (that is, the optimal solution to be approached) is assumed to be F(x)+x, and the output is: y+x, then the training goal becomes to make y approach F (x). That is, the same body part x before and after the mapping is removed, so as to highlight the small changes (residuals).

用数学表达式表示为：Mathematically expressed as:

y＝F(x,{W_i})+W_sx (2)y=F(x,{W _i })+W _s x (2)

x是残差单元的输入，y是残差单元的输出，F(x)是目标映射，{W_i}是残差单元中的卷积层。W_s是一个1×1卷积核大小的卷积，作用是给x降维或升维，从而与输出y大小一致(因为需要求和)。x is the input of the residual unit, y is the output of the residual unit, F(x) is the target map, and {W _i } is the convolutional layer in the residual unit. W _s is a convolution with a 1×1 convolution kernel size, which is used to reduce or increase the dimension of x so that it is the same size as the output y (because it needs to be summed).

具体过程为：The specific process is:

2)输入数据到残差网络单元，根据ResNet网络的特性，即，残差网络的恒等映射函数，每个模块的输出都是当前输入加上残差，利用训练数据对网络进行层层训练。2) Input data to the residual network unit. According to the characteristics of the ResNet network, that is, the identity mapping function of the residual network, the output of each module is the current input plus the residual, and the training data is used to train the network layer by layer. .

3)本发明中的训练数据的采集并未达到深度学习要求的几十万的训练集样本，但由于使用已训练好的ResNet网络进行网络的迁移，利用其训练得很好的参数作为本模型的训练初始参数，省去了一部分的训练时间以及训练样本，十分适合小样本的训练学习；3) The collection of training data in the present invention does not reach the hundreds of thousands of training set samples required by deep learning, but due to the use of the trained ResNet network for network migration, the well-trained parameters are used as this model. The initial parameters of training, which saves part of the training time and training samples, are very suitable for training and learning with small samples;

具体实施措施如下：The specific implementation measures are as follows:

(1)模型预训练将迁移过来的权重视作新网络的初始权重，在训练过程中会被梯度下降算法改变数值。(1) Model pre-training uses the transferred weights as the initial weights of the new network, which will be changed by the gradient descent algorithm during the training process.

梯度下降算法：Gradient descent algorithm:

1)从0开始到训练集数据数量结束：1) Starting from 0 and ending with the number of training set data:

①计算第i个训练数据的权重w和偏差b相对于损失函数的梯度。于是我们最终会得到每一个训练数据的权重和偏差的梯度值。① Calculate the gradient of the weight w and bias b of the i-th training data relative to the loss function. So we end up with the gradient values of the weights and biases for each training data.

②计算所有训练数据权重w的梯度的总和。② Calculate the sum of the gradients of all training data weights w.

③计算所有训练数据偏差b的梯度的总和。③ Calculate the sum of the gradients of all training data deviations b.

2)做完上面的计算之后，我们开始执行下面的计算：2) After completing the above calculations, we start to perform the following calculations:

①使用上面第②、③步所得到的结果，计算所有样本的权重和偏差的梯度的平均值。① Using the results obtained in steps ② and ③ above, calculate the average value of the gradients of the weights and biases of all samples.

②使用下面的式子，更新每个样本的权重值和偏差值。②Using the following formula, update the weight value and bias value of each sample.

重复上面的过程，直至损失函数收敛不变。Repeat the above process until the loss function converges.

(2)反向微调也就是对ResNet网络进行有监督训练来减少训练误差和提高分类模型的准确率，BP算法步骤：(2) Reverse fine-tuning is to perform supervised training on the ResNet network to reduce training errors and improve the accuracy of the classification model. BP algorithm steps:

1)输入训练集；1) Input the training set;

2)对于训练集中的每个样本x，设置输入层对应的激活值a¹：2) For each sample x in the training set, set the activation value a ¹ corresponding to the input layer:

前向传播：Forward propagation:

3)由于输出结果与实际结果有误差，则计算输出层产生的错误：3) Since there is an error between the output result and the actual result, calculate the error generated by the output layer:

δ^L＝Δ_aCeσ'(z^L) (6)δ ^L =Δ _a Ceσ'(z ^L ) (6)

4)将上步所求的误差从输出层向隐藏层反向传播：4) Backpropagate the error obtained in the previous step from the output layer to the hidden layer:

δ^l＝((w^l+1)^Tδ^l+1)eδ'(z^l) (7)δ ^l =((w ^l+1 ) ^T δ ^l+1 )eδ'(z ^l ) (7)

5)使用梯度下降，训练参数，不断迭代直至收敛：5) Use gradient descent, train parameters, and iterate until convergence:

本发明中，基于深度学习的脑脊液细胞自动识别技术，能够帮助神经科医生快速建立更科学的鉴别诊断模型，可以减少医生因主观因素对镜检结果造成的不利影响，帮助医生进行细胞计数及分类，大大提高诊断率；且可以与高水平医疗机构资源相互融合，让整体诊断模式趋为规范、统一，极大地提高优质医疗资源向基层医疗机构的辐射作用，提高基层医院的鉴别诊断水平，因此，构建基于深度学习的脑脊液细胞自动识别系统，对于提升中枢神经系统感染性疾病的诊断率、解决地区医疗差异、低年资及基层医师误诊等问题具有重大意义，从而最终使广大患者受益。In the present invention, the automatic identification technology of cerebrospinal fluid cells based on deep learning can help neurologists to quickly establish a more scientific differential diagnosis model, can reduce the adverse effects of doctors on microscopic examination results caused by subjective factors, and help doctors to count and classify cells , greatly improve the diagnosis rate; and can be integrated with the resources of high-level medical institutions, so that the overall diagnosis model tends to be standardized and unified, greatly improving the radiation effect of high-quality medical resources to primary medical institutions, and improving the level of differential diagnosis in primary hospitals. Therefore, The construction of an automatic identification system of cerebrospinal fluid cells based on deep learning is of great significance for improving the diagnosis rate of infectious diseases in the central nervous system, solving the problems of regional medical differences, low seniority and misdiagnosis by primary doctors, thus ultimately benefiting the majority of patients.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，根据本发明的技术方案及其发明构思加以等同替换或改变，都应涵盖在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. The equivalent replacement or change of the inventive concept thereof shall be included within the protection scope of the present invention.

Claims

1. a complex small sample cerebrospinal fluid cell identification and classification method, is characterized in that, comprises the following steps:

S1: use the microscope automatic scanning platform to acquire the image of the sample slide to obtain a complete image set of the cerebrospinal fluid cell slide with multiple cells;

S2: Preprocess the obtained image set, filter and denoise the image, remove irrelevant factors in the picture, and separate the cells that adhere to each other, and form the obtained sample image set into training set and test set in batches;

S3: Carry out model transfer training for small sample sets, and use deep learning networks trained in similar fields to perform transfer learning on small sample data sets;

S4: Use the BP algorithm to fine-tune the weights and thresholds of the trained model to further optimize the model;

S5: Input the test set into the model, and the output result is the cerebrospinal fluid cell identification result.

2. The method for identifying and classifying cerebrospinal fluid cells in a complex small sample according to claim 1, wherein in the S1, the cerebrospinal fluid cell glass slide is placed on an electric translation stage of a microscope, and a software system is used to quantify the cerebrospinal fluid cell slide. The scanning range is determined by the diagonal coordinate point positioning, the scanning range of the image is determined, and the size of the scanned image range is recorded. The software system platform is used to stitch the collected images to obtain a complete picture of the cell sample. Repeat this step for image acquisition.

3. a kind of complex small sample cerebrospinal fluid cell identification and classification method according to claim 1, is characterized in that, in described S2, the concrete step of preprocessing the image set is:

Step 1: For the irrelevant impurities in the background of the sample, firstly separate the background of the image, obtain the binary image by the maximum inter-class variance method, and use the morphological opening operation to smooth the contour of the target in the binary image. In the open operation of learning, the impurities in the background that are not the target can be removed, and finally the Canny boundary detection algorithm is used to obtain the contour edge information of the target;

Step 2: Use the concave point detection to segment the adhesion cells. The concave point of the adhesion cell refers to the local maximum curvature point in the concave area formed by the adhesion of two or more quasi-circular objects due to overlapping with each other. For near-circular images, there is no sudden change in curvature unless there are two or more cells;

Step 3: Ellipse fitting. In order to obtain the outline boundary of the sticking target lost due to sticking, the algorithm uses the prior knowledge that the target generally appears to be a circle, and uses the ellipse fitting method based on the least squares method for fitting to complete Adhesive segmentation.

4. a kind of complexity small sample cerebrospinal fluid cell identification and classification method according to claim 1, is characterized in that, adopts the multi-layer ResNet model trained in other fields in the described S3, intercepts the front of the fully connected layer of the model The output part sets three output nodes according to the required classification types, and then uses the pre-training migration method to take the parameters of the current multi-layer ResNet as the initial parameters of the present invention, and then use the image data processed in S2 to carry out The training process of the network is as follows:

1) Determine the number of nodes in the first layer of the network according to the dimension of the input data, that is, the number of nodes in the input layer;

2) Input data to the residual network unit. According to the characteristics of the ResNet network, that is, the identity mapping function of the residual network, the output of each module is the current input plus the residual, and the training data is used to train the network layer by layer;

3) Use the trained ResNet network for network transfer learning, and use its well-trained parameters as the initial training parameters of this model, which saves part of the training time and training samples, which is very suitable for training and learning with small samples.

5. a kind of complex small sample cerebrospinal fluid cell identification and classification method according to claim 1, is characterized in that, the concrete steps of optimizing model in described S4 are:

1) When the training is completed, the model is supervised by adding label data to the top layer of ResNet, that is, using the back propagation algorithm (BP) to fine-tune the relevant parameters of the network;

2) Input the classified labeled data into the top layer of ResNet, fine-tune the weight and threshold of ResNet through BP algorithm, and further reduce the training error and improve the accuracy of the transfer learning recognition model through supervised training.

6. a kind of complexity small sample cerebrospinal fluid cell identification and classification method according to claim 1, is characterized in that, in described S5, test set data is input in the classification model that is trained, after multi-layer ResNet mapping, The number of output layer nodes is the number of recognition states, and the input vector successfully activates the corresponding category nodes in the output layer.

7 . The method for identifying and classifying cerebrospinal fluid cells in a complex small sample according to claim 6 , wherein the monocytes in the class nodes in S5 are node 0, lymphocytes are node 1, and neutrophils are node 1. 8 . for node 2.