CN110031624A

CN110031624A - Tumor markers detection system based on multiple neural networks classifier, method, terminal, medium

Info

Publication number: CN110031624A
Application number: CN201910149298.5A
Authority: CN
Inventors: 王晋; 陈晓东
Original assignee: Shanghai Advanced Research Institute of CAS; University of Chinese Academy of Sciences
Current assignee: Shanghai Advanced Research Institute of CAS; University of Chinese Academy of Sciences
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2019-07-19

Abstract

This application provides a tumor marker detection system, method, terminal, and medium based on a multi-neural network classifier, mainly using the data preprocessing method of principal component analysis and feature extraction, based on random forest, support vector machine, BP neural network, and limit The combined classifier model of the learning machine classifier model trains the data, and finally obtains a liver cancer tumor marker classifier whose accuracy, specificity and sensitivity meet the clinical diagnosis, which helps clinicians reduce the misdiagnosis rate in the initial diagnosis of liver cancer and improve liver cancer tumors. Marker detection accuracy.

Description

Tumor marker detection system, method, terminal, medium

技术领域technical field

本申请涉及神经网络分类器技术领域，特别是涉及基于多神经网络分类器的肿瘤标志物检测系统、方法、终端、介质。The present application relates to the technical field of neural network classifiers, and in particular, to a tumor marker detection system, method, terminal and medium based on multiple neural network classifiers.

背景技术Background technique

原发性肝癌是目前我国第4位的常见恶性肿瘤及第3位的肿瘤致死病因，严重威胁我国人民的生命和健康。目前肝癌早期诊断的检测手段主要分为两大类：Primary liver cancer is the fourth most common malignant tumor and the third leading cause of tumor death in my country, which seriously threatens the life and health of the Chinese people. At present, the detection methods for early diagnosis of liver cancer are mainly divided into two categories:

一种是以肝癌血清肿瘤标志物为参考指标的检测手段。我国最新的《原发性肝癌诊疗规范2017版》中指出，肝癌肿瘤标志物甲胎蛋白AFP是当前诊断肝癌常用而又重要的方法，临床上，当甲胎蛋白AFP的含量大于400ug/L，提示可能存在肝癌。尽管AFP对肝癌的诊断具有较高的敏感性和特异性，但仍有40％的早期肝癌和15％-20％的晚期肝癌患者可出现假阳性。另一种是以超声，计算机X射线，核磁共振成像，数字减影血管造影和肝病理穿刺等检测技术为代表的影像学，病理学手段。One is a detection method using serum tumor markers of liver cancer as a reference index. my country's latest "Primary Liver Cancer Diagnosis and Treatment Specifications 2017 Edition" pointed out that the liver cancer tumor marker alpha-fetoprotein AFP is a commonly used and important method for the diagnosis of liver cancer. suggest that there may be liver cancer. Although AFP has high sensitivity and specificity for the diagnosis of liver cancer, false positives can still occur in 40% of early stage liver cancer and 15%-20% of advanced liver cancer patients. The other is imaging and pathological means represented by ultrasound, computer X-ray, magnetic resonance imaging, digital subtraction angiography and liver pathological puncture.

但是，前一种检测手段主要依赖于医生的实际工作经验，需要经过长期的实践才能达到优秀的决策水平，受医生的主观经验和外界干扰因素的影响较大；而后一种检测手段中的大多数技术在肝癌的早期诊断能力有限，不仅对患者的身体有一定影响，而且价格高，不具有普适性。However, the former detection method mainly relies on the actual work experience of doctors, and requires long-term practice to achieve an excellent decision-making level, which is greatly affected by the subjective experience of doctors and external interference factors; Most technologies have limited ability to diagnose liver cancer at an early stage, which not only has a certain impact on the patient's body, but also is expensive and not universal.

肝癌血清肿瘤标志物的种类很多，但临床上没有任一单一肿瘤标志物可以确诊肝癌，每一种肿瘤标志物的存在在诊断上都有一定的参考价值，但也有自身的局限性。临床上，医生一般参考多种肿瘤标志物的参数进行联合检测诊断。人为经验的因素容易造成诊断上会出现误差。利用机器学习手段可以有效的提高多种肿瘤标志物联合诊断的准确度。基于机器学习的计算机辅助诊断算法近年来不断优化和完善，这为多肿瘤标志物联合诊断模型的构建带来了可能性。算法方面，国外有学者比较过179个不同分类器在121个不同数据集上的实际效果。研究结果表明：不同分类器在不同分类场景下的效果是不同的，比如随机森林RF平均来说最强，但也只在9.9％的数据集上拿到了第一；支持向量机SVM的平均水平紧随其后，在10.7％的数据集上拿到第一。There are many types of serum tumor markers for liver cancer, but no single tumor marker can diagnose liver cancer clinically. The existence of each tumor marker has certain reference value in diagnosis, but it also has its own limitations. Clinically, doctors generally refer to the parameters of multiple tumor markers for joint detection and diagnosis. The factors of human experience are easy to cause errors in diagnosis. The use of machine learning methods can effectively improve the accuracy of combined diagnosis of multiple tumor markers. Computer-aided diagnosis algorithms based on machine learning have been continuously optimized and improved in recent years, which brings the possibility of constructing a multi-tumor marker combined diagnosis model. In terms of algorithms, some foreign scholars have compared the actual effects of 179 different classifiers on 121 different datasets. The research results show that the effects of different classifiers in different classification scenarios are different. For example, the random forest RF is the strongest on average, but it only won the first place in 9.9% of the data sets; the average level of support vector machine SVM It is close behind, taking first place on 10.7% of the datasets.

因此，本领域亟需一种能够有效提高多种肿瘤标志物联合诊断的准确度的技术方案。Therefore, there is an urgent need in the art for a technical solution that can effectively improve the accuracy of combined diagnosis of multiple tumor markers.

申请内容Application content

鉴于以上所述现有技术的缺点，本申请的目的在于提供基于多神经网络分类器的肿瘤标志物检测系统、方法、终端、介质，用于解决现有技术中的问题。In view of the above-mentioned shortcomings of the prior art, the purpose of the present application is to provide a tumor marker detection system, method, terminal, and medium based on a multi-neural network classifier, which are used to solve the problems in the prior art.

为实现上述目的及其他相关目的，本申请的第一方面提供一种基于多神经网络分类器的肿瘤标志物检测系统，其包括：样本采集模块，用于采集肿瘤标志物检测样本的样本数据，其中所述样本数据包括测试集数据和训练集数据；数据预处理模块，用于根据所述样本数据筛选出多个与肿瘤具有高关联性的异常指标；样本统计分析模块，用于基于所述样本数据数据对单个异常指标做检测结果评估分析，和/或对多个联合的异常指标做检测结果评估分析；分类器模型训练模块，用于基于所述训练集数据对多个分类器模型进行模型训练；分类器模型评估与测试模块，用于基于所述测试集数据对训练后的分类器模型进行测试，并将测试结果与所述样本统计分析模块的评估分析结果做优劣比较，据以判断被测试的分类器模型的有效性；分类器诊断方法应用模块，用于利用被判断为有效的分类器模型对肿瘤标志物数据进行检测，据以输出诊断结果信息。In order to achieve the above object and other related objects, a first aspect of the present application provides a tumor marker detection system based on a multi-neural network classifier, which includes: a sample collection module for collecting sample data of tumor marker detection samples, The sample data includes test set data and training set data; a data preprocessing module is used to filter out a plurality of abnormal indicators that are highly correlated with tumors according to the sample data; a sample statistical analysis module is used to The sample data data is used to evaluate and analyze the detection results of a single abnormal index, and/or to evaluate and analyze the detection results of multiple joint abnormal indicators; the classifier model training module is used to perform a plurality of classifier models based on the training set data. Model training; a classifier model evaluation and testing module is used to test the trained classifier model based on the test set data, and compare the test results with the evaluation and analysis results of the sample statistical analysis module. to judge the validity of the tested classifier model; the classifier diagnosis method application module is used to detect the tumor marker data by using the classifier model judged to be valid, so as to output the diagnostic result information.

于本申请的第一方面的一些实施例中，所述样本采集模块采集临床真实的样本数据且对所采集的样本数据进行分类。In some embodiments of the first aspect of the present application, the sample collection module collects clinical real sample data and classifies the collected sample data.

于本申请的第一方面的一些实施例中，所述肿瘤标志物包括肝癌肿瘤标志物；其中，所述样本数据的分类方式包括：将样本数据分为肝癌组样本数据和非肝癌组样本数据，和/或分为肝癌组样本数据、肝病组样本数据和健康组样本数据。In some embodiments of the first aspect of the present application, the tumor markers include liver cancer tumor markers; wherein, the classification method of the sample data includes: dividing the sample data into liver cancer group sample data and non-liver cancer group sample data. , and/or divided into liver cancer group sample data, liver disease group sample data and healthy group sample data.

于本申请的第一方面的一些实施例中，所述数据预处理模块用于基于主成分分析算法进行特征选择，以对根据所述样本数据最初筛选出的多个异常指标进行降维处理，并获得所述与肿瘤具有高关联性的异常指标。In some embodiments of the first aspect of the present application, the data preprocessing module is configured to perform feature selection based on a principal component analysis algorithm, so as to perform dimensionality reduction processing on a plurality of abnormal indicators initially screened according to the sample data, And obtain the abnormal index with high correlation with the tumor.

于本申请的第一方面的一些实施例中，所述与肿瘤具有高关联性的异常指标包括与肝癌肿瘤具有高关联性的异常指标，其包括甲胎蛋白，还包括：异常凝血酶原、癌胚抗原、糖类抗原199、糖类抗原242、糖类抗原211、糖类抗原125、唾液酸、或者铁蛋白。In some embodiments of the first aspect of the present application, the abnormal indicators with high correlation with tumors include abnormal indicators with high correlation with liver cancer tumors, which include alpha-fetoprotein, and also include: abnormal prothrombin, Carcinoembryonic antigen, carbohydrate antigen 199, carbohydrate antigen 242, carbohydrate antigen 211, carbohydrate antigen 125, sialic acid, or ferritin.

于本申请的第一方面的一些实施例中，检测结果评估分析的评估指标包括灵敏度、特异度、及准确度；其中，灵敏度表示实际为阳性的样本中判断为阳性的比例，特异度表示实际为阴性的样本中判断为阴性的比例，准确度表示真阳性和真阴性人数占受试总人数的比例。In some embodiments of the first aspect of the present application, the evaluation indicators for the evaluation and analysis of the test results include sensitivity, specificity, and accuracy; wherein, the sensitivity represents the proportion of samples that are actually positive, and the specificity represents the actual positive ratio. The proportion of negative samples judged to be negative, and the accuracy indicates the proportion of true positives and true negatives to the total number of subjects tested.

于本申请的第一方面的一些实施例中，所述分类器模型包括：随机森林模型、支持向量机模型、BP神经网络模型、极限学习机模型。In some embodiments of the first aspect of the present application, the classifier model includes: a random forest model, a support vector machine model, a BP neural network model, and an extreme learning machine model.

于本申请的第一方面的一些实施例中，所述样本统计分析模块根据ROC曲线从单个异常指标中选取部分或全部的异常指标作为联合的异常指标。In some embodiments of the first aspect of the present application, the sample statistical analysis module selects some or all of the abnormal indicators from the single abnormal indicators as the combined abnormal indicators according to the ROC curve.

为实现上述目的及其他相关目的，本申请的第二方面提供一种基于多神经网络分类器的肿瘤标志物检测方法，其包括：采集肿瘤标志物检测样本的样本数据，其中所述样本数据包括测试集数据和训练集数据；根据所述样本数据筛选出多个与肿瘤具有高关联性的异常指标；基于所述样本数据数据对单个异常指标做检测结果评估分析，和/或对多个联合的异常指标做检测结果评估分析，以生成对应的评估分析结果；基于所述训练集数据对多个分类器模型进行模型训练；基于所述测试集数据对训练后的分类器模型进行测试，并将测试结果与所述评估分析结果做优劣比较，据以判断被测试的分类器模型的有效性；利用被判断为有效的分类器模型对肿瘤标志物数据进行检测，据以输出诊断结果信息。In order to achieve the above object and other related objects, a second aspect of the present application provides a method for detecting tumor markers based on a multi-neural network classifier, which includes: collecting sample data of a tumor marker detection sample, wherein the sample data includes: Test set data and training set data; screen out a plurality of abnormal indicators with high correlation with the tumor according to the sample data; perform detection result evaluation and analysis on a single abnormal indicator based on the sample data data, and/or combine multiple abnormal indicators The abnormal index of the test results is evaluated and analyzed to generate corresponding evaluation and analysis results; model training is performed on multiple classifier models based on the training set data; based on the test set data, the trained classifier models are tested, and Compare the test results with the evaluation and analysis results to judge the validity of the tested classifier model; use the classifier model judged to be effective to detect the tumor marker data, and output the diagnostic result information accordingly .

为实现上述目的及其他相关目的，本申请的第三方面提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现所述基于多神经网络分类器的肿瘤标志物检测方法。In order to achieve the above object and other related objects, a third aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the multi-neural network classifier-based algorithm is implemented. Tumor marker detection methods.

为实现上述目的及其他相关目的，本申请的第四方面提供一种检测终端，包括：处理器及存储器；所述存储器用于存储计算机程序，所述处理器用于执行所述存储器存储的计算机程序，以使所述检测终端执行所述基于多神经网络分类器的肿瘤标志物检测方法。In order to achieve the above purpose and other related purposes, a fourth aspect of the present application provides a detection terminal, comprising: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , so that the detection terminal executes the multi-neural network classifier-based tumor marker detection method.

如上所述，本申请的基于多神经网络分类器的肿瘤标志物检测系统、方法、终端、介质，具有以下有益效果：本申请提供基于多神经网络分类器的肝癌肿瘤标志物检测方案，采用主成分分析和特征提取的数据预处理方法，基于随机森林，支持向量机，BP神经网络，以及极限学习机分类器模型的组合分类器模型对数据训练，最后得到一个准确度，特异性，灵敏性满足临床诊断的肝癌肿瘤标志物分类器，帮助临床医生在肝癌初诊中降低误诊率，提高肝癌肿瘤标志物检测准确度。As mentioned above, the multi-neural network classifier-based tumor marker detection system, method, terminal, and medium of the present application have the following beneficial effects: The data preprocessing method of component analysis and feature extraction, based on random forest, support vector machine, BP neural network, and the combined classifier model of extreme learning machine classifier model, train the data, and finally get an accuracy, specificity, sensitivity The liver cancer tumor marker classifier that meets the clinical diagnosis can help clinicians reduce the misdiagnosis rate in the initial diagnosis of liver cancer and improve the detection accuracy of liver cancer tumor markers.

附图说明Description of drawings

图1显示为本申请一实施例中基于多神经网络分类器的肿瘤标志物检测系统的示意图。FIG. 1 is a schematic diagram of a tumor marker detection system based on a multi-neural network classifier according to an embodiment of the present application.

图2显示为本申请一实施例中基于6种肝癌肿瘤标志物所绘制的ROC曲线示意图。FIG. 2 shows a schematic diagram of ROC curves drawn based on six liver cancer tumor markers in an example of the present application.

图3显示为本申请一实施例中基于多神经网络分类器的肿瘤标志物检测方法的示意图。FIG. 3 is a schematic diagram of a method for detecting tumor markers based on a multi-neural network classifier according to an embodiment of the present application.

图4显示为本申请一实施例中检测终端的结构示意图。FIG. 4 is a schematic structural diagram of a detection terminal in an embodiment of the present application.

具体实施方式Detailed ways

以下通过特定的具体实例说明本申请的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本申请的其他优点与功效。本申请还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本申请的精神下进行各种修饰或改变。需说明的是，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。The embodiments of the present application are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present application from the contents disclosed in this specification. The present application can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other under the condition of no conflict.

需要说明的是，在下述描述中，参考附图，附图描述了本申请的若干实施例。应当理解，还可使用其他实施例，并且可以在不背离本申请的精神和范围的情况下进行机械组成、结构、电气以及操作上的改变。下面的详细描述不应该被认为是限制性的，并且本申请的实施例的范围仅由公布的专利的权利要求书所限定。这里使用的术语仅是为了描述特定实施例，而并非旨在限制本申请。空间相关的术语，例如“上”、“下”、“左”、“右”、“下面”、“下方”、“下部”、“上方”、“上部”等，可在文中使用以便于说明图中所示的一个元件或特征与另一元件或特征的关系。It should be noted that, in the following description, reference is made to the accompanying drawings, which describe several embodiments of the present application. It is to be understood that other embodiments may be utilized and mechanical, structural, electrical, as well as operational changes may be made without departing from the spirit and scope of the present application. The following detailed description should not be considered limiting, and the scope of embodiments of the present application is limited only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the application. Spatially related terms, such as "upper," "lower," "left," "right," "below," "below," "lower," "above," "upper," etc., may be used in the text for ease of description The relationship of one element or feature shown in the figures to another element or feature.

在本申请中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”、“固定”、“固持”等术语应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本申请中的具体含义。In this application, unless otherwise expressly specified and limited, terms such as "installation", "connection", "connection", "fixing", "fixing" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a It is a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, or it can be the internal communication between two components. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific situations.

再者，如同在本文中所使用的，单数形式“一”、“一个”和“该”旨在也包括复数形式，除非上下文中有相反的指示。应当进一步理解，术语“包含”、“包括”表明存在所述的特征、操作、元件、组件、项目、种类、和/或组，但不排除一个或多个其他特征、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。此处使用的术语“或”和“和/或”被解释为包括性的，或意味着任一个或任何组合。因此，“A、B或C”或者“A、B和/或C”意味着“以下任一个：A；B；C；A和B；A和C；B和C；A、B和C”。仅当元件、功能或操作的组合在某些方式下内在地互相排斥时，才会出现该定义的例外。Also, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context dictates otherwise. It should be further understood that the terms "comprising", "comprising" indicate the presence of a stated feature, operation, element, component, item, kind, and/or group, but do not exclude one or more other features, operations, elements, components, The existence, appearance or addition of items, categories, and/or groups. The terms "or" and "and/or" as used herein are to be construed to be inclusive or to mean any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A, B and C" . Exceptions to this definition arise only when combinations of elements, functions, or operations are inherently mutually exclusive in some way.

鉴于上述种种技术问题，本申请提供基于多神经网络分类器的肿瘤标志物检测系统、方法、终端、介质，来有效解决该些难题。本申请的主要思想旨在提供基于多神经网络分类器的肝癌肿瘤标志物检测方案，采用主成分分析和特征提取的数据预处理方法，基于随机森林，支持向量机，BP神经网络，以及极限学习机分类器模型的组合分类器模型对数据训练，最后得到一个准确度，特异性，灵敏性满足临床诊断的肝癌肿瘤标志物分类器，帮助临床医生在肝癌初诊中降低误诊率，提高肝癌肿瘤标志物检测准确度。In view of the above technical problems, the present application provides a tumor marker detection system, method, terminal, and medium based on a multi-neural network classifier to effectively solve these problems. The main idea of this application is to provide a multi-neural network classifier-based detection scheme for liver cancer tumor markers, data preprocessing methods using principal component analysis and feature extraction, based on random forests, support vector machines, BP neural networks, and extreme learning The combined classifier model of the machine classifier model trains the data, and finally obtains a liver cancer tumor marker classifier whose accuracy, specificity and sensitivity meet the clinical diagnosis, which helps clinicians reduce the misdiagnosis rate in the initial diagnosis of liver cancer and improve liver cancer tumor markers. object detection accuracy.

如图1所示，展示本申请一实施例中基于多神经网络分类器的肿瘤标志物检测系统的示意图。所述系统包括样品采集模块11、数据预处理模块12、样本统计分析模块13、分类器模型训练模块14、分类器模型评估与测试模块15、分类器诊断方法应用模块16。As shown in FIG. 1 , a schematic diagram of a tumor marker detection system based on a multi-neural network classifier in an embodiment of the present application is shown. The system includes a sample collection module 11 , a data preprocessing module 12 , a sample statistical analysis module 13 , a classifier model training module 14 , a classifier model evaluation and testing module 15 , and a classifier diagnosis method application module 16 .

样品采集模块11用于采集肿瘤标志物检测样本的样本数据。The sample collection module 11 is used for collecting sample data of the tumor marker detection sample.

在一实施例中，为确保系统的训练效果，肝癌肿瘤标志物检测样本优选满足数据源为真实数据以及对样本数据进行分类的要求。具体而言，肝癌肿瘤标志物检测样本来源于真实案例，其相比于通过机器学习手段扩增的数据，能够确保检测系统的检测结果的真实性。另外，可将肝癌肿瘤标志物检测样本分为肝癌组和非肝癌组两大类，或者可分为肝癌组、肝病组和健康组这三类。In one embodiment, in order to ensure the training effect of the system, the detection samples of liver cancer tumor markers preferably meet the requirements that the data source is real data and that the sample data is classified. Specifically, the detection samples of liver cancer tumor markers are derived from real cases, which can ensure the authenticity of the detection results of the detection system compared with the data amplified by machine learning methods. In addition, the detection samples of liver cancer tumor markers can be divided into two categories: a liver cancer group and a non-liver cancer group, or can be divided into three categories: a liver cancer group, a liver disease group, and a healthy group.

值得注意的是，需保证每组的样本数量不能过低，组与组之间的样本数量尽量保持同一数量级。同时，将样本划分为训练集和测试集，训练集和测试集的比例优选为7:3。测试集和训练集中都需要包含样本所有类别。It is worth noting that the number of samples in each group should not be too low, and the number of samples between groups should be kept in the same order of magnitude as possible. Meanwhile, the samples are divided into training set and test set, and the ratio of training set and test set is preferably 7:3. Both the test set and the training set need to contain all classes of samples.

数据预处理模块12用于根据所述样本数据筛选出多个与肝癌肿瘤具有高关联性的异常指标。The data preprocessing module 12 is configured to screen out a plurality of abnormal indicators that are highly correlated with liver cancer tumors according to the sample data.

在一实施例中，数据预处理模块在筛选出异常指标前，先剔除与受试者无关的属性，从而提升筛选结果的准确性。于本实施例中，数据预处理模块执行了多轮筛选并最终确定与肝癌肿瘤具有高关联性的6个异常指标。In one embodiment, the data preprocessing module removes attributes irrelevant to subjects before screening out abnormal indicators, thereby improving the accuracy of screening results. In this embodiment, the data preprocessing module performs multiple rounds of screening and finally determines 6 abnormal indicators that are highly correlated with liver cancer tumors.

具体的，在剔除无关属性数据后，先筛选出多于20种的异常指标。利用多重填补法对数据进行缺失值的填充，并利用挖掘关联规则算法，从先前筛选的20种以上的异常指标中筛选出与原发性肝癌关联性最强的约为10种的异常指标，最后利用主成分分析算法进行特征选择以对数据进行降维处理。Specifically, after eliminating irrelevant attribute data, more than 20 abnormal indicators are first screened. The data were filled with missing values by the multiple imputation method, and the mining association rule algorithm was used to screen out about 10 abnormal indicators with the strongest correlation with primary liver cancer from the previously screened more than 20 abnormal indicators. Finally, the principal component analysis algorithm is used for feature selection to reduce the dimension of the data.

于本实施例中，最终筛选出的肝癌肿瘤标志物有：甲胎蛋白(AFP)，其他辅助参考的肿瘤标志物有异常凝血酶原(PIVKA-II)，癌胚抗原(CEA))，糖类抗原199(CA199)，糖类抗原242(CA242)，糖类抗原211(CA211)，糖类抗原125，唾液酸，铁蛋白等。In this example, the final screening of liver cancer tumor markers are: alpha-fetoprotein (AFP), other auxiliary reference tumor markers are abnormal prothrombin (PIVKA-II), carcinoembryonic antigen (CEA)), sugar Class antigen 199 (CA199), carbohydrate antigen 242 (CA242), carbohydrate antigen 211 (CA211), carbohydrate antigen 125, sialic acid, ferritin, etc.

需要说明的是，本实施例中所述的多重填补法是指由包含多个插补值的向量代替每一个缺失值的过程，也即，用一系列可能的值来替换每一个缺失值。本实施例中的挖掘关联规则算法例如为Apriori算法，其核心思想是通过候选集生成和情节的向下封闭检测两个阶段来挖掘频繁项集。It should be noted that the multiple imputation method described in this embodiment refers to the process of replacing each missing value with a vector containing multiple imputed values, that is, replacing each missing value with a series of possible values. The algorithm for mining association rules in this embodiment is, for example, the Apriori algorithm, the core idea of which is to mine frequent itemsets through two stages of candidate set generation and plot downward closure detection.

样本统计分析模块13用于基于所述样本数据数据对单个异常指标做检测结果评估分析，和/或对多个联合的异常指标做检测结果评估分析。The sample statistical analysis module 13 is configured to perform detection result evaluation and analysis on a single abnormality index based on the sample data data, and/or perform detection result evaluation and analysis on a plurality of combined abnormality indicators.

在一实施例中，利用统计学软件spss12.0对选中的肿瘤标志物样本数据分别进行描述性分析。描述性分析是指对所收集的数据进行分析，得出反映客观现象的各种数量特征的一种分析方法，它包括数据的集中趋势分析，数据离散程度分析，数据的频数分布分析等等。In one embodiment, descriptive analysis is performed on the selected tumor marker sample data respectively using statistical software spss12.0. Descriptive analysis refers to an analysis method that analyzes the collected data and obtains various quantitative characteristics that reflect objective phenomena. It includes data central tendency analysis, data dispersion degree analysis, data frequency distribution analysis, and so on.

具体而言，对单一肿瘤标志物检测和联合检测分别进行分析。采用非参数检验来判断不同组之间的差异是否具有统计学意义，以α＝0.05为检验标准。以诊断灵敏度，特异度，准确度和ROC曲线作为评价指标。计算公式如下文的公式1)～3)所示：Specifically, single tumor marker assays and combined assays were analyzed separately. A nonparametric test was used to judge whether the difference between different groups was statistically significant, with α=0.05 as the test standard. The diagnostic sensitivity, specificity, accuracy and ROC curve were used as evaluation indicators. The calculation formulas are shown in the following formulas 1) to 3):

灵敏度TPR＝TP/(TP+FN)×100％；公式1)Sensitivity TPR=TP/(TP+FN)×100%; Formula 1)

特异度TNR＝TN/(TN+FP)×100％；公式2)Specificity TNR=TN/(TN+FP)×100%; Formula 2)

准确度＝(TP+TN)/(TP+TN+FP+FN)×100％；公式3)Accuracy=(TP+TN)/(TP+TN+FP+FN)×100%; Equation 3)

其中，灵敏度是指实际为阳性的样本中判断为阳性的比例，特异度是指实际为阴性的样本中判断为阴性的比例，准确度是指真阳性和真阴性人数占受试人数的比例。在公式中，TP表示真阳性，TN表示真阴性，FP表示假阳性，FN表示假阴性。Among them, sensitivity refers to the proportion of samples that are actually positive, specificity refers to the proportion of samples that are actually negative, and accuracy refers to the proportion of true positives and true negatives in the number of subjects. In the formula, TP means true positive, TN means true negative, FP means false positive, and FN means false negative.

在一实施例中，所述样本统计分析模块根据ROC曲线从单个异常指标中选取部分或全部的异常指标作为联合的异常指标。ROC曲线指受试者工作特征曲线(receiveroperating characteristic curve)，是反映敏感性和特异性连续变量的综合指标，是用构图法揭示敏感性和特异性的相互关系，它通过将连续变量设定出多个不同的临界值，从而计算出一系列敏感性和特异性，再以敏感性为纵坐标、(1-特异性)为横坐标绘制成曲线，曲线下面积越大，诊断准确性越高。在ROC曲线上，最靠近坐标图左上方的点为敏感性和特异性均较高的临界值。In one embodiment, the sample statistical analysis module selects some or all of the abnormality indexes from the single abnormality indexes as the combined abnormality indexes according to the ROC curve. The ROC curve refers to the receiver operating characteristic curve, which is a comprehensive index reflecting the continuous variables of sensitivity and specificity. It uses the composition method to reveal the relationship between sensitivity and specificity. Multiple different critical values are used to calculate a series of sensitivity and specificity, and then draw a curve with sensitivity as the ordinate and (1-specificity) as the abscissa. The larger the area under the curve, the higher the diagnostic accuracy. . On the ROC curve, the point closest to the upper left of the graph is the critical value with high sensitivity and specificity.

值得注意的是，每一种肿瘤标志物的存在在诊断上虽然都有一定的参考价值，但也有自身的局限性，目前临床上并没有任一单一肿瘤标志物能够确诊肝癌。而本申请提出AFP，异常凝血酶原，癌胚抗原，糖类抗原199，糖类抗原242，糖类抗原211等多个肿瘤标志物联合检测，提高了肝癌诊断的准确度，计算机辅助诊断手段可以帮助临床医生减少漏诊和误诊，对早期肝癌的发现有极高的临床价值。It is worth noting that although the existence of each tumor marker has certain reference value in diagnosis, it also has its own limitations. At present, no single tumor marker can diagnose liver cancer in clinical practice. The present application proposes the combined detection of multiple tumor markers such as AFP, abnormal prothrombin, carcinoembryonic antigen, carbohydrate antigen 199, carbohydrate antigen 242, carbohydrate antigen 211, etc., which improves the accuracy of liver cancer diagnosis and provides a computer-aided diagnostic method. It can help clinicians reduce missed diagnosis and misdiagnosis, and has extremely high clinical value for the detection of early liver cancer.

分类器模型训练模块14用于基于所述训练集数据对多个分类器模型进行模型训练。The classifier model training module 14 is configured to perform model training on a plurality of classifier models based on the training set data.

在一实施例中，基于经数据预处理模块做预处理后的数据，对随机森林模型、支持向量机模型、BP神经网络模型、极限学习机模型等分类器模型进行模型训练。其中，随机森林模型基于Bootstrap方法进行重采样；支持向量机算法的实现函数模型为C-SVC；BP神经网络利用遗传算法来对网络进行改善，神经网络的输入层为5-10个神经元，隐含层为15-20个神经元，输出层为1个神经元。In one embodiment, based on the data preprocessed by the data preprocessing module, model training is performed on classifier models such as random forest model, support vector machine model, BP neural network model, and extreme learning machine model. Among them, the random forest model is resampled based on the Bootstrap method; the implementation function model of the support vector machine algorithm is C-SVC; the BP neural network uses the genetic algorithm to improve the network, and the input layer of the neural network is 5-10 neurons. The hidden layer is 15-20 neurons and the output layer is 1 neuron.

在一实施例中，根据不同类别的样本数据设定不同的期望输出值。以肝癌组样本数据、肝病组样本数据和健康组样本数据为例，健康组的期望输出值为0.1，肝病组的期望输出值为0.5，肝癌组的期望输出值为0.9，经过多次迭代后得到训练好的模型。In one embodiment, different expected output values are set according to different types of sample data. Taking the sample data of the liver cancer group, the sample data of the liver disease group and the sample data of the healthy group as examples, the expected output value of the healthy group is 0.1, the expected output value of the liver disease group is 0.5, and the expected output value of the liver cancer group is 0.9. Get the trained model.

值得注意的是，由于不同分类器在不同分类场景下的效果是不同的，比如随机森林RF平均来说最强，但也只在9.9％的数据集上拿到了第一；支持向量机SVM的平均水平紧随其后，在10.7％的数据集上拿到第一。因此，本申请提供基于多神经网络分类器的肝癌肿瘤标志物辅助诊断方法在准确度上优于基于单一神经网络模型的肝癌肿瘤标志物辅助诊断。It is worth noting that due to the different effects of different classifiers in different classification scenarios, for example, random forest RF is the strongest on average, but it only won the first place in 9.9% of the data sets; support vector machine SVM's The average is a close second, taking first place on 10.7% of the datasets. Therefore, the present application provides a method for assisted diagnosis of liver cancer tumor markers based on a multi-neural network classifier, which is superior in accuracy to the assisted diagnosis of liver cancer tumor markers based on a single neural network model.

分类器模型评估与测试模块15用于基于所述测试集数据对训练后的分类器模型进行测试，并将测试结果与所述样本统计分析模块的评估分析结果做优劣比较，据以判断被测试的分类器模型的有效性。The classifier model evaluation and testing module 15 is used to test the trained classifier model based on the test set data, and compare the test results with the evaluation and analysis results of the sample statistical analysis module, so as to judge the quality of the classifier model. The validity of the tested classifier model.

具体的，对于每一个训练好的分类器模型，需要在测试集数据上进行验证，当准确度指标大于经所述样本统计分析模块输出的评估分析结果时，则认为该分类器模型有效，否则调整参数，重新通过分类器模型评估与测试模块15训练新的分类器模型。Specifically, for each trained classifier model, it needs to be verified on the test set data. When the accuracy index is greater than the evaluation analysis result output by the sample statistical analysis module, the classifier model is considered to be valid, otherwise Adjust the parameters and retrain a new classifier model through the classifier model evaluation and testing module 15 .

分类器诊断方法应用模块16用于利用被判断为有效的分类器模型对肿瘤标志物数据进行检测，据以输出诊断结果信息。例如：病人或者医生可以将临床检测的肝癌肿瘤标志物数据作为本申请提供的分类器模型的输入数据，由分类器模型自动给出诊断结果信息以供参考。The classifier diagnosis method application module 16 is used for detecting the tumor marker data by using the classifier model that is judged to be valid, and outputting diagnosis result information accordingly. For example, a patient or a doctor can use the clinically detected liver cancer tumor marker data as the input data of the classifier model provided in this application, and the classifier model automatically gives the diagnosis result information for reference.

应理解以上系统的各个模块的划分仅仅是一种逻辑功能的划分，实际实现时可以全部或部分集成到一个物理实体上，也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现；也可以全部以硬件的形式实现；还可以部分模块通过处理元件调用软件的形式实现，部分模块通过硬件的形式实现。例如，分类器模型训练模块可以为单独设立的处理元件，也可以集成在上述系统的某一个芯片中实现，此外，也可以以程序代码的形式存储于上述系统的存储器中，由上述系统的某一个处理元件调用并执行以上分类器模型训练模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起，也可以独立实现。这里所述的处理元件可以是一种集成电路，具有信号的处理能力。在实现过程中，上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be understood that the division of each module of the above system is only a division of logical functions, and may be fully or partially integrated into a physical entity in actual implementation, or may be physically separated. And these modules can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some modules can also be implemented in the form of calling software through processing elements, and some modules can be implemented in hardware. For example, the classifier model training module can be a separately established processing element, or can be integrated into a certain chip of the above-mentioned system. In addition, it can also be stored in the memory of the above-mentioned system in the form of program code. A processing element invokes and executes the functions of the above classifier model training module. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together, and can also be implemented independently. The processing element described here may be an integrated circuit with signal processing capability. In the implementation process, each step of the above-mentioned method or each of the above-mentioned modules can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.

例如，以上这些模块可以是被配置成实施以上方法的一个或多个集成电路，例如：一个或多个特定集成电路(Application Specific Integrated Circuit，简称ASIC)，或，一个或多个微处理器(digital signal processor，简称DSP)，或，一个或者多个现场可编程门阵列(Field Programmable Gate Array，简称FPGA)等。再如，当以上某个模块通过处理元件调度程序代码的形式实现时，该处理元件可以是通用处理器，例如中央处理器(Central Processing Unit，简称CPU)或其它可以调用程序代码的处理器。再如，这些模块可以集成在一起，以片上系统(system-on-a-chip，简称SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), or one or more microprocessors ( digital signal processor, referred to as DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array, referred to as FPGA) and the like. For another example, when one of the above modules is implemented in the form of processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).

为便于本领域技术人员理解，下文以某肝胆外科医院于2017年3月至2018年10月间的多份病例做为研究对象。In order to facilitate the understanding of those skilled in the art, a number of cases in a hepatobiliary surgery hospital from March 2017 to October 2018 are taken as the research objects.

样本采集模块采集139份原发性肝癌病例和100份非原发性肝癌病例(其中包括20份良性肿瘤病例和80份其他肝脏疾病病例)，其中，选取120例病例作为训练集(其中原发性肝癌病例70份，非原发性肝癌病例50份)，剩余119例病例作为测试集(其中原发性肝癌病例69份，非原发性肝癌病例50份)。The sample collection module collects 139 primary liver cancer cases and 100 non-primary liver cancer cases (including 20 benign tumor cases and 80 other liver disease cases), among which 120 cases are selected as the training set (including primary liver cancer cases). The remaining 119 cases were used as the test set (including 69 primary liver cancer cases and 50 non-primary liver cancer cases).

数据预处理模块首先剔除与患者无关的属性，筛选出20种以上异常指标。利用多重填补法对少部分数据进行缺失值的填充，并利用Apriori关联算法得到与原发性肝癌关联性最强的10种左右的异常指标，最后利用特征选择和主成分分析算法对数据进行了降维处理，最终筛选出的肝癌肿瘤标志物有甲胎蛋白(AFP)、癌胚抗原(CEA)、异常凝血酶原(PIVKA-II)、糖类抗原199(CA199)、糖类抗原242(CA242)、以及糖类抗原211(CA211)。The data preprocessing module first removes attributes irrelevant to patients, and selects more than 20 abnormal indicators. The missing values were filled in a small part of the data by the multiple imputation method, and the Apriori correlation algorithm was used to obtain about 10 abnormal indicators with the strongest correlation with primary liver cancer. Finally, the data were analyzed by the feature selection and principal component analysis algorithm. Dimensional reduction treatment, the final screening of liver cancer tumor markers are alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), abnormal prothrombin (PIVKA-II), carbohydrate antigen 199 (CA199), carbohydrate antigen 242 ( CA242), and carbohydrate antigen 211 (CA211).

样本统计分析模块利用统计学软件spss12.0对选中的肿瘤标志物样本数据分别描述性分析。对单一肿瘤标志物检测和联合检测分别进行分析。采用非参数检验来判断不同组之间的差异是否具有统计学意义，以α＝0.05为检验标准。以诊断灵敏度，特异度，准确度和ROC曲线作为评价指标。The sample statistical analysis module uses the statistical software spss12.0 to descriptively analyze the selected tumor marker sample data. The single tumor marker detection and combined detection were analyzed separately. A nonparametric test was used to judge whether the difference between different groups was statistically significant, with α=0.05 as the test standard. The diagnostic sensitivity, specificity, accuracy and ROC curve were used as evaluation indicators.

以BP神经网络为例进行分析，如图2所示，展示本实施例中对6种肝癌肿瘤标志物为研究对象所绘制的ROC曲线。其中，对角线21为参考线，曲线22表示糖类抗原211，曲线23表示癌坯抗原，曲线24表示糖类抗原242，曲线25表示糖类抗原199，曲线26表示甲胎蛋白，曲线27表示异常凝血酶原。Taking the BP neural network as an example for analysis, as shown in FIG. 2 , the ROC curves drawn by the six liver cancer tumor markers as the research objects in this example are shown. Among them, the diagonal line 21 is the reference line, the curve 22 represents the carbohydrate antigen 211, the curve 23 represents the cancer blank antigen, the curve 24 represents the carbohydrate antigen 242, the curve 25 represents the carbohydrate antigen 199, the curve 26 represents the alpha-fetoprotein, and the curve 27 Indicates abnormal prothrombin.

比较各条ROC曲线下的面积大小可知，异常凝血酶原和甲胎蛋白的ROC曲线，即曲线27和曲线26下的面积最大，所以对肝癌的诊断价值最大；而糖类抗原242和糖类抗原211的ROC曲线，即曲线24和曲线22接近对角线21，对联合检测肝癌的诊断价值不大，因此，联合检测可摒弃这两个指标作为训练输入。联合检测组与各单项指标检测结果比较，差异均有统计学意义(P<0.05)，下表1是样本的评价指标表。Comparing the area under each ROC curve, it can be seen that the ROC curves of abnormal prothrombin and alpha-fetoprotein, that is, the area under curve 27 and curve 26 are the largest, so they have the greatest diagnostic value for liver cancer; while carbohydrate antigen 242 and carbohydrate The ROC curve of antigen 211, that is, curve 24 and curve 22 are close to the diagonal line 21, and have little diagnostic value for combined detection of liver cancer. Therefore, combined detection can discard these two indicators as training input. Compared with the test results of each single index in the joint detection group, the difference was statistically significant (P<0.05). Table 1 below is the evaluation index table of the samples.

表1样本统计分析表Table 1 Sample Statistical Analysis Table

分类器模型训练模块基于经数据预处理模块做预处理后的数据，对随机森林模型、支持向量机模型、BP神经网络模型、极限学习机模型等分类器模型进行模型训练。其中，随机森林模型基于Bootstrap方法进行重采样；支持向量机算法的实现函数模型为C-SVC；BP神经网络利用遗传算法来对网络进行改善，神经网络的输入层为5-10个神经元，隐含层为15-20个神经元，输出层为1个神经元。The classifier model training module conducts model training on classifier models such as random forest model, support vector machine model, BP neural network model, and extreme learning machine model based on the data preprocessed by the data preprocessing module. Among them, the random forest model is resampled based on the Bootstrap method; the implementation function model of the support vector machine algorithm is C-SVC; the BP neural network uses the genetic algorithm to improve the network, and the input layer of the neural network is 5-10 neurons. The hidden layer is 15-20 neurons and the output layer is 1 neuron.

根据不同类别的样本数据设定不同的期望输出值。以肝癌组样本数据、肝病组样本数据和健康组样本数据为例，健康组的期望输出值为0.1，肝病组的期望输出值为0.5，肝癌组的期望输出值为0.9，经过多次迭代后得到训练好的模型。训练好的模型需要根据ROC曲线选择诊断价值最高的肿瘤标志物作为联合检测指标，本实施例选用甲胎蛋白和异常凝血酶原作为联合检测的输入指标。Different expected output values are set according to different types of sample data. Taking the sample data of the liver cancer group, the sample data of the liver disease group and the sample data of the healthy group as examples, the expected output value of the healthy group is 0.1, the expected output value of the liver disease group is 0.5, and the expected output value of the liver cancer group is 0.9. Get the trained model. The trained model needs to select the tumor marker with the highest diagnostic value as the joint detection index according to the ROC curve. In this embodiment, alpha-fetoprotein and abnormal prothrombin are selected as the input index of the joint detection.

分类器模型评估与测试模块对于每一个训练好的分类器模型，需要在测试集数据上进行验证，当准确度指标大于经所述样本统计分析模块输出的评估分析结果时，则认为该分类器模型有效，否则调整参数，重新通过分类器模型评估与测试模块训练新的分类器模型。最终的分类器模型是基于分类器模型训练模块输出的四种分类器，利用决策树思想构建而成。The classifier model evaluation and testing module For each trained classifier model, it needs to be verified on the test set data. When the accuracy index is greater than the evaluation analysis result output by the sample statistical analysis module, the classifier is considered to be The model is valid, otherwise adjust the parameters, and retrain a new classifier model through the classifier model evaluation and testing module. The final classifier model is constructed based on the four classifiers output by the classifier model training module using the idea of decision tree.

于本实施例中，四种分类器在测试集数据上的表现如表2所示。将表2与上文中的表1做比较，由于表2中的准确度已高于表1中的准确度，故认为该分类器模型是有效的。In this embodiment, the performance of the four classifiers on the test set data is shown in Table 2. Comparing Table 2 with Table 1 above, since the accuracy in Table 2 is higher than that in Table 1, the classifier model is considered to be effective.

分类器模型classifier model 特异性specificity 灵敏度Sensitivity 准确度Accuracy BP神经网络BP neural network 98.60％98.60% 76.314％76.314% 89.92％89.92% 支持向量机SVMSupport Vector Machine SVM 98％98% 95.65％95.65% 96.64％96.64% 随机森林RFRandom Forest RF 100％100% 97.1097.10 98.32％98.32% 极限学习机ELMExtreme Learning Machine ELM 90％90% 91.30％91.30% 90.76％90.76%

分类器诊断方法应用模块利用被判断为有效的分类器模型对肿瘤标志物数据进行检测，据以输出诊断结果信息。例如：病人或者医生可以将临床检测的肝癌肿瘤标志物数据作为本申请提供的分类器模型的输入数据，由分类器模型自动给出诊断结果信息以供参考。The classifier diagnosis method application module uses the classifier model judged to be valid to detect the tumor marker data, and outputs the diagnosis result information accordingly. For example, a patient or a doctor can use the clinically detected liver cancer tumor marker data as the input data of the classifier model provided in this application, and the classifier model automatically gives the diagnosis result information for reference.

如图3所示，展示本申请一实施例中基于多神经网络分类器的肿瘤标志物检测方法的流程示意图。As shown in FIG. 3 , a schematic flowchart of a method for detecting tumor markers based on a multi-neural network classifier according to an embodiment of the present application is shown.

在一些实施方式中，所述方法可应用于控制器，例如：ARM控制器、FPGA控制器、SoC控制器、DSP控制器、或者MCU控制器等等。在一些实施方式中，所述方法也可应用于包括存储器、存储控制器、一个或多个处理单元(CPU)、外设接口、RF电路、音频电路、扬声器、麦克风、输入/输出(I/O)子系统、显示屏、其他输出或控制设备，以及外部端口等组件的计算机；所述计算机包括但不限于如台式电脑、笔记本电脑、平板电脑、智能手机、智能电视、个人数字助理(Personal Digital Assistant，简称PDA)等个人电脑。在另一些实施方式中，所述方法还可应用于服务器，所述服务器可以根据功能、负载等多种因素布置在一个或多个实体服务器上，也可以由分布的或集中的服务器集群构成。In some embodiments, the method may be applied to a controller, such as an ARM controller, an FPGA controller, a SoC controller, a DSP controller, or an MCU controller, among others. In some embodiments, the method may also be applied to include a memory, a memory controller, one or more processing units (CPUs), peripheral interfaces, RF circuits, audio circuits, speakers, microphones, input/output (I/ O) Computers with components such as subsystems, display screens, other output or control devices, and external ports; such computers include, but are not limited to, such as desktop computers, notebook computers, tablet computers, smart phones, smart TVs, personal digital assistants (Personal Digital Assistants) Digital Assistant, referred to as PDA) and other personal computers. In other embodiments, the method can also be applied to a server, and the server can be arranged on one or more entity servers according to various factors such as function and load, and can also be composed of a distributed or centralized server cluster.

于本实施例中，所述方法包括步骤S31、步骤S32、步骤S33、步骤S34、步骤S35、步骤S36。In this embodiment, the method includes step S31, step S32, step S33, step S34, step S35, and step S36.

在步骤S31中，采集肿瘤标志物检测样本的样本数据，其中所述样本数据包括测试集数据和训练集数据。In step S31, sample data of the tumor marker detection sample is collected, wherein the sample data includes test set data and training set data.

在步骤S32中，根据所述样本数据筛选出多个与肿瘤具有高关联性的异常指标。In step S32, a plurality of abnormal indicators with high correlation with the tumor are screened out according to the sample data.

在步骤S33中，基于所述样本数据数据对单个异常指标做检测结果评估分析，和/或对多个联合的异常指标做检测结果评估分析，以生成对应的评估分析结果。In step S33, based on the sample data data, a detection result evaluation and analysis is performed on a single abnormality index, and/or a detection result evaluation and analysis is performed on a plurality of combined abnormality indicators, so as to generate a corresponding evaluation and analysis result.

在步骤S34中，基于所述训练集数据对多个分类器模型进行模型训练。In step S34, model training is performed on a plurality of classifier models based on the training set data.

在步骤S35中，基于所述测试集数据对训练后的分类器模型进行测试，并将测试结果与所述评估分析结果做优劣比较，据以判断被测试的分类器模型的有效性。In step S35, the trained classifier model is tested based on the test set data, and the test results are compared with the evaluation and analysis results, so as to judge the validity of the tested classifier model.

在步骤S36中，利用被判断为有效的分类器模型对肿瘤标志物数据进行检测，据以输出诊断结果信息。In step S36, the tumor marker data is detected by using the classifier model that is judged to be valid, so as to output diagnosis result information.

需要说明的是，本实施例的基于多神经网络分类器的肿瘤标志物检测方法的实施方式，与上文中基于多神经网络分类器的肿瘤标志物检测系统的实施方式类似，故不再赘述。It should be noted that the implementation of the tumor marker detection method based on the multi-neural network classifier in this embodiment is similar to the implementation of the tumor marker detection system based on the multi-neural network classifier above, so it will not be repeated.

另外，本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过计算机程序相关的硬件来完成。前述的计算机程序可以存储于一计算机可读存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。In addition, those skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by hardware related to computer programs. The aforementioned computer program may be stored in a computer-readable storage medium. When the program is executed, the steps including the above method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

如图4所示，展示本申请另一实施例中检测终端的结构示意图。本实例提供的检测终端，包括：处理器41、存储器42、收发器43、通信接口44和系统总线45；存储器42和通信接口44通过系统总线45与处理器41和收发器43连接并完成相互间的通信，存储器42用于存储计算机程序，通信接口44和收发器43用于和其他设备进行通信，处理器41用于运行计算机程序，使检测终端执行如上检测方法的各个步骤。As shown in FIG. 4 , a schematic structural diagram of a detection terminal in another embodiment of the present application is shown. The detection terminal provided in this example includes: a processor 41, a memory 42, a transceiver 43, a communication interface 44, and a system bus 45; the memory 42 and the communication interface 44 are connected to the processor 41 and the transceiver 43 through the system bus 45 and complete each other The memory 42 is used to store the computer program, the communication interface 44 and the transceiver 43 are used to communicate with other devices, and the processor 41 is used to run the computer program, so that the detection terminal executes each step of the above detection method.

上述提到的系统总线可以是外设部件互连标准(Peripheral ComponentInterconnect，简称PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture，简称EISA)总线等。该系统总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。通信接口用于实现数据库访问装置与其他设备(例如客户端、读写库和只读库)之间的通信。存储器可能包含随机存取存储器(Random Access Memory，简称RAM)，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The system bus mentioned above may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA for short) bus or the like. The system bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface is used to realize the communication between the database access device and other devices (eg client, read-write library and read-only library). The memory may include random access memory (Random Access Memory, RAM for short), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(Digital Signal Processing，简称DSP)、专用集成电路(Application SpecificIntegrated Circuit，简称ASIC)、现场可编程门阵列(Field－Programmable Gate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.

综上所述，本申请提供基于多神经网络分类器的肿瘤标志物检测系统、方法、终端、介质，本申请提供基于多神经网络分类器的肝癌肿瘤标志物检测方案，采用主成分分析和特征提取的数据预处理方法，基于随机森林，支持向量机，BP神经网络，以及极限学习机分类器模型的组合分类器模型对数据训练，最后得到一个准确度，特异性，灵敏性满足临床诊断的肝癌肿瘤标志物分类器，帮助临床医生在肝癌初诊中降低误诊率，提高肝癌肿瘤标志物检测准确度。所以，本申请有效克服了现有技术中的种种缺点而具高度产业利用价值。To sum up, the present application provides a tumor marker detection system, method, terminal, and medium based on a multi-neural network classifier. The extracted data preprocessing method is based on random forest, support vector machine, BP neural network, and the combined classifier model of the extreme learning machine classifier model to train the data, and finally obtain an accuracy, specificity, and sensitivity that meet the requirements of clinical diagnosis. The liver cancer tumor marker classifier helps clinicians reduce the misdiagnosis rate in the initial diagnosis of liver cancer and improve the detection accuracy of liver cancer tumor markers. Therefore, the present application effectively overcomes various shortcomings in the prior art and has high industrial application value.

上述实施例仅例示性说明本申请的原理及其功效，而非用于限制本申请。任何熟悉此技术的人士皆可在不违背本申请的精神及范畴下，对上述实施例进行修饰或改变。因此，举凡所属技术领域中具有通常知识者在未脱离本申请所揭示的精神与技术思想下所完成的一切等效修饰或改变，仍应由本申请的权利要求所涵盖。The above-mentioned embodiments merely illustrate the principles and effects of the present application, but are not intended to limit the present application. Anyone skilled in the art can make modifications or changes to the above embodiments without departing from the spirit and scope of the present application. Therefore, all equivalent modifications or changes made by those with ordinary knowledge in the technical field without departing from the spirit and technical idea disclosed in this application should still be covered by the claims of this application.

Claims

1. a tumor marker detection system based on a multi-neural network classifier, is characterized in that, comprising:

a sample collection module for collecting sample data of tumor marker detection samples, wherein the sample data includes test set data and training set data;

a data preprocessing module, configured to screen out a plurality of abnormal indicators that are highly correlated with the tumor according to the sample data;

A sample statistical analysis module, configured to perform detection result evaluation and analysis on a single abnormal index based on the sample data data, and/or perform detection result evaluation and analysis on a plurality of combined abnormal indicators;

A classifier model training module for performing model training on multiple classifier models based on the training set data;

The classifier model evaluation and testing module is used to test the trained classifier model based on the test set data, and compare the test results with the evaluation and analysis results of the sample statistical analysis module, so as to judge the the validity of the tested classifier model;

The classifier diagnosis method application module is used for detecting the tumor marker data by using the classifier model judged to be effective, so as to output the diagnosis result information.

2 . The system according to claim 1 , wherein the sample collection module collects clinical real sample data and classifies the collected sample data. 3 .

3. The system according to claim 2, wherein the tumor markers comprise liver cancer tumor markers; wherein, the classification method of the sample data comprises: dividing the sample data into liver cancer group sample data and non-liver cancer group Sample data, and/or divided into liver cancer group sample data, liver disease group sample data and healthy group sample data.

4 . The system according to claim 1 , wherein the data preprocessing module is used to perform feature selection based on a principal component analysis algorithm, so as to perform dimension reduction on a plurality of abnormal indicators initially screened out according to the sample data. 5 . treatment, and obtain the abnormal index with high correlation with the tumor.

5. The system according to claim 4, wherein the abnormal index with high correlation with tumor comprises abnormal index with high correlation with liver cancer tumor, which includes alpha-fetoprotein, and also includes: abnormal thrombin Pro, carcinoembryonic antigen, carbohydrate antigen 199, carbohydrate antigen 242, carbohydrate antigen 211, carbohydrate antigen 125, sialic acid, or ferritin.

6. system according to claim 1, is characterized in that, the evaluation index of test result evaluation analysis comprises sensitivity, specificity, and accuracy; Wherein, sensitivity represents the ratio that is judged to be positive in the sample that is actually positive, specificity Represents the proportion of samples that are actually negative, and accuracy represents the ratio of true positives and true negatives to the total number of subjects tested.

7. The system according to claim 1, wherein the classifier model comprises: a random forest model, a support vector machine model, a BP neural network model, and an extreme learning machine model.

8 . The system according to claim 1 , comprising: the sample statistical analysis module selects some or all of the abnormal indicators from the single abnormal indicators as the joint abnormal indicators according to the ROC curve. 9 .

9. A method for detecting tumor markers based on a multi-neural network classifier, comprising:

collecting sample data of tumor marker detection samples, wherein the sample data includes test set data and training set data;

Screening out a plurality of abnormal indicators that are highly correlated with the tumor according to the sample data;

Based on the sample data data, the detection result evaluation analysis is performed on a single abnormal index, and/or the detection result evaluation analysis is performed on a plurality of combined abnormal indicators, so as to generate a corresponding evaluation analysis result;

Model training is performed on multiple classifier models based on the training set data;

Test the trained classifier model based on the test set data, and compare the test results with the evaluation analysis results, so as to judge the validity of the tested classifier model;

The tumor marker data is detected by using the classifier model that is judged to be effective, so as to output the diagnosis result information.

10 . The method according to claim 9 , wherein the method comprises: collecting clinical real sample data and classifying the collected sample data. 11 .

11. The method according to claim 10, wherein the tumor markers comprise liver cancer tumor markers; wherein, the classification method of the sample data comprises: dividing the sample data into liver cancer group sample data and non-liver cancer group Sample data, and/or divided into liver cancer group sample data, liver disease group sample data and healthy group sample data.

12. The method according to claim 9, wherein the method comprises: performing feature selection based on a principal component analysis algorithm, so as to perform dimensionality reduction processing on a plurality of abnormal indicators initially screened according to the sample data, and The abnormal index with high correlation with the tumor is obtained.

13. The method according to claim 12, wherein the abnormal index with high correlation with tumor comprises an abnormal index with high correlation with liver cancer tumor, which comprises alpha-fetoprotein, and further comprises: abnormal thrombin Pro, carcinoembryonic antigen, carbohydrate antigen 199, carbohydrate antigen 242, carbohydrate antigen 211, carbohydrate antigen 125, sialic acid, or ferritin.

14. method according to claim 9, is characterized in that, the evaluation index of test result evaluation analysis comprises sensitivity, specificity, and accuracy; Wherein, sensitivity represents the ratio that is judged to be positive in actual positive samples, specificity Represents the proportion of samples that are actually negative, and accuracy represents the ratio of true positives and true negatives to the total number of subjects tested.

15. The method according to claim 9, wherein the classifier model comprises: a random forest model, a support vector machine model, a BP neural network model, and an extreme learning machine model.

16 . The method according to claim 9 , wherein the method comprises: selecting part or all of the abnormality indexes from the single abnormality indexes as the joint abnormality indexes according to the ROC curve. 17 .

17. A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the multi-neural network classifier-based tumor according to any one of claims 9 to 16 is implemented. Marker detection methods.

18. A detection terminal, comprising: a processor and a memory;

the memory is used to store computer programs;

The processor is configured to execute the computer program stored in the memory, so that the terminal executes the method for detecting tumor markers based on a multi-neural network classifier according to any one of claims 9 to 16.