TWI630501B - Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set - Google Patents

Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set Download PDF

Info

Publication number
TWI630501B
TWI630501B TW105124089A TW105124089A TWI630501B TW I630501 B TWI630501 B TW I630501B TW 105124089 A TW105124089 A TW 105124089A TW 105124089 A TW105124089 A TW 105124089A TW I630501 B TWI630501 B TW I630501B
Authority
TW
Taiwan
Prior art keywords
cancer
machine learning
tumor marker
learning model
antigen
Prior art date
Application number
TW105124089A
Other languages
Chinese (zh)
Other versions
TW201804348A (en
Inventor
盧章智
陳春賢
王信堯
溫瀅皓
Original Assignee
長庚醫療財團法人林口長庚紀念醫院
長庚大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 長庚醫療財團法人林口長庚紀念醫院, 長庚大學 filed Critical 長庚醫療財團法人林口長庚紀念醫院
Priority to TW105124089A priority Critical patent/TWI630501B/en
Publication of TW201804348A publication Critical patent/TW201804348A/en
Application granted granted Critical
Publication of TWI630501B publication Critical patent/TWI630501B/en

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

一種癌症預測模型建立及其結合腫瘤標誌套組進行癌症檢測結果分析之方法:(A)輸入多個檢測受檢者蛋白質之腫瘤標誌套組檢驗結果及其相對應之癌症疾病狀態至一機器學習模組中;(B)在該模組中使用變量挑選方法,進行變量挑選,並選出分類效能最佳的數個變量;(C)使用挑選過後之變量、數值及癌症疾病狀態,藉由監督式機器學習之方法進行癌症預測模型的建立,接著再使用上述建立之癌症預測模型結合腫瘤標誌套組之檢驗結果即可進行癌症檢測結果分析。 A method for establishing a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set: (A) inputting a plurality of tumor marker sets for detecting a protein of a subject and the corresponding cancer disease state to a machine learning In the module; (B) use the variable selection method in the module, select the variables, and select the variables with the best classification efficiency; (C) use the selected variables, values and cancer disease status, by supervision The method of machine learning is used to establish a cancer prediction model, and then the cancer detection result can be analyzed by using the cancer prediction model established above in combination with the test result of the tumor marker set.

Description

癌症預測模型建立及其結合腫瘤標誌套組進行癌症檢測結果分析之方法 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set

一種癌症預測之方法,尤指一種癌症預測機器學習模型建立及其結合腫瘤標誌套組進行癌症檢測結果分析之方法,能透過癌症預測機器學習模型之建立,再結合腫瘤標誌套組中多個腫瘤標誌檢驗結果進行預測。 A method for predicting cancer, in particular, a method for establishing a cancer prediction machine learning model and a method for analyzing cancer detection results in combination with a tumor marker set, capable of establishing a machine learning model through cancer prediction, and combining multiple tumors in a tumor marker set The flag test results are predicted.

先前相關技術中,一般健康族群的癌症篩檢以現行我國推廣的四癌篩檢為主:口腔黏膜檢查、乳房攝影、糞便潛血檢查、子宮頸抹片。四癌篩檢分別針對口腔癌、乳癌、大腸直腸癌、以及子宮頸癌進行篩檢,可以在病患無明顯症狀的情況下篩檢出潛在的癌症。四癌篩檢的成效良好,然而上述的檢查方法只單單針對四種國人常發的特定單一癌症。此種檢查方法無法檢查出四癌以外種類的癌症。 In the related art, the cancer screening of the general healthy group is mainly based on the four cancer screenings currently promoted in China: oral mucosa examination, mammography, fecal occult blood test, and Pap smear. Four cancer screening tests for oral cancer, breast cancer, colorectal cancer, and cervical cancer, respectively, can screen out potential cancers without obvious symptoms. The four cancer screening tests have achieved good results. However, the above-mentioned examination methods are only for a single cancer that is common to four people. This type of examination cannot detect cancers of a type other than four cancers.

除了國家所推行的四癌篩檢之外,目前也有許多篩檢潛在癌症的方法,包含:篩檢肺癌的低劑量胸部電腦斷層、篩檢大腸直腸癌的大腸鏡、篩檢肝癌的腹部超音波等等。然而,這些方法也只能進行特定單一癌症的篩檢,若要進行多種類的癌症篩檢,則受檢者必須進行多次不同的檢驗或檢查,這種做法不但相當不便利、價錢高,而且可能使受檢者暴露在潛在的醫源性傷害及輻射量下。 In addition to the four cancer screenings implemented by the state, there are many methods for screening for potential cancers, including: low-dose chest computed tomography screening for lung cancer, colonoscopy for colorectal cancer screening, and abdominal ultrasound for screening liver cancer. and many more. However, these methods can only be screened for a specific single cancer. If multiple types of cancer screening are to be performed, the subject must perform multiple different tests or examinations. This is not only inconvenient, but also expensive. It may also expose the subject to potential iatrogenic damage and radiation.

根據上述缺失,先前技術仍有改善空間。 According to the above missing, the prior art still has room for improvement.

本發明係提供一種癌症預測模型建立之方法,至少包括下列步驟:(A)輸入多個檢測受檢者蛋白質之腫瘤標誌套組檢驗結果及其相對應之癌症疾病狀態至一機器學習模組中;(B)在該模組中使用變量挑選方法,進行變量挑選,並選出分類效能最佳的數個變量;及(C)使用挑選過後之變量、數值及癌症疾病狀態,藉由監督式機器學習之方法進行癌症預測模型的建立。 The present invention provides a method for establishing a cancer prediction model, comprising at least the following steps: (A) inputting a plurality of tumor marker set test results for detecting a test subject protein and corresponding cancer disease states to a machine learning module (B) use the variable selection method in the module, select the variables, and select the variables with the best classification efficiency; and (C) use the selected variables, values and cancer disease status, by supervised machine The method of learning is to establish a cancer prediction model.

其中,所述監督式機器學習之方法為邏輯式迴歸、K鄰近法、支持向量機、類神經網路學習、決策樹、貝氏決策法或上述之任意組合。 The supervised machine learning method is a logical regression, a K-neighbor method, a support vector machine, a neural network learning, a decision tree, a Bayesian decision method, or any combination thereof.

其中,所述之癌症疾病狀態為有癌症/無癌症的狀態分類、早期癌/晚期癌的狀態分類(如TNM腫瘤分期系統之腫瘤狀態分類)或癌症種類分類(如肝癌/肺癌/大腸直腸癌...等)。 Wherein, the cancer disease state is a state classification with cancer/no cancer, a state classification of early cancer/advance cancer (such as a tumor state classification of a TNM tumor staging system), or a classification of cancer species (such as liver cancer/lung cancer/colorectal cancer) ...Wait).

其中,所述癌症疾病狀態之判定日期與所述腫瘤標誌套組之檢驗日期,兩者相隔時間為1天~3年。 Wherein, the date of determination of the cancer disease state and the date of inspection of the tumor marker set are between 1 day and 3 years.

其中,所建立的機器學習模型可依據不同模組的癌症標誌物計算敏感度(Sensitivity)、特異度(Specificity)、陽性預測值(Positive Predictive Value,PPV)、陰性預測值(Negative Predictive Value,NPV)、準確度(Accuracy)、接受者操作曲線下面積(Area Under ROC Curve,AUC)、約登指數(Youden index)之統計指標。 Among them, the established machine learning model can calculate Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) according to cancer markers of different modules. ), Accuracy, Area under ROC Curve (AUC), and Youden index.

本發明另提供一種使用癌症預測模型結合腫瘤標誌套組進 行癌症檢測結果分析之方法,至少包括下列步驟:(A)提供新受檢者之檢體;(B)透過腫瘤標誌套組內含檢測蛋白質之複數腫瘤標誌對上述之檢體同時進行多個腫瘤標誌檢測;(C)輸入腫瘤標誌套組檢驗結果至上述之癌症預測模型,即可進行癌症檢測結果的比對運算及分析;(D)做出罹患癌症之風險預測,藉此可提醒受檢者或醫護端可進一步採取後續行動。 The invention further provides a cancer prediction model combined with a tumor marker kit The method for analyzing the results of cancer detection comprises at least the following steps: (A) providing a sample of a new subject; and (B) simultaneously performing a plurality of the above-mentioned samples through a plurality of tumor markers containing the detection protein in the tumor marker set. Tumor marker detection; (C) Entering the tumor marker set test results to the above cancer prediction model, the cancer test results can be compared and analyzed; (D) making a risk prediction for cancer, thereby reminding Further follow-up actions can be taken by the examiner or the medical terminal.

其中,所述之檢體可為人體之血液、尿液、唾液、汗液、糞便、胸水、腹水、腦脊隨液。 The sample may be human blood, urine, saliva, sweat, feces, pleural effusion, ascites, and cerebral ridge.

其中,所述之複數腫瘤標誌為甲型胎兒蛋白(Alpha Fetal Protein,AFP),癌症胚胎抗原(Carcinoembryonic Antigen,CEA),醣抗原19-9(Carbohydrate Antigen 19-9,CA19-9),細胞角質抗原(Cytokeratin Fragment 21-1,CYFRA21-1),鱗狀細胞癌抗原(Squamous Cell Carcinoma Antigen,SCC),攝護腺特異抗原(Prostate Specific Antigen,PSA),醣抗原15-3(Carbohydrate Antigen,CA15-3),醣抗原125(Carbohydrate Antigen 125,CA125),人類第四型泡疹病毒抗體(Epstein-Barr Virus IgA,EBV IgA),醣抗原27-29(Carbohydrate Antigen,CA27-29),貝他2微球蛋白(Beta-2-microglobulin),貝他人類絨毛膜激素(Beta-human Chorionic Gonadotropin,Beta-hCG),分化群抗原177(Cluster of Differentiation 177,CD 177),分化群抗原20(Cluster of Differentiation 20,CD 20),嗜鉻粒蛋白(Chromogranin A,CgA),人類副睪分泌蛋白4(Human Epididymis Secretory Protein 4,HE 4),乳酸去氫酶(Lactate Dehydrogenase,LDH),甲狀腺球蛋白(Thyroglobulin),神經元特異性烯醇化酶(Neuron-specific Enolase,NSE),核基質蛋白22(Nuclear Matrix Protein 22),細胞計畫性死亡配體1(Programmed Death Ligand 1,PD-L1)。 Wherein, the plurality of tumor markers are Alpha Fetal Protein (AFP), Carcinoembryonic Antigen (CEA), Carbohydrate Antigen 19-9 (CA19-9), cytokeratin Antigen (Cytokeratin Fragment 21-1, CYFRA21-1), Squamous Cell Carcinoma Antigen (SCC), Prostate Specific Antigen (PSA), Carbohydrate Antigen (CA15) -3), Carbohydrate Antigen 125 (CA125), Human Epstein-Barr Virus IgA (EBV IgA), Carbohydrate Antigen (CA27-29), Beta 2Beta-2-microglobulin, Beta-human Chorionic Gonadotropin (Beta-hCG), Cluster of Differentiation 177 (CD 177), Differentiation group antigen 20 (Cluster) Of Differentiation 20, CD 20), Chromogranin A (CgA), Human Epididymis Secretory Protein 4 (HE 4), Lactate Dehydrogenase (LDH), Thyroglobulin (Thyroglobulin), nerve NSE (Neuron-specific Enolase, NSE), nuclear matrix protein 22 (Nuclear Matrix Protein 22), program cell death ligand 1 (Programmed Death Ligand 1, PD-L1).

機器學習模型優點在於:本發明採用監督式的機器學習運算方法,能透過結合過去臨床大量數據建立癌症預測模型,可同時進行多種腫瘤標誌物之多數據的輔助判讀及分析,得以最大程度的從現有數據中,分析不同疾病狀態個案之腫瘤標誌分布上的差異,從整體數據分布樣貌中找出不同疾病狀態的分類依據,可提高預測正確性、時效性、經濟效益及重現性之效能。 The advantage of the machine learning model is that the present invention adopts a supervised machine learning operation method, and can establish a cancer prediction model by combining a large amount of clinical data in the past, and can simultaneously perform auxiliary reading and analysis of multiple tumor markers, so as to maximize the In the existing data, the differences in the distribution of tumor markers in different disease state cases are analyzed, and the classification basis of different disease states is found out from the overall data distribution appearance, which can improve the accuracy of prediction, timeliness, economic efficiency and reproducibility. .

腫瘤標誌套組使用多種腫瘤標誌物優點在於:首先,檢體採檢容易,無接觸輻射、不適感低,也沒有麻醉風險,亦可搭配目前推廣的微創手術進行檢體取樣,可以增加篩查的意願和大規模篩檢的可能性,亦可方便地在無症狀的一般族群中進行全種類的癌症篩檢。 The advantages of using a variety of tumor markers in tumor marker kits are: First, the specimens are easy to pick, have no contact radiation, have low discomfort, and have no risk of anesthesia. They can also be sampled with minimally invasive surgery that is currently promoted. The willingness to investigate and the possibility of large-scale screening can also facilitate the screening of all types of cancer in the asymptomatic general population.

第二,腫瘤標誌物可通過自動化系統進行檢測,透過自動化和嚴格的品管監控,使腫瘤標誌物的檢驗準確度跟精準度可以維持一定的水平。 Second, tumor markers can be detected by automated systems. Through automated and strict quality control, the accuracy and accuracy of tumor markers can be maintained at a certain level.

第三,腫瘤標誌物檢測結果的解釋是客觀的,可以防止判讀上可能的不一致。 Third, the interpretation of tumor marker test results is objective and can prevent possible inconsistencies in interpretation.

最後,各種腫瘤標誌物可以從單次的檢體採樣獲得與定量,可藉由不同的腫瘤標誌物組合來篩選不同類別的癌症,換言之,癌症篩查腫瘤標誌物不但安全、客觀、具有成本效益,且還能篩檢各種類別的癌症。 Finally, various tumor markers can be obtained and quantified from a single sample of the sample, and different types of cancer can be screened by different combinations of tumor markers. In other words, cancer screening for tumor markers is safe, objective, and cost-effective. And can also screen for various types of cancer.

第1圖係本發明之癌症預測機器學習模型建立方法之步驟流程圖 Figure 1 is a flow chart showing the steps of the method for establishing a cancer prediction machine learning model of the present invention.

第2圖係本發明之癌症預測機器學習模型結合腫瘤標誌套組進行癌症檢測結果分析之方法流程示意圖 2 is a schematic flow chart of a method for analyzing cancer detection results by combining a cancer prediction machine learning model of the present invention with a tumor marker set

第3A圖係各單一腫瘤標誌之接受者操作曲線(ROC curve)(男性) Figure 3A is the receiver operating curve (ROC curve) for each single tumor marker (male)

第3B圖係本發明結合複數腫瘤標誌之不同種類的監督式機器學習方法之接受者操作特徵曲線(ROC curve)(男性) Figure 3B is a receiver operating characteristic curve (ROC curve) of a different type of supervised machine learning method incorporating multiple tumor markers of the present invention (male)

第3C圖係各單一腫瘤標誌之接受者操作曲線(ROC curve)(女性) Figure 3C shows the receiver operating curve (ROC curve) for each single tumor marker (female)

第3D圖係本發明結合複數腫瘤標誌之不同種類的監督式機器學習方法之接受者操作特徵曲線(ROC curve)(女性) Figure 3D is a receiver operating characteristic curve (ROC curve) of a different type of supervised machine learning method combining multiple tumor markers of the present invention (female)

參閱第1圖,本發明係提供一種癌症預測機器學習模型建立之方法,至少包括下列步驟:(A)輸入多個檢測受檢者蛋白質之腫瘤標誌套組檢驗結果及其相對應之癌症疾病狀態至一機器學習模組中;(B)在該模組中使用變量挑選方法,進行變量挑選,並選出分類效能最佳的數個變量;及(C)使用挑選過後之變量、數值及癌症疾病狀態,藉由監督式機器學習之方法進行癌症預測模型的建立。 Referring to Fig. 1, the present invention provides a method for establishing a cancer prediction machine learning model, comprising at least the following steps: (A) inputting a plurality of tumor marker sets of test proteins for detecting a tester and corresponding cancer disease states To the machine learning module; (B) use the variable selection method in the module, select the variables, and select the variables with the best classification efficiency; and (C) use the selected variables, values and cancer diseases State, the establishment of a cancer prediction model by supervised machine learning.

其中,所述監督式機器學習之方法為邏輯式迴歸、K鄰近法、支持向量機、類神經網路學習、決策樹、貝氏決策法或上述之任意組合。 The supervised machine learning method is a logical regression, a K-neighbor method, a support vector machine, a neural network learning, a decision tree, a Bayesian decision method, or any combination thereof.

其中,所述之癌症疾病狀態為有癌症/無癌症的狀態分類、早期癌/晚期癌的狀態分類(如TNM腫瘤分期系統之腫瘤狀態分類)或癌症種類分類(如肝癌/肺癌/大腸直腸癌...等)。 Wherein, the cancer disease state is a state classification with cancer/no cancer, a state classification of early cancer/advance cancer (such as a tumor state classification of a TNM tumor staging system), or a classification of cancer species (such as liver cancer/lung cancer/colorectal cancer) ...Wait).

其中,所述癌症疾病狀態之判定日期與所述腫瘤標誌套組之檢驗日期,兩者相隔時間為1天~3年。 Wherein, the date of determination of the cancer disease state and the date of inspection of the tumor marker set are between 1 day and 3 years.

其中,所建立的機器學習模型可依據不同模組的癌症標誌物計算敏感度(Sensitivity)、特異度(Specificity)、陽性預測值(Positive Predictive Value,PPV)、陰性預測值(Negative Predictive Value,NPV)、準確度(Accuracy)、接受者操作曲線下面積(Area Under ROC curve,AUC)、約登指數(Youden index)之統計指標。 Among them, the established machine learning model can calculate Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) according to cancer markers of different modules. ), Accuracy, Area under ROC curve (AUC), and Youden index.

參閱第2圖,本發明另提供一種使用癌症預測機器學習模型結合腫瘤標誌套組進行癌症檢測結果分析之方法,至少包括下列步驟:(A)提供新受檢者之檢體;(B)透過腫瘤標誌套組內含檢測蛋白質之複數腫瘤標誌對上述之檢體同時進行多個腫瘤標誌檢測;(C)輸入腫瘤標誌套組檢驗結果至上述之癌症預測機器學習模型,即可進行癌症檢測結果的比對運算及分析;(D)做出罹患癌症之風險預測,藉此可提醒受檢者或醫護端可進一步採取後續行動。 Referring to FIG. 2, the present invention further provides a method for analyzing cancer detection results using a cancer prediction machine learning model in combination with a tumor marker set, comprising at least the following steps: (A) providing a sample of a new subject; (B) The tumor marker kit contains a plurality of tumor markers for detecting proteins, and simultaneously performs multiple tumor marker detection on the above-mentioned specimens; (C) inputting the tumor marker set test results to the cancer prediction machine learning model described above, and performing cancer detection results Comparison and analysis; (D) make a risk prediction for cancer, which can remind the subject or the medical end to further follow up.

其中,所述之檢體可為人體之血液、尿液、唾液、汗液、糞便、胸水、腹水、腦脊隨液。 The sample may be human blood, urine, saliva, sweat, feces, pleural effusion, ascites, and cerebral ridge.

其中,所述之複數腫瘤標誌為甲型胎兒蛋白(Alpha Fetal Protein,AFP),癌症胚胎抗原(Carcinoembryonic Antigen,CEA),醣抗原19-9(Carbohydrate Antigen 19-9,CA19-9),細胞角質抗原(Cytokeratin Fragment 21-1,CYFRA21-1),鱗狀細胞癌抗原(Squamous Cell Carcinoma Antigen,SCC),攝護腺特異抗原(Prostate Specific Antigen,PSA),醣抗原15-3(Carbohydrate Antigen,CA15-3),醣抗原125(Carbohydrate Antigen 125,CA125),人類第四型 泡疹病毒抗體(Epstein-Barr Virus IgA,EBV IgA),醣抗原27-29(Carbohydrate Antigen,CA27-29),貝他2微球蛋白(Beta-2-microglobulin),貝他人類絨毛膜激素(Beta-human Chorionic Gonadotropin,Beta-hCG),分化群抗原177(Cluster of Differentiation 177,CD 177),分化群抗原20(Cluster of Differentiation 20,CD 20),嗜鉻粒蛋白(Chromogranin A,CgA),人類副睪分泌蛋白4(Human Epididymis Secretory Protein 4,HE 4),乳酸去氫酶(Lactate Dehydrogenase,LDH),甲狀腺球蛋白(Thyroglobulin),神經元特異性烯醇化酶(Neuron-specific Enolase,NSE),核基質蛋白22(Nuclear Matrix Protein 22),細胞計畫性死亡配體1(Programmed Death Ligand 1,PD-L1)。 Wherein, the plurality of tumor markers are Alpha Fetal Protein (AFP), Carcinoembryonic Antigen (CEA), Carbohydrate Antigen 19-9 (CA19-9), cytokeratin Antigen (Cytokeratin Fragment 21-1, CYFRA21-1), Squamous Cell Carcinoma Antigen (SCC), Prostate Specific Antigen (PSA), Carbohydrate Antigen (CA15) -3), Carbohydrate Antigen 125 (CA125), human type IV Epstein-Barr Virus IgA (EBV IgA), Carbohydrate Antigen (CA27-29), Beta-2-microglobulin, Beta Human Human Chorionic Hormone ( Beta-human Chorionic Gonadotropin, Beta-hCG), Cluster of Differentiation 177 (CD 177), Cluster of Differentiation 20 (CD 20), Chromogranin A (CgA), Human Epididymis Secretory Protein 4 (HE 4), Lactate Dehydrogenase (LDH), Thyroglobulin, Neuron-specific Enolase (NSE) , Nuclear Matrix Protein 22, Programmed Death Ligand 1, PD-L1.

參閱第3A~3D圖,在本次實施例中,所使用的檢體為血液樣本,癌症檢測中係以資料探勘方法分析8種腫瘤標誌物,分別為AFP,CEA,CA19-9,CYFRA21-1,SCC,PSA,CA15-3,CA125,以下為操作流程: Referring to Figures 3A to 3D, in this embodiment, the sample used is a blood sample, and in the cancer detection, 8 tumor markers are analyzed by data exploration methods, respectively AFP, CEA, CA19-9, CYFRA21- 1, SCC, PSA, CA15-3, CA125, the following is the operation process:

1.受試者條件設定,包括納入及排除的條件及數目:本發明實施例之受試者為自費健檢選擇癌症標誌物篩檢套組之大於20歲成人。 1. Subject Condition Setting, including Conditions and Numbers of Inclusion and Exclusion: The subject of the present invention is an adult of a 20-year-old adult who selects a cancer marker screening kit for self-funded health testing.

2.設計及方法:主要測量值為八種癌症標誌之檢驗值,追蹤調查受測者在採檢後一年內,是否有新發之癌症及其種類。在資料整理完成之後,本實施例依此建立數個監督式學習模型,包含:邏輯式迴歸、K近鄰法、支持向量機。 2. Design and method: The main measurement value is the test value of eight cancer markers. The follow-up survey subjects have new cancers and their types within one year after the test. After the data is completed, the embodiment establishes several supervised learning models, including: logical regression, K-nearest neighbor, and support vector machine.

3.本實施例資料回溯期間:資料回溯其間自1999年1月1日至2013年12月31日。 3. During the retrospective period of the data in this example: the data is traced back from January 1, 1999 to December 31, 2013.

4.結果之評估及統計方法:本實施例計算各種不同腫瘤標 誌物數據的分布情形,並在模型建立前進行變量挑選,以選出分類效能最佳的數個變量。在本實施例中,變量效力以特徵曲線(ROC curve)計算其曲線下面積(AUC)以評估其效力,選出分類能力最佳的變量組合。此外,本實施例將以內部驗證的組別,驗證個模型的預測能力,並依此計算出分類器模型的敏感度、特異度、陽性預測值、陰性預測值、準確度等統計指標。 4. Evaluation of results and statistical methods: This example calculates various tumor markers The distribution of the target data, and the selection of variables before the model is established, to select the number of variables with the best classification efficiency. In the present embodiment, the variable effectiveness calculates the area under the curve (AUC) by the ROC curve to evaluate its effectiveness, and selects the combination of variables with the best classification ability. In addition, in this embodiment, the predictive ability of the model is verified by the internally verified group, and the statistical indicators such as the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of the classifier model are calculated accordingly.

第3A圖、第3B圖個別顯示各種單一不同的腫瘤標誌及不同的機器學習運算法的癌症篩檢效能(男性),其中,LR為邏輯式迴歸(logistic regression),KNN為k近鄰(K nearest neighbor),SVM為支持向量機(Support vector machine),接著從表一中可證明腫瘤標誌套組並配合機器學習運算法的使用,相較於使用單一腫瘤標誌顯著地提高癌症篩檢效能,表一癌症檢測中的腫瘤標誌係使用AFP,CEA,CA19-9,CYFRA21-1,SCC,PSA。 Figures 3A and 3B individually show the cancer screening efficacy (male) of various single different tumor markers and different machine learning algorithms, where LR is logistic regression and KNN is k nearest neighbor (K nearest Neighbor), SVM is a support vector machine (Support vector machine), and then from Table 1 can prove the tumor marker set and use the machine learning algorithm, compared to the use of a single tumor marker to significantly improve the effectiveness of cancer screening, The tumor markers in a cancer test use AFP, CEA, CA19-9, CYFRA21-1, SCC, PSA.

第3C圖、第3D圖個別顯示各種單一不同的腫瘤標誌及不同的機器學習運算法的癌症篩檢效能(女性),其中,LR為邏輯式迴歸(logistic regression),KNN為k近鄰(K nearest neighbor),SVM為支持向量機(Support vector machine),接著從表二中同樣可看到腫瘤標誌套組結合機器學習運算法的使用,相較於使用單一腫瘤標誌顯著地提高癌症篩檢效能,表二癌症檢測中的腫瘤標誌係使用AFP,CEA,CA19-9,CYFRA21-1,SCC,CA15-3,CA125。 Figures 3C and 3D show the cancer screening efficacy (female) of various single different tumor markers and different machine learning algorithms. Among them, LR is logistic regression, KNN is k nearest neighbor (K nearest Neighbors, SVM is a support vector machine. Then, from Table 2, the use of the tumor marker set combined with the machine learning algorithm can be seen. Compared with the use of a single tumor marker, the cancer screening performance is significantly improved. Table 2 shows the tumor markers in cancer detection using AFP, CEA, CA19-9, CYFRA21-1, SCC, CA15-3, CA125.

更進一步地,參見以下的表三及表四,比較機器學習運算法與傳統判讀法的效能差異。 Further, see Tables 3 and 4 below to compare the performance differences between machine learning algorithms and traditional interpretation methods.

表三揭露了機器學習運算法與傳統判讀法之成效(男性):機器學習運算法如支持向量機及k近鄰方法之效能都顯著地比傳統判讀法還要好。 Table 3 reveals the effectiveness of machine learning algorithms and traditional interpretation methods (male): the effectiveness of machine learning algorithms such as support vector machines and k-nearest neighbor methods is significantly better than traditional interpretation methods.

表四揭露了機器學習運算法與傳統判讀法之成效(女性):支持向量機、k近鄰及邏輯式迴歸都可提供比傳統判讀法更好之效能。 Table 4 reveals the effectiveness of machine learning algorithms and traditional interpretation methods (female): support vector machines, k-nearest neighbors, and logical regression provide better performance than traditional interpretation methods.

此表三及表四的結果顯示,不論對於男性還是女性,使用機器學習運算法進行腫瘤標誌套組數據的分析及學習,基本上都可以提高癌症篩檢的效能。 The results of Tables 3 and 4 show that the use of machine learning algorithms for the analysis and study of tumor marker sets data can improve the effectiveness of cancer screening, both for men and women.

綜上所述,本發明實施例能同時提高一般族群癌症篩檢的便利性、經濟性及正確性,藉由腫瘤標誌套組的單次多項腫瘤標誌檢驗,醫療人員得以從更多面向得知身體情況及可能的潛在癌症,受檢者不必再進行多次、不同項目的檢驗。腫瘤標誌套組可提高檢驗的時效性、同時也減少許多可能的醫源性傷害及輻射暴露,且由於腫瘤標誌套組包含相當多且大量的資訊,因此結合機器學習運算法形成癌症的預測模型,能進行多數據的結果解讀及判讀,在正確性、時效性及判讀結果重現性上,皆可提供可觀的改善。 In summary, the embodiments of the present invention can simultaneously improve the convenience, economy, and correctness of general population cancer screening, and the medical personnel can learn from more aspects through the single-multiple tumor marker test of the tumor marker set. The physical condition and possible potential cancer, the subject does not have to carry out multiple, different items of testing. Tumor marker sets can improve the timeliness of testing, while also reducing many possible iatrogenic injuries and radiation exposure, and because the tumor marker set contains a considerable amount of information, a predictive model for cancer formation in combination with machine learning algorithms The ability to interpret and interpret the results of multiple data can provide considerable improvement in correctness, timeliness and reproducibility of interpretation results.

需注意的是,上述實施例僅為例示性說明本發明之原理及其功效,而非用於限制本發明之範圍。任何熟於此項技術之人均可在不違背本發明之技術原理及精神下,對實施例作修改與變化。因此本發明之權 利保護範圍應如後述之申請專利範圍所述。 It is to be noted that the above-described embodiments are merely illustrative of the principles of the invention and its advantages, and are not intended to limit the scope of the invention. Modifications and variations of the embodiments can be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore the right of the present invention The scope of protection shall be as described in the scope of the patent application described later.

Claims (8)

一種癌症預測機器學習模型建立之方法,至少包括下列步驟:(A)輸入多個檢測受檢者蛋白質之腫瘤標誌套組檢驗結果及其相對應之癌症疾病狀態至一機器學習模組中;(B)在該模組中使用變量挑選方法,進行變量挑選,並選出分類效能最佳的數個變量;及(C)使用挑選過後之變量、數值及癌症疾病狀態,藉由監督式機器學習之方法進行癌症預測模型的建立。 A method for establishing a cancer prediction machine learning model comprises at least the following steps: (A) inputting a plurality of tumor marker set test results for detecting a subject protein and a corresponding cancer disease state to a machine learning module; B) using the variable selection method in the module, selecting the variables, and selecting the variables with the best classification efficiency; and (C) using the selected variables, values, and cancer disease states, by supervised machine learning Methods The establishment of a cancer prediction model was performed. 如申請專利範圍第1項所述之癌症預測機器學習模型建立之方法,其中,所述監督式機器學習之方法為邏輯式迴歸、K鄰近法、支持向量機、類神經網路學習、決策樹、貝氏決策法或上述之任意組合。 The method for establishing a cancer prediction machine learning model according to claim 1, wherein the supervised machine learning method is a logical regression, a K-neighbor method, a support vector machine, a neural network learning, a decision tree. , Bayesian decision making or any combination of the above. 如申請專利範圍第1項所述之癌症預測機器學習模型建立之方法,其中,所述之癌症疾病狀態為癌症/無癌症的狀態分類、早期癌/晚期癌的狀態分類或癌症種類分類。 The method for establishing a cancer prediction machine learning model according to claim 1, wherein the cancer disease state is a cancer/no cancer state classification, an early cancer/advance cancer state classification, or a cancer species classification. 如申請專利範圍第1項所述之癌症預測機器學習模型建立之方法,其中,所述癌症疾病狀態之判定日期與所述腫瘤標誌套組之檢驗日期,兩者相隔時間為1天~3年。 The method for establishing a cancer prediction machine learning model according to claim 1, wherein the date of determination of the cancer disease state and the date of inspection of the tumor marker set are 1 day to 3 years apart. . 如申請專利範圍第1項所述之癌症預測機器學習模型建立之方法,其中,所建立的機器學習模型可依據不同模組的癌症標誌物計算敏感度(Sensitivity)、特異度(Specificity)、陽性預測值(Positive Predictive Value,PPV)、陰性預測值(Negative Predictive Value,NPV)、準確度(Accuracy)、接受者操作曲線下面積(Area Under ROC Curve,AUC)、約登指數(Yonden index)之統計指標。 The method for establishing a cancer prediction machine learning model according to claim 1, wherein the established machine learning model can calculate sensitivity (Sensitivity), specificity (Specificity), and positive according to cancer markers of different modules. Positive Predictive Value (PPV), Negative Predictive Value (NPV), Accuracy, Area Under ROC Curve (AUC), Yoden (Yonden) Index) The statistical indicator. 一種使用如申請專利範圍第1項所述之癌症預測機器學習模型結合腫瘤標誌套組進行癌症篩檢之方法,至少包括下列步驟:(A)提供新受檢者之檢體;(B)透過腫瘤標誌套組內含檢測蛋白質之複數腫瘤標誌對上述之檢體同時進行多個腫瘤標誌檢測;(C)輸入腫瘤標誌套組檢驗結果至上述之癌症預測機器學習模型,即可進行癌症檢測結果的比對運算及分析;(D)做出罹患癌症之風險預測。 A method for performing cancer screening using a cancer prediction machine learning model as described in claim 1 in combination with a tumor marker kit, comprising at least the following steps: (A) providing a sample of a new subject; (B) The tumor marker kit contains a plurality of tumor markers for detecting proteins, and simultaneously performs multiple tumor marker detection on the above-mentioned specimens; (C) inputting the tumor marker set test results to the cancer prediction machine learning model described above, and performing cancer detection results Comparison and analysis; (D) make a risk prediction for cancer. 如申請專利範圍第6項所述之癌症預測機器學習模型結合腫瘤標誌套組進行癌症檢測結果分析之方法,其中,所述之檢體可為入體之血液、尿液、唾液、汗液、糞便、胸水、腹水、腦脊隨液。 The method for analyzing a cancer detection result according to the cancer prediction machine learning model described in claim 6 of the patent scope, wherein the sample may be blood, urine, saliva, sweat, and feces , pleural effusion, ascites, brain ridge with fluid. 如申請專利範圍第6項所述之癌症預測機器學習模型結合腫瘤標誌套組進行癌症檢測結果分析之方法,其中,所述之複數腫瘤標誌為甲型胎兒蛋白(Alpha Fetal Protein,AFP),癌症胚胎抗原(Carcinoembryonic Antigen,CEA),醣抗原19-9(Carbohydrate Antigen 19-9,CA19-9),細胞角質抗原(Cytokeratin Fragment 21-1,CYFRA21-1),鱗狀細胞癌抗原(Squamous Cell Carcinoma Antigen,SCC),攝護腺特異抗原(Prostate Specific Antigen,PSA),醣抗原15-3(Carbohydrate Antigen,CA15-3),醣抗原125(Carbohydrate Antigen 125,CA125),人類第四型泡疹病毒抗體(Epstein-Barr Virus IgA,EBV IgA),醣抗原27-29(Carbohydrate Antigen,CA27-29),貝他2微球蛋白(Beta-2-microglobulin),貝他人類絨毛膜激素(Beta-human Chorionic Gonadotropin,Beta-hCG),分化群抗原177(Cluster of Differentiation 177,CD 177),分化群抗原20(Cluster of Differentiation 20,CD 20),嗜鉻粒蛋白(Chromogranin A,CgA),人類副睪分泌蛋白4(Human Epididymis Secretory Protein 4,HE 4),乳酸去氫酶(Lactate Dehydrogenase,LDH),甲狀腺球蛋白(Thyroglobulin),神經元特異性烯醇化酶(Neuron-specific Enolase,NSE),核基質蛋白22(Nuclear Matrix Protein 22),細胞計畫性死亡配體1(Programmed Death Ligand 1,PD-L1)。 The method for analyzing cancer detection results by combining a cancer prediction machine learning model according to claim 6 of the patent application scope, wherein the plurality of tumor markers are Alpha Fetal Protein (AFP), cancer Embryo antigen (Carinoembryonic Antigen, CEA), Carbohydrate Antigen 19-9 (CA19-9), Cytokeratin Fragment 21-1 (CYFRA21-1), Squamous Cell Carcinoma Antigen, SCC), Prostate Specific Antigen (PSA), Carbohydrate Antigen (CA15-3), Carbohydrate Antigen 125 (CA125), Human Type IV Herpes Virus Antibody (Epstein-Barr Virus IgA, EBV IgA), Carbohydrate Antigen (CA27-29), Beta-2-microglobulin, Beta Human Human Chorionic Hormone (Beta-human) Chorionic Gonadotropin, Beta-hCG), Cluster of Differentiation 177 (CD 177), Cluster of Differentiation 20 (CD 20), Chromogranin A (CgA), Human Paracrine Secretion Protein 4 (Human Epididymis Secretory Protein 4, HE 4), Lactate Dehydrogenase (LDH), Thyroglobulin, Neuron-specific Enolase (NSE), Nuclear Matrix Protein 22 (Nuclear Matrix Protein 22), Programmed Death Ligand 1, PD-L1.
TW105124089A 2016-07-29 2016-07-29 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set TWI630501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW105124089A TWI630501B (en) 2016-07-29 2016-07-29 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW105124089A TWI630501B (en) 2016-07-29 2016-07-29 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set

Publications (2)

Publication Number Publication Date
TW201804348A TW201804348A (en) 2018-02-01
TWI630501B true TWI630501B (en) 2018-07-21

Family

ID=62014333

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105124089A TWI630501B (en) 2016-07-29 2016-07-29 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set

Country Status (1)

Country Link
TW (1) TWI630501B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11494698B2 (en) 2020-01-06 2022-11-08 Acer Incorporated Method and electronic device for selecting influence indicators by using automatic mechanism

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005901A1 (en) * 2018-06-30 2020-01-02 20/20 Genesystems, Inc Cancer classifier models, machine learning systems and methods of use
CN110957043A (en) * 2018-09-26 2020-04-03 金敏 Disease prediction system
CN110031624A (en) * 2019-02-28 2019-07-19 中国科学院上海高等研究院 Tumor markers detection system based on multiple neural networks classifier, method, terminal, medium
CN111583993A (en) * 2020-05-29 2020-08-25 杭州广科安德生物科技有限公司 Method for constructing mathematical model for in vitro cancer detection and application thereof
CN112185549B (en) * 2020-09-29 2022-08-02 郑州轻工业大学 Esophageal squamous carcinoma risk prediction system based on clinical phenotype and logistic regression analysis
TWI768713B (en) * 2021-02-17 2022-06-21 長庚大學 Cancer status prediction method
CN115798596B (en) * 2023-01-18 2023-10-13 安徽省立医院(中国科学技术大学附属第一医院) Tumor marker identification method based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070099219A1 (en) * 2003-07-21 2007-05-03 Aureon Laboratories, Inc. Systems and methods for treating, diagnosing and predicting the occurence of a medical condition
CN103268431A (en) * 2013-05-21 2013-08-28 中山大学 Cancer hypotype biomarker detecting system based on student t distribution
TW201339310A (en) * 2012-03-23 2013-10-01 Phalanx Biotech Group Inc Gene set for predicting post-surgery recurrence or metastasis risk in cancer patients and method thereof
CN103793600A (en) * 2014-01-16 2014-05-14 西安电子科技大学 Isolated component analysis and linear discriminant analysis combined cancer forecasting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070099219A1 (en) * 2003-07-21 2007-05-03 Aureon Laboratories, Inc. Systems and methods for treating, diagnosing and predicting the occurence of a medical condition
TW201339310A (en) * 2012-03-23 2013-10-01 Phalanx Biotech Group Inc Gene set for predicting post-surgery recurrence or metastasis risk in cancer patients and method thereof
CN103268431A (en) * 2013-05-21 2013-08-28 中山大学 Cancer hypotype biomarker detecting system based on student t distribution
CN103793600A (en) * 2014-01-16 2014-05-14 西安电子科技大学 Isolated component analysis and linear discriminant analysis combined cancer forecasting method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11494698B2 (en) 2020-01-06 2022-11-08 Acer Incorporated Method and electronic device for selecting influence indicators by using automatic mechanism

Also Published As

Publication number Publication date
TW201804348A (en) 2018-02-01

Similar Documents

Publication Publication Date Title
TWI630501B (en) Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set
US20200005901A1 (en) Cancer classifier models, machine learning systems and methods of use
Carlsson et al. Circulating tumor microemboli diagnostics for patients with non–small-cell lung cancer
Ali et al. A systematic review of automated melanoma detection in dermatoscopic images and its ground truth data
Wang et al. EUS-guided FNA for diagnosis of pancreatic cystic lesions: a meta-analysis
Hegde et al. Artificial intelligence in early diagnosis and prevention of oral cancer
Yazdani et al. Covid ct-net: Predicting covid-19 from chest ct images using attentional convolutional network
CN109072479A (en) Spontaneous pre-term risk is layered using circulation particle
CN113270188B (en) Method and device for constructing prognosis prediction model of patient after radical esophageal squamous carcinoma treatment
Ahmad et al. Classifying breast cancer types based on fine needle aspiration biopsy data using random forest classifier
US20180173847A1 (en) Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation
WO2022012280A1 (en) Peripheral blood tcr marker for lung cancer, detection kit therefor and application thereof
CN110114680A (en) For the composition of diagnosing, method and kit
CN105793710A (en) Compositions, methods and kits for diagnosis of lung cancer
CN115082437A (en) Tumor prediction system and method based on tongue picture image and tumor marker and application
CN115099331A (en) Auxiliary diagnosis system for malignant pleural effusion based on interpretable machine learning algorithm
CN111833963A (en) cfDNA classification method, device and application
Trivedi et al. Risk assessment for indeterminate pulmonary nodules using a novel, plasma-protein based biomarker assay
Elhadary et al. Revolutionizing chronic lymphocytic leukemia diagnosis: A deep dive into the diverse applications of machine learning
Shi et al. Machine learning prediction models for different stages of non-small cell lung cancer based on tongue and tumor marker: a pilot study
Kocher et al. Tumor burden of lung metastases at initial staging in breast cancer patients detected by artificial intelligence as a prognostic tool for precision medicine
CN110501443B (en) Novel biomarker for noninvasive identification/early warning of fatty liver cows
CN116047074B (en) Marker for diagnosing and/or predicting lung cancer, diagnostic model and construction method thereof
US11585816B2 (en) Automated method for assessing cancer risk using tissue samples, and system therefor
US9734122B2 (en) System, method and computer-accessible medium for evaluating a malignancy status in at-risk populations and during patient treatment management