CN115128285A - Kit and system for identifying and evaluating thyroid follicular tumor by protein combination - Google Patents

Kit and system for identifying and evaluating thyroid follicular tumor by protein combination Download PDF

Info

Publication number
CN115128285A
CN115128285A CN202211046085.8A CN202211046085A CN115128285A CN 115128285 A CN115128285 A CN 115128285A CN 202211046085 A CN202211046085 A CN 202211046085A CN 115128285 A CN115128285 A CN 115128285A
Authority
CN
China
Prior art keywords
proteins
combination
model
protein
follicular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211046085.8A
Other languages
Chinese (zh)
Other versions
CN115128285B (en
Inventor
郭天南
孙耀庭
王赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Westlake University
Original Assignee
Westlake University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westlake University filed Critical Westlake University
Priority to CN202211046085.8A priority Critical patent/CN115128285B/en
Publication of CN115128285A publication Critical patent/CN115128285A/en
Application granted granted Critical
Publication of CN115128285B publication Critical patent/CN115128285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Hematology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Urology & Nephrology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Cell Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Food Science & Technology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Hospice & Palliative Care (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Oncology (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention relates to a kit comprising a combination of proteins. The invention also relates to application of the protein combination in preparing a kit for differential evaluation of thyroid follicular tumors. The invention also relates to a system for the differential evaluation of thyroid follicular tumors, which comprises a substance for detecting the relative expression amount of the protein combination, a data processing device and an output device. According to the invention, 123 highly credible protein candidate pools are found according to TMT labeled proteome data of adult thyroid follicular adenoma and follicular cancer samples, and a 25 protein combination is screened by combining an extreme gradient lifting model. According to the protein quantitative value of the protein combination, the extreme gradient lifting model is combined, and the benign and malignant thyroid follicular tumor can be identified and evaluated with the AUC of more than 0.9 and the accuracy of more than 85 percent, so that a clinician is assisted in making clinical decisions.

Description

Kit and system for identifying and evaluating thyroid follicular tumor by protein combination
Technical Field
The invention relates to the field of medical diagnosis, and particularly relates to an auxiliary means for identifying and evaluating thyroid follicular tumors based on protein and machine learning.
Background
The incidence of thyroid nodules and thyroid cancer has continued to rise over the past two decades. Although ultrasound examination and ultrasound-guided fine needle puncture help to distinguish benign and malignant nodules, about 10-30% of thyroid nodules are still not identifiable by cytopathology and require surgical diagnosis. The surgical specimen is carefully examined by the pathologist and provides a clear and complete diagnosis in terms of histopathological changes. Such patients often undergo unnecessary surgery because many benign nodules are obscured from pre-operative diagnosis. The most ambiguous diagnoses occur in follicular tumors, which represent approximately 30-50% of the indeterminate nodules prior to surgery.
Follicular tumors are tumors formed by differentiation of follicular cells and consist of microfiltered structures. Follicular adenomas are hard or rubbery, uniform round or oval tumors, enveloped by a thin fibrous envelope, a common benign tumor of the thyroid gland. The incidence of thyroid follicular tumors was 3-4.3% in necropsy results. However, follicular cancer is more cellular, has a thick, irregular envelope, often necrotic areas and more frequent nuclear divisions. Follicular carcinoma differs from follicular adenoma by invasion of the entire envelope, invasion of blood vessels, extrathyroid invasion, lymph node metastasis or systemic metastasis. Vascular infiltration is the most reliable sign of malignancy. Distant metastasis occurs in 10-15% and recurrence occurs in 11-39% of all follicular cancer patients. Invasive follicular cancer patients have a 10-year disease-specific mortality rate of 15-28%. The ratio of follicular adenoma to follicular carcinoma in the surgical specimens was approximately 5: 1.
Benign follicular tumor follicular adenoma cannot be identified before surgery from malignant follicular tumor follicular carcinoma because invasion of the envelope cannot be assessed in cytological, ultrasonic and clinical features. The only method for distinguishing the two is to perform diagnostic operation and further to judge whether the tumor is benign or malignant. Nevertheless, follicular carcinoma and follicular adenoma are often difficult to distinguish in pathological diagnosis of paraffin section after operation, microscopic features of follicular carcinoma are very similar to those of follicular adenoma, and the judgment can be obtained only by microscopically examining tumor invasion conditions through continuous sections. Sometimes, even a continuous slice is difficult to judge, and eventually only a fuzzy diagnosis result is given. Moreover, there are two not negligible problems, on the one hand, the capsule is not visible to the pathologist, and on the other hand, some follicular adenomas may develop into follicular carcinoma, but the tumor does not break through the capsule at the early stage of disease development when the surgery is ongoing. Therefore, the simple envelope violation definition is not reliable and accurate. This obviously also requires other means to assist in completing the identification.
Several next-generation sequencing-based nucleic acid molecule assays have been developed and have achieved some success for diagnosing indeterminate thyroid nodules. However, differentiation between follicular tumors and follicular cancers based on genomic and transcriptome characteristics has not been reported. RAS mutations and PAX8/PPAR γ rearrangements are the most common changes in follicular tumors, but this mutation model is detectable in both benign and malignant follicular tumors, and therefore cannot be distinguished by the results of genetic testing.
The protein is located at the most downstream of the biological center rule and is a direct executor or a direct embodiment of life activities. In clinical diagnosis of diseases, important biological roles such as biomarkers, drug targets, etc. are played. Proteomics is a discipline for quantitative analysis of proteins detected in biological samples. The proteomics based on the integration of multiple groups of science such as proteome provides verification and explanation of a closer phenotype for a genome, provides more accurate and reliable information for early cancer discovery, benign and malignant diagnosis, typing, personalized medicine application, curative effect monitoring, prognosis judgment and the like, and makes accurate medicine more accurate.
In general, proteomes can be obtained by two methods, one is a conventional non-labeled quantitative method, and the other corresponds to a labeled quantitative method. The labeled quantitative proteomics method can analyze 6-16 samples simultaneously in one detection, and the detection flux is higher than that of the unlabeled quantitative method. Meanwhile, labeled quantitative proteomics can deeply and quantitatively detect protein expression in a sample, and can detect nearly ten thousand or more than ten thousand proteins under normal conditions. The most common method for label quantification is the Tandem Mass Tag (TMT) method. For sample analysis that closely resembles biological performance, deeper protein coverage may be effective in finding potential biomarkers.
Disclosure of Invention
In the application, the inventor detects protein expression in a sample based on a TMT (tmt Mass tag) isotope labeling quantitative tandem Mass spectrometry, and combines a machine learning method to accurately evaluate and identify follicular carcinoma and follicular adenoma from a protein expression level. According to the invention, a new combination of 25 proteins is screened out by analyzing proteomics data of the thyroid follicular tumor, and based on the 25 proteins, the thyroid follicular tumor can be identified from the protein molecular level by combining a polar gradient promotion model, so that a clinician can be assisted in making a decision, and the problems of over diagnosis and inaccurate evaluation of the thyroid follicular tumor in clinic can be relieved to a certain extent.
The invention is obtained by the following steps:
1. data generation method
First, a tissue sample of thyroid follicular adenoma and follicular carcinoma is obtained, proteins in the tissue are extracted by a pressure cycling technique, and the proteins are digested into a polypeptide sample by using enzymes. Subsequently, the polypeptides in the different samples were labeled with TMT reagent and the labeled peptide fragments were further fractionated by high pH liquid chromatography, each fraction being subjected to acquisition of mass spectral data by a 60 minute data-dependent acquisition mode. And finally, carrying out library searching and quantification on the original file data after mass spectrum acquisition by using protome discover software.
2. Data preprocessing method
For the protein matrix generated by the library searching software, firstly, proteins with deletion rate exceeding 60% are removed, then, a robust sequence filling method in an R software package NAguider is used for filling the deletion value, and finally, a ComBat algorithm is used for batch correction of the data entirety.
3. Protein feature combinatorial preselection
Firstly, analyzing the protein matrix after pretreatment to determine the differential protein of follicular carcinoma and follicular adenoma, thereby realizing the characteristic filtration of candidate proteins. Then, further filtration was performed by three methods: analysis of variance, Kruskal-Wallis test, and information gain method to determine preliminary protein combinations.
4. Classification model construction and final feature combination determination
Firstly, the hyper-parameters of an eXtreme Gradient boost (XGboost) algorithm are optimized through random search and five-fold cross validation, and then, based on the preliminary combination of the proteins, more refined feature selection is carried out: namely, the importance of the proteins is ranked by training the model for multiple times, and the best protein quantity and protein combination are determined by the cross validation effect of the model. After the protein is determined, the hyper-parameter tuning and the model training are carried out again, the final extreme gradient elevation model can evaluate the benign and malignant degree of the thyroid follicular tumor, a score between 0 and 1 is given, and the higher the score is, the higher the malignant degree is, so that the final extreme gradient elevation model can be applied to a new data set.
Accordingly, in one aspect, the present invention provides the use of a combination of proteins in the manufacture of a kit for the differential assessment of thyroid follicular tumors, said combination of proteins consisting of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ cor 2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q8N3R _ MPP, wherein the kit contains reagents to detect the relative expression levels of the combination of the proteins.
In one embodiment, the relative expression of the combination of proteins is detected by mass spectrometry.
In another embodiment, the relative expression of the combination of proteins is detected by tandem mass spectrometry labeling.
In yet another embodiment, the evaluating comprises inputting data obtained by detecting the relative expression levels of the combination of proteins by tandem mass spectrometry labeling technique into a polar gradient elevation model, outputting a score between 0 and 1, the higher the score, the higher the degree of malignancy, and a cutoff value of 0.5.
In another aspect, the present invention provides a kit containing, but not limited to, the following proteins to be detected, heavy-gauge isotopic peptide fragments corresponding to the proteins, and the combination of proteins consisting of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
In yet another aspect, the present invention provides a method for constructing a model for differential evaluation of thyroid follicular tumors, comprising: training a machine learning model by taking the relative expression quantity of a protein combination in thyroid gland follicular adenoma and follicular carcinoma as a training sample to obtain the model, wherein the protein combination consists of the following components in percentage by weight: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
In one embodiment, the model is obtained from a gradient lifting model algorithm construction.
In another aspect, the present invention provides a system for differential assessment of thyroid follicular tumors, comprising a substance for detecting the relative expression of a combination of proteins, and data processing means and output means, wherein the combination of proteins consists of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
In one embodiment, the data processing apparatus comprises a discriminatory evaluation module comprising a polar gradient boost model.
In yet another embodiment, the data of the relative expression amounts of the combination of proteins is input into the extreme gradient elevation model for processing, and the output device outputs a score between 0 and 1, wherein the higher the score is, the higher the malignancy degree of the thyroid follicular tumor is, and the cutoff value is 0.5.
The present invention proposes a new 25 protein combinations (Q8 TF72_ SHROOM3, Q86UX2_ ITIH5, Q8NBF6_ AVL9, Q8N6Y0_ USHBP1, Q96RR4_ CAMKK2, Q92828_ cor 2A, Q96K21_ ZFYVE19, Q96FN5_ KIF12, Q9H223_ d4, Q9HCD6_ TANC2, Q8IYT2_ CMTR2, P14649_ MYL6B, Q9UNA1_ hgarep 26, P02765_ AHSG, Q86 _ ahd 7_ hdc2, Q9 m7_ ucr 7, Q9 _ ep 941 72, dhq 9_ yap 8672, Q9 _ yap 7, Q8 _ yap 7_ akp 7, Q8 _ akp 7_ sax 7_ yao 7, Q8 _ sax 7_ sax 3668, Q8 _ sax 7, Q8 _ sax 7, Q7 _ sax 7, Q8 _ sax _ 3655, Q9, Q8 _ sax, Treatment and surgery.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following embodiments are merely illustrative of the technical solutions of the present invention, and should not be used to limit the scope of the present invention.
Unless otherwise specifically indicated or limited, the technical means used in the embodiments of the present application are all conventional technical means well known to those skilled in the art, and the materials and/or devices, apparatuses, instruments, reagents, consumables and the like used in the embodiments of the present application are all commercially available.
1. Data generation method
First, a tissue sample of thyroid follicular adenoma and follicular carcinoma is obtained, proteins in the tissue are extracted by a pressure cycling technique, and the proteins are digested into a polypeptide sample by using enzymes. Subsequently, the polypeptides in the different samples were labeled with TMT reagent and the labeled peptides were further fractionated by high pH liquid chromatography, each fraction being subjected to acquisition of mass spectral data by a 60 minute data dependent acquisition mode. And finally, carrying out library searching and quantification on the Proteome expression on the original file data acquired by the mass spectrum by using protome discover software.
2. Data preprocessing method
For a protein matrix generated after library search, firstly, performing deletion value evaluation on proteins, namely analyzing the deletion rate threshold values of different proteins through the deletion rate of the proteins, determining the deletion rate threshold value and removing the proteins with high deletion rate, so that the total deletion rate of the matrix is less than 10%. Then, missing value padding is performed, using a robust sequence padding method in the R software package NAguideR. And finally, carrying out batch correction by a Combat method. For non-positive values in the protein matrix due to padding/correction, it was replaced with a minimum value of 0.5 times the value of the corresponding protein positive expression.
3. Protein feature preselection
Firstly, the protein matrix after filling and correction is analyzed to determine the differential protein of follicular carcinoma and follicular adenoma, thereby realizing candidate protein characteristic filtering. Fold difference is determined by dividing mean thyroid follicular oncoprotein expression by mean follicular adenomprotein expression. The differential protein calculation conditions were: the difference multiple is more than 1.2 times, and the student t test correction P value is less than 0.05. Subsequently, further filtration was performed by three principles: 1) the P value of the analysis of variance is more than or equal to 0.001; 2) the P value is more than or equal to 0.001 by Kruskal-Wallis test; 3) there is no information gain, the characteristic protein satisfying any one of the above filtering principles is removed, and the remaining characteristic protein is the determined preliminary combination of proteins. The combination can be used for classification of follicular cancer from follicular adenoma after further screening.
4. Classification model construction and final feature combination determination
Firstly, the hyper-parameters of the extreme gradient lifting model are optimized through random search (100 times of search in a parameter space) and five-fold cross validation, and then in order to further refine the extreme gradient lifting model, more refined feature selection is carried out based on the determined protein preliminary combination, and the total steps are divided into two steps. The first step is as follows: firstly, training 100 models, calculating characteristic importance and selecting the first 50 proteins according to a Gini coefficient during each training, then integrating the results of the 100 times, reserving the selected proteins for not less than 30 times, continuing to train the models for 100 times by using the proteins, sequencing the characteristic importance, and finally averaging the results of the 100 times to obtain the ranking of the rest proteins; the second step is that: 100 cross-validations were performed with the first 5, 10, 15, … … protein features at 5 variable intervals, the optimal number of variables was determined by mean AUC, and the above procedure was repeated at intervals of 1 based on the preliminary results to determine the final number of proteins and protein combinations. After the protein is determined, the hyper-parameter tuning and model training are carried out again, the final extreme gradient lifting model can evaluate the benign and malignant degree of the thyroid follicular tumor, a score between 0 and 1 is given, the higher the score is, the higher the malignant degree is, and therefore the final extreme gradient lifting model can be applied to a new data set.
Examples
Example 1-sample inclusion.
The thyroid tissue related to the embodiment was obtained from thirteen clinical hospitals in China and Singapore in 2010-2020, and ethical approval of the hospitals and the research unit was obtained. A total of 645 samples were included initially in this experiment, specifically 341 follicular tumors and 303 follicular carcinomas, and 1 missing from the study due to failure to match. And (3) after the H & E section corresponding to each sample wax lump is rechecked and checked by a pathology expert, confirming that the percentage of the tumor tissue area is more than 70%, taking 1 section for proteomics detection and analysis, and obtaining the section with the thickness of 5-10 mu m.
Example 2-proteomics data acquisition and pre-processing.
Paraffin sections were washed with 100% heptane, 100% ethanol, 90% ethanol, and 75% ethanol sequentially for 5 minutes each time, and subjected to dewaxing and hydration processes. The dewaxed sample was added with Tris base solution of pH =10 and reacted at 95 ℃ for 30 minutes. Then, urea, thiourea, a reducing agent and an alkylating reagent are added, and the mixture is circulated alternately under high pressure and low pressure through a pressure circulation system, namely, the mixture reacts for 50 seconds under the pressure of 45000 p.s.i., the mixture reacts for 10 seconds under normal pressure, and the circulation operation is carried out for 90 times. After cleavage, proteolytic cleavage is carried out by trypsin and LysC enzyme, and the obtained cleavage peptide fragments are desalted by C18. Subsequently, the clean polypeptide was labeled with 16plex TMTpro reagent. And fractionating the marked sample by adopting high pH liquid chromatography, and obtaining 30 fractions by fractionating under a gradient of 60 min, wherein each fraction is subjected to data dependence acquisition by high resolution mass spectrometry. Raw data are subjected to spectrum decomposition and quantification by using FragPipe software and protome resolver (PD) software, and 11,533 proteins (FragPipe) and 10,336 Proteins (PD) are respectively identified under the condition that the false discovery rate is less than 0.01. To ensure the reliability of the data, only 10,032 proteins identified by both software were retained. And the protein matrix output by the PD software was used in subsequent analyses. Subsequently, 2236 (22.3%) proteins with deletion rates greater than 60% were filtered, resulting in an overall deletion rate of less than 10% for the entire protein matrix. Then, the deletion value filling and the Combat method are carried out by a robust sequence filling method in the R package NAguideR to carry out batch correction. Non-positive values appearing in the protein matrix were replaced with 0.5 times the minimum of their corresponding protein positive expression values.
Example 3 protein feature preselection.
To find differences in the molecular biology level between follicular adenoma and follicular carcinoma, a comparison of the differences in the proteomic expression profiles of the two was made. Under the condition that the student t-test corrected P value is less than 0.05 and the difference multiple (or the reciprocal of the difference) is more than 1.2 times as screening conditions, 178 difference proteins are obtained in total. Candidate proteins are further filtered through three conditions, namely, an analysis of variance P value is larger than or equal to 0.001, a Kruskal-Wallis test P value is larger than or equal to 0.001 and no information gain is achieved, and proteins meeting any filtering principle are removed from the 178 differential protein candidate pool. A total of 55 proteins were filtered out by the above filtration, and the remaining 123 proteins were the primary combination of proteins, as shown in table 1. The protein candidate pool is closely related to thyroid follicular tumor, is discovered for the first time by the method and is not reported at all.
Table 1: 123 proteins (Unit Access ID) obtained by preselection
O00339 P05546 P29373 Q15742 Q8N6Y0 Q9H223
O14524 P06727 P29762 Q1HG43 Q8NBF6 Q9H788
O14727 P07202 P29966 Q53RD9 Q8NFP9 Q9HCD6
O15037 P07858 P30291 Q5S007 Q8TDX6 Q9NXH8
O15460 P08697 P36551 Q5TF21 Q8TF72 Q9P219
O43148 P11388 P42574 Q5VSL9 Q8WXA9 Q9P258
O60303 P12429 P46013 Q687X5 Q92828 Q9P2K5
O60502 P14649 P47736 Q6NV74 Q92959 Q9UBM7
O60706 P15090 P52926 Q6UX53 Q93099 Q9UIJ5
O75096 P16104 P57729 Q6ZS11 Q96FN5 Q9UKS7
O75382 P16401 P61077 Q6ZS30 Q96GM8 Q9ULC0
O95210 P16402 P61916 Q7Z7B0 Q96K21 Q9ULH0
O95372 P16403 P61925 Q86SF2 Q96RR4 Q9UNA1
O95429 P16671 P81877 Q86U70 Q99470 Q9Y487
P00740 P16949 P85037 Q86UX2 Q9BQB6 Q9Y4H2
P01019 P17096 Q00613 Q86XX4 Q9BRL6 Q9Y4P1
P01266 P17535 Q04941 Q86YB7 Q9BWG4 Q9Y646
P02765 P20962 Q07352 Q8IWS0 Q9BX97 Q9Y6M1
P02766 P22223 Q13454 Q8IYT2 Q9BY12
P02774 P22748 Q14195 Q8N3R9 Q9C0H9
P04275 P25311 Q14376 Q8N6N7 Q9H1E3 。
Example 4 final protein combination determination.
After the hyper-parameters are adjusted, the selection of the number of characteristic proteins is evaluated more accurately. Firstly, 100 times of models are trained, the feature importance is calculated and the first 50 proteins are selected in each training, and then the results of the 100 times of training are combined, so that not less than 30 selected proteins are reserved, and 58 selected proteins are reserved. Then, the model is continuously trained for 100 times by using the 58 proteins, the characteristic importance ranking is carried out, and finally, the results of the 100 times are averaged to obtain the ranking of the 58 proteins. Model potency comparisons were performed for different numbers of protein features, and 25 proteins (Q8 TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q687X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP _ 6V0A, O75382_ TRIM, Q927539 _ SLCO2A, and Q8N 953R _ MPP) were finally selected, and cross validation was performed at this time. Of these 25 proteins, 7 were reported to be associated with thyroid cancer or thyroid function, of which only the ITIH5 protein was reported to be associated with thyroid follicular tumor, while the remaining 24 proteins were all reported for the first time to be associated with thyroid follicular tumor. This is shown in table 2, and is ranked according to its classification potency for follicular carcinoma and adenoma.
Table 2: further summary of 25 proteins
Figure 930798DEST_PATH_IMAGE001
Example 5-evaluation model construction and testing.
Based on the above results, an extreme gradient model was selected in combination with 25 final signature proteins for assessment of follicular adenomas and follicular carcinomas.
To construct the final model, the hyper-parameters were re-optimized by five-fold cross-validation, as detailed in table 3. Then, the model was trained and tested on the corresponding data set, and the results are shown in Table 4, wherein the AUC of the internal validation set of the model is 0.951 (0.944-0.959), the accuracy is 0.872 (0.859-0.892), the sensitivity is 0.875 (0.836-0.889), the specificity is 0.871 (0.866-0.910), the PPV is 0.856 (0.849-0.894), and the NPV is 0.887 (0.860-0.899); in the independent test set, the model potency suggested an AUC of 0.904 (0.852-0.956), an accuracy of 0.859 (0.789-0.908), a sensitivity of 0.877 (0.772-0.938), a specificity of 0.843 (0.738-0.911), a positive predictive value PPV of 0.838 (0.731-0.908), and a negative predictive value NPV of 0.881 (0.778-0.940). The result sensitivity is higher than specificity, and the screening capability of the prompt model for follicular carcinoma is stronger. Compared with PPV, NPV is high, and the prompt model exclusion capability is strong. The reliability of the predicted results in the independent test set fluctuated within about ± 5%.
Table 3: parameter setting after extreme gradient lifting model tuning
Figure 682853DEST_PATH_IMAGE003
Table 4: model prediction performance
Figure 442999DEST_PATH_IMAGE004
Example 6 evaluation of thyroid follicular tumor malignancy in thyroid tissue to be tested in a subject.
The method comprises the steps of preparing a thyroid tissue sample to be detected of a subject by using a pressure circulation system, quantifying by using a TMT (tetramethylbenzidine) label, collecting protein quantification result data by using a high performance liquid chromatography and a mass spectrum together, inputting the mass spectrum data into a final extreme gradient lifting model of the application, and giving a fraction of 0-1, wherein the higher the fraction is, the higher the malignancy degree is. After the model was constructed, we tested 135 independent test cohorts, with a median score of 0.82, an average score of 0.75, a first quartile score of 0.63, and a third quartile score of 0.95 for follicular cancer samples; the follicular adenoma median score was 0.27, the mean score was 0.30, the first quartile score was 0.11, and the third quartile score was 0.43, which data reflect the accuracy of the model score. In practical applications, 25 protein expressions in a sample are detected, the quantitative results of the proteins are used as model inputs, the malignancy score is output through the model under the parameters, and when the score is more than 0.5, the sample is judged to be follicular cancer.
While the present invention has been described in detail hereinabove with respect to specific embodiments thereof, it will be apparent to those skilled in the art that modifications and improvements can be made based on the disclosure. Therefore, it is intended that all such modifications and improvements be included within the scope of the invention without departing from the spirit thereof.

Claims (10)

1. Use of a combination of proteins in the manufacture of a kit for the differential assessment of a thyroid follicular tumor in a subject, the combination of proteins consisting of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ cor 2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q8N3R _ MPP, wherein the kit contains reagents to detect the relative expression levels of the combination of the proteins.
2. The use of claim 1, wherein the relative expression of the combination of proteins is detected by mass spectrometry.
3. The use of claim 1, wherein the relative expression of the combination of proteins is detected by tandem mass spectrometry labeling.
4. The use according to claim 3, wherein the assessment comprises inputting data obtained by measuring the relative expression of the combination of proteins by tandem mass spectrometry tagging into a polar gradient elevation model, outputting a score between 0 and 1, the higher the score, the higher the degree of malignancy, and a cutoff value of 0.5.
5. A kit comprising a combination of proteins consisting of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
6. A method for constructing a model for identifying and evaluating thyroid follicular tumor comprises the following steps: training a machine learning model by taking the relative expression quantity of a protein combination in thyroid gland follicular adenoma and follicular carcinoma as a training sample to obtain the model, wherein the protein combination consists of the following components: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q687X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
7. The method for constructing a model for the differential evaluation of thyroid follicular tumor according to claim 6, wherein the model is constructed with a polar gradient elevation model algorithm.
8. A system for differential assessment of thyroid follicular tumors comprising a substance for detecting the relative expression of a combination of proteins, and data processing means and output means, wherein the combination of proteins consists of: q8TF _ SHROOM, Q86UX _ ITIH, Q8NBF _ AVL, Q8N6Y _ USHBP, Q96RR _ CAMKK, Q92828_ CORO2, Q96K _ ZFYVE, Q96FN _ KIF, Q9H223_ EHD, Q9HCD _ TANC, Q8 _ CMTR, P14649_ MYL6, Q9UNA _ ARHGAP, P02765_ AHSG, Q86YB _ ECHDC, Q9UBM _ DHCR, Q04941_ PLP, P07202_ TPO, Q7X _ STEAP, O60706_ ABCC, O95429_ BAG, Q9Y487_ ATP6V0A, O75382_ TRIM, Q92959_ SLCO2A, and Q688N 3R _ MPP.
9. The system of claim 8, wherein the data processing device comprises a discriminatory evaluation module comprising a polar gradient boost model.
10. The system of claim 9, wherein the relative expression data of the combination of proteins is input into the extreme gradient elevation model for processing, and the output device outputs a score between 0 and 1, wherein a higher score indicates a higher malignancy of the thyroid follicular tumor, and the cutoff value is 0.5.
CN202211046085.8A 2022-08-30 2022-08-30 Kit and system for identifying and evaluating thyroid follicular tumor by protein combination Active CN115128285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211046085.8A CN115128285B (en) 2022-08-30 2022-08-30 Kit and system for identifying and evaluating thyroid follicular tumor by protein combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211046085.8A CN115128285B (en) 2022-08-30 2022-08-30 Kit and system for identifying and evaluating thyroid follicular tumor by protein combination

Publications (2)

Publication Number Publication Date
CN115128285A true CN115128285A (en) 2022-09-30
CN115128285B CN115128285B (en) 2023-01-06

Family

ID=83387441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211046085.8A Active CN115128285B (en) 2022-08-30 2022-08-30 Kit and system for identifying and evaluating thyroid follicular tumor by protein combination

Country Status (1)

Country Link
CN (1) CN115128285B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115436640A (en) * 2022-11-07 2022-12-06 西湖欧米(杭州)生物科技有限公司 Surrogate matrix for polypeptides that can assess the malignancy or probability of thyroid nodules

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100285979A1 (en) * 2007-08-27 2010-11-11 Martha Allen Zeiger Diagnostic tool for diagnosing benign versus malignant thyroid lesions
US20110312520A1 (en) * 2010-05-11 2011-12-22 Veracyte, Inc. Methods and compositions for diagnosing conditions
US20120142030A1 (en) * 2007-04-14 2012-06-07 The Regents of the University of Colorado, Body Co rporate Biomarkers for Follicular Thyroid Carcinoma and Methods of Use
CN111292801A (en) * 2020-01-21 2020-06-16 西湖大学 Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning
CN111424091A (en) * 2020-04-20 2020-07-17 中国医学科学院北京协和医院 Marker for differential diagnosis of benign and malignant thyroid follicular tumor and application thereof
CN112862756A (en) * 2021-01-11 2021-05-28 中国医学科学院北京协和医院 Method for identifying pathological change type and gene mutation in thyroid tumor pathological image
US20210174899A1 (en) * 2019-12-05 2021-06-10 Bostongene Corporation Machine learning techniques for gene expression analysis
US20210254056A1 (en) * 2017-05-05 2021-08-19 Camp4 Therapeutics Corporation Identification and targeted modulation of gene signaling networks
CN114414704A (en) * 2022-03-22 2022-04-29 西湖欧米(杭州)生物科技有限公司 System, model and kit for evaluating malignancy degree or probability of thyroid nodule
CN114705794A (en) * 2022-04-15 2022-07-05 西湖大学 Proteomics analysis method for biological sample

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120142030A1 (en) * 2007-04-14 2012-06-07 The Regents of the University of Colorado, Body Co rporate Biomarkers for Follicular Thyroid Carcinoma and Methods of Use
US20100285979A1 (en) * 2007-08-27 2010-11-11 Martha Allen Zeiger Diagnostic tool for diagnosing benign versus malignant thyroid lesions
US20110312520A1 (en) * 2010-05-11 2011-12-22 Veracyte, Inc. Methods and compositions for diagnosing conditions
US20210254056A1 (en) * 2017-05-05 2021-08-19 Camp4 Therapeutics Corporation Identification and targeted modulation of gene signaling networks
US20210174899A1 (en) * 2019-12-05 2021-06-10 Bostongene Corporation Machine learning techniques for gene expression analysis
CN111292801A (en) * 2020-01-21 2020-06-16 西湖大学 Method for evaluating thyroid nodule by combining protein mass spectrum with deep learning
CN111424091A (en) * 2020-04-20 2020-07-17 中国医学科学院北京协和医院 Marker for differential diagnosis of benign and malignant thyroid follicular tumor and application thereof
CN112862756A (en) * 2021-01-11 2021-05-28 中国医学科学院北京协和医院 Method for identifying pathological change type and gene mutation in thyroid tumor pathological image
CN114414704A (en) * 2022-03-22 2022-04-29 西湖欧米(杭州)生物科技有限公司 System, model and kit for evaluating malignancy degree or probability of thyroid nodule
CN114705794A (en) * 2022-04-15 2022-07-05 西湖大学 Proteomics analysis method for biological sample

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
唐颖等: "S100A4和hMLH1在甲状腺腺瘤和滤泡癌中的表达及其鉴别诊断意义", 《诊断病理学杂志》 *
张福彬等: "TFF3和C1orf24协助鉴别甲状腺滤泡型肿瘤良恶性", 《医学信息》 *
张颖等: "VEGF与P53蛋白对鉴别诊断甲状腺良恶性肿瘤的临床价值分析", 《现代诊断与治疗》 *
熊金华等: "甲状腺滤泡型肿瘤分子标记物的研究进展", 《国际内分泌代谢杂志》 *
胡智祥: "《医院临床检验技术操作规范与实(化)验室管理全书》", 31 August 2004 *
范若愚 等: "《大数据时代的商业建模》", 31 July 2013 *
詹阳等: "MMP2、MMP9、Galectin-3、Ckl9,及Ret在甲状腺滤泡型肿瘤中的表达", 《中华医学会病理学分会2005年学术年会》 *
金东岭等: "甲状腺癌survivin基因表达及细胞增殖与凋亡的研究", 《2010年河北省科学技术研究成果公报(第一号)》 *
陈云等: "RET、Mucin1和Galectin-3在甲状腺良恶性肿瘤中的表达", 《第二军医大学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115436640A (en) * 2022-11-07 2022-12-06 西湖欧米(杭州)生物科技有限公司 Surrogate matrix for polypeptides that can assess the malignancy or probability of thyroid nodules

Also Published As

Publication number Publication date
CN115128285B (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN102027373B (en) It was found that being determined for prostate cancer diagnosis and the biomarker and medicine target calibration method and its biomarker of establishment for the treatment of
KR100679173B1 (en) Protein markers for diagnosing stomach cancer and the diagnostic kit using them
Zhang et al. Tree analysis of mass spectral urine profiles discriminates transitional cell carcinoma of the bladder from noncancer patient
Srinivasan et al. Accurate diagnosis of acute graft-versus-host disease using serum proteomic pattern analysis
CN108603887A (en) Nonalcoholic fatty liver disease (NAFLD) and nonalcoholic fatty liver disease (NASH) biomarker and application thereof
CN115575636B (en) Biomarker for lung cancer detection and system thereof
CN115144599B (en) Application of protein combination in preparation of kit for carrying out prognosis stratification on thyroid cancer of children, and kit and system thereof
CN114414704B (en) System, model and kit for evaluating malignancy degree or probability of thyroid nodule
CN106461647A (en) Protein biomarker profiles for detecting colorectal tumors
CN113167782A (en) Method for sample quality assessment
US20130303391A1 (en) Methods for diagnosis and prognosis of inflammatory bowel disease using cytokine profiles
CN112748191A (en) Small molecule metabolite biomarker for diagnosing acute diseases, and screening method and application thereof
CN115128285B (en) Kit and system for identifying and evaluating thyroid follicular tumor by protein combination
CN115798712A (en) System and biomarker for diagnosing whether person to be tested is breast cancer
JP2006294014A (en) Analysis program, protein chip, method for manufacturing protein chip and antibody cocktail
CN114496220A (en) Rapid design method of fluorescent probe for discovering and detecting primary screening indexes of tumor
CN115044665A (en) Application of ARG1 in preparation of sepsis diagnosis, severity judgment or prognosis evaluation reagent or kit
CN111751551B (en) Protein molecule as biomarker for diagnosing liver cirrhosis and prognosis method thereof
CN113125757A (en) Protein biomarker for early pregnancy diagnosis of sows and method for early pregnancy diagnosis of sows by using protein biomarker
CN108697760A (en) The early detection of hepatocellular carcinoma
Wilz et al. Development of a test to identify bladder cancer in the urine of patients using mass spectroscopy and subcellular localization of the detected proteins
CN111751550B (en) Biomarker for liver cancer diagnosis and prognosis method thereof
CN113718032B (en) Application of biomarker in early detection of cervical cancer
CN117169504B (en) Biomarker for gastric cancer related parameter detection and related prediction system and application
CN116735889B (en) Protein marker for early colorectal cancer screening, kit and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant