CN116539892B - Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method - Google Patents

Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method Download PDF

Info

Publication number
CN116539892B
CN116539892B CN202310433563.9A CN202310433563A CN116539892B CN 116539892 B CN116539892 B CN 116539892B CN 202310433563 A CN202310433563 A CN 202310433563A CN 116539892 B CN116539892 B CN 116539892B
Authority
CN
China
Prior art keywords
protein
cell carcinoma
clear cell
sample
renal clear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310433563.9A
Other languages
Chinese (zh)
Other versions
CN116539892A (en
Inventor
崔心刚
金鸽
潘秀武
张梦欢
董克勤
曹建军
周旺
陈文进
李文彦
徐小红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Rendong Bioengineering Co ltd
XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine
Original Assignee
Suzhou Rendong Bioengineering Co ltd
XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Rendong Bioengineering Co ltd, XinHua Hospital Affiliated To Shanghai JiaoTong University School of Medicine filed Critical Suzhou Rendong Bioengineering Co ltd
Priority to CN202310433563.9A priority Critical patent/CN116539892B/en
Publication of CN116539892A publication Critical patent/CN116539892A/en
Application granted granted Critical
Publication of CN116539892B publication Critical patent/CN116539892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2570/00Omics, e.g. proteomics, glycomics or lipidomics; Methods of analysis focusing on the entire complement of classes of biological molecules or subsets thereof, i.e. focusing on proteomes, glycomes or lipidomes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/70Mechanisms involved in disease identification
    • G01N2800/7023(Hyper)proliferation
    • G01N2800/7028Cancer

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Food Science & Technology (AREA)
  • Bioethics (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Cell Biology (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Physiology (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a group of kidney clear cell carcinoma protein markers, which comprise eight proteins, namely PLOD3, DCN, SOD3, COL4A2, SPON1, GGH, VWF and SPARC. The invention also discloses a screening method of the marker and a construction method of an auxiliary diagnosis model of the renal clear cell carcinoma based on the protein level of the marker. The invention provides a practical and effective technical means for early diagnosis and large-scale screening of the renal clear cell carcinoma by screening the renal clear cell carcinoma specific protein markers and constructing an auxiliary diagnosis model of the renal clear cell carcinoma based on the screened markers.

Description

Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method
Technical Field
The invention relates to the field of biomedicine, in particular to the field of kidney cancer, and more particularly relates to a biomarker for kidney clear cell cancer, screening of the biomarker, and an auxiliary diagnosis model and a construction method of the model for kidney clear cell cancer based on the marker.
Background
Renal cancer is one of the most common urinary system malignancies, 13 th cancer in the world, accounting for 2.4% of all cancers, with more than 33 tens of thousands of new diagnosed cases each year. There were 73820 new cases counted in 2019 and 14770 cases with death, in China, the 14 th place of all cancers listed in renal cancer. Renal cell carcinoma is a tissue heterogeneous tumor that includes a number of different tumor cell types, each of which has a different molecular phenotype, histological characteristics, and clinical characteristics. According to the consensus underlying the kidney tumor classification established by 2016WHO, of the subtypes of renal cell carcinoma, renal clear cell carcinoma (ccRCC) accounts for about 70-80%, while papillary carcinoma (PRCC) accounts for about 10%, chromophobe cell carcinoma (chRCC) accounts for about 5%, clear cell papillary RCC (ccpRCC) accounts for about 4%, and the remaining subtypes are rare, accounting for about 1%.
ccRCC is the most common malignancy of the adult kidney. Despite the improvement seen early in recent years, more than one-third of patients are already at a first visit with locally advanced tumors or metastasis due to renal compensatory mechanisms and lack of typical symptoms of renal clear cell carcinoma at the initial stage. About 20% to 40% of localized ccRCC patients recur after surgery. Patients with metastasis and recurrence are often insensitive to radiation and chemotherapy, with five-year survival rates of only 10%. In recent years, with the rapid development and application of high-resolution mass spectrometry technology, accurate molecular expression profiles are identified from the proteomics level, the intrinsic molecular mechanism and protein network of tumors are revealed, and research on screening new targets for early diagnosis, prognosis evaluation and treatment has become a new hotspot of tumor research. In fact, research in clinical tumor proteomics faces many technical and costly challenges. On the one hand, the requirements of hardware, software equipment, skilled technical operation and data analysis personnel are met, and on the other hand, proper strategies are selected to finish accurate sampling, sample preparation, mass spectrum data acquisition, analysis, classification and the like of complex tumor tissues. Most importantly, large sample, multi-queue validation of reproducibility and stability of the protein biomarkers screened was also performed. Due to the above factors, the clinical proteomics studies reported at present are limited to about 100 small samples and single-queue studies.
Disclosure of Invention
One of the technical problems to be solved by the present invention is to provide a group of protein markers of renal clear cell carcinoma, which can be used for distinguishing renal clear cell carcinoma from normal tissues.
In order to solve the technical problems, the kidney clear cell carcinoma protein marker combination of the present invention comprises the following 8 secreted proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN, COL4A2 and SPON1.
The second technical problem to be solved by the invention is to provide a group of T1a stage renal clear cell carcinoma protein marker combinations, which comprise the following 6 proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN can be used to distinguish stage T1a renal clear cell carcinoma from normal tissue.
The third technical problem to be solved by the present invention is to provide another group of kidney transparent cell carcinoma protein marker combinations, comprising the following 6 secreted proteins: PLOD3, VWF, SPARC, LGALS1, ANXA2, TGM2. The group of protein markers are up-regulated in the expression of renal clear cell carcinoma.
The fourth technical problem to be solved by the invention is to provide a kit, which comprises the detection reagent of any one group of kidney transparent cell carcinoma protein marker combination.
The fifth technical problem to be solved by the present invention is to provide a screening method of the above-mentioned kidney clear cell carcinoma protein marker combination, which comprises the following steps:
1) Acquiring renal clear cell carcinoma proteomics data;
2) Collecting secreted proteins;
3) Preliminary dimension reduction of protein variables;
4) Proteins that differ significantly between cancer samples and normal control samples were selected as renal clear cell carcinoma protein markers.
The proteomic data in the step 1) can be obtained by the following method:
11 Collecting paired tumor tissue samples and paracancerous normal tissue samples of patients with renal clear cell carcinoma;
12 Precisely acquiring a tissue sample core;
13 Pretreating the sample obtained in the step 12), wherein the pretreatment comprises weighing, dewaxing, hydration, acidic hydrolysis and alkaline hydrolysis;
14 Extracting the sample polypeptide: tissue lysis (preferably PCT-assisted FFPE tissue lysis) and protease cascade digestion methods (including protein reduction and alkylation, aminoacylase digestion, trypsin digestion) can be employed;
15 Desalting and solid-phase extracting the sample polypeptide;
16 Constructing a ccRCC protein mass spectrum library;
17 Collecting mass spectrum of the sample (preferably DIA MS method), identifying the sample protein, and performing proteomic data analysis to obtain proteomic data. The proteomic data analysis may employ the following method:
Sample screening: (a) for duplicate samples, deleting samples with more blank values; (b) deleting the identified less protein sample; (c) Deleting samples of cancer tissue and paracancerous normal tissue unpaired;
the proteins in the above-screened samples were subjected to the following treatments: (a) retaining at least 20% of the protein expressed in the sample; (b) Using a minimum value of protein expression as a filling value of the blank value; (c) Log of data 2 Conversion; (d) removing the batch effect using the limma package in the R language.
The collecting method of the secreted protein in the step 2) is preferably as follows: screening secreted proteins in Uniprot database, collecting proteins of plasma and extracellular vesicles provided by two databases of neXtProt and human protein profile, and taking the proteins of intersection of the two as secreted proteins expressed in the plasma and extracellular vesicles.
The preliminary dimension reduction method in the step 3) is preferably as follows: screening intersection of protein variables with secreted proteins collected in step 2), deleting protein variables not in the intersection; the independent variable with extremely small variance, the variable with strong correlation with other independent variables and the variable with multiple collinearity problems are deleted.
The invention aims to provide a construction method of an auxiliary diagnosis model of renal clear cell carcinoma, which is based on any group of renal clear cell carcinoma protein marker combinations and constructs the auxiliary diagnosis model of renal clear cell carcinoma by using GBM algorithm.
The seventh technical problem to be solved by the invention is to provide an auxiliary diagnosis model of the renal clear cell carcinoma constructed by the model construction method.
According to the invention, a quantitative proteomics technology based on FFPE sample preparation assisted by PCT (pressure-cycle assisted technology) and DIA non-labeling quantitative proteomics technology is adopted to quantitatively analyze the differential proteomes of ccRCC paired clinical samples (cancer and paracancerous normal tissues) of multiple queues and large samples, key proteins involved in ccRCC lesions are screened out as markers by comparing the proteomics characteristics of ccRCC tumors and paracancerous normal tissues, and a kidney transparent cell cancer auxiliary diagnosis model is constructed based on the expression levels of the protein markers, so that the model prediction repeatability is high, the time consumption is short and the cost is low, and the accuracy of early auxiliary diagnosis and prognosis evaluation of ccRCC is effectively improved.
Drawings
FIG. 1 is a schematic of accurate sampling in FFPE tissue. Wherein, figures a, e are FFPE sections of cancer and paracancerous normal tissue; panels b, c, d are the exact positioning of punch coring position (circular blank) and surrounding tissue under the microscope after staining of cancerous tissue with hematoxylin-eosin staining. Figures f, g, h are the accurate positioning of the punch coring position (circular blank) under the microscope after staining the paracancerous normal tissue with hematoxylin-eosin staining.
FIG. 2 is a graph of protein intersection identified by unlabeled quantitative proteomics data for 6 pooled samples.
Fig. 3 is a spline correlation coefficient profile of a repeat sample.
Fig. 4 is a training parameter optimization diagram of the renal clear cell carcinoma auxiliary diagnostic model.
Fig. 5 is a graph of AUC predicted by the renal clear cell carcinoma auxiliary diagnostic model.
FIG. 6 is a training parameter optimization diagram of the T1a stage renal clear cell carcinoma auxiliary diagnostic model.
FIG. 7 is a graph of the predicted AUC of the T1a stage renal clear cell carcinoma auxiliary diagnostic model.
FIG. 8 is a graph of training parameter optima using a renal clear cell carcinoma auxiliary diagnostic model for up-regulating proteins in cancer.
Fig. 9 is a graph of AUC predicted using a renal clear cell carcinoma-assisted diagnostic model for up-regulating proteins in cancer.
Detailed Description
For a more specific understanding of the technical content, features and effects of the present invention, the technical solution of the present invention will be described in further detail with reference to the accompanying drawings and specific examples.
EXAMPLE 1 acquisition of renal clear cell carcinoma proteomics data
1. Sample collection
The renal clear cell carcinoma (clear cell Renal Cell Carcinoma, ccRCC) formalin-fixed paraffin embedded (FFPE) samples used in this example were from the following 5 domestic hospitals: shanghai long sea hospital (CH), yi Angeles mountain Hospital (YJS), oriental liver and gall Hospital (DFGD), public Hospital (GL), and Taizhou Hospital (TZ). All patients to whom FFPE samples belong were diagnosed with ccRCC and did not receive any antitumor treatment prior to surgery. All FFPE samples screened were paired Tumor (T) and Normal Adjacent Tissue (NAT) samples, totaling 1862 samples (containing biological replicates) from 755 patients, including 948 Tumor samples and 914 normal controls. The first two initials of the name of the hospital with the name of the source of the sample and the acronym "ccRCC" for renal clear cell carcinoma are adopted for naming each queue, and the samples of the long-sea hospital are collected in two batches, and the batch effect is considered to be divided into two queues. The final number of samples for each queue is: 542 samples from ccRCC-CH-1 queue 229, 707 samples from ccRCC-CH-2 queue 274, 295 samples from ccRCC-YJS queue 124, 190 samples from ccRCC-DF queue 79, 81 samples from ccRCC-GL queue 30, and 47 samples from ccRCC-TZ queue 19. Clinical information for each queue sample is shown in table 1.
Table 1 all patients in the cohort were diagnosed with renal clear cell carcinoma
Note that: TX indicates that the clinical stage cannot be defined in the collected clinical data; GX indicates that no clear pathological grading is possible in the collected clinical data.
2. FFPE tissue accurate sampling
The collected formalin-fixed paraffin-embedded tissue sample sections of each patient were stained with hematoxylin-eosin staining, screened under a microscope, confirmed to be tumor tissue and normal adjacent tissue, accurately positioned at the coring position of a punch under the microscope, and a punch sampler was used to punch out tissue cores from FFPE tissue samples, requiring a diameter of 1mm and a weight of about 1-1.5mg. Three tissue cores were made per case as biological replicate samples (as shown in fig. 1).
3. Pretreatment of FFPE samples
1. The following reagents and buffers were prepared: (1) reagent: heptane, 100% ethanol, 90% ethanol, 75% ethanol, 0.1% formic acid; (2) buffer: 0.1M Tris-HCl (pH=10). 0.1M Tris-HCl (ph=10) was formulated at each experiment and careful inspection of the Tris-HCl pH for eligibility was required prior to use.
2. Sample weighing: each FFPE tissue core is weighed using a balance, the weight is controlled between 0.6-1mg, and the weight is recorded to calculate the sample protease dose and the yield of polypeptide.
3. Dewaxing: the weighed FFPE tissue cores were placed in a 2ml round bottom EP tube, 1ml heptane was added and placed in a sample thermostatted mixer for vortex dewaxing with instrument parameters set as: the temperature was 25℃and the rotational speed was 800rpm for 10 minutes. After completion, the heptane in the EP tube was removed using a micropipette, 1ml of heptane was added, and the dewaxing step described above was repeated 1 time.
4. Hydration: the heptane in the EP tube was removed using a micropipette, 1ml of 100% ethanol was added and placed in a sample thermostats mixer with instrument parameters set to: the temperature was 25℃and the rotational speed was 800rpm for 5 minutes. After the end, the supernatant in the EP tube was removed using a micropipette, 200 μl of 90% ethanol was added and placed in a sample thermostatic mixer, and the instrument parameters were set as follows: the temperature was 25℃and the rotational speed was 800rpm for 5 minutes. After the end, the supernatant in the EP tube was removed using a micropipette, 200 μl of 75% ethanol was added and placed in a sample thermostatic mixer, and the instrument parameters were set as follows: the temperature was 25℃and the rotational speed was 800rpm for 5 minutes.
5. Acid hydrolysis: the supernatant in the EP tube was removed using a micropipette, dewaxed hydrated sample tissue was transferred to new PCT microtubes with forceps, 150 μl of 0.1% formic acid was added to each microtube, capped and placed in a sample thermostats mixer with instrument parameters set to: the temperature was 30℃and the rotational speed was 600rpm for 30 minutes.
6. Alkaline hydrolysis: the supernatant from PCT microtubes was removed using a micropipette, and the microtubes were washed once with 150 μl of 0.1M Tris-HCl (ph=10) to remove residual formic acid (this was to ensure a pH of 10 for each tissue in the microenvironment). After washing, the supernatant was removed using a micropipette (carefully handled, ensuring that no small tissue fragments were removed), 15 μl of 0.1M Tris-HCl (ph=10) was added to the PCT microtube, the microtube was capped into a new 2ml ep tube, placed in a sample thermostats mixer, and the instrument parameters were set to: the temperature was 95℃and the rotational speed was 600rpm for 30 minutes. Immediately after the end, PCT microtubes were placed on ice to rapidly cool the samples to 4 ℃.
4. PCT-assisted FFPE tissue lysis and protease cascade digestion for extracting polypeptides
1. Preparing a lysate and a buffer solution:
the following reagents were prepared: 6M urea (Sigma); 2M thiourea (amerco); 5mM Na 2 EDTA, 100mM ammonium bicarbonate (Shanghai Titan Scientific, pH 8.5); 0.1M hydrogen triphosphate (2-carboxyethyl) (TCEP), 100mM ammonium bicarbonate (Ammonium bicarbonate, ABB); 800mM Iodoacetamide (IAA), 100mM ABB;10% trifluoroacetic acid (TFA).
Preparing a lysis buffer solution: 5mM Na 2 EDTA was dissolved in 100mM ammonium bicarbonate (ph=8), 0.1M TCEP was dissolved in 100mM ABB,800mM Iodoacetamide (IAA) was dissolved in 100mM ABB (wrapped in aluminum foil paper protected from light). The lysis buffer can be prepared according to the total sample amount at one time, and then is packaged and stored in a refrigerator at the temperature of minus 20 ℃ for standby. TCEP, IAA and ABB should be freshly prepared.
Pct-assisted FFPE tissue lysis: to the tissue containing PCT microtubes (15. Mu.l of existing FFPE containing tissue and supernatant) 25. Mu.l of lysis buffer (including 6M urea, 2M thiourea and 100mM ammonium bicarbonate) was added to a final volume of 40. Mu.l and each PCT microtube was then capped with a micropocket A further PCT assisted tissue lysis of the sample was performed using Barocycler NEP2320 45K A PCT protocol for tissue lysis was set, each cycle comprising shaking at 45kpsi high pressure for 30s and at normal atmospheric pressure for 10s, shaking at 30℃for a total of 90 cycles (to ensure normal operation of the air circulation pump).
3. Protein reduction and alkylation: the micro pestle cover of PCT microtubes was removed, 10mM hydrogen triphosphate (2-carboxyethyl) (TCEP) and 20mM Iodoacetamide (IAA) were added to the tissue lysates, and the mixture was placed in a sample thermostatic mixer and incubated for 30 minutes at room temperature under slow vortex (600 rpm) in the absence of light.
4. Lysyl endonuclease (Lys-C) digestion: a mass-grade lysine lysyl endonuclease (Lys-C) was added to the PCT microtubule to digest the protein at an enzyme to substrate ratio of 1:40. The solution volume of PCT microtubes was then made up to 100 μl using 0.1M ammonium bicarbonate, and the samples were further subjected to PCT-assisted protein digestion using Barocycler NEP2320 45K after capping the PCT microtubes. The PCT protocol used was: each cycle included 50s at 20kpsi high pressure and 10s at normal atmospheric pressure, 45 cycles at 30 c (to ensure that the air circulation pump is working properly).
5. Trypsin (Trypsin) digestion: protein was digested by addition of a mass spectrum grade Trypsin (Trypsin) to PCT microtubules at an enzyme to substrate ratio of 1:20. The solution volume of PCT microtubes was then made up to 150 μl using 0.1M ammonium bicarbonate, and urea concentration was further diluted to 1M and then trypsinized. After capping the PCT microtubes, the samples were further subjected to PCT-assisted protein digestion using Barocycler NEP2320 45K. The PCT protocol used was: each cycle included 50s at 20kpsi high pressure and 10s at normal atmospheric pressure, 90 cycles at 30 c (to ensure that the air circulation pump works properly).
6. Terminating digestion: the polypeptide digest solution in PCT microtubes was transferred to a new 2ml EP tube (avoiding the migration of tissue fragments) using a micropipette, and the digestion was stopped by acidification of the polypeptide digest to pH 2-3 with the addition of 15 μl of 10% trifluoroacetic acid (TFA), stored at 4 ℃ or-20 ℃ for further experiments.
5. Desalting and solid-phase extraction of sample polypeptides
1. Reagents and materials:
buffers and reagents include: methanol; 80% of ammonium chloride and 0.1% of perfluoro acetic acid; 2% ammonium chloride, 0.1% perfluoroacetic acid; 0.1% perfluoroacetic acid (for sample redissolution). The pH value of the sample is checked to be acidic (pH 2-3), if the pH value of the sample is more than 3, a proper amount of 0.1 percent of perfluorinated acetic acid is added to adjust the pH value.
The capacity of the C18 solid phase extraction column is 70-160 μg (Nest Group Inc.). The capacity of the C18 solid phase extraction column is dependent on the amount of polypeptide, and can retain non-polar solutes, including polypeptides, proteins and detergents, while salts and polar materials (e.g., DNA) will not be retained, allowing removal of salts and SDS from the MS sample. The use of 1% TFA (trifluoroacetic acid) will increase the binding of the polypeptide to the protein.
Activation of c18 solid phase extraction column: placing the C18 solid phase extraction column into a new EP pipe, placing the EP pipe into a centrifuge, adding 200 μl of methanol into the C18 solid phase extraction column, starting up and centrifuging, and setting parameters: 800rpm,30s, repeated 3 times.
Equilibrium of c18 solid phase extraction column: the waste liquid in the EP tube was removed, 200. Mu.l of buffer containing 80% acetonitrile (ACN, HPLC grade) and 0.2% TFA was added to the C18 solid phase extraction column, and the column was started up and centrifuged, and the parameters were set: 800rpm,30s, repeated 3 times.
Cleaning of C18 solid phase extraction column: the waste liquid in the EP tube was removed, 200. Mu.l of buffer containing 2% ACN and 0.2% TFA was added to the C18 solid phase extraction column, and the column was started up and centrifuged, and the parameters were set: 800rpm,30s, repeated 3 times.
5. Filling sample polypeptide: the C18 solid phase extraction column was placed in a fresh EP tube, and the EP tube was placed in a centrifuge, and 160. Mu.l of sample polypeptide solution was added to the C18 solid phase extraction column. And (3) starting up and centrifuging, and setting parameters: 1000rpm,1min.
6. Desalting: the waste liquid in the EP tube was removed, 200. Mu.l of buffer containing 2% ACN and 0.2% TFA was added to the C18 solid phase extraction column, and the column was started up and centrifuged, and the parameters were set: 1000rpm,30s, 10 replicates (one purge of waste liquid from the EP tube per wash).
7. Sample polypeptide elution: the C18 solid phase extraction column is put into a new 2ml EP tube with a sample number label, 150 μl of buffer containing 80% CAN and 0.1% TFA is added into the C18 solid phase extraction column, and then the column is started up for centrifugation, and parameters are set: 1400rpm,30s, and eluted twice.
8. Concentrating and drying sample polypeptide: the C18 solid phase extraction column was removed and the EP tube collecting the eluate was placed in a Eppendorf Concentrator plus vacuum centrifugal concentrator to dry the sample. The dried sample polypeptides were stored at-80℃prior to MS analysis.
9. Sample polypeptide redissolution detection concentration: the dried sample polypeptide was dissolved in 15. Mu.l of HPLC grade H containing 0.1% formic acid and 2% acetonitrile 2 0. To facilitate solubilization of the polypeptide, the polypeptide solution was sonicated in an ice water bath for 5 minutes. Protein a280 (1ab=1 mg/ml) was selected to measure the concentration of each sample polypeptide using a NanoDrop1000 spectrophotometer. According to the detection result, HPLC grade H containing 0.1% formic acid and 2% acetonitrile was used 2 0 the concentration of each sample polypeptide was adjusted to 0.5. Mu.g/. Mu.l for mass spectrometry detection.
6. Construction of ccRCC protein Mass Spectrometry library (IDA database)
IDA library sample preparation
In order to establish a spectral library for analyzing DIA (data-independant acquisition, data independent collection) files in renal clear cell carcinoma and normal renal tissue samples, 20 tumor tissue and paracancerous normal tissue samples of men and women patients at different ages are respectively extracted from 6 queues (including all Furhan graded and TNM graded samples), library-building sample polypeptides are extracted by PCT assisted cleavage and protease cascade digestion according to the method, then 20 tumor sample polypeptides are pooled together, and 20 paracancerous normal tissue sample polypeptides are pooled together, and after C18 column solid phase extraction, desalination, concentration and drying, the polypeptides pass through Ultimate Nano 3000LC and Xbridge polypeptide BEH C18 column 5 μm by 4.6 μm by 250 μm, w.t.m., ma). Peptides were separated from 5-35% acn liquid chromatography gradient in 10mM ammonia (ph=10.0) at a flow rate of 1mL/min for 60min, the polypeptides were divided into 60 fractions, concentrated, re-dissolved with HPLC grade water containing 0.1% formic acid and 2% acetonitrile, combined into 15 fractions (1, 16, 31, 46;2, 17, 32, 47; …), each 15 polypeptide mother liquor concentration of tumor tissue and paracancerous normal tissue (NAT) was adjusted to 0.5 μg/μl, 13.5 μl was pipetted into a mass spectrometry flask, and 1.5 μl iRT (final concentration 10%, v/v) was added for RT calibration in the polypeptide solution.
IDA library sample MS acquisition
30 library sample polypeptide separations were performed using the UltiMate Nano 3000 liquid chromatography system (DIONEX Ultimate 3000RSLC Nano system). Mu.l of the enzymatically digested sample polypeptide (0.5. Mu.g/. Mu.l) was injected into a C18 chromatographic column (1.9. Mu.m, c18 15 cm. Times.75 μm ID) and then the polypeptide was purified by a 3% -28% linear liquid chromatography gradient (buffer A: MS (mass spectrometry) grade water containing 2% ACN and 0.1% FA; buffer B: MS grade water containing 98% ACN and 0.1% FA) was used for elution separation at a polypeptide flow rate of 300nL/min for 90min (148 min object-to-object). The polypeptide eluted from the column was ionized to QExactive at +1.9kV voltage TM In HF Mass spectrometer (Q exact) TM HF combined quadrupole Orbitrap), parameter set to AGC target 3E6 charge, ion funnel RF set to 40, and complete MS measured in orbit rap at resolution 60000 (m/z 200). With an AGC target of 1E5 charge, maximum 80MS, dynamic rejection time of 30s, MS/MS spectra were obtained for an orbit detector with a resolution of 30000 (m/z 200).
3. Construction of ccRCC protein Mass Spectrometry library
All acquired original DDA (data-dependant acquisition, data dependent acquisition) files were converted from "ProteoWizard" to "mzXML" format files using mscon, and the parameters were set to '-mzXML-filter "peak true [1,2]"'. Finally, the DDA files are analyzed by using the pFind version 3.1.3 and SwissProt fasta files, and 93 IDA (information-dependent acquisition, information dependent acquisition) data files are finally obtained. A ccRCC protein mass spectrum library was constructed using MaxQuant 1.6.7 from search results generated from 93 IDA data files, which contained 59731 unique polypeptide sequences, 6285 proteomes and 6383 protein types.
7. Mass spectrum acquisition of samples
To further eliminate the differences in proteomics methods, we used the Orbitrap-based DIA MS strategy for sample collection on samples of Shanghai long-sea hospitals (ccRCC-CH-1 and ccRCC-CH-2), and the remaining 4 queues (ccRCC-YJS, ccRCC-DF, ccRCC-GL, ccRCC-TZ) were collected using the TOF-based pulse scan (pulseswat).
Dia MS acquisition
Polypeptide isolation used UltiMate Nano 3000 liquid chromatography (DIONEX Ultimate 3000RSLC Nano system). Mu.l of the enzymatically digested sample polypeptide (0.5. Mu.g/. Mu.l) was injected into a C18 chromatographic column (1.9. Mu.m,c18 15 cm. Times.75 μm ID) by a 3% -28% linear liquid chromatography gradient (buffer A: MS grade water containing 2% ACN and 0.1% FA; buffer B: MS grade water with 98% ACN and 0.1% FA) was used to isolate the polypeptide at a flow rate of 300nL/min for 45min (68 min object-to-object). The polypeptide eluted from the column was ionized to QExactive at a voltage of 2.0kV TM In HF Mass spectrometer (Q exact) TM HF combined quadrupole Orbitrap), parameters set to AGC target value 3E6 charge, maximum injection time 100MS, full MS scanning of 400-1200m/z range in track RAP with resolution 60000 (m/z 200), total acquisition of 24 overlapping windows of MS/MS scanning, resolution 30000 (m/z 200) for each scan, AGC target value 1E6 charge, normalized collision energy 27%, default charge state set to 2, maximum injection time set to auto. The 24 MS/MS scan (isolation window center) periods are (m/z): 410. 430, 450, 470, 490, 510, 530, 550, 570, 590, 610, 630, 650, 670, 690, 710, 730, 770, 790, 820, 860, 910, 970. The whole MS and MS/MS scan acquisition cycle was approximately 3 seconds and repeated during LC/MS analysis. The DIA original file was analyzed using openswitch v 2.0.
SWATH MS acquisition
The SWATH (sequential window acquisition of all theoretical spectra) MS acquisition method adopted in the embodiment is a novel three-Pulse SWATH MS acquisition method, which is to scan (400-1200M/z, 24 windows) a common SWATH-M which covers the whole mass range after 1 MS injection sample injectionThe S analysis method is optimized, polypeptide precursors are separated into a plurality of windows in a 3-time m/z range in a gas phase, and then the windows are distributed into 3 MS injections in sequence in a pulse mode (20min_100windows_Pulse SWATH_3ul_3part), and finally 100 MS/MS scanning windows are obtained. Mass spectrometry was performed using a EkspertTM Nano LC system (Eksigent, dublin, calif., USA) in tandem with a triple TOF 6600 system (SCIEX, calif., USA) by first injecting 3. Mu.l of the digested sample polypeptide (0.5. Mu.g/. Mu.l) into a TRAP column (5 μm, chromaXP C18CL,10X 0.3 mm), enrichment and washing were carried out at a flow rate of 10. Mu.l/min for 3min. The polypeptide was then passed through a column (3 μm, chromXP C18CL,) with a linear liquid chromatography gradient of 3% -28% (buffer a: MS grade water with 2% acn and 0.1% fa; buffer B: MS grade water with 98% acn and 0.1% fa) at a flow rate of 5 μl/min>150X 0.3 mm) for 20min. And in the positive ion mode, a full scanning mode is adopted, and the nano-upgrading electron spray voltage is 1.3kv. The SWATH method involves a 75MS-TOF-MS scan in the range 400-1200Da (m/z) followed by a MS/MS scan of all precursors in the range 100-1500Da (m/z) in a cyclic manner. The cumulative time was set to 25ms per isolation window with a total cycle time of 2.7s. The beta-galactosidase digestive enzyme (beta-gal) (SCIEX, 4465867) for mass and RT calibration was injected once after every four samples tested, and the instrument monitored the target ion of the polypeptide precursor in the beta-gal digestion mixture in a high sensitivity mode (m/z= 729.4). LC Quality Control (QC), MS sensitivity and mass calibration were performed using the m/z values of the target precursor and fragment ions.
8. Raw data analysis
Sample data processing for DIA MS acquisition
The DIA original file was converted to mzXML format using mscon vert and parsed using openswitch (2.0). The identification of sample proteins was performed using DIA pan-human protein mass spectrometry library (DPHL, total 396245 polypeptides and 14786 proteomes). The extraction window retention time was set at 600 seconds, the RT extraction window was 150s, the m/z extraction tolerance was 0.03Da, and then the RT was calibrated with the iRT polypeptide. The polypeptide precursor matrix extracted from the same sample was identified using the R program (https:// gitsub.com/Allen 188/DIATools) in combination with OpenSWATH (version: 1.3.0) and pypropet (version: 2.1.3) software tools (FDR <0.01, CV < 10%). Finally, the mrkvw file containing the sample protein information is loaded into a software marker view (Version 1.1.1, AB Sciex) for relative quantitative analysis of the proteins, and protein matrices of each queue are derived.
Sample data processing for SWATH MS acquisition
Since the SWATH MS acquisition of the present embodiment uses a three Pulse SWATH MS acquisition method, 3 SWATH wiff raw files (part 1, part2, part 3) are generated for each sample. The SWATH wiff original file is first converted to an mzXML file using proteomwizard Msconvert. Analysis was then performed using openswitch (2.0), and the input files included mzXML files, the ccRCC spectral library's TraML files, and the iRT polypeptide TraML files. Wherein, the ccRCC protein mass spectrum library constructed in this example was used for identification of sample proteins. To analyze Pulse SWATH mass spectral data generated by discrete m/z windows, this example developed a script that recognizes conserved high abundance polypeptides with CiRT for retention time calibration. In the three Pulse SWATH MS acquisition method, the three CiRT TraML format is analyzed using OpenSWATH, and then the three matrices obtained from OpenSWATH are combined for final polypeptide and protein identification. Briefly, the polypeptide length is set in the range of 7-30, the precursor m/z is set in the range of 400-1200, and the product ion m/z is set in the range of 100-1500. The retention time extraction window was set to 120s and the tolerance of the m/z extraction was set to 0.05da. The retention time was then calibrated using the Conserved IRT (CiRT) polypeptide as described previously, with the other settings all retained as default settings. Full recognition rate and False Discovery Rate (FDR) analysis was selected and FDR was set to < 1%. Comparing the high-credibility polypeptide characteristics of different samples by using a recognition confidence conversion algorithm (TRIC), and setting parameters: max_rt_diff=30, method=global_best_overlap, nr_high_conf_exp=2, target_fdr=0.001, use_score_filter=1. The unlabeled protein level was quantitatively analyzed using the protein analysis function in the R software package amfq. Parameter settings, peptide_method= "top", peptide_topx=3, transmission_topx=5, contents_proteins=false, contents_peptides=false, and contents_transmissions=false. The four queue protein matrices ccRCC-YJS, ccRCC-DF, ccRCC-GL and ccRCC-TZ are derived from the ccRCC protein mass spectrum library (named ccRCC20190917 IDA93_decoy) constructed in this example. As with the DIA MS data processing method described above, the R program was used in combination with OpenSWATH (version 1.3.0) and prophet (version 2.1.3) to identify the polypeptide precursor matrices extracted from the same sample (FDR <0.01, CV < 10%). Finally, the mrkvw file containing the sample protein information is loaded into a software marker view (Version 1.1.1, AB Sciex) for relative quantitative analysis of the proteins, and protein matrices of each queue are derived.
3. Proteomic data quality control
In the identification result of the unlabeled quantitative proteomics data of the 6-queue samples, 8611 proteins are identified by the ccRCC-CH-1 queue, 8762 proteins are identified by the ccRCC-CH-2 queue, 4996 proteins are identified by the ccRCC-YJS queue, 5007 proteins are identified by the ccRCC-DF queue, 4992 proteins are identified by the ccRCC-GL queue, and 4829 proteins are identified by the ccRCC-TZ queue. The protein intersection of the two queues analyzed by the DIA MS method (ccRCC-CH-1 and ccRCC-CH-2) was 8371, overlapping by up to 96.4% (see panel A in FIG. 2). The four cohorts of intersection proteins identified using the SWATH MS method had 4739 overlapping up to 95.6% (see panel B in FIG. 2). The intersection proteins identified by the six cohorts together were 4633 (see panel C in fig. 2). These results indicate that the overlap of the data is good.
The data quality is assessed by repeating samples through analytical techniques. Of the 1862 samples from 755 patients, 222 had two technical replicates, 71 had three technical replicates, and 7 had four technical replicates. The R language was used to calculate the spline correlation coefficient (see fig. 3) for the replicate samples, with an average correlation coefficient of 0.866 and a correlation coefficient greater than 82.1% of 0.8, indicating that the present project yields better reproducibility of mass spectral data.
4. Proteomic data analysis
1862 samples were processed as follows: (a) For repeated samples, samples with few blank values are reserved, and 1477 samples without repetition are remained; (b) Deleting samples with less identified proteins, ensuring that each sample identifies at least 1763 proteins, and screening 1428 samples; (c) The unpaired sample was deleted, 1352 samples of the remaining cancer tissue and paracancerous normal tissue were completely paired.
The proteins in the 1352 samples were treated as follows: (a) retaining at least 20% of the protein expressed in the sample; (b) Using a minimum value of protein expression as a filling value of the blank value; (c) Log of data 2 Conversion; (d) Limma packets in the R language were used to remove the batch effect for the 6 queues. Table 2 shows the number of samples remaining after data processing.
Table 2 number of samples after data processing
Hospital Queue encoding Number of cancer samples Number of normal samples Mass spectrometry method
Shanghai Changhai Hospital CCRCC-CH-1 203 203 DIA
Shanghai Changhai Hospital CCRCC-CH-2 270 270 DIA
Yi Angeles mountain hospital CCRCC-YJS 86 86 PulseSWATH
Oriental liver and gall hospital CCRCC-DF 68 68 PulseSWATH
Public hospital CCRCC-GL 30 30 PulseSWATH
Taizhou hospital CCRCC-TZ 19 19 PulseSWATH
EXAMPLE 2 construction of a model for aiding diagnosis of renal clear cell carcinoma
1. Feature screening
1. Collection of secreted proteins
Entering the Uniprot database (https:// www.uniprot.org) page, selecting "Advanced" in the search box. At the jumped page, select Subcellular location term inputs "encrypted, extracellular space"; selecting Key [ KW ] to input 'secret KW-0964'; selection of Organism [ OS ] inputs "Homo sapiens (Human/Man) [9606 ]. The search results contained 729 secreted proteins. The blood plasma and extracellular vesicle proteins provided by both the neXtProt and human protein profile (Human Protein Atlas, HPA) databases were collected, totaling 5092. 221 proteins that intersect with each other were used as secreted proteins expressed in plasma and extracellular vesicles.
2. Preliminary dimension reduction of variable
Two cohort data, ccRCC-CH-1 and ccRCC-CH-2, from Shanghai, long-sea hospitals, were selected, containing 473 cancer samples and 473 normal control samples, protein expression data involving 4813 protein variables, required preliminary dimensionality reduction of the variables, as follows: (a) Screening the intersection of 4813 variables and 221 secreted proteins, deleting variables not in the intersection, and leaving 104 proteins; (b) Based on the caret package of R language, the independent variable with extremely small variance is deleted by using a nearest zero Var function, the variable with strong correlation with other independent variables is deleted by using a findcorration function, the variable with multiple collinearity problems in the independent variable is deleted by using a findcorCombos function, and 102 proteins are left.
3. Feature selection
Based on the caret package of R language, selecting random forest, i.e. rfFuncs model, for independent variable ordering, using a 10fold Cross-Validated sampling method, and finally using rfe command for feature selection. The results showed that the highest prediction accuracy was 0.9524 when the Top8 argument was selected. These 8 independent variables were significantly different between cancer and normal controls (see table 3).
Table 3 8 differential expression of independent variables
2. Construction of kidney transparent cell carcinoma auxiliary diagnosis model
GBM (Gradient Boosting Machine) algorithm is one of the ensemble learning. The GBM algorithm calculates the pseudo-residual from the initial model, and then builds a base learner to interpret the pseudo-residual, which reduces the residual in the gradient direction. And multiplying the weight coefficient by the base learner and linearly combining the original model to form a new model. Thus, repeated iterations can find a model that minimizes the expectations of the loss function.
Model training uses the trainControl function for parameter setting, method= "repetition cv", number=10, repetition=3. Method represents a sampling Method of multiple cross checks, number represents the number of divided weights, and repetition determines the number of iterations. Training is carried out by utilizing a train function, and a modeling method is gbm algorithm. Training results showed that the accuracy was highest when the interaction. Depth was taken to be 3 and n. Trees was taken to be 50 (see FIG. 4). The savesd function is used to preserve a model, which is an assisted diagnosis model for renal clear cell carcinoma.
Example 3 evaluation of an auxiliary diagnostic model for renal clear cell carcinoma
The effect of the kidney clear cell carcinoma auxiliary diagnostic model constructed in example 2 was verified using 6 sets of data, 4 of which are the data generated in example 1: yi Angeles mountain Hospital CCRCC-YJS, oriental liver and gall Hospital CCRCC-DF, public Hospital CCRCC-GL, and Taizhou Hospital CCRCC-TZ; the 2 sets of data are the data in published literature: ccRCC proteome data published in 2019 on Cell (Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma, clark DJ, dhanasekaran SM, petrilia F, et al, cell,2019, 179 (4): 964-983) and 2022 on NC (A proteogenomic analysis of clear Cell renal Cell carcinoma in a Chinese population, qu Y, feng J, wu X, et al, nature Communications,2022, 13 (1): 2052). 8 independent variable protein expression values of 6 sets of data are extracted as inputs of a model, and the prediction results are shown in table 4, wherein the average accuracy is 0.8906, the average sensitivity is 0.8748, and the average specificity is 0.9064. Fig. 5 is a graph showing the predicted AUC values for renal clear cell carcinoma using this model, with a minimum AUC value of 0.8774, a maximum AUC value of 0.9929, and a mean AUC value of 0.9292. These results indicate that the model is very predictive.
TABLE 4 evaluation of cell carcinoma auxiliary diagnostic models
EXAMPLE 4 construction of a model for aiding diagnosis of T1a stage renal clear cell carcinoma
T represents the condition of the primary tumor, and is classified according to the volume of the tumor and the affected range of adjacent tissues, the T1-stage tumor refers to a localized tumor with the diameter of less than or equal to 7cm, the T1 a-stage tumor refers to a tumor with the maximum diameter of less than or equal to 4cm, and the T1 a-stage renal cancer usually belongs to early-stage cancers. Clinically, the T1a patient usually has no symptoms, most patients can find the disease through B-ultrasonic examination and CT examination, the patients can cut off tumors under laparoscope, the kidney function is protected as much as possible, more kidney units are reserved, the life quality is improved, the prognosis of the patients is better, and the 5-year survival rate can reach more than 95%, so that the disease can be found as early as possible, and the development of treatment is particularly important. In this embodiment, the auxiliary diagnosis model of the renal clear cell carcinoma in the T1a stage is established to distinguish the patient sample with the tumor in the T1a stage from the normal control sample.
First, samples of the T1a phase were selected according to the patient's stage (see table 5). The training dataset contained two queues of ccRCC-CH-1 and ccRCC-CH-2 of the Shanghai long-sea hospital, containing 242 cancer samples and 242 normal control samples. The collection of secreted proteins, and the preliminary dimension reduction of variables were the same as in example 2.
TABLE 5 number of patient samples at T1a stage
Hospital Encoding Number of T1a cancer samples Number of normal paired samples
Shanghai Changhai Hospital CCRCC-CH-1 105 105
Shanghai Changhai Hospital CCRCC-CH-2 137 137
Yi Angeles mountain hospital CCRCC-YJS 59 59
Oriental liver and gall hospital CCRCC-DF 39 39
Public hospital CCRCC-GL 16 16
Based on the caret package of R language, selecting random forest, i.e. rfFuncs model, for independent variable ordering, using a 10fold Cross-Validated sampling method, and finally using rfe command for feature selection. The results showed 0.9339 prediction accuracy when 6 independent variables were selected. These 6 independent variables were significantly different between cancer and normal controls (see table 6).
Table 6 6 differential expression of independent variables
Model training uses the trainControl function for parameter setting, method= "repetition cv", number=10, repetition=3. Training is carried out by utilizing a train function, and a modeling method is gbm algorithm. Training results showed that the accuracy was highest when the interaction. Depth was taken to be 1 and n. Trees was taken to be 50 (see FIG. 6). The saverrds function is used to preserve a model, which is an auxiliary diagnostic model of T1a stage renal clear cell carcinoma.
Example 5 evaluation of stage 1T 1a renal clear cell carcinoma auxiliary diagnostic model
The effect of the T1a stage renal clear cell carcinoma auxiliary diagnostic model of example 4 was verified using 3 sets of data. 3 sets of data are the data generated in this example 1: yi Angeles mountain Hospital CCRCC-YJS, eastern hepatobiliary Hospital CCRCC-DF, public Hospital CCRCC-GL. The 6 independent variable protein expression values of 3 sets of data were extracted as inputs to the model, and the predicted results are shown in table 7 with average accuracy of 0.8365, average sensitivity of 0.8346, and average specificity of 0.8384. Fig. 7 is a graph showing the prediction of AUC values for T1a stage renal clear cell carcinoma using this model, with a mean AUC value of 0.9178. These results indicate that the model is very predictive.
TABLE 7 evaluation of auxiliary diagnostic model for renal clear cell carcinoma at T1a stage
Example 6 construction of a renal clear cell carcinoma Assistant diagnostic model Using up-regulated proteins in cancer
The preliminary dimension reduction method of the secreted protein and the variables was the same as that of example 2. Of 102 proteins obtained after preliminary dimensionality reduction by the variable, proteins whose expression was up-regulated in cancer were selected to obtain 62 proteins. Based on the caret package of R language, selecting random forest, i.e. rfFuncs model, for independent variable ordering, using a 10fold Cross-Validated sampling method, and finally using rfe command for feature selection. The results showed 0.9334 prediction accuracy when 6 independent variables were selected. These 6 independent variables were significantly different between cancer and normal controls (see table 8).
Table 8 6 differential expression of independent variables
Model training uses the trainControl function for parameter setting, method= "repetition cv", number=10, repetition=3. Training is carried out by utilizing a train function, and a modeling method is gbm algorithm. Training results showed that the accuracy was highest when the interaction. Depth was taken to be 1 and n. Trees was taken to be 50 (see FIG. 8). The saveRDS function was used to preserve a model, which is an assisted diagnosis model of renal clear cell carcinoma based on up-regulated proteins.
Example 7 evaluation of an auxiliary diagnostic model of renal clear cell carcinoma constructed using an up-regulating protein in cancer
The effect of the kidney clear cell carcinoma auxiliary diagnosis model constructed in example 6 was verified using 6 sets of data in example 3. 6 independent variable protein expression values of 6 sets of data are extracted as inputs to the model, and the predicted results are shown in Table 9, with average accuracy of 0.8237, average sensitivity of 0.8765, and average specificity of 0.7738. Fig. 9 is a graph showing the predicted AUC values for renal clear cell carcinoma using this model, with a minimum AUC value of 0.8428, a maximum AUC value of 0.9876, and a mean AUC value of 0.9152. These results indicate that the model is very predictive.
TABLE 9 evaluation of renal clear cell carcinoma auxiliary diagnostic models constructed using up-regulated proteins in cancer
The foregoing embodiments are merely examples of possible or preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and therefore, all equivalent changes and modifications that are consistent with the scope of the present invention shall fall within the scope of the present invention.

Claims (16)

1. A renal clear cell carcinoma protein marker combination comprising the following 8 secreted proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN, COL4A2 and SPON1.
A t1a stage renal clear cell carcinoma protein marker combination comprising the following 6 secreted proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN.
3. A renal clear cell carcinoma protein marker combination comprising the following 6 secreted proteins: PLOD3, VWF, SPARC, LGALS1, ANXA2, TGM2.
4. A kit for diagnosing renal clear cell carcinoma, comprising a detection reagent comprising a marker combination according to claim 1 or 3.
5. A method for screening a combination of renal clear cell carcinoma protein markers, comprising the steps of:
1) Acquiring renal clear cell carcinoma proteomics data;
2) Collecting secreted proteins;
3) Preliminary dimension reduction of protein variables;
4) Selecting a protein having a significant difference between a cancer sample and a normal control sample as a renal clear cell carcinoma protein marker;
the protein markers determined in step 4) constitute any one of the following protein marker combinations:
protein marker combinations comprising the following 8 secreted proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN, COL4A2 and SPON1;
(ii) a protein marker combination comprising the following 6 secreted proteins: PLOD3, VWF, SPARC, GGH, SOD3, DCN;
(iii) a protein marker combination comprising the following 6 secreted proteins: PLOD3, VWF, SPARC, LGALS1, ANXA2, TGM2.
6. The method according to claim 5, wherein the step 1) of obtaining proteomic data comprises the steps of:
11 Collecting paired tumor tissue samples and paracancerous normal tissue samples of patients with renal clear cell carcinoma;
12 Precisely acquiring a tissue sample core;
13 Pre-treating the sample obtained in step 12);
14 Extracting the sample polypeptide;
15 Desalting and solid-phase extracting the sample polypeptide;
16 Constructing a ccRCC protein mass spectrum library;
17 Collecting mass spectrum of the sample, identifying the protein of the sample, and analyzing the proteomic data to obtain the proteomic data.
7. The screening method of claim 6, wherein the pretreatment comprises weighing, dewaxing, hydration, acid hydrolysis, alkaline hydrolysis, and the like in step 13).
8. The method of claim 6, wherein step 14) extracts the sample polypeptides by tissue lysis and proteolytic digestion.
9. The screening method of claim 8, wherein the tissue lysis is performed using PCT-assisted FFPE tissue lysis; the protease cascade digestion includes protein reduction and alkylation, aminoacyl endonuclease digestion, and trypsin digestion.
10. The method of screening according to claim 6, wherein in step 17), the mass spectrum of the sample is acquired using the DIA MS method.
11. The screening method of claim 6, wherein step 17) the proteomic data analysis comprises the steps of:
171 Sample screening: (a) for duplicate samples, deleting samples with more blank values; (b) deleting the identified less protein sample; (c) Deleting samples of cancer tissue and paracancerous normal tissue unpaired;
172 Protein in the sample obtained by the screening in step 171) is subjected to the following treatment: (a) retaining at least 20% of the protein expressed in the sample; (b) Using a minimum value of protein expression as a filling value of the blank value; (c) Log of data 2 Conversion; (d) removing the batch effect using the limma package in the R language.
12. The method according to claim 5, wherein the method for collecting secreted proteins in step 2) comprises the steps of: screening secreted proteins in Uniprot database, collecting proteins of plasma and extracellular vesicles provided by two databases of neXtProt and human protein profile, and taking the proteins of intersection of the two as secreted proteins expressed in the plasma and extracellular vesicles.
13. The screening method according to claim 5, wherein the preliminary dimension reduction method of step 3) comprises the steps of: screening intersection of protein variables with secreted proteins collected in step 2), deleting protein variables not in the intersection; the independent variable with extremely small variance, the variable with strong correlation with other independent variables and the variable with multiple collinearity problems are deleted.
14. A method for constructing a model for assisting diagnosis of renal clear cell carcinoma, characterized in that the model for assisting diagnosis of renal clear cell carcinoma is constructed by GBM algorithm based on the combination of renal clear cell carcinoma protein markers as defined in any one of claims 1 to 3.
15. A renal clear cell carcinoma-assisted diagnostic model constructed using the method of claim 14.
16. A kit for diagnosing stage T1a renal clear cell carcinoma, comprising a detection reagent comprising a marker combination according to claim 2.
CN202310433563.9A 2023-04-21 2023-04-21 Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method Active CN116539892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310433563.9A CN116539892B (en) 2023-04-21 2023-04-21 Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310433563.9A CN116539892B (en) 2023-04-21 2023-04-21 Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method

Publications (2)

Publication Number Publication Date
CN116539892A CN116539892A (en) 2023-08-04
CN116539892B true CN116539892B (en) 2024-03-26

Family

ID=87446200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310433563.9A Active CN116539892B (en) 2023-04-21 2023-04-21 Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method

Country Status (1)

Country Link
CN (1) CN116539892B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease
WO2020072857A1 (en) * 2018-10-04 2020-04-09 The Regents Of The Univefisity Of California Methods and compositions related to plod3
CN114164269A (en) * 2021-11-25 2022-03-11 四川大学华西医院 Potential antigen significantly related to renal clear cell carcinoma prognosis, immunophenotyping, construction method and application thereof
CN114965800A (en) * 2022-05-12 2022-08-30 山西医科大学 Renal clear cell carcinoma biomarker and application thereof in early screening

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020072857A1 (en) * 2018-10-04 2020-04-09 The Regents Of The Univefisity Of California Methods and compositions related to plod3
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease
CN114164269A (en) * 2021-11-25 2022-03-11 四川大学华西医院 Potential antigen significantly related to renal clear cell carcinoma prognosis, immunophenotyping, construction method and application thereof
CN114965800A (en) * 2022-05-12 2022-08-30 山西医科大学 Renal clear cell carcinoma biomarker and application thereof in early screening

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Bioinformatic identification of key genes and analysis of prognostic values in clear cell renal cell carcinoma;Ting Luo等;《Oncology Letters》;20180530;第16卷(第2期);第1747-1757页 *
Comprehensive Evaluation of One-Carbon Metabolism Pathway Gene Variants and Renal Cell Cancer Risk;Todd M. Gibson等;《Plos One》;20111019;第6卷(第10期);e26165 *
Identification of a Metastasis-Associated Gene Signature of Clear Cell Renal Cell Carcinoma;Suhua Gao等;《Front. Genet.》;20210204;第11卷;第1-15页 *
Quantitative Proteomics Identifies Secreted Diagnostic Biomarkers as well as Tumor-Dependent Prognostic Targets for Clear Cell Renal Cell Carcinoma;Aydanur Senturk等;《Mol. Cancer Res.》;20220801;第19卷(第8期);第1322–1337页 *
Screening and identification of key biomarkers in clear cell renal cell carcinoma based on bioinformatics analysis;Basavaraj Vastrad等;《bioRxiv》;20201223;第1-185页 *
The use of an oxidative stress scoring system in prognostic prediction for kidney renal clear cell carcinoma;Wang X.等;《Cancer Commun》;20210303;第41卷(第4期);第354-357页 *
基于TCGA数据库探究肾透明细胞癌中vWF表达及临床意义;郭殷华;《中国优秀硕士学位论文全文数据库》;20210115;第2021年卷(第01期);E067-46 *

Also Published As

Publication number Publication date
CN116539892A (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Nalbantoglu Metabolomics: basic principles and strategies
Zeki et al. Integration of GC–MS and LC–MS for untargeted metabolomics profiling
EP2845011B1 (en) Apparatus and methods for microbiological analysis
CN110057955B (en) Method for screening specific serum marker of hepatitis B
Peng et al. Peptidomic analyses: The progress in enrichment and identification of endogenous peptides
CN108680745B (en) Application method of serum lipid biomarker in early diagnosis of NSCLC
US10197576B2 (en) Mass spectrometry imaging with substance identification
CN111562338B (en) Application of transparent renal cell carcinoma metabolic marker in renal cell carcinoma early screening and diagnosis product
CN110057954B (en) Application of plasma metabolism marker in diagnosis or monitoring of HBV
CN113267586A (en) Application of purine metabolic marker in preparation of lung cancer molecular targeted drug acquired resistance screening and diagnostic reagent
CN113138249A (en) Micro-sample metabolome, proteome and phosphoproteome multi-group chemical analysis method based on micropore array chip
CN112014509A (en) Method for synchronously determining angiotensin I and aldosterone in sample
CN114002342B (en) Combined metabolic marker and detection kit for judging gene modification effect of pathogenic bacteria of visceral ichthyophthiriasis of Epinephelus coioides
CN116539892B (en) Renal clear cell carcinoma protein marker and auxiliary diagnosis model construction method
CN116879558B (en) Ovarian cancer diagnosis marker, detection reagent and detection kit
CN111766325B (en) Sample pretreatment method for multiomic analysis and application thereof
US20140162903A1 (en) Metabolite Biomarkers For Forecasting The Outcome of Preoperative Chemotherapy For Breast Cancer Treatment
CN113138275A (en) Serum lipid metabolite composition, kit and application
CN113834889B (en) Pituitary stem blocking syndrome biomarker, and determination method and application thereof
CN114791459B (en) Serum metabolic marker for detecting pulmonary tuberculosis and kit thereof
Qiu et al. Searching for potential ovarian cancer biomarkers with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
CN116183922B (en) Construction method of oral squamous cell carcinoma diagnosis model, marker and application thereof
CN112067807B (en) Method for screening prognosis related protein from serum of liver cancer patient and application thereof
CN117079710B (en) Biomarkers and their use in predicting and/or diagnosing UTUC muscle infiltrates
CN113884583B (en) Novel method for obtaining high-credibility phosphorylation site occupancy rate in paired samples on large scale

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant