CN116735889A - Protein marker for early colorectal cancer screening, kit and application - Google Patents

Protein marker for early colorectal cancer screening, kit and application Download PDF

Info

Publication number
CN116735889A
CN116735889A CN202310049892.3A CN202310049892A CN116735889A CN 116735889 A CN116735889 A CN 116735889A CN 202310049892 A CN202310049892 A CN 202310049892A CN 116735889 A CN116735889 A CN 116735889A
Authority
CN
China
Prior art keywords
colorectal cancer
protein
protein marker
marker combination
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310049892.3A
Other languages
Chinese (zh)
Other versions
CN116735889B (en
Inventor
廖鲁剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Original Assignee
Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Durbrain Medical Inspection Laboratory Co ltd filed Critical Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Priority to CN202310049892.3A priority Critical patent/CN116735889B/en
Priority claimed from CN202310049892.3A external-priority patent/CN116735889B/en
Publication of CN116735889A publication Critical patent/CN116735889A/en
Application granted granted Critical
Publication of CN116735889B publication Critical patent/CN116735889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/531Production of immunochemical test materials
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/06Gastro-intestinal diseases
    • G01N2800/065Bowel diseases, e.g. Crohn, ulcerative colitis, IBS

Abstract

The application discloses a protein marker combination for colorectal cancer prediction, diagnosis or prognosis, and belongs to the technical field of cancer proteomics detection. The protein marker combination includes at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1. The application also provides application and a system based on the protein marker combination. The protein marker combination of the application provides a non-invasive screening means based on plasma for the prediction of early colorectal cancer, even premalignant lesions. The method and the system of the application are used for predicting, diagnosing or prognosing colorectal cancer, have no wound on patients, convenient material acquisition, small blood plasma sample amount, high sensitivity and specificity, and most importantly fill the blank that the early colorectal cancer has no effective protein marker.

Description

Protein marker for early colorectal cancer screening, kit and application
Technical Field
The application belongs to the technical field of cancer proteomics detection, and particularly relates to a protein marker for early colorectal cancer screening, a kit and application.
Background
Colorectal cancer is one of the five major causes of cancer death worldwide. In the united states, colorectal cancer incidence rates are third and mortality rates are second. Similarly, colorectal cancer is also a highly malignant tumor that severely affects the health of the national people in China, and the morbidity and mortality rate of colorectal cancer are ranked in the top three among all malignant tumors. The main reason for the low survival rate of colorectal cancer patients is the lack of effective early diagnosis of early stage intestinal cancer. A number of clinical practices have shown that patients who have undergone surgery in the early stages of tumorigenesis (stage I or IIa) have a five-year survival rate of 90%, whereas patients who have undergone surgery in the late stages (stage III and IV) have a five-year survival rate of less than 10%. Colorectal cancer often evolves from precancerous to diffuse metastatic malignancy for 10-15 years, so making early diagnoses of cancer cells before they diffuse metastasis is of great importance to improve survival in patients.
The main means of the existing colorectal cancer screening in clinic comprise colorectal microscopy, imaging examination, fecal occult blood test, DNA detection, CEA and other protein markers detection and the like. The conventional technology is invasive or generates radiation damage, and more importantly, the sensitivity is low, so that the conventional technology is difficult to be used for early screening of large-scale risk groups, and the tolerance and the acceptance of common groups to enteroscopes are low. The only non-invasive detection means applied to clinic is the chemical and immunological detection of fecal occult blood, but the sensitivity of the detection on colorectal cancer is only 61-79% on the premise of 86-95% specificity, and the detection rate of early colorectal cancer is difficult to meet clinical requirements although the detection method is widely applied to clinic.
In recent years, liquid biopsy technology has been developed rapidly, and the problem of lower sensitivity of the traditional detection technology is solved to a certain extent. For example, the methylation products of Septin9 gene in blood plasma (Epi protocol), the detection of BMP3/NDRG4 methylation in feces in combination with KRAS gene mutation and the early colorectal cancer screening products of FIT (Cologuard) are used, and these noninvasive novel screening technologies create a new era of early diagnosis of colorectal cancer. However, there is still a great room for improvement in the sensitivity and specificity of these detection techniques. For example, the Epi protocol assay has 97.5% specificity, but only 79% sensitivity, which can lead to a large proportion of missed diagnoses. Cologard can reach a sensitivity of 95.55%, but its specificity is reduced to 87.1%. Meanwhile, the sensitivity and the specificity are improved, the detection accuracy can be better improved, and the probability of missed diagnosis and misdiagnosis is reduced as much as possible. In addition, protein markers such as CEA detection have more limited sensitivity and specificity.
In recent years, proteomics based on high-resolution mass spectrometers greatly improves detection accuracy and increases detection speed, and is gradually suitable for analyzing the proteomic expression level of large-scale clinical samples. Over the years of practice, it is widely recognized by the industry that high sensitivity and high specificity early cancer screening strategies require shifting from single protein markers to combined markers. At present, there is no early screening diagnostic kit for colorectal cancer based on protein markers in clinic.
Disclosure of Invention
In order to solve at least one of the technical problems, the application adopts the following technical scheme:
the first aspect of the present application provides a protein marker combination for colorectal cancer prediction, diagnosis or prognosis, comprising at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, CNDP1. ITIH3: heavy chain H3 of the meta alpha trypsin inhibitor, the complex can stabilize the extracellular matrix by its ability to bind hyaluronic acid. Polymorphism of this gene may be associated with increased risk of schizophrenia and major depression.
LRG1: belongs to the family of leucine-rich repeats, and plays an important role in protein-protein interactions, signal transduction, intercellular adhesion and development processes.
C9: this protein is the last component of the complement system and is involved in the formation of the Membrane Attack Complex (MAC). Membrane attack complexes play a key role in innate and adaptive immune responses.
IGFBP2: the protein can bind insulin-like growth factors I and II (IGF-I and IGF-II), can better bind IGF-I and IGF-II after being secreted into blood, and can also act with different ligands in cells. High expression of IGFBP2 may promote the growth of a variety of tumors and may allow for the prognosis of a patient.
CNDP1: the protein is one of M20 metalloprotease family members, specifically expressed in brain, and coding region of gene Contains Trinucleotide (CTG) repetitive sequence.
SERPINA1: the protein is a serine protease inhibitor, belongs to serine superfamily, and its action targets include elastase, plasmin, thrombin, trypsin, chymosin and plasminogen activator. The protein is produced by lymphocytes and monocytes in liver, bone marrow, lymphoid tissues, and pantyhose cells of the gut. It is known that the deficiency of this gene is associated with chronic obstructive pulmonary disease, emphysema and chronic liver disease.
CP: the protein is a metallic protein, can bind most of copper in plasma, and is involved in the peroxidation of iron (II) transferrin to iron (III) transferrin. This gene mutation leads to acute plasmin, iron accumulation and tissue damage, and is associated with diabetes and neurological abnormalities.
ORM1: the protein belongs to acute stage plasma protein. In the acute inflammatory response, the expression level increases. The specific function of the protein is unknown and may be involved in immunosuppression.
In some embodiments of the application, the protein marker combination comprises LRG1, further comprising at least one of SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1.
In other embodiments of the application, the protein marker combination comprises C9 and further comprises at least one of LRG1, SERPINA1, ITIH3, CP, ORM1, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises ITIH3, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises CP, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises ITIH3, CP, LRG1, C9 and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, CP, LRG1, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, ORM1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, CP, ORM1, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, ITIH3, CP, C9, and CNDP1.
In some embodiments of the application, the protein marker combination comprises LRG1, SERPINA1, ITIH3, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, ITIH3, LRG1, C9, IGFBP2, and CNDP1.
In some embodiments of the application, the protein marker combination comprises SERPINA1, ITIH3, LRG1, ORM1, C9, and CNDP1.
In the present application, by detecting the expression level of each protein in the combination of protein markers, it is possible to predict whether a subject is at risk of having colorectal cancer, i.e., can be used for colorectal cancer early screening; it is also possible to diagnose whether the subject has colorectal cancer, which may be an auxiliary diagnosis, by the clinician in combination with other clinical indicators; a prognosis of a subject with colorectal cancer after receiving treatment can also be assessed.
In a second aspect the application provides a polypeptide combination for use in the prediction, diagnosis or prognosis of colorectal cancer, said polypeptide combination comprising at least one polypeptide from each protein in any of the protein marker combinations according to the first aspect of the application.
Optionally, the polypeptide from C9 comprises the amino acid sequence shown as SEQ ID No.1 or SEQ ID No. 2.
Optionally, the polypeptide from SERPINA1 comprises the amino acid sequence shown in SEQ ID No. 3.
Optionally, the polypeptide from ITIH3 comprises the amino acid sequence shown in SEQ ID No. 4.
Optionally, the polypeptide from CP comprises the amino acid sequence shown in SEQ ID No. 5.
Optionally, the polypeptide from LRG1 comprises the amino acid sequence shown as SEQ ID No.6 or SEQ ID No. 7.
Optionally, the polypeptide from IGFBP2 comprises the amino acid sequence set forth in SEQ ID No. 8.
Optionally, the polypeptide from KNG1 comprises the amino acid sequence shown in SEQ ID No. 9.
Optionally, the polypeptide from ORM1 comprises the amino acid sequence shown in SEQ ID No. 10.
Optionally, the polypeptide from PRDX2 comprises the amino acid sequence shown in SEQ ID No. 11.
Optionally, the polypeptide from CNDP1 comprises the amino acid sequence shown in SEQ ID No. 12.
In a third aspect, the application provides the use of a reagent for detecting the expression level of a combination of protein markers according to any one of the first aspects of the application for the preparation of a kit for the prediction, diagnosis or prognosis of colorectal cancer.
In some embodiments of the application, the detection reagent detects the expression level of each protein in the protein marker combination based on mass spectrometry.
In some embodiments of the application, the level of expression of each protein in the protein marker combination is detected by detecting the level of one or more polypeptides of each protein in the protein marker combination.
Optionally, the polypeptide from C9 comprises the amino acid sequence shown as SEQ ID No.1 or SEQ ID No. 2.
Optionally, the polypeptide from SERPINA1 comprises the amino acid sequence shown in SEQ ID No. 3.
Optionally, the polypeptide from ITIH3 comprises the amino acid sequence shown in SEQ ID No. 4.
Optionally, the polypeptide from CP comprises the amino acid sequence shown in SEQ ID No. 5.
Optionally, the polypeptide from LRG1 comprises the amino acid sequence shown as SEQ ID No.6 or SEQ ID No. 7.
Optionally, the polypeptide from IGFBP2 comprises the amino acid sequence set forth in SEQ ID No. 8.
Optionally, the polypeptide from KNG1 comprises the amino acid sequence shown in SEQ ID No. 9.
Optionally, the polypeptide from ORM1 comprises the amino acid sequence shown in SEQ ID No. 10.
Optionally, the polypeptide from PRDX2 comprises the amino acid sequence shown in SEQ ID No. 11.
Optionally, the polypeptide from CNDP1 comprises the amino acid sequence shown in SEQ ID No. 12.
In a fourth aspect the application provides a kit for the prediction, diagnosis or prognosis of colorectal cancer comprising an expression level detection reagent for any one of the protein marker combinations of the first aspect of the application.
In a fifth aspect the present application provides a method for the prediction, diagnosis or prognosis of colorectal cancer comprising the steps of:
s1, obtaining expression level data of each protein in the protein marker combination according to any one of the first aspect of the application;
s2, constructing a machine learning model by using expression level data of each protein in the protein marker combination in the population sample and information of whether each sample is derived from colorectal cancer patients, and judging whether a subject has colorectal cancer or has risk of colorectal cancer or whether colorectal cancer prognosis is good or not based on the machine learning model.
In some embodiments of the application, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some preferred embodiments of the application, the machine learning model is trained using a logistic regression algorithm.
Further, a preset threshold is obtained based on the machine learning model by using the population samples, and a model measurement result of each subject sample is judged to have colorectal cancer or to have a risk of having colorectal cancer or a poor prognosis of colorectal cancer if the model measurement result is higher than the preset threshold. If not higher than the preset threshold, it is judged that the colorectal cancer does not exist or the risk of suffering from the colorectal cancer does not exist or the prognosis of the colorectal cancer is good.
In some embodiments of the application, in step S1, the blood sample of the subject is anticoagulated with EDTA to obtain plasma, the plasma protein is denatured, reduced, alkylated, digested with trypsin to obtain polypeptide fragments, desalted and evaporated to dryness, and subjected to liquid phase separation and mass spectrometry to determine the level of the protein marker combination based on the level of the polypeptide.
In some embodiments of the application, the mass spectrometry detection is performed using a triple quadrupole mass spectrometry method.
In a sixth aspect the application provides a system for colorectal cancer prediction, diagnosis or prognosis comprising the following modules:
a data input module for inputting expression level data of each protein in any of the protein marker combinations of the first aspect of the present application to a subject;
the data storage module is used for storing the expression level data of each protein in the protein marker combination in the population samples and the information of whether each sample is derived from colorectal cancer patients;
the colorectal cancer analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by utilizing the expression level data of each protein in the protein marker combination in the storage population sample stored in the data storage module and the information of whether each sample is derived from a colorectal cancer patient, and judges whether the subject has colorectal cancer or has risk of colorectal cancer or has good colorectal cancer prognosis based on the machine learning model.
In some embodiments of the application, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some embodiments of the application, the colorectal cancer analysis module further inputs the expression level data and the determination of each protein in the subject protein marker combination to the data storage module.
In some preferred embodiments of the application, the machine learning model is trained using a logistic regression algorithm.
The beneficial effects of the application are that
Compared with the prior art, the application has the following beneficial effects:
and (3) detecting a plurality of protein markers in the plasma simultaneously based on the target mass spectrum, and carrying out absolute quantification, so that the result is accurate, and the time cost of detection is saved.
The protein marker combination of the application provides a non-invasive screening means based on plasma for early colorectal cancer.
The method and the system of the application are used for predicting, diagnosing or prognosing colorectal cancer, have no wound on patients, convenient material acquisition, small blood plasma sample amount, high sensitivity and specificity, and most importantly fill the blank that the early colorectal cancer has no effective protein marker.
The protein marker combination has high accuracy in predicting early colorectal cancer, and can promote patients to further diagnose after judging positive results, so that the death rate of colorectal cancer can be effectively reduced in the crowd in long term.
The machine learning is utilized to detect the marker protein of the blood plasma, so that the purpose of dynamically monitoring the disease state of a patient can be achieved.
Drawings
FIG. 1 shows the subject working characteristics of a single protein marker LRG1 with areas under the curve (AUC) of 0.904, 0.85, 0.8 for the training set, the test set and the independent validation set, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 2 shows the subject working characteristics of a single protein marker SERPINA1 with areas under the curve (AUC) of 0.837, 0.779, 0.771 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 3 shows the subject working characteristics of a single protein marker ITIH3 with areas under the curve (AUC) of the training set, test set and independent validation set of 0.835, 0.921, 0.79, respectively, where train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 4 shows the subject working characteristics of a single protein marker CP with areas under the curves (AUC) of 0.823, 0.842, 0.624 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 5 shows the subject working characteristics of a single protein marker ORM1 with areas under the curve (AUC) of 0.818, 0.783, 0.697 for the training set, the test set and the independent validation set, respectively, wherein train represents the training set, test represents the test set and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 6 shows the subject working characteristics of a single protein marker C9 with areas under the curves (AUC) of 0.875, 0.91, 0.81 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 7 shows the subject operating characteristics of the single protein marker IGFBP2 with areas under the curve (AUC) of 0.728, 0.738, 0.737 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, and valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
FIG. 8 shows a subject working profile for 5 protein marker combinations with areas under the profile (AUC) of 0.956, 0.954, 0.893 for the training set, test set and independent validation set, respectively, where train represents the training set, test represents the test set, valid represents the independent validation set; true positive rate (sensitivity) indicates a true positive rate (sensitivity), and False postive rate (1-specificity) indicates a false positive rate (1-specificity).
Figure 9 shows a confusion matrix of 5 protein marker combinations, with 121 colorectal cancer patients and 186 healthy individuals. 1 indicates positive, and 0 indicates negative. Wherein train represents a training set, test represents a test set, and valid represents an independent verification set; truth represents reality and Prediction represents Prediction.
Detailed Description
Unless otherwise indicated, implied from the context, or common denominator in the art, all parts and percentages in the present application are based on weight and the test and characterization methods used are synchronized with the filing date of the present application. Where applicable, the disclosure of any patent, patent application, or publication referred to in this application is incorporated by reference in its entirety, and the equivalent patents to those cited in this application are incorporated by reference, particularly as if they were set forth in the relevant terms of art. If the definition of a particular term disclosed in the prior art is inconsistent with any definition provided in the present application, the definition of the term provided in the present application controls.
The numerical ranges in the present application are approximations, so that it may include the numerical values outside the range unless otherwise indicated. The numerical range includes all values from the lower value to the upper value that increase by 1 unit, provided that there is a spacing of at least 2 units between any lower value and any higher value. For ranges containing values less than 1 or containing fractions greater than 1 (e.g., 1.1,1.5, etc.), then 1 unit is suitably considered to be 0.0001,0.001,0.01, or 0.1. For a range containing units of less than 10 (e.g., 1 to 5), 1 unit is generally considered to be 0.1. These are merely specific examples of what is intended to be provided, and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure.
The terms "comprises," "comprising," "including," and their derivatives do not exclude the presence of any other component, step or process, and are not related to whether or not such other component, step or process is disclosed in the present application. For the avoidance of any doubt, all use of the terms "comprising", "including" or "having" herein, unless expressly stated otherwise, may include any additional additive, adjuvant or compound. Rather, the term "consisting essentially of … …" excludes any other component, step or process from the scope of any of the terms recited below, as those out of necessity for operability. The term "consisting of … …" does not include any components, steps or processes not specifically described or listed. The term "or" refers to the listed individual members or any combination thereof unless explicitly stated otherwise.
In order to make the technical problems, technical schemes and beneficial effects solved by the application more clear, the application is further described in detail below with reference to the embodiments.
Examples
The following examples are presented herein to demonstrate preferred embodiments of the present application. It will be appreciated by those skilled in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function in the practice of the application, and thus can be considered to constitute preferred modes for its practice. Those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit or scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the disclosure of which is incorporated herein by reference as is commonly understood by reference.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the application described herein. Such equivalents are intended to be encompassed by the claims.
The experimental methods in the following examples are conventional methods unless otherwise specified. The instruments used in the following examples are laboratory conventional instruments unless otherwise specified; the test materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.
Example 1 discovery of protein markers
The inventors collected fresh blood samples of gender and age matched 101 colorectal cancer patients and 89 healthy human controls for the discovery of protein markers.
1. Blood sample processing
After anticoagulation treatment, 1000g of fresh blood sample is centrifuged for 5min to obtain a plasma sample, and the plasma sample is stored for a long time in a refrigerator at-70 ℃.
Plasma samples were diluted 50-fold and BCA assay concentrations were determined: BSA standards were diluted in a gradient to concentration gradients of 2, 1, 0.5, 0.25, 0.125, 0.0625mg/mL and plasma concentrations were calibrated as a working curve. The diluted sample and standard substance are respectively added into a 96-well plate, a pre-prepared BCA working solution is added, and the reaction is carried out at 37 ℃ for 30min, and the concentration of plasma protein is measured under the absorbance of 562 nm.
50 μg of protein was taken and ammonium bicarbonate solution was added to a final concentration of 50mM. DTT was added to a final concentration of 10mM and heated at 95℃for 10min. After returning to room temperature, dark reaction was performed for 30min by adding IAA at a final concentration of 15 mM. 1 mug of trypsin was added to each sample, and the reaction was carried out overnight in a metal bath at 37℃for 12-14 h. The next day, formic acid with a final concentration of 1% was added to carry out the acidification treatment to terminate the cleavage reaction.
2. Differential proteins and polypeptides
The selection of targets is first based on finding differentially expressed proteins. The inventors performed mass spectrum collection by independent collection pattern (DIA) on 190 plasma samples (89 healthy people and 101 colorectal cancer patients) with symmetrical gender and age, further analyzed by DIA-NN software to obtain expression data of proteins and polypeptides, and performed normalization analysis by total protein intensity to total 714 proteins and 7988 polypeptides. For expressing proteins and polypeptides conforming to normal distribution, the inventors found differentially expressed proteins and polypeptides using T-test, and for expressing proteins and polypeptides not conforming to normal distribution, the inventors found differentially expressed proteins and polypeptides using Wilcoxon non-parametric test. Finally, the inventors have obtained 96 differentially expressed proteins, 832 differentially expressed polypeptides. Integration yields a differentially expressed polypeptide.
3. Marker protein screening
The potential polypeptides capable of distinguishing colorectal cancer and healthy people are selected by a random forest method, average Gini coefficients of the targets are calculated by the random forest, the targets are ranked according to importance, the biological functions of the proteins are further combined, and finally 10 top-ranked proteins, namely LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, CNDP1, KNG1 and PRDX2 are obtained, and corresponding polypeptide sequences are shown in table 1:
TABLE 1 polypeptide sequences of candidate proteins
Example 2 machine learning model establishment
C at an appropriate concentration for each polypeptide 13 And N 15 The labeled heavy isotope polypeptide is added to the enzyme after the enzyme digestionAnd (3) uniformly mixing the plasma samples, and then carrying out desalting and evaporating treatment by a 96-well SOLA solid-phase extraction device.
For each polypeptide, a concentration-appropriate standard curve range (9 standard curve points) is configured, and an equivalent amount of internal standard is added to each standard curve point. Mass spectrometry was performed using an AB Sciex 5500Qtrap mass spectrometer, and the polypeptides were separated using a C18 column (Phenomenex) at a set column temperature of 45 ℃ and 15 μl of standard sample was introduced. 150. Mu.L of 0.1% formic acid is added into the evaporated sample, the mixture is fully and uniformly mixed, 15. Mu.L of sample is injected for mass spectrum detection, and the conditions of liquid phase separation are shown in Table 2:
TABLE 2 conditions for separating liquid phases
Time (min) Event(s) Parameters (parameters) Flow rate (ml/min)
0.01 PumpBConc. 6 0.25
2.0 PumpBConc. 6 0.25
18.0 PumpBConc. 28 0.25
18.5 PumpBConc. 28 0.25
21.5 PumpBConc. 98 0.25
22 PumpBConc. 98 0.25
25 PumpBConc. 6 0.25
Triple quaternary rod targeted mass spectrometry was then performed and the ion pair information for multiple reaction monitoring (multiple reaction monitoring, MRM) is shown in table 3.
Table 3MRM monitoring information
After mass spectrometry, the polypeptide concentrations corresponding to the respective protein markers were quantified and used for model establishment. 190 samples were randomly selected 80% (152) as training set, the remaining 20% (38) as test set, and 10 potential protein markers were further modeled as logistic regression. The inventors found that LRG1, SERPINA1, ITIH3, CP, ORM1, C9 and IGFBP2 together had 7 single protein markers, which had very good predictive power in both training and test sets, and ROC curves thereof were shown in fig. 1 to 7, respectively.
Example 3 model verification
The inventor selects 121 colorectal cancer patients and 186 matched healthy people as verification sets to verify the model. In order to more accurately quantify the polypeptide and reduce errors caused by complicated experimental treatment, the inventor does not need to perform the operation of removing the kurtosis protein, and the pretreatment cost of the experiment can be greatly reduced. And (3) extracting protein, measuring the concentration, and then carrying out liquid phase separation and mass spectrum detection.
Example 4 modeling and validation of multiple marker combinations
The inventor further utilizes the optimal combination of the aforementioned proteins-the concentration of 5 protein markers (ITIH 3, LRG1, SERPINA1, IGFBP2, and CDNP 1) to build a logistic regression model to better discriminate colorectal cancer patients from healthy people. Specifically, logistic regression modeling used 77 colorectal cancer patients and 79 healthy people to learn the distinguishing effects of 5 protein markers. A threshold of 0.34 in the logistic regression model was set and independent verification of the model was performed using 44 colorectal cancer patients and 107 healthy persons. A threshold was set based on the model results for all 307 plasma samples, and a model measurement result for each sample was determined to be positive if above this threshold. And if the model measurement result of the sample is lower than the threshold value, judging as negative.
The ROC curves are shown in fig. 8, and the area under the curve (AUC) for the training set, the test set, and the independent validation set are 0.956, 0.954, and 0.893, respectively. The final result was 92% sensitivity, 81% specificity, 94% negative predictive value, 76% positive predictive value, as shown in FIG. 9.
In addition, the inventors also presented other 10 protein marker combinations that perform well during machine learning, and the results are shown in table 4.
Table 4 protein marker combinations
Model Training set AUC Test set AUC Independent validation set AUC
CP+LRG1+C9+IGFBP2+CNDP1 0.955 0.945 0.870
ITIH3+CP+LRG1+C9+CNDP1 0.953 0.945 0.872
SERPINA1+LRG1+C9+IGFBP2+CNDP1 0.952 0.939 0.884
SERPINA1+CP+LRG1+C9+CNDP1 0.952 0.942 0.870
LRG1+ORM1+C9+IGFBP2+CNDP1 0.947 0.935 0.891
LRG1+SERPINA1+CP+ORM1+C9+CNDP1 0.950 0.939 0.861
LRG1+SERPINA1+ITIH3+CP+C9+CNDP1 0.951 0.941 0.866
LRG1+SERPINA1+ITIH3+C9+IGFBP2+CNDP1 0.949 0.936 0.892
SERPINA1+ITIH3+LRG1+C9+IGFBP2+CNDP1 0.952 0.941 0.887
SERPINA1+ITIH3+LRG1+ORM1+C9+CNDP1 0.951 0.941 0.890
All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.

Claims (10)

1. A protein marker combination for colorectal cancer prediction, diagnosis or prognosis, characterized in that the protein marker combination comprises at least one selected from LRG1, SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2 and CNDP1.
2. The protein marker combination according to claim 1, wherein the protein marker combination comprises LRG1 and further comprises at least one of SERPINA1, ITIH3, CP, ORM1, C9, IGFBP2, and CNDP1.
3. The protein marker combination of claim 1, wherein the protein marker combination comprises C9 and further comprises at least one of LRG1, SERPINA1, ITIH3, CP, ORM1, IGFBP2, and CNDP1.
4. A combination of polypeptides for use in the prediction, diagnosis or prognosis of colorectal cancer, characterized in that the combination of polypeptides comprises at least one polypeptide from each protein in the combination of protein markers according to any one of claims 1-3.
5. Use of a reagent for detecting the expression level of a combination of protein markers according to any one of claims 1 to 3 for the preparation of a kit for the prediction, diagnosis or prognosis of colorectal cancer.
6. The use according to claim 5, wherein the detection reagent detects the expression level of each protein in the protein marker combination based on mass spectrometry.
7. A kit for colorectal cancer prediction, diagnosis or prognosis comprising an expression level detection reagent comprising a combination of protein markers comprising ITIH3, LRG1 and C9.
8. A system for colorectal cancer prediction, diagnosis or prognosis comprising the following modules:
a data input module for inputting expression level data for each protein in a subject protein marker combination comprising ITIH3, LRG1 and C9;
the data storage module is used for storing the expression level data of each protein in the protein marker combination in the population samples and the information of whether each sample is derived from colorectal cancer patients;
the colorectal cancer analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by utilizing the expression level data of each protein in the protein marker combination in the storage population sample stored in the data storage module and the information of whether each sample is derived from a colorectal cancer patient, and judges whether the subject has colorectal cancer or has risk of colorectal cancer or has good colorectal cancer prognosis based on the machine learning model.
9. The system of claim 8, wherein the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
10. The system of claim 8 or 9, wherein the colorectal cancer analysis module further inputs into the data storage module expression level data and determinations of each protein in the subject protein marker combination.
CN202310049892.3A 2023-02-01 Protein marker for early colorectal cancer screening, kit and application Active CN116735889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310049892.3A CN116735889B (en) 2023-02-01 Protein marker for early colorectal cancer screening, kit and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310049892.3A CN116735889B (en) 2023-02-01 Protein marker for early colorectal cancer screening, kit and application

Publications (2)

Publication Number Publication Date
CN116735889A true CN116735889A (en) 2023-09-12
CN116735889B CN116735889B (en) 2024-05-17

Family

ID=

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149022A1 (en) * 2009-02-20 2012-06-14 Eva I-Wei Aw Compositions and methods for diagnosis and prognosis of colorectal cancer
WO2013152989A2 (en) * 2012-04-10 2013-10-17 Eth Zurich Biomarker assay and uses thereof for diagnosis, therapy selection, and prognosis of cancer
US20170269089A1 (en) * 2014-12-11 2017-09-21 Wisconsin Alumni Research Foundation Methods for Detection and Treatment of Colorectal Cancer
CN109036571A (en) * 2014-12-08 2018-12-18 20/20基因系统股份有限公司 The method and machine learning system of a possibility that for predicting with cancer or risk
CN111584008A (en) * 2020-05-29 2020-08-25 杭州广科安德生物科技有限公司 Method for constructing mathematical model for detecting colorectal cancer in vitro and application thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149022A1 (en) * 2009-02-20 2012-06-14 Eva I-Wei Aw Compositions and methods for diagnosis and prognosis of colorectal cancer
WO2013152989A2 (en) * 2012-04-10 2013-10-17 Eth Zurich Biomarker assay and uses thereof for diagnosis, therapy selection, and prognosis of cancer
CN109036571A (en) * 2014-12-08 2018-12-18 20/20基因系统股份有限公司 The method and machine learning system of a possibility that for predicting with cancer or risk
US20170269089A1 (en) * 2014-12-11 2017-09-21 Wisconsin Alumni Research Foundation Methods for Detection and Treatment of Colorectal Cancer
CN111584008A (en) * 2020-05-29 2020-08-25 杭州广科安德生物科技有限公司 Method for constructing mathematical model for detecting colorectal cancer in vitro and application thereof

Similar Documents

Publication Publication Date Title
EP2295975B1 (en) Determining the expression status of human epidermal growth factor receptor 2 (HER2) in biological samples
EP2362942A1 (en) Biomarkers
WO2023179263A1 (en) System, model and kit for evaluating malignancy grade or probability of thyroid nodules
WO2011161186A1 (en) Method for in vitro diagnosing sepsis utilizing biomarker composed of more than two different types of endogenous biomolecules
CN113156018B (en) Method for establishing liver and gall disease diagnosis model and diagnosis system
CN115575636A (en) Biomarker for lung cancer detection and system thereof
CN115798712B (en) System for diagnosing whether person to be tested is breast cancer or not and biomarker
CN111833963A (en) cfDNA classification method, device and application
CN112748191A (en) Small molecule metabolite biomarker for diagnosing acute diseases, and screening method and application thereof
CN105624166B (en) A kind of aptamer for detecting Human Bladder Transitional Cell Carcinoma cell and its application in detection preparation is prepared
CN109971853A (en) One kind molecular marker relevant to Diagnosis of Non-Small Cell Lung and its application
CN113391072A (en) Ovarian cancer urine marker combination and application thereof
WO2011163627A2 (en) Organ specific diagnostic panels and methods for identification of organ specific panel proteins
US20160018413A1 (en) Methods of Prognosing Preeclampsia
US20070184511A1 (en) Method for Diagnosing a Person Having Sjogren's Syndrome
CN116735889B (en) Protein marker for early colorectal cancer screening, kit and application
CN115128285B (en) Kit and system for identifying and evaluating thyroid follicular tumor by protein combination
CN116735889A (en) Protein marker for early colorectal cancer screening, kit and application
US20070249000A1 (en) Method for diagnosing a person having b-cell pathologies
CN114660290A (en) Sugar chain marker for predicting postoperative recurrence of thyroid cancer and application thereof
CN110780070B (en) Plasma protein molecule for detecting cancer chemotherapy sensitivity, application and kit
CN115349091A (en) Biomarkers for endometriosis
JP2023518280A (en) Compositions for ovarian cancer assessment with improved specificity and sensitivity
JP2023514809A (en) Biomarkers for diagnosing ovarian cancer
CN107541564B (en) Molecular marked compound TCONS_00016233, kit and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Liao Lujian

Inventor after: Wang Tingting

Inventor after: Gao Fei

Inventor after: Pan Liangxuan

Inventor after: Du Xiaoyao

Inventor before: Liao Lujian

CB03 Change of inventor or designer information
GR01 Patent grant