CN116008551B - Protein marker combination for judging benign and malignant pulmonary nodule and application thereof - Google Patents

Protein marker combination for judging benign and malignant pulmonary nodule and application thereof Download PDF

Info

Publication number
CN116008551B
CN116008551B CN202310100161.7A CN202310100161A CN116008551B CN 116008551 B CN116008551 B CN 116008551B CN 202310100161 A CN202310100161 A CN 202310100161A CN 116008551 B CN116008551 B CN 116008551B
Authority
CN
China
Prior art keywords
protein
benign
malignant
protein marker
marker combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310100161.7A
Other languages
Chinese (zh)
Other versions
CN116008551A (en
Inventor
高飞
王婷婷
廖鲁剑
杜逍遥
潘良选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Original Assignee
Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Durbrain Medical Inspection Laboratory Co ltd filed Critical Hangzhou Durbrain Medical Inspection Laboratory Co ltd
Priority to CN202310100161.7A priority Critical patent/CN116008551B/en
Publication of CN116008551A publication Critical patent/CN116008551A/en
Application granted granted Critical
Publication of CN116008551B publication Critical patent/CN116008551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a protein marker combination for prediction or diagnosis of benign and malignant pulmonary nodules, and belongs to the technical field of cancer proteomics detection. The protein marker combinations include APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2. The protein marker combination provides a non-invasive screening means based on plasma for the prediction of benign and malignant pulmonary nodules. The method and the system for predicting or diagnosing benign and malignant pulmonary nodules have the advantages of no wound on a subject, convenient material taking, small blood plasma sample amount and high sensitivity, can fill the blank that no effective protein marker exists at present in judging benign and malignant pulmonary nodules, and have great clinical significance.

Description

Protein marker combination for judging benign and malignant pulmonary nodule and application thereof
Technical Field
The invention belongs to the technical field of cancer proteomics detection, and particularly relates to a protein marker combination for judging benign and malignant pulmonary nodules and application thereof.
Background
Lung cancer is one of the malignant tumors with highest mortality, wherein the five-year survival rate of non-small cell lung cancer (NSCLCs) patients accounting for more than 85% of all lung cancer patients is lower than 15%, but the survival rate of lung cancer patients which are found to be treated in time in early stage can reach more than 80%, so that early diagnosis and early treatment of lung cancer are key to improving prognosis of patients. Lung cancer occurs from pulmonary nodules, and studies have found that the incidence of nodules is between 3% and 13%, with malignant nodules greater than 10 mm in diameter up to 80% and approximately 99% benign nodules less than 6mm in diameter. Nodules ranging in diameter from 5 mm to 30 mm present a great challenge to the clinician in accurate diagnosis, surgery may lead to over-treatment and cause a series of complications, while untreated nodules may create a risk of malignancy or tumor progression. At present, clinical judgment on benign and malignant pulmonary nodules with the diameter smaller than 30 mm mainly depends on chest CT and puncture biopsy, and no noninvasive and accurate means are yet available. Therefore, it is important to develop a sensitive diagnostic method for identifying early benign and malignant nodules.
Non-invasive diagnostic methods such as liquid biopsies detect nucleic acids of tumor origin in blood or other body fluids. Circulating tumor cells provide tumor-derived nucleic acids for detection, including circulating tumor DNA (ctDNA), cell free DNA (cfDNA), circulating messenger RNA (mRNA), and long non-coding RNA (lncRNA), among others. The specificity of the circulating tumor cell nucleic acid detection on cancer detection is high, and the false positive rate of detection can be obviously reduced. In addition, methylation of tumor suppressor genes or oncogenes is also well sensitive and specific in distinguishing malignant from benign lung nodules and has a potential for clinical use.
Protein biomarkers are relatively less studied than extensive biomarker studies targeting nucleic acids. Notably, a pulmonary nodule plasma proteome classifier (Pulmonary Nodule Plasma Proteomic Classifier, PANOPTIC) in the united states provides a project that integrates several key clinical features of patients and two protein markers (LG 3BP and C163A) to distinguish early benign from malignant nodules, the only one to date being FDA approved and used to clinically diagnose early nodule patients. Subsequent follow-up studies over two years showed that the biomarker was 97% sensitive and 44% specific, superior to the doctor's estimate of the probability of cancer. However, the detection means needs to integrate clinical characteristics such as the size of the nodules, the morphology of the nodules, the positions of the nodules and the like, so that patients also need to be judged by CT and clinicians, and a certain difficulty and a certain challenge are brought to early screening of lung cancer of a larger range of people. Therefore, further development of non-invasive plasma-based protein biomarkers is necessary.
Disclosure of Invention
The invention discovers the protein marker combination with high sensitivity and specificity by utilizing the high performance liquid chromatography-high resolution mass spectrometry combination, and can well judge early pulmonary nodule benign and malignant by combining a machine learning model, thereby completing the invention. The technical scheme adopted by the invention is as follows:
in a first aspect the invention provides a combination of protein markers for the prediction or diagnosis of benign and malignant pulmonary nodules comprising APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2.
APOA4: the protein belongs to one of lipoprotein families, the specific function is unknown, and in vitro experiments prove that APOA4 is an activator of lecithin cholesterol acyltransferase. Related biological pathways include assembly, remodeling, clearance of plasma lipoproteins, and the like.
CD14: the protein is a surface antigen mainly expressed on the surface of mononuclear cells, and the protein and other proteins cooperate to mediate the innate immune response of bacterial lipopolysaccharide and viruses. The protein has been identified as a candidate target for treating patients infected with SARS-Cov-2 to reduce or inhibit severe inflammatory responses.
PFN1: the protein belongs to a small actin binding protein family member, plays an important role in actin dynamics, and responds to extracellular signals by regulating actin polymerization.
APOB: the protein is the main apolipoprotein of low density lipoprotein, and is also a ligand of low density lipoprotein receptor, and exists in two main isomer forms in plasma: APOB-48 and APOB-100, respectively, the former is synthesized only in the intestine and the latter is synthesized in the liver. Mutations in the regulatory regions of APOB can lead to betalipoproteinemia, triglyceride hypobetalipoproteinemia and hypercholesterolemia.
PLA2G7: the protein is a secretase, catalyzes the degradation of platelet activating factor into a biological inactive product, and the gene defect is one of causes of platelet activating factor acetylhydrolase deficiency. Lipoprotein-associated calcium-dependent phospholipase A2 is involved in phospholipid catabolism in inflammatory and oxidative stress reactions, acting as a potent pro-inflammatory signaling lipid through PTAFR on various innate immune cells.
IGFBP2: the protein can bind insulin-like growth factors I and II (IGF-I and IGF-II), can better bind IGF-I and IGF-II after being secreted into blood, and can also act with different ligands in cells. High expression of IGFBP2 may promote the growth of a variety of tumors and may allow for the prognosis of a patient.
In the present invention, by detecting the expression level of each protein in the protein marker combination, it is possible to predict whether a subject has a risk of developing a malignant nodule in the lung or predict whether a pulmonary nodule in the subject has a risk of progressing to a malignant nodule, and thus to screen the malignant nodule in the lung of the subject early, and if the subject already has a pulmonary nodule, it is possible to determine whether the pulmonary nodule is malignant or benign. In some embodiments of the invention, the prediction of having a pulmonary nodule is made primarily for pulmonary nodules having a diameter of between 5mm and 30 mm.
Further, by detecting the expression level of each protein in the protein marker combination, it is also possible to diagnose whether a pulmonary malignancy is present, wherein the diagnosis is performed by determining whether a pulmonary malignancy is present in a subject having a pulmonary malignancy, and the diagnosis is performed by a clinician in combination with other clinical indicators. Diagnosis of malignant nodules (i.e., judging benign malignancy of pulmonary nodules) may also be considered early screening of the lungs because malignant nodules may risk malignancy or tumor progression. In some embodiments of the invention, diagnosis of an already pulmonary nodule is made primarily for pulmonary nodules having a diameter of 5mm to 30 mm.
In a second aspect the invention provides a polypeptide combination for use in the prediction or diagnosis of benign and malignant pulmonary nodules, comprising one specific polypeptide from each protein in the protein marker combination of the first aspect of the invention.
In some embodiments of the present invention,
The polypeptide from APOA4 comprises an amino acid sequence shown in SEQ ID No. 1;
the polypeptide from CD14 comprises the amino acid sequence shown in SEQ ID No. 2;
The polypeptide from PFN1 comprises the amino acid sequence shown in SEQ ID No. 3;
The polypeptide from APOB comprises the amino acid sequence shown in SEQ ID No. 4;
the polypeptide from PLA2G7 comprises the amino acid sequence shown in SEQ ID No. 5;
The polypeptide from IGFBP2 comprises the amino acid sequence shown in SEQ ID No. 6.
In the invention, the expression level of each protein in the protein marker combination can be obtained by quantitatively detecting the level of the polypeptide from each protein in the protein marker combination. The polypeptide may be a polypeptide obtained by naturally decomposing each protein in a subject, or may be a polypeptide obtained by subjecting a protein sample to trypsin degradation.
In a third aspect, the invention provides the use of a protein marker combination according to the first aspect of the invention or a detection reagent for a polypeptide combination according to the second aspect of the invention in the preparation of a kit for the prediction or diagnosis of benign and malignant pulmonary nodules.
In some embodiments of the invention, the detection reagent detects the expression level of each protein in the protein marker combination based on mass spectrometry.
In some embodiments of the invention, the expression level of each protein in the protein marker combination is obtained by obtaining the level of each polypeptide in the polypeptide combination.
In a fourth aspect the invention provides a method for the prediction or diagnosis of benign and malignant pulmonary nodules comprising the steps of:
S1, obtaining expression level data of each protein in the protein marker combination according to any one of the first aspect of the invention;
S2, constructing a machine learning model by using expression level data of each protein in the protein marker combination in the population sample and benign and malignant lung nodule information, and diagnosing benign and malignant lung nodule of the subject based on the machine learning model, or predicting whether the subject has risk of suffering from malignant lung nodule or predicting whether the pulmonary nodule of the subject has risk of progressing to malignant nodule.
In some embodiments of the invention, the level of expression of each protein in the protein marker combination is obtained by obtaining the level of each polypeptide in the polypeptide combination.
In some embodiments of the invention, the mass spectrometry-based method detects the level of each polypeptide in the polypeptide combination.
In some embodiments of the invention, the mass spectrometry detection utilizes high resolution mass spectrometry for proteomic assays.
In some embodiments of the invention, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some preferred embodiments of the invention, the machine learning model is trained using a logistic regression algorithm.
Further, a preset threshold is obtained based on the machine learning model using population samples, and for each subject sample, the model determination is performed to diagnose that the subject's pulmonary nodule is malignant if above the preset threshold, or predict that the subject has a risk of developing a malignant pulmonary nodule or that the subject's pulmonary node has a risk of progressing to a malignant nodule. Diagnosing the subject's pulmonary nodule as benign if not above the preset threshold; or predicting that the subject is not at risk of developing a malignant pulmonary nodule or that a benign pulmonary nodule in the subject is not at risk of progressing to a malignant nodule.
In a fifth aspect, the invention provides a system for pulmonary nodule benign and malignant prediction or diagnosis comprising the following modules:
a data input module for inputting expression level data for each protein in a subject protein marker combination comprising APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2;
The data storage module is used for storing the protein level data in the protein marker combination in the group samples and the information of whether each sample is derived from a patient with benign and malignant pulmonary nodules;
the pulmonary nodule analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by using the expression level data of each protein in the protein marker combination in the group sample stored in the data storage module and the benign and malignant pulmonary nodule information, and diagnoses the benign and malignant pulmonary nodule of the subject based on the machine learning model, or predicts whether the subject has the risk of suffering from the malignant pulmonary nodule or predicts whether the pulmonary nodule of the subject has the risk of progressing to the malignant nodule.
In some embodiments of the invention, the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
In some embodiments of the invention, the pulmonary nodule benign and malignant analysis module further inputs expression level data and determinations of each polypeptide in a subject protein marker combination to the data storage module.
In some preferred embodiments of the invention, the machine learning model is trained using a logistic regression algorithm.
Further, the pulmonary nodule analysis module obtains a preset threshold based on the machine learning model using population samples, and for each subject sample, models determinations if above the preset threshold, diagnosing that the subject pulmonary nodule is malignant, or predicts that the subject has a risk of developing a malignant pulmonary nodule or that the subject pulmonary nodule has a risk of progressing to a malignant nodule. Diagnosing the subject's pulmonary nodule as benign if not above the preset threshold; or predicting that the subject is not at risk of developing a malignant pulmonary nodule or that a benign pulmonary nodule in the subject is not at risk of progressing to a malignant nodule.
The beneficial effects of the invention are that
Compared with the prior art, the invention has the following beneficial effects:
The protein marker combination provides a non-invasive screening means based on plasma for judging benign and malignant pulmonary nodules. The expression level of the protein marker is analyzed by a machine learning means, so that the detection accuracy of benign and malignant pulmonary nodules can be improved.
The method and the system of the invention are used for judging benign and malignant pulmonary nodules, i.e. predicting or diagnosing benign and malignant nodules with the size within 30 mm, have no wound on patients, are convenient to obtain materials, have small blood plasma sample consumption and high sensitivity and specificity, and fill the blank that effective protein markers (combinations) are not available for identifying benign and malignant pulmonary nodules.
The sensitivity of the protein marker combination disclosed by the invention to the prediction of malignant pulmonary nodules is up to 96%, and missed diagnosis can be avoided to a great extent. After judging positive results, the patient is prompted to carry out further diagnosis, and the death rate of the early non-small cell lung cancer can be effectively reduced in the crowd in the long term.
The protein markers of the blood plasma are detected by machine learning, so that the purpose of dynamically monitoring the benign and malignant states of the nodules of the subject can be achieved.
Drawings
FIG. 1 shows protein expression data for 6 markers of the discovery phase (Cohort 1).
FIG. 2 shows a subject's working characteristics curve (ROC curve) for the 6 protein marker combinations of stage (Cohort 1), training set, test set with area under the curve (AUC) of 0.87, 0.91, respectively, where train represents training set, test represents test set, valid represents independent validation set; true positive rate (sensitivity) indicates true positive rate (sensitivity), false positive rate (1-specificty) indicates false positive rate (1-specificity).
Fig. 3 shows a confusion matrix for the discovery phase (Cohort 1) of 6 protein marker combinations, of which 40 patients with Malignant nodules (magnants) and 40 benign nodule controls (Benign). Wherein Positive indicates Positive, negative indicates Negative; true Label represents True, prediction represents Prediction.
Figure 4 shows expression data for the 6 marker proteins of the discovery phase (Cohort 1) and the two independent validation sets (Cohort 2 and Cohort 3) corresponding to the specific polypeptides.
Fig. 5 shows the subject working characteristics of 6 protein marker combinations when validated for two independent validation sets (Cohort and Cohort 3), with areas under the curve (AUC) of 0.82, 0.81, respectively, where True positive rate (sensitivity) represents true positive rate (sensitivity) and False positive rate (1-specificty) represents false positive rate (1-specificity).
Fig. 6 shows a confusion matrix for 6 protein marker combinations at the time of independent validation set 1 (Cohort) validation, with 26 Malignant nodule patients (magnants) and 20 benign nodule controls (Benign). Wherein Positive indicates Positive, negative indicates Negative; true Label represents True, prediction represents Prediction.
Fig. 7 shows a confusion matrix for 6 protein marker combinations at the time of independent validation set 2 (Cohort) validation, with 35 Malignant nodule patients (magnants) and 24 benign nodule controls (Benign). Wherein Positive indicates Positive, negative indicates Negative; true Label represents True, prediction represents Prediction.
Detailed Description
Unless otherwise indicated, implied from the context, or common denominator in the art, all parts and percentages in the present application are based on weight and the test and characterization methods used are synchronized with the filing date of the present application. Where applicable, the disclosure of any patent, patent application, or publication referred to in this application is incorporated by reference in its entirety, and the equivalent patents to those cited in this application are incorporated by reference, particularly as if they were set forth in the relevant terms of art. If the definition of a particular term disclosed in the prior art is inconsistent with any definition provided in the present application, the definition of the term provided in the present application controls.
The numerical ranges in the present application are approximations, so that it may include the numerical values outside the range unless otherwise indicated. The numerical range includes all values from the lower value to the upper value in 1 unit increase, provided that there is a spacing of at least 2 units between any lower value and any higher value. For ranges containing values less than 1 or containing fractions greater than 1 (e.g., 1.1,1.5, etc.), then 1 unit is suitably considered to be 0.0001,0.001,0.01, or 0.1. For a range containing units of less than 10 (e.g., 1 to 5), 1 unit is generally considered to be 0.1. These are merely specific examples of what is intended to be provided, and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure.
The terms "comprises," "comprising," "including," and their derivatives do not exclude the presence of any other component, step or process, and are not related to whether or not such other component, step or process is disclosed in the present application. For the avoidance of any doubt, all use of the terms "comprising", "including" or "having" herein, unless expressly stated otherwise, may include any additional additive, adjuvant or compound. Rather, the term "consisting essentially of … …" excludes any other component, step, or process from the scope of any of the terms recited below, as those out of necessity for performance of the operation. The term "consisting of … …" does not include any components, steps or processes not specifically described or listed. The term "or" refers to the listed individual members or any combination thereof unless explicitly stated otherwise.
In order to make the technical problems, technical schemes and beneficial effects solved by the invention more clear, the invention is further described in detail below with reference to the embodiments.
Examples
The following examples are presented herein to demonstrate preferred embodiments of the present invention. It will be appreciated by those skilled in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. Those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit or scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, the disclosure of which is incorporated herein by reference as is commonly understood by reference.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the claims.
The experimental methods in the following examples are conventional methods unless otherwise specified. The instruments used in the following examples are laboratory conventional instruments unless otherwise specified; the test materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.
Example 1 discovery of protein markers
The inventors collected fresh blood samples of 40 malignant nodule patients and 40 benign nodule controls matched in gender, age, and nodule size for the discovery of protein markers.
1. Blood sample processing
After anticoagulation treatment, the fresh blood sample is centrifuged at 1000 Xg for 5min to obtain a plasma sample, and the plasma sample is stored for a long time in a refrigerator at-70 ℃.
Thawing the plasma sample at room temperature, mixing 8 mu L of the plasma sample to a centrifugal column (Thermo No. A36370) which is at room temperature and is used for removing 14 high kurtosis proteins (albumin、IgG、IgG(light chains)、IgA、IgD、IgE、transferrin、haptoglobin、alpha1-antitrypsin、fibrinogen、alpha2-macroglobulin、alpha1-acid glycoprotein、IgM and apolipoprotein AI, performing shaking table reaction at room temperature for 15min to remove the high kurtosis proteins, and centrifuging to obtain a low-abundance protein sample. BCA assay for protein concentration determination: BSA standards were diluted in a gradient to concentration gradients of 2, 1, 0.5, 0.25, 0.125, 0.0625mg/mL and plasma concentrations were calibrated as a working curve. The diluted sample and standard substance are respectively added into a 96-well plate, a pre-prepared BCA working solution is added, and the reaction is carried out at 37 ℃ for 30min, and the concentration of plasma protein is measured under the absorbance of 562 nm.
30 Μg of the peak-removed plasma sample was taken and ammonium bicarbonate solution was added to a final concentration of 50mM. DTT was added to a final concentration of 10mM and heated at 95℃for 10min. After returning to room temperature, dark reaction was performed for 30min by adding IAA at a final concentration of 15 mM. 1 mug of trypsin was added to each sample, and the reaction was carried out overnight in a metal bath at 37℃for 12-14 h. The next day, formic acid with a final concentration of 1% was added to carry out the acidification treatment to terminate the cleavage reaction.
2. Differential protein
The selection of targets is first based on finding differentially expressed proteins. The inventors performed Data Independent Acquisition (DIA) of 80 sex and age symmetric plasma samples (40 malignant nodule patients and 40 benign nodule controls) for mass spectrum acquisition, further analyzed by DIA-NN software to obtain protein and polypeptide expression data, and performed normalization analysis by total protein intensity to total 451 proteins. For expressing proteins conforming to normal distribution, the inventors found differentially expressed proteins using T-test, and for expressing proteins not conforming to normal distribution expression, the inventors found differentially expressed proteins using Wilcoxon non-parametric test. Finally, the inventors have obtained a total of 19 differentially expressed proteins, 15 up-regulated proteins and 4 down-regulated proteins, and the list of differential proteins is shown in Table 1.
TABLE 1 list of differential proteins
3. Marker protein screening
The potential proteins and specific polypeptides thereof capable of distinguishing early benign and malignant nodules are selected by a random forest method, the average Gini coefficients of the targets are calculated by random forests, the random forests are ranked according to importance, the biological functions of the proteins are further combined, and finally 6 top-ranked proteins, namely APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2, are obtained, the expression data of which in 80 samples are shown in figure 1, and the expression level of the CD14, the PFN1, the APOB and the PLA2G7 in malignant nodule patients is higher than that in benign nodule patients except that the expression level of the APOA4 and the IGFBP2 in malignant nodule patients (MA) is lower than that in benign nodule controls (BE). The corresponding polypeptide sequences for APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2 are shown in table 2:
TABLE 2 polypeptide sequences of candidate proteins
Proteins Polypeptide sequence SEQ ID No.
APOA4 SELTQQLNALFQDK 1
CD14 AFPALTSLDLSDNPGLGER 2
PFN1 STGGAPTFNVTVTK 3
APOB TSSFALNLPTLPEVK 4
PLA2G7 IAVIGHSFGGATVIQTLSEDQR 5
IGFBP2 LEGEACGVYTPR 6
Example 2 machine learning model establishment (Cohort 1)
After mass spectrum acquisition, according to mass spectrum quantitative data, the polypeptide expression data corresponding to the respective protein markers are used for establishing a model. 80 samples are randomly selected to be 80% (64) as a training set, the rest 20% (16) are used as a test set, 6 protein markers are further built into a logistic regression model, and parameters of the model are determined. The inventors found that the combination of 6 protein markers, APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2, has better predictive power in both training and test sets, and the ROC curve is shown in FIG. 2.
The final result was a sensitivity of 100%, a specificity of 40%, a negative predictive value of 100%, and a positive predictive value of 63%, as shown in fig. 3.
Example 3 independent model verification 1 (Cohort 2)
The inventors have additionally selected 46 patients (Cohort, 20 malignant nodule patients and 26 benign nodule controls) as the validation set for model validation. After plasma protein extraction and concentration measurement, liquid phase separation and mass spectrum acquisition are carried out in a non-dependent acquisition mode (DIA), and DIA-NN software analysis is carried out to obtain expression data of 6 marker polypeptides (shown in figure 4). Substituting the expression data of the 6 protein markers into the established logistic regression model, and judging the benign and malignant nodules of the 46 samples. The inventors found that the protein marker combination formed by 6 protein markers of APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2 also has very good predictive power in Cohort and the ROC curve is shown in fig. 5.
The final result was a sensitivity of 96%, a specificity of 35%, a negative predictive value of 88%, and a positive predictive value of 66%, as shown in fig. 6.
Example 4 independent model verification 2 (Cohort 3)
The inventors further performed model validation using a third population (Cohort, including 59 patients, 24 of which were malignant nodule patients and 35 of which were benign nodule controls) as an independent validation set. Similarly, plasma protein extraction and concentration determination followed by liquid phase separation and mass spectrometry acquisition in independent acquisition mode (DIA) were performed, and DIA-NN software analysis resulted in expression data for 6 marker polypeptides (as shown in fig. 2). Substituting the expression data of the 6 protein markers into the established logistic regression model, and judging the benign and malignant nodules of the 59 samples. The inventors found that the protein marker combination formed by 6 protein markers of APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2 also had very good predictive power in Cohort, and the ROC curve thereof is shown in fig. 5.
The final result was a sensitivity of 91%, a specificity of 54%, a negative predictive value of 81%, and a positive predictive value of 74%, as shown in FIG. 7.
The result shows that the high sensitivity and specificity protein marker combination is detected by utilizing the high performance liquid chromatography-high resolution mass spectrum combination, so that the discrimination capability of discriminating benign and malignant pulmonary nodules is improved. Meanwhile, the expression level of the protein marker combination is analyzed through a machine learning means, so that the accuracy of pulmonary nodule benign and malignant detection can be further improved. Finally, through a large-scale clinical test, the transformation of the product is realized, and the protein marker product for early pulmonary nodule benign and malignant can be first judged on the domestic market.
All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.

Claims (8)

1. Use of a detection reagent of a protein marker combination consisting of APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP2 for the preparation of a kit for the prediction or diagnosis of benign and malignant pulmonary nodules between 5 and 30 mm in diameter.
2. The use according to claim 1, wherein the detection reagent is a detection reagent for at least one polypeptide from each protein in the protein marker combination.
3. The use according to claim 2, wherein the polypeptide from APOA4 comprises the amino acid sequence shown in SEQ ID No. 1; the polypeptide from CD14 comprises the amino acid sequence shown in SEQ ID No. 2; the polypeptide from PFN1 comprises the amino acid sequence shown in SEQ ID No. 3; the polypeptide from APOB comprises the amino acid sequence shown in SEQ ID No. 4; the polypeptide from PLA2G7 comprises the amino acid sequence shown in SEQ ID No. 5; the polypeptide from IGFBP2 comprises the amino acid sequence shown in SEQ ID No. 6.
4. The use according to claim 1, wherein the detection reagent detects the level of each protein in the protein marker combination based on mass spectrometry.
5. A system for pulmonary nodule benign and malignant prediction or diagnosis between 5mm and 30 mm in diameter, comprising the following modules:
A data input module for inputting expression level data for each protein in a subject protein marker combination consisting of APOA4, CD14, PFN1, APOB, PLA2G7 and IGFBP 2;
the data storage module is used for storing the protein level data and the benign and malignant lung nodule information of each protein in the protein marker combination in the population sample;
the pulmonary nodule analysis module is respectively connected with the data input module and the data storage module, constructs a machine learning model by using the expression level data of each protein in the protein marker combination in the group sample stored in the data storage module and the benign and malignant pulmonary nodule information, and diagnoses the benign and malignant pulmonary nodule of the subject based on the machine learning model, or predicts whether the subject has the risk of suffering from the malignant pulmonary nodule or predicts whether the pulmonary nodule of the subject has the risk of progressing to the malignant nodule.
6. The system of claim 5, wherein the expression level data for each protein in the protein marker combination is obtained based on detecting the expression level of a specific polypeptide from each protein in the protein marker combination.
7. The system of claim 6, wherein the machine learning model is trained using any one of the following algorithms:
random forest algorithms, support vector machine algorithms, linear regression algorithms, logistic regression algorithms, bayesian classifiers, and neural network algorithms.
8. The system of any one of claims 5-7, wherein the pulmonary nodule analysis module further inputs into the data storage module expression level data and determinations of each protein in a subject protein marker combination.
CN202310100161.7A 2023-02-03 2023-02-03 Protein marker combination for judging benign and malignant pulmonary nodule and application thereof Active CN116008551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310100161.7A CN116008551B (en) 2023-02-03 2023-02-03 Protein marker combination for judging benign and malignant pulmonary nodule and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310100161.7A CN116008551B (en) 2023-02-03 2023-02-03 Protein marker combination for judging benign and malignant pulmonary nodule and application thereof

Publications (2)

Publication Number Publication Date
CN116008551A CN116008551A (en) 2023-04-25
CN116008551B true CN116008551B (en) 2024-05-17

Family

ID=86028196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310100161.7A Active CN116008551B (en) 2023-02-03 2023-02-03 Protein marker combination for judging benign and malignant pulmonary nodule and application thereof

Country Status (1)

Country Link
CN (1) CN116008551B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012051822A (en) * 2010-08-31 2012-03-15 Institute Of Physical & Chemical Research Lung cancer diagnostic polypeptide, method for detecting lung cancer, and method for evaluating therapeutic effect
WO2013081721A1 (en) * 2011-11-30 2013-06-06 Battelle Memorial Institute Biomarkers for lymphoma
GB2590185B (en) * 2018-04-23 2022-09-28 Seer Inc Systems and methods for complex biomolecule sampling and biomarker discovery

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Association of the serological status of rheumatoid arthritis patients with two circulating protein biomarkers: A useful tool for precision medicine strategies;Cristina Ruiz-Romero等;frontiers in medicine;20221028;第1-15页 *
Identification of the Level of Exosomal Protein by Parallel Reaction Monitoring Technology;Hui Huang等;Int J Gen Med;20221014;第1-11页 *
roteomic identification of biomarkers in maternal plasma that predict the outcome of rescue cerclage for cervical insufficiency;Kisoon Dan等;PLOS ONE;20210415;第1-17页 *
Shotgun proteomics coupled to nanoparticle-based biomarker enrichment reveals a novel panel of extracellular matrix proteins as candidate serum protein biomarkers for early-stage breast cancer detection;Claudia Fredolini等;Breast Cancer Research;20201231;第1-16页 *

Also Published As

Publication number Publication date
CN116008551A (en) 2023-04-25

Similar Documents

Publication Publication Date Title
US20230393150A1 (en) Methods and algorithms for aiding in the detection of cancer
EP2362942B1 (en) Biomarkers
JP5986638B2 (en) Lung cancer biomarkers and uses thereof
JP5905003B2 (en) Lung cancer biomarkers and their use
JP5931874B2 (en) Pancreatic cancer biomarkers and uses thereof
WO2011157655A1 (en) Use of bile acids for prediction of an onset of sepsis
JP2014531046A (en) Selection of preferred sample handling and processing protocols for disease biomarker identification and sample quality assessment
EP2812693A1 (en) A multi-biomarker-based outcome risk stratification model for pediatric septic shock
CN102333887A (en) Method for detecting metastasis of gi cancer
KR20210089178A (en) How to evaluate sample quality
CN112748191A (en) Small molecule metabolite biomarker for diagnosing acute diseases, and screening method and application thereof
JP2023545017A (en) Methods for detection and treatment of lung cancer
Pérez-Carrillo et al. Diagnostic value of serum miR-144-3p for the detection of acute cellular rejection in heart transplant patients
EP2473854A2 (en) Systems and methods for treating, diagnosing and predicting the response to therapy of breast cancer
US20150338412A1 (en) Composition for diagnosis of lung cancer and diagnosis kit for lung cancer
CN116626297B (en) System for pancreatic cancer detection and reagent or kit thereof
CN116008551B (en) Protein marker combination for judging benign and malignant pulmonary nodule and application thereof
WO2011080184A1 (en) Use of endogenous metabolites for early diagnosing sepsis
CN109813912B (en) Application of group of serum differential protein combinations in preparation of reagent for detecting autism
EP2772759B1 (en) Composition for diagnosis of lung cancer
CN116735889B (en) Protein marker for early colorectal cancer screening, kit and application
JP6755703B2 (en) Cancer detection method
EP2730922B1 (en) Method and system for detecting lymphosarcoma in cats using biomarkers
EP2730925B1 (en) Method and system for detecting and differentiating cancer and sepsis in canines subjects using biomarkers
US20240044902A1 (en) Methods for the detection and treatment of ovarian cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant