CN116908450A - Serum metabolite combination biomarker for diagnosing prostate cancer - Google Patents
Serum metabolite combination biomarker for diagnosing prostate cancer Download PDFInfo
- Publication number
- CN116908450A CN116908450A CN202310723363.7A CN202310723363A CN116908450A CN 116908450 A CN116908450 A CN 116908450A CN 202310723363 A CN202310723363 A CN 202310723363A CN 116908450 A CN116908450 A CN 116908450A
- Authority
- CN
- China
- Prior art keywords
- metabolite
- prostate cancer
- serum
- sample
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002207 metabolite Substances 0.000 title claims abstract description 108
- 206010060862 Prostate cancer Diseases 0.000 title claims abstract description 69
- 208000000236 Prostatic Neoplasms Diseases 0.000 title claims abstract description 69
- 210000002966 serum Anatomy 0.000 title claims abstract description 56
- 239000000090 biomarker Substances 0.000 title claims abstract description 19
- 238000007477 logistic regression Methods 0.000 claims abstract description 20
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 claims abstract description 18
- 238000004458 analytical method Methods 0.000 claims abstract description 17
- JVTAAEKCZFNVCJ-UHFFFAOYSA-N lactic acid Chemical compound CC(O)C(O)=O JVTAAEKCZFNVCJ-UHFFFAOYSA-N 0.000 claims abstract description 16
- 238000012216 screening Methods 0.000 claims abstract description 13
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 claims abstract description 9
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 claims abstract description 8
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 claims abstract description 8
- 150000001982 diacylglycerols Chemical class 0.000 claims abstract description 8
- 235000014655 lactic acid Nutrition 0.000 claims abstract description 8
- 239000004310 lactic acid Substances 0.000 claims abstract description 8
- WTJKGGKOPKCXLL-RRHRGVEJSA-N phosphatidylcholine Chemical compound CCCCCCCCCCCCCCCC(=O)OC[C@H](COP([O-])(=O)OCC[N+](C)(C)C)OC(=O)CCCCCCCC=CCCCCCCCC WTJKGGKOPKCXLL-RRHRGVEJSA-N 0.000 claims abstract description 8
- WBWWGRHZICKQGZ-HZAMXZRMSA-N taurocholic acid Chemical compound C([C@H]1C[C@H]2O)[C@H](O)CC[C@]1(C)[C@@H]1[C@@H]2[C@@H]2CC[C@H]([C@@H](CCC(=O)NCCS(O)(=O)=O)C)[C@@]2(C)[C@@H](O)C1 WBWWGRHZICKQGZ-HZAMXZRMSA-N 0.000 claims abstract description 8
- 239000003550 marker Substances 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims description 31
- 206010028980 Neoplasm Diseases 0.000 claims description 21
- 201000011510 cancer Diseases 0.000 claims description 21
- 238000001819 mass spectrum Methods 0.000 claims description 16
- 238000000034 method Methods 0.000 claims description 10
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 claims description 9
- 238000007476 Maximum Likelihood Methods 0.000 claims description 7
- 239000000126 substance Substances 0.000 claims description 6
- 150000002500 ions Chemical class 0.000 claims description 4
- 238000011282 treatment Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000011049 filling Methods 0.000 claims description 3
- 239000007788 liquid Substances 0.000 claims description 3
- 238000004949 mass spectrometry Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims description 3
- 238000000611 regression analysis Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 2
- 239000006185 dispersion Substances 0.000 claims 1
- 238000002705 metabolomic analysis Methods 0.000 claims 1
- 230000001431 metabolomic effect Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000003759 clinical diagnosis Methods 0.000 abstract 1
- 102000007066 Prostate-Specific Antigen Human genes 0.000 description 12
- 108010072866 Prostate-Specific Antigen Proteins 0.000 description 12
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 206010004446 Benign prostatic hyperplasia Diseases 0.000 description 2
- ZRALSGWEFCBTJO-UHFFFAOYSA-N Guanidine Chemical compound NC(N)=N ZRALSGWEFCBTJO-UHFFFAOYSA-N 0.000 description 2
- MBMBGCFOFBJSGT-KUBAVDMBSA-N all-cis-docosa-4,7,10,13,16,19-hexaenoic acid Chemical compound CC\C=C/C\C=C/C\C=C/C\C=C/C\C=C/C\C=C/CCC(O)=O MBMBGCFOFBJSGT-KUBAVDMBSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- MPCAJMNYNOGXPB-SLPGGIOYSA-N 1,5-anhydro-D-glucitol Chemical compound OC[C@H]1OC[C@H](O)[C@@H](O)[C@@H]1O MPCAJMNYNOGXPB-SLPGGIOYSA-N 0.000 description 1
- ODHCTXKNWHHXJC-VKHMYHEASA-N 5-oxo-L-proline Chemical compound OC(=O)[C@@H]1CCC(=O)N1 ODHCTXKNWHHXJC-VKHMYHEASA-N 0.000 description 1
- KWIUHFFTVRNATP-UHFFFAOYSA-N Betaine Natural products C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 1
- KWIUHFFTVRNATP-UHFFFAOYSA-O N,N,N-trimethylglycinium Chemical compound C[N+](C)(C)CC(O)=O KWIUHFFTVRNATP-UHFFFAOYSA-O 0.000 description 1
- CHJJGSNFBQVOTG-UHFFFAOYSA-N N-methyl-guanidine Natural products CNC(N)=N CHJJGSNFBQVOTG-UHFFFAOYSA-N 0.000 description 1
- CXTATJFJDMJMIY-CYBMUJFWSA-N O-octanoyl-L-carnitine Chemical compound CCCCCCCC(=O)O[C@H](CC([O-])=O)C[N+](C)(C)C CXTATJFJDMJMIY-CYBMUJFWSA-N 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 208000004403 Prostatic Hyperplasia Diseases 0.000 description 1
- ODHCTXKNWHHXJC-GSVOUGTGSA-N Pyroglutamic acid Natural products OC(=O)[C@H]1CCC(=O)N1 ODHCTXKNWHHXJC-GSVOUGTGSA-N 0.000 description 1
- ODHCTXKNWHHXJC-UHFFFAOYSA-N acide pyroglutamique Natural products OC(=O)C1CCC(=O)N1 ODHCTXKNWHHXJC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229960003237 betaine Drugs 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000033077 cellular process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- OEYIOHPDSNJKLS-UHFFFAOYSA-N choline Chemical compound C[N+](C)(C)CCO OEYIOHPDSNJKLS-UHFFFAOYSA-N 0.000 description 1
- 229960001231 choline Drugs 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001212 derivatisation Methods 0.000 description 1
- SWSQBOPZIKWTGO-UHFFFAOYSA-N dimethylaminoamidine Natural products CN(C)C(N)=N SWSQBOPZIKWTGO-UHFFFAOYSA-N 0.000 description 1
- 229940090949 docosahexaenoic acid Drugs 0.000 description 1
- 235000020669 docosahexaenoic acid Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 238000011221 initial treatment Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000000101 novel biomarker Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 239000000439 tumor marker Substances 0.000 description 1
- 238000004704 ultra performance liquid chromatography Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
- G01N33/57434—Specifically defined cancers of prostate
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57484—Immunoassay; Biospecific binding assay; Materials therefor for cancer involving compounds serving as markers for tumor, cancer, neoplasia, e.g. cellular determinants, receptors, heat shock/stress proteins, A-protein, oligosaccharides, metabolites
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6848—Methods of protein analysis involving mass spectrometry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Urology & Nephrology (AREA)
- General Health & Medical Sciences (AREA)
- Hematology (AREA)
- Pathology (AREA)
- General Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Cell Biology (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Food Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Microbiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medicinal Chemistry (AREA)
- Public Health (AREA)
- Hospice & Palliative Care (AREA)
- Epidemiology (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Data Mining & Analysis (AREA)
- Oncology (AREA)
- Theoretical Computer Science (AREA)
- Primary Health Care (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
Abstract
The invention provides a serum metabolite combination biomarker for diagnosing prostate cancer, which relates to the technical field of biomedicine and clinical examination, and consists of hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2) and phosphatidylcholine PC aa C34:2. According to the invention, the molecular concentration of 630 serum metabolites is accurately measured by carrying out targeted quantitative metabonomics analysis on clinical serum samples, modeling is carried out according to the metabolite intensity based on a logistic regression algorithm, regression coefficients and cut-off values are determined, the marker combinations are screened, risk scores are established, and whether the subject is a prostate cancer patient is judged. The metabolite combination developed by the invention has remarkable advantages compared with the markers used in the current clinic, and can be used for clinical diagnosis and screening of the prostate cancer.
Description
Technical Field
The invention relates to the technical field of biomedicine and clinical examination, in particular to a serum metabolite combination biomarker for diagnosing prostate cancer.
Background
Prostate cancer (PCa) is the most common cancer in older men and the second most common cause of cancer death in men. Biomarkers are not only critical for the initial diagnosis of prostate cancer, but also provide an effective method to screen appropriate populations, guide initial treatment strategies, assess the efficacy of treatment and track the progression of cancer over time. At present, prostate specific antigen (prostate specific antigen, PSA) in serum is mainly used as a tumor marker for diagnosing prostate cancer clinically, but the specificity is low, the false positive rate is high, a large number of unnecessary clinical puncture biopsies are caused, and heavy living and economic burden is brought to patients. There is thus a great need for novel biomarkers to improve clinical decisions and management of PCa (Kdadra et al, 2019).
Metabolome represents a complete set of metabolites that are end products of cellular processes within a biological cell, tissue, organ or organism and can be considered as a measure of the activity at the genomic, epigenomic, transcriptomic and proteomic levels and their end results of interactions with the environment. Research has shown that metabonomic features of cancer can be used to assess disease risk, or for cancer screening, diagnosis and monitoring of specific disease sub-populations (Schmidtet al, 2021). For example, a group of 4 metabolites (L-octanoyl carnitine, pyroglutamic acid, hypoxanthine, and docosahexaenoic acid) were identified as potential breast cancer serum biomarkers (Park et al, 2019). A group of 5 serum metabolites (including glutamic acid, choline, 1, 5-anhydroglucitol, betaine, and guanidine) was able to distinguish pancreatic cancer patients from control groups (Xie et al, 2015).
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a serum metabolite combination biomarker for diagnosing prostate cancer.
A serum metabolite combination biomarker for diagnosing prostate cancer, comprising hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2), phosphatidylcholine PC aa C34:2, which is a prostate cancer detection marker;
the serum metabolite combination biomarker is obtained by adopting a biomarker screening method based on targeted quantitative metabonomics, and specifically comprises the following steps of:
step 1: performing metabonomics analysis on serum of a prostate cancer patient and serum of a non-prostate cancer population by adopting a targeted quantitative metabonomics analysis technology;
sample treatment is carried out on serum of a prostate cancer patient and serum of a non-prostate cancer crowd by using a kit, separation is carried out by using a liquid chromatograph, a mass spectrometer is used as a detector, and metabolome quantification is carried out by using analysis metabolome mass spectrum data software;
step 1.1: collecting serum samples of prostate cancer patients and serum samples of non-prostate cancer people;
step 1.2: treating the serum sample and extracting metabolites;
step 1.3: mass spectrum detection;
sequentially carrying out mass spectrometry on serum samples according to a sample table derived from analysis metabolome mass spectrum data software, detecting each sample in two parts, wherein the first part uses a mass spectrometer to carry out FIA mode signal acquisition and carries out LCMS mode acquisition, and the second part uses the mass spectrometer to carry out LCMS mode acquisition, and each sample is respectively acquired in positive and negative ion modes once to obtain original mass spectrum data;
step 1.4: preprocessing data;
the generated original mass spectrum data are imported into analysis metabolome mass spectrum data software, the software generates a standard curve according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and meanwhile, the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; the quality of each metabolite data is evaluated according to the precision of each metabolite in the standard, i.e. the consistency of each metabolite in repeated measurement of QC samples, and each metabolite data is screened and filtered according to the precision of each metabolite. When the discrete coefficient CV between a certain metabolite and a standard substance is more than 30%, the metabolite is filtered out and does not participate in subsequent data analysis;
step 2: screening the metabolite indexes according to serum samples of a cancer group and a non-cancer group by using a feature selection algorithm, and performing LASSO regression modeling on the screened indexes by using a regression analysis tool, wherein a linear model of a LASSO regression model estimated sparse parameters is used for screening the metabolites, and through cross-checking the samples, optimal parameters are selected according to model accuracy;
step 3: performing logistic regression modeling according to the metabolite indexes reserved by LASSO regression, and evaluating different index numbers, namely the performances of the models under different metabolite numbers, by using the McFadden pseudo R square value and a maximum likelihood estimation method to obtain an optimal index number, so as to obtain optimal metabolite molecules;
the logistic regression model is used for modeling according to metabolite intensity, and calculating the prostate cancer Risk score according to regression coefficients and intercept obtained by logistic regression, wherein the formula is Risk score=coef (metabolite 1) ×intensity (metabolite 1) +coef (metabolite 2) ×intensity (metabolite 2) +··+coef (metabolite N) ×intensity (metabolite N) +interval).
Wherein, risk score is cancer Risk score, coef () is regression coefficient coefficis of logistic regression, intensity () is metabolite concentration;
calculating cancer risk scores for each sample according to a logistic regression model, and obtaining a model under the optimal index number, wherein the formula is as follows: pcascore=0.623+b 0.100+c 0.001+d 22.045+e (-1.060) +f (-0.011) -10.973)
Wherein PCaScore is the prostate cancer risk score, A represents the concentration of hypoxanthine in the sample, B represents the concentration of tryptophan in the sample, C represents the concentration of lactic acid in the sample, D represents the concentration of taurocholate in the sample, E represents the concentration of diacylglycerol DG (16:0/18:2) in the sample, F represents the concentration of phosphatidylcholine PC aa C34:2 in ng/ml in the sample.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
the invention provides a serum metabolite combination biomarker for diagnosing prostate cancer, which can be used for diagnosing prostate cancer in a combined way. The combined marker related by the invention has the advantages of high sensitivity and high specificity, and is expected to be used for auxiliary diagnosis of the prostate cancer.
Drawings
FIG. 1 is a graph showing a differential analysis of serum total metabolites for prostate cancer patients and non-prostate cancer populations in an example of the present invention;
wherein plot (a) is a volcanic plot of the differential analysis, plot (b) is a thermal plot of the differential metabolite, and plot (c) is a correlation thermal plot for the differential metabolite.
FIG. 2 is a diagram showing the selection of all metabolite indices using the Borata algorithm in the examples of the present invention
FIG. 3 is a diagram of screening for metabolite indicators using LASSO regression in an embodiment of the invention;
wherein, the graph (a) is the regression coefficient under different deviation interpretation rates, the graph (b) is the regression coefficient under different Lambda values, and the graph (c) is the importance of different metabolite indexes in the model.
FIG. 4 is a graph showing the determination of index numbers in an embodiment of the present invention;
wherein the graph (a) is the McFadden pseudo-R square value of the model under different index numbers, and the graph (b) is the maximum likelihood estimated P value of the model under different index numbers compared with the model under the previous index number.
FIG. 5 is a graph of six metabolite indices of the final selection in an embodiment of the invention;
wherein graph (a) models the odds ratio at each standard deviation for six indices, respectively, and graph (b) compares the intensities of the six indices in serum from prostate cancer patients and non-prostate cancer populations.
FIG. 6 is a graph showing detection performance at different score thresholds according to an embodiment of the present invention;
wherein graph (a) is sensitivity at different score thresholds and graph (b) is specificity at different score thresholds.
FIG. 7 is a graph comparing risk scores for prostate cancer patients and non-prostate cancer populations in an embodiment of the present invention.
FIG. 8 is a graph showing the detection capability of risk scores and clinical indicators according to an embodiment of the present invention;
wherein figure (a) is a comparison of the detection performance of the risk score and the PSA indicator and figure (b) is a comparison of the detection performance of the risk score and the PSAD indicator.
FIG. 9 is a graph comparing AUC curves for risk score and tPSA metrics in an embodiment of the invention;
fig. 10 is a graph comparing AUC curves for risk score and PSAD index in an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
A serum metabolite combination biomarker for diagnosing prostate cancer, comprising hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2), phosphatidylcholine PC aa C34:2, which is a prostate cancer detection marker;
the serum metabolite combination biomarker is obtained by adopting a biomarker screening method based on targeted quantitative metabonomics, and specifically comprises the following steps of:
step 1: performing metabonomics analysis on serum of a prostate cancer patient and serum of a non-prostate cancer population by adopting a targeted quantitative metabonomics analysis technology;
in the embodiment, a BioCrates P500 kit is used for carrying out sample treatment on serum of a prostate cancer patient and serum of a non-prostate cancer crowd, an ACQUITY UPLC system liquid chromatograph is used for separation, a QTRAP 6500 mass spectrometer is used as a detector, and a MetLMS system is used for carrying out metabolome quantification by using analysis metabolome mass spectrum data software;
step 1.1: collecting serum samples of prostate cancer patients and serum samples of non-prostate cancer people;
in this example, 88 serum samples were collected, including 48 prostate cancer patients, 20 benign prostatic hyperplasia patients, and 20 healthy prostate population serum samples, all of which were confirmed by clinical histopathology. Sample collection, storage and use was approved by the ethical committee of medical science research, affiliated with the first hospital at the university of chinese medical science ([ 2022] 2021-539-2).
Step 1.2: treating the serum sample and extracting metabolites;
sample grouping information and variable conditions are registered in the MetLIMS software. According to the designed sample table, 10 μl of sample was added to the corresponding position in the 96-well plate provided by the BiocratesP500 kit and dried with nitrogen for 40 minutes. Derivatization was performed using 5% pitc, and after incubation in the dark for 1 hour, nitrogen was blown dry for 2 hours. 300. Mu.L of extraction solvent was added and mixed for 30 minutes at 450 rpm. Centrifugal filtration was carried out at 600rpm for 10 minutes, and the filtered extract was collected. 100. Mu.L of the extract was transferred to a new 96-well plate, 100. Mu.L of water was added, and 5. Mu.L of the diluted 2-fold solution was used for LCMS mode detection. Another 10. Mu.L of the extract was transferred to a new 96-well plate, 240. Mu.L of the mobile phase was added, and 20. Mu.L of the 25-fold diluted solution was used for FIA mode detection.
Step 1.3: mass spectrum detection;
sequentially carrying out mass spectrometry on serum samples according to a sample table derived from a MetLIMS system, wherein each sample is detected in two parts, the first part uses a mass spectrometer to carry out FIA mode signal acquisition and positive ion mode acquisition, the second part uses the mass spectrometer to carry out LCMS mode acquisition, and each sample is acquired once in positive and negative ion modes respectively to obtain original mass spectrum data;
step 1.4: preprocessing data;
the generated original mass spectrum data is imported into a MetLIMS system, a standard curve is generated by the software according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; the quality of each metabolite data is evaluated according to the precision of each metabolite in the standard, i.e. the consistency of each metabolite in repeated measurement of QC samples, and each metabolite data is screened and filtered according to the precision of each metabolite. When the discrete coefficient CV between a certain metabolite and a standard substance is more than 30%, the metabolite is filtered out and does not participate in subsequent data analysis;
in this example, 249 metabolite indices were ultimately retained, with p-values less than 0.05 and fold changes greater than 1.2 as thresholds, 7 of which were significantly up-regulated in serum from prostate cancer patients and 16 of which were significantly down-regulated from prostate cancer patients, as shown in fig. 1 (a). The intensity and correlation heat maps of these significantly varying metabolites in different samples are shown in fig. 1 (b) and fig. 1 (c).
Step 2: screening metabolite indexes according to serum samples of a cancer group and a non-cancer group by using a feature selection algorithm Borata, performing LASSO regression modeling on the screened indexes by using a regression analysis tool caret tool and a glmcet tool, wherein a linear model of a LASSO regression model estimated sparse parameters is used for screening the metabolites, and selecting optimal parameters according to model accuracy by cross-checking the samples;
in this example, serum samples from healthy and benign prostate hyperplasia groups were used as non-cancer groups, serum samples from prostate cancer patients were used as cancer groups, and 15 metabolites were finally retained for subsequent analysis, as shown in fig. 2. Cross-checking to cross-check using the caret tool, 10-fold cross-checking was performed using the trainControl function to set the cross-check parameters. For the established models, the model accuracy is calculated respectively, and the model with the highest model accuracy is selected, wherein the parameters of the model are the final LASSO regression model of the optimal parameters, and the importance of the metabolite indexes is shown in figure 3.
Step 3: performing logistic regression modeling according to the metabolite indexes reserved by LASSO regression, and evaluating different index numbers, namely the performances of the models under different metabolite numbers, by using the McFadden pseudo R square value and a maximum likelihood estimation method to obtain an optimal index number, so as to obtain optimal metabolite molecules;
the final metabolites selected in this example were six of hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2), phosphatidylcholine PC aa C34:2, the intensities of which are shown in FIG. 5 (b) in serum from prostate cancer patients and non-prostate cancer humans.
In this embodiment, logistic regression modeling is performed on the indexes from the first to the last thirteen indexes of importance according to the importance of the indexes, and the performance of the model under different metabolite index numbers is evaluated according to the use of the McFadden pseudo R square value and the maximum likelihood estimation method. The behavior of the model at different metabolite index numbers is shown in figure 4. If the index number continues to be increased under the current index number, but the McFadden pseudo R square value increases slowly, the maximum likelihood estimation method P value is larger than 0.1, and the current index number is considered to be optimal. The McFadden pseudo R square value grows slowly when the index number is six, as shown in fig. 4 (a), and the P value of the maximum likelihood estimation method is larger than 0.1, as shown in fig. 4 (b), the index number is considered to be continuously increased without significant gain on modeling, so that the last six variables are selected to establish a final logistic regression model.
It is subjected to logistic regression modeling, and the advantage per standard deviation change of each index is shown in fig. 5 (a).
The logistic regression model is used for modeling according to metabolite intensity, and calculating the prostate cancer Risk score according to regression coefficients and intercept obtained by logistic regression, wherein the formula is Risk score=coef (metabolite 1) ×intensity (metabolite 1) +coef (metabolite 2) ×intensity (metabolite 2) +··+coef (metabolite N) ×intensity (metabolite N) +interval).
Wherein, risk score is cancer Risk score, coef () is regression coefficient coefficis of logistic regression, intensity () is metabolite concentration;
calculating cancer risk scores for each sample according to a logistic regression model, and obtaining a model under the optimal index number, wherein the formula is as follows: pcascore=0.623+b 0.100+c 0.001+d 22.045+e (-1.060) +f (-0.011) -10.973)
Wherein PCaScore is the prostate cancer risk score, A represents the concentration of hypoxanthine in the sample, B represents the concentration of tryptophan in the sample, C represents the concentration of lactic acid in the sample, D represents the concentration of taurocholate in the sample, E represents the concentration of diacylglycerol DG (16:0/18:2) in the sample, F represents the concentration of phosphatidylcholine PC aa C34:2 in ng/ml in the sample.
In this example, the cancer risk score was greater than 0 as the threshold, and the prediction of the cancer risk score was shown in fig. 6, with a sensitivity of 0.921, a specificity of 0.875, and an accuracy of 0.885 in all 78 samples. The risk score of the serum samples of prostate cancer patients was significantly higher than that of the serum samples of non-prostate cancer population, as shown in fig. 7.
The cancer risk scores calculated from the six metabolite indices were compared to the clinically usual Prostate Specific Antigen (PSA) and Prostate Specific Antigen Density (PSAD). For the prediction based on PSA, prostate cancer is judged if tPSA is greater than 10ng/ml, further judgment is made based on fPSA/tPSA if tPSA is less than 10ng/ml and greater than 4ng/ml, and prostate cancer is judged if fPSA/tPSA is less than 0.16. If the PSAD is greater than 0.15, the PSAD is determined to be prostate cancer.
Of the 72 samples recorded with PSA values, 32 prostate cancer samples were total, of which 23 showed positive for PSA index and 9 showed negative for PSA index. 29 risk scores greater than 0 and 3 risk scores less than 0 are shown in fig. 8 (a). A total of 25 prostate cancer samples among 45 samples recorded with PSAD, 19 of which showed positive PSAD index and 6 showing negative PSAD index. 24 risk scores greater than 0 and 1 risk score less than 0 are shown in fig. 8 (b).
Logistic regression modeling was performed on tPSA and PSAD, respectively, with an AUC value of 0.823 for tPSA and a risk score of 0.934 for 72 samples as shown in fig. 9. The AUC value for PSAD in 45 samples was 0.808 and the risk score was 0.892 as in fig. 10.
In summary, the results show that the fraction of prostate cancer branching calculated from the six metabolites has excellent detection capability for prostate cancer, and the performance of the fraction is superior to that of the existing clinical PSA and PSAD indexes.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.
Claims (4)
1. A serum metabolite combination biomarker for diagnosing prostate cancer, which is characterized by comprising hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2) and phosphatidylcholine PC aa C34:2, and is a prostate cancer detection marker.
2. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 1, obtained using a biomarker screening method based on targeted quantitative metabolomics, characterized by comprising the steps of:
step 1: performing metabonomics analysis on serum of a prostate cancer patient and serum of a non-prostate cancer population by adopting a targeted quantitative metabonomics analysis technology;
sample treatment is carried out on serum of a prostate cancer patient and serum of a non-prostate cancer crowd by using a kit, separation is carried out by using a liquid chromatograph, a mass spectrometer is used as a detector, and metabolome quantification is carried out by using analysis metabolome mass spectrum data software;
step 2: screening the metabolite indexes according to serum samples of a cancer group and a non-cancer group by using a feature selection algorithm, and performing LASSO regression modeling on the screened indexes by using a regression analysis tool, wherein a linear model of a LASSO regression model estimated sparse parameters is used for screening the metabolites, and through cross-checking the samples, optimal parameters are selected according to model accuracy;
step 3: and carrying out logistic regression modeling according to the metabolite indexes reserved by LASSO regression, and evaluating different index numbers, namely the performances of the models under different metabolite numbers, by using the McFadden pseudo R square value and a maximum likelihood estimation method to obtain the optimal index number, thereby obtaining the optimal metabolite molecules.
3. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 2, characterized in that step 1 specifically comprises the steps of:
step 1.1: collecting serum samples of prostate cancer patients and serum samples of non-prostate cancer people;
step 1.2: treating the serum sample and extracting metabolites;
step 1.3: mass spectrum detection;
sequentially carrying out mass spectrometry on serum samples according to a sample table derived from analysis metabolome mass spectrum data software, detecting each sample in two parts, wherein the first part uses a mass spectrometer to carry out FIA mode signal acquisition and carries out LCMS mode acquisition, and the second part uses the mass spectrometer to carry out LCMS mode acquisition, and each sample is respectively acquired in positive and negative ion modes once to obtain original mass spectrum data;
step 1.4: preprocessing data;
the generated original mass spectrum data are imported into analysis metabolome mass spectrum data software, the software generates a standard curve according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and meanwhile, the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; evaluating the quality of each metabolite data according to the precision of each metabolite in the standard substance, namely the consistency degree of each metabolite in repeated QC sample measurement, and screening and filtering each metabolite data according to the precision condition of each metabolite; that is, when the coefficient of dispersion CV between a certain metabolite and a standard is greater than 30%, the metabolite is filtered out and no longer participates in the subsequent data analysis.
4. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 2, wherein the logistic regression model in step 3 is modeling according to metabolite intensity, and calculating a prostate cancer Risk score according to regression coefficients and intercepts obtained by logistic regression, wherein the formula is Risk score=coef (metabolite 1) ×intensity (metabolite 1) +coef (metabolite 2) ×intensity (metabolite 2) +· +coef (metabolite N) ×intensity (metabolite N) +intensity);
wherein, risk score is cancer Risk score, coef () is regression coefficient coefficis of logistic regression, intensity () is metabolite concentration;
calculating cancer risk scores for each sample according to a logistic regression model, and obtaining a model under the optimal index number, wherein the formula is as follows: pcascore=0.623+b 0.100+c 0.001+d 22.045+e (-1.060) +f (-0.011) -10.973)
Wherein PCaScore is the prostate cancer risk score, A represents the concentration of hypoxanthine in the sample, B represents the concentration of tryptophan in the sample, C represents the concentration of lactic acid in the sample, D represents the concentration of taurocholate in the sample, E represents the concentration of diacylglycerol DG (16:0/18:2) in the sample, F represents the concentration of phosphatidylcholine PC aa C34:2 in ng/ml in the sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723363.7A CN116908450A (en) | 2023-06-19 | 2023-06-19 | Serum metabolite combination biomarker for diagnosing prostate cancer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310723363.7A CN116908450A (en) | 2023-06-19 | 2023-06-19 | Serum metabolite combination biomarker for diagnosing prostate cancer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116908450A true CN116908450A (en) | 2023-10-20 |
Family
ID=88365837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310723363.7A Pending CN116908450A (en) | 2023-06-19 | 2023-06-19 | Serum metabolite combination biomarker for diagnosing prostate cancer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116908450A (en) |
-
2023
- 2023-06-19 CN CN202310723363.7A patent/CN116908450A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109884302B (en) | Lung cancer early diagnosis marker based on metabonomics and artificial intelligence technology and application thereof | |
Zhang et al. | Tree analysis of mass spectral urine profiles discriminates transitional cell carcinoma of the bladder from noncancer patient | |
CN112858551B (en) | Application and kit of combined metabolic biomarker for diagnosing esophageal squamous carcinoma | |
CN109830264B (en) | Method for classifying tumor patients based on methylation sites | |
CN113711044B (en) | Biomarker for detecting colorectal cancer or adenoma and method thereof | |
CN115410713A (en) | Hepatocellular carcinoma prognosis risk prediction model construction based on immune-related gene | |
CN109307764B (en) | Application of a group of metabolic markers in preparation of glioma diagnostic kit | |
CN112151121A (en) | Diagnostic marker, kit and screening method for esophageal cancer diagnosis and construction method of esophageal cancer diagnosis model | |
CN113528672A (en) | Biomarker combination for early screening of bladder cancer, kit and application | |
CN111354421B (en) | Health risk assessment method | |
CN1851455B (en) | Tumour marker-serum protein fingerprint detecting method | |
Song et al. | MALDI‐TOF‐MS analysis in low molecular weight serum peptidome biomarkers for NSCLC | |
CN116908450A (en) | Serum metabolite combination biomarker for diagnosing prostate cancer | |
JP2018511811A (en) | Diagnostic method for endometrial cancer | |
CN113539478B (en) | Metabolic omics-based deep vein thrombosis prediction model establishing method | |
CN115684451A (en) | Esophageal squamous carcinoma lymph node metastasis diagnosis marker based on metabonomics and application thereof | |
CN110780070B (en) | Plasma protein molecule for detecting cancer chemotherapy sensitivity, application and kit | |
CN109633142B (en) | Method for establishing acute myelocytic leukemia diagnosis model and application thereof | |
CN113252899A (en) | Application of marker in preparation of colorectal cancer detection kit | |
CN109444277B (en) | Application of metabolic marker in preparation of glioma diagnostic kit | |
CN112834652B (en) | Acute aortic dissection patient-specific biomarker composition and application thereof | |
CN116165385B (en) | Serum metabolic marker for liver cancer diagnosis and screening method and application thereof | |
CN112986588B (en) | Application of lipid marker in preparation of colorectal cancer detection kit | |
CN116344027B (en) | Intestinal adenoma adenocarcinoma diagnosis method based on peripheral blood circulation micro ribonucleic acid and protein | |
CN116386716B (en) | Metabolomics and methods for gastric cancer diagnosis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |