CN116908450A

CN116908450A - Serum metabolite combination biomarker for diagnosing prostate cancer

Info

Publication number: CN116908450A
Application number: CN202310723363.7A
Authority: CN
Inventors: 黄万旭; 李秉武; 李花; 穆润清
Original assignee: Shenyang Pharmaceutical University
Current assignee: Shenyang Pharmaceutical University
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-10-20

Abstract

The invention provides a serum metabolite combination biomarker for diagnosing prostate cancer, which relates to the technical field of biomedicine and clinical examination, and consists of hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2) and phosphatidylcholine PC aa C34:2. According to the invention, the molecular concentration of 630 serum metabolites is accurately measured by carrying out targeted quantitative metabonomics analysis on clinical serum samples, modeling is carried out according to the metabolite intensity based on a logistic regression algorithm, regression coefficients and cut-off values are determined, the marker combinations are screened, risk scores are established, and whether the subject is a prostate cancer patient is judged. The metabolite combination developed by the invention has remarkable advantages compared with the markers used in the current clinic, and can be used for clinical diagnosis and screening of the prostate cancer.

Description

Serum metabolite combination biomarker for diagnosing prostate cancer

Technical Field

The invention relates to the technical field of biomedicine and clinical examination, in particular to a serum metabolite combination biomarker for diagnosing prostate cancer.

Background

Prostate cancer (PCa) is the most common cancer in older men and the second most common cause of cancer death in men. Biomarkers are not only critical for the initial diagnosis of prostate cancer, but also provide an effective method to screen appropriate populations, guide initial treatment strategies, assess the efficacy of treatment and track the progression of cancer over time. At present, prostate specific antigen (prostate specific antigen, PSA) in serum is mainly used as a tumor marker for diagnosing prostate cancer clinically, but the specificity is low, the false positive rate is high, a large number of unnecessary clinical puncture biopsies are caused, and heavy living and economic burden is brought to patients. There is thus a great need for novel biomarkers to improve clinical decisions and management of PCa (Kdadra et al, 2019).

Metabolome represents a complete set of metabolites that are end products of cellular processes within a biological cell, tissue, organ or organism and can be considered as a measure of the activity at the genomic, epigenomic, transcriptomic and proteomic levels and their end results of interactions with the environment. Research has shown that metabonomic features of cancer can be used to assess disease risk, or for cancer screening, diagnosis and monitoring of specific disease sub-populations (Schmidtet al, 2021). For example, a group of 4 metabolites (L-octanoyl carnitine, pyroglutamic acid, hypoxanthine, and docosahexaenoic acid) were identified as potential breast cancer serum biomarkers (Park et al, 2019). A group of 5 serum metabolites (including glutamic acid, choline, 1, 5-anhydroglucitol, betaine, and guanidine) was able to distinguish pancreatic cancer patients from control groups (Xie et al, 2015).

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a serum metabolite combination biomarker for diagnosing prostate cancer.

A serum metabolite combination biomarker for diagnosing prostate cancer, comprising hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2), phosphatidylcholine PC aa C34:2, which is a prostate cancer detection marker;

the serum metabolite combination biomarker is obtained by adopting a biomarker screening method based on targeted quantitative metabonomics, and specifically comprises the following steps of:

step 1: performing metabonomics analysis on serum of a prostate cancer patient and serum of a non-prostate cancer population by adopting a targeted quantitative metabonomics analysis technology;

sample treatment is carried out on serum of a prostate cancer patient and serum of a non-prostate cancer crowd by using a kit, separation is carried out by using a liquid chromatograph, a mass spectrometer is used as a detector, and metabolome quantification is carried out by using analysis metabolome mass spectrum data software;

step 1.1: collecting serum samples of prostate cancer patients and serum samples of non-prostate cancer people;

step 1.2: treating the serum sample and extracting metabolites;

step 1.3: mass spectrum detection;

sequentially carrying out mass spectrometry on serum samples according to a sample table derived from analysis metabolome mass spectrum data software, detecting each sample in two parts, wherein the first part uses a mass spectrometer to carry out FIA mode signal acquisition and carries out LCMS mode acquisition, and the second part uses the mass spectrometer to carry out LCMS mode acquisition, and each sample is respectively acquired in positive and negative ion modes once to obtain original mass spectrum data;

step 1.4: preprocessing data;

the generated original mass spectrum data are imported into analysis metabolome mass spectrum data software, the software generates a standard curve according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and meanwhile, the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; the quality of each metabolite data is evaluated according to the precision of each metabolite in the standard, i.e. the consistency of each metabolite in repeated measurement of QC samples, and each metabolite data is screened and filtered according to the precision of each metabolite. When the discrete coefficient CV between a certain metabolite and a standard substance is more than 30%, the metabolite is filtered out and does not participate in subsequent data analysis;

step 2: screening the metabolite indexes according to serum samples of a cancer group and a non-cancer group by using a feature selection algorithm, and performing LASSO regression modeling on the screened indexes by using a regression analysis tool, wherein a linear model of a LASSO regression model estimated sparse parameters is used for screening the metabolites, and through cross-checking the samples, optimal parameters are selected according to model accuracy;

step 3: performing logistic regression modeling according to the metabolite indexes reserved by LASSO regression, and evaluating different index numbers, namely the performances of the models under different metabolite numbers, by using the McFadden pseudo R square value and a maximum likelihood estimation method to obtain an optimal index number, so as to obtain optimal metabolite molecules;

the logistic regression model is used for modeling according to metabolite intensity, and calculating the prostate cancer Risk score according to regression coefficients and intercept obtained by logistic regression, wherein the formula is Risk score=coef (metabolite 1) ×intensity (metabolite 1) +coef (metabolite 2) ×intensity (metabolite 2) +··+coef (metabolite N) ×intensity (metabolite N) +interval).

Wherein, risk score is cancer Risk score, coef () is regression coefficient coefficis of logistic regression, intensity () is metabolite concentration;

calculating cancer risk scores for each sample according to a logistic regression model, and obtaining a model under the optimal index number, wherein the formula is as follows: pcascore=0.623+b 0.100+c 0.001+d 22.045+e (-1.060) +f (-0.011) -10.973)

Wherein PCaScore is the prostate cancer risk score, A represents the concentration of hypoxanthine in the sample, B represents the concentration of tryptophan in the sample, C represents the concentration of lactic acid in the sample, D represents the concentration of taurocholate in the sample, E represents the concentration of diacylglycerol DG (16:0/18:2) in the sample, F represents the concentration of phosphatidylcholine PC aa C34:2 in ng/ml in the sample.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in:

the invention provides a serum metabolite combination biomarker for diagnosing prostate cancer, which can be used for diagnosing prostate cancer in a combined way. The combined marker related by the invention has the advantages of high sensitivity and high specificity, and is expected to be used for auxiliary diagnosis of the prostate cancer.

Drawings

FIG. 1 is a graph showing a differential analysis of serum total metabolites for prostate cancer patients and non-prostate cancer populations in an example of the present invention;

wherein plot (a) is a volcanic plot of the differential analysis, plot (b) is a thermal plot of the differential metabolite, and plot (c) is a correlation thermal plot for the differential metabolite.

FIG. 2 is a diagram showing the selection of all metabolite indices using the Borata algorithm in the examples of the present invention

FIG. 3 is a diagram of screening for metabolite indicators using LASSO regression in an embodiment of the invention;

wherein, the graph (a) is the regression coefficient under different deviation interpretation rates, the graph (b) is the regression coefficient under different Lambda values, and the graph (c) is the importance of different metabolite indexes in the model.

FIG. 4 is a graph showing the determination of index numbers in an embodiment of the present invention;

wherein the graph (a) is the McFadden pseudo-R square value of the model under different index numbers, and the graph (b) is the maximum likelihood estimated P value of the model under different index numbers compared with the model under the previous index number.

FIG. 5 is a graph of six metabolite indices of the final selection in an embodiment of the invention;

wherein graph (a) models the odds ratio at each standard deviation for six indices, respectively, and graph (b) compares the intensities of the six indices in serum from prostate cancer patients and non-prostate cancer populations.

FIG. 6 is a graph showing detection performance at different score thresholds according to an embodiment of the present invention;

wherein graph (a) is sensitivity at different score thresholds and graph (b) is specificity at different score thresholds.

FIG. 7 is a graph comparing risk scores for prostate cancer patients and non-prostate cancer populations in an embodiment of the present invention.

FIG. 8 is a graph showing the detection capability of risk scores and clinical indicators according to an embodiment of the present invention;

wherein figure (a) is a comparison of the detection performance of the risk score and the PSA indicator and figure (b) is a comparison of the detection performance of the risk score and the PSAD indicator.

FIG. 9 is a graph comparing AUC curves for risk score and tPSA metrics in an embodiment of the invention;

fig. 10 is a graph comparing AUC curves for risk score and PSAD index in an embodiment of the invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

in the embodiment, a BioCrates P500 kit is used for carrying out sample treatment on serum of a prostate cancer patient and serum of a non-prostate cancer crowd, an ACQUITY UPLC system liquid chromatograph is used for separation, a QTRAP 6500 mass spectrometer is used as a detector, and a MetLMS system is used for carrying out metabolome quantification by using analysis metabolome mass spectrum data software;

in this example, 88 serum samples were collected, including 48 prostate cancer patients, 20 benign prostatic hyperplasia patients, and 20 healthy prostate population serum samples, all of which were confirmed by clinical histopathology. Sample collection, storage and use was approved by the ethical committee of medical science research, affiliated with the first hospital at the university of chinese medical science ([ 2022] 2021-539-2).

Step 1.2: treating the serum sample and extracting metabolites;

sample grouping information and variable conditions are registered in the MetLIMS software. According to the designed sample table, 10 μl of sample was added to the corresponding position in the 96-well plate provided by the BiocratesP500 kit and dried with nitrogen for 40 minutes. Derivatization was performed using 5% pitc, and after incubation in the dark for 1 hour, nitrogen was blown dry for 2 hours. 300. Mu.L of extraction solvent was added and mixed for 30 minutes at 450 rpm. Centrifugal filtration was carried out at 600rpm for 10 minutes, and the filtered extract was collected. 100. Mu.L of the extract was transferred to a new 96-well plate, 100. Mu.L of water was added, and 5. Mu.L of the diluted 2-fold solution was used for LCMS mode detection. Another 10. Mu.L of the extract was transferred to a new 96-well plate, 240. Mu.L of the mobile phase was added, and 20. Mu.L of the 25-fold diluted solution was used for FIA mode detection.

Step 1.3: mass spectrum detection;

sequentially carrying out mass spectrometry on serum samples according to a sample table derived from a MetLIMS system, wherein each sample is detected in two parts, the first part uses a mass spectrometer to carry out FIA mode signal acquisition and positive ion mode acquisition, the second part uses the mass spectrometer to carry out LCMS mode acquisition, and each sample is acquired once in positive and negative ion modes respectively to obtain original mass spectrum data;

step 1.4: preprocessing data;

the generated original mass spectrum data is imported into a MetLIMS system, a standard curve is generated by the software according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; the quality of each metabolite data is evaluated according to the precision of each metabolite in the standard, i.e. the consistency of each metabolite in repeated measurement of QC samples, and each metabolite data is screened and filtered according to the precision of each metabolite. When the discrete coefficient CV between a certain metabolite and a standard substance is more than 30%, the metabolite is filtered out and does not participate in subsequent data analysis;

in this example, 249 metabolite indices were ultimately retained, with p-values less than 0.05 and fold changes greater than 1.2 as thresholds, 7 of which were significantly up-regulated in serum from prostate cancer patients and 16 of which were significantly down-regulated from prostate cancer patients, as shown in fig. 1 (a). The intensity and correlation heat maps of these significantly varying metabolites in different samples are shown in fig. 1 (b) and fig. 1 (c).

Step 2: screening metabolite indexes according to serum samples of a cancer group and a non-cancer group by using a feature selection algorithm Borata, performing LASSO regression modeling on the screened indexes by using a regression analysis tool caret tool and a glmcet tool, wherein a linear model of a LASSO regression model estimated sparse parameters is used for screening the metabolites, and selecting optimal parameters according to model accuracy by cross-checking the samples;

in this example, serum samples from healthy and benign prostate hyperplasia groups were used as non-cancer groups, serum samples from prostate cancer patients were used as cancer groups, and 15 metabolites were finally retained for subsequent analysis, as shown in fig. 2. Cross-checking to cross-check using the caret tool, 10-fold cross-checking was performed using the trainControl function to set the cross-check parameters. For the established models, the model accuracy is calculated respectively, and the model with the highest model accuracy is selected, wherein the parameters of the model are the final LASSO regression model of the optimal parameters, and the importance of the metabolite indexes is shown in figure 3.

the final metabolites selected in this example were six of hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2), phosphatidylcholine PC aa C34:2, the intensities of which are shown in FIG. 5 (b) in serum from prostate cancer patients and non-prostate cancer humans.

In this embodiment, logistic regression modeling is performed on the indexes from the first to the last thirteen indexes of importance according to the importance of the indexes, and the performance of the model under different metabolite index numbers is evaluated according to the use of the McFadden pseudo R square value and the maximum likelihood estimation method. The behavior of the model at different metabolite index numbers is shown in figure 4. If the index number continues to be increased under the current index number, but the McFadden pseudo R square value increases slowly, the maximum likelihood estimation method P value is larger than 0.1, and the current index number is considered to be optimal. The McFadden pseudo R square value grows slowly when the index number is six, as shown in fig. 4 (a), and the P value of the maximum likelihood estimation method is larger than 0.1, as shown in fig. 4 (b), the index number is considered to be continuously increased without significant gain on modeling, so that the last six variables are selected to establish a final logistic regression model.

It is subjected to logistic regression modeling, and the advantage per standard deviation change of each index is shown in fig. 5 (a).

In this example, the cancer risk score was greater than 0 as the threshold, and the prediction of the cancer risk score was shown in fig. 6, with a sensitivity of 0.921, a specificity of 0.875, and an accuracy of 0.885 in all 78 samples. The risk score of the serum samples of prostate cancer patients was significantly higher than that of the serum samples of non-prostate cancer population, as shown in fig. 7.

The cancer risk scores calculated from the six metabolite indices were compared to the clinically usual Prostate Specific Antigen (PSA) and Prostate Specific Antigen Density (PSAD). For the prediction based on PSA, prostate cancer is judged if tPSA is greater than 10ng/ml, further judgment is made based on fPSA/tPSA if tPSA is less than 10ng/ml and greater than 4ng/ml, and prostate cancer is judged if fPSA/tPSA is less than 0.16. If the PSAD is greater than 0.15, the PSAD is determined to be prostate cancer.

Of the 72 samples recorded with PSA values, 32 prostate cancer samples were total, of which 23 showed positive for PSA index and 9 showed negative for PSA index. 29 risk scores greater than 0 and 3 risk scores less than 0 are shown in fig. 8 (a). A total of 25 prostate cancer samples among 45 samples recorded with PSAD, 19 of which showed positive PSAD index and 6 showing negative PSAD index. 24 risk scores greater than 0 and 1 risk score less than 0 are shown in fig. 8 (b).

Logistic regression modeling was performed on tPSA and PSAD, respectively, with an AUC value of 0.823 for tPSA and a risk score of 0.934 for 72 samples as shown in fig. 9. The AUC value for PSAD in 45 samples was 0.808 and the risk score was 0.892 as in fig. 10.

In summary, the results show that the fraction of prostate cancer branching calculated from the six metabolites has excellent detection capability for prostate cancer, and the performance of the fraction is superior to that of the existing clinical PSA and PSAD indexes.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A serum metabolite combination biomarker for diagnosing prostate cancer, which is characterized by comprising hypoxanthine, tryptophan, lactic acid, taurocholate, diacylglycerol DG (16:0/18:2) and phosphatidylcholine PC aa C34:2, and is a prostate cancer detection marker.

2. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 1, obtained using a biomarker screening method based on targeted quantitative metabolomics, characterized by comprising the steps of:

step 3: and carrying out logistic regression modeling according to the metabolite indexes reserved by LASSO regression, and evaluating different index numbers, namely the performances of the models under different metabolite numbers, by using the McFadden pseudo R square value and a maximum likelihood estimation method to obtain the optimal index number, thereby obtaining the optimal metabolite molecules.

3. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 2, characterized in that step 1 specifically comprises the steps of:

step 1.2: treating the serum sample and extracting metabolites;

step 1.3: mass spectrum detection;

step 1.4: preprocessing data;

the generated original mass spectrum data are imported into analysis metabolome mass spectrum data software, the software generates a standard curve according to a set standard substance, the generation of the standard curve is limited to an LCMS acquisition mode, and meanwhile, the concentration value of each metabolite in each sample is calculated; if the deletion value of the metabolite in more than 50% of the detected samples exists, rejecting the metabolite; setting detection limits, namely an upper detection limit and a lower detection limit, and filling samples below the lower detection limit and samples above the upper detection limit with the detection limits; evaluating the quality of each metabolite data according to the precision of each metabolite in the standard substance, namely the consistency degree of each metabolite in repeated QC sample measurement, and screening and filtering each metabolite data according to the precision condition of each metabolite; that is, when the coefficient of dispersion CV between a certain metabolite and a standard is greater than 30%, the metabolite is filtered out and no longer participates in the subsequent data analysis.

4. The serum metabolite combination biomarker for diagnosing prostate cancer according to claim 2, wherein the logistic regression model in step 3 is modeling according to metabolite intensity, and calculating a prostate cancer Risk score according to regression coefficients and intercepts obtained by logistic regression, wherein the formula is Risk score=coef (metabolite 1) ×intensity (metabolite 1) +coef (metabolite 2) ×intensity (metabolite 2) +· +coef (metabolite N) ×intensity (metabolite N) +intensity);