CN116381072A

CN116381072A - Biomarker for identifying sporadic gout and frequent gout and application thereof

Info

Publication number: CN116381072A
Application number: CN202310066159.2A
Authority: CN
Inventors: 李长贵; 尹慧勇
Original assignee: Affiliated Hospital of University of Qingdao
Current assignee: Suzhou Aimai Strontium Life Technology Co.,Ltd.
Priority date: 2023-01-16
Filing date: 2023-01-16
Publication date: 2023-07-04

Abstract

The invention relates to a biomarker for identifying sporadic gout (InGF) and frequent gout (FrGF) and application thereof. Since gout flares can lead to systemic metabolic changes, the present invention aims at using metabonomics and machine learning algorithms to discover different metabolites and potential metabolic pathways to build predictive models to distinguish between InGF and FrGF. The present invention uses a multivariable selection method based on machine learning to identify potential metabolic biomarkers and further verify through targeted metabolomics. Based on the six biomarkers, the area estimates under the subject operating characteristic curve (ROC) for distinguishing between InGF and FrGF were found to be 0.88 and 0.67 in the cohort and validation cohorts, respectively. The invention provides unprecedented insight for the metabolic basis of gout attack frequency, proves the unique metabonomics profile of frequent gout and sporadic gout groups, and proves that the metabonomics characteristics can distinguish different clinical manifestations of gout.

Description

Biomarker for identifying sporadic gout and frequent gout and application thereof

Technical Field

The invention relates to a biomarker, in particular to a biomarker for identifying sporadic gout (InGF) and frequent gout (FrGF) and application thereof.

Background

Patients with gout often experience sudden joint pain at night, and are ill, and severe pain, edema, redness and inflammation occur at the joint. The most common joint is the first metatarsophalangeal joint, but the joint is not limited thereto, and is also commonly found in joints of hands, knee joints, elbow joints, and the like. The diseased joint eventually becomes inflamed, and the tissue becomes soft after edema, with limited mobility.

Gout is a highly prevalent disease caused by the deposition of Mono Sodium Uronate (MSU) crystals in joints and/or surrounding tissues. Typical clinical manifestations of gout are gout attacks: acute inflammatory arthritis, which accompanies severe pain, presents as a self-limiting episode. The gout flares generally last 7 to 14 days, regress on their own, and then enter an asymptomatic intermittent phase until the next episode. The duration of the asymptomatic interval varies from patient to patient, with some patients lasting 1 to 2 years, while others last only a few months.

Frequent gout (FrGF) is defined as having a number of episodes per year of > 2, while sporadic gout (InGF) is defined as having a number of episodes per year of < 1. Clinically, different clinical treatments for InGF and FrGF patients are suggested. For example, the American society of rheumatology (ACR) guide in 2020 "strongly advised" FrGF patients for uric acid lowering therapy (ULT) and InGF patients "conditionally advised" for ULT. The gout management guidelines updated by the European anti-rheumatic alliance (EULAR) 2016 also suggest different clinical treatments for InGF and FrGF patients. Although variables such as serum uric acid and imaging evidence of MSU crystal deposition are related to future gout flares, these indicators do not fully predict the frequency of gout flares and little is known about the biological basis of frequent gout flares.

Gout is closely associated with other metabolic complications, including hypertension, diabetes, metabolic syndrome, obesity, cardiovascular and cerebrovascular diseases, nonalcoholic steatohepatitis, and chronic kidney disease. Furthermore, gout and some drugs for treating gout are associated with changes in the intestinal microbiome that potentially regulate systemic inflammatory states, and are thus susceptible to not only gout, but also one or more of these concomitant diseases. Thus, metabolism is closely related to inflammation of gout.

Metabonomics is a technique for systematically analyzing all metabolites in biological systems, which has evolved into a powerful histology technique for studying metabolic diseases and, at the systemic biological level, provides unprecedented insight into deregulated metabolic pathways. Although most metabonomics studies are in animal studies, such techniques have been increasingly applied to study rheumatic diseases, including hyperuricemia and gout. Furthermore, metabolomics based on high resolution mass spectrometry in combination with machine learning algorithms have been widely used to identify potential metabolic biomarkers that are relevant for diagnosis and clinical manifestations of disease.

Recent studies by the inventors systematically describe metabolic pathway disorders in asymptomatic hyperuricemia and gout patients. The inventors have further developed a diagnostic model that uses machine learning algorithms to predict the progression of hyperuricemia to gout.

Currently, in the course of treatment scheme selection of gout, patients with recurrent gout (InGF) and frequent gout (FrGF) are usually suggested to be treated differently based on gout management guidelines, but the prediction factor of the frequency of gout attack is not clear. There is a significant metabolic profile difference between sporadic and frequent gout. Alterations in energy metabolism, amino acid metabolism, purine metabolism, and bile acid metabolism, and potential interactions in these pathways, may lead to exacerbation of gout flares. The frequency of gout flares varies from person to person. In the clinical guidelines for gout treatment, different clinical treatments for InGF and FrGF are suggested. Thus, better biomarkers are needed to identify InGF and FrGF, which directly affect the selection of gout treatment regimens. Therefore, based on the basic principle of symptomatic treatment, the method has extremely important clinical significance in finding the biomarker which can rapidly judge and identify the recurrent gout and the frequent gout.

Disclosure of Invention

In order to solve the technical problems, the invention provides a corresponding solution:

because gout attack can cause systemic metabolic change, the invention discovers different metabolites and potential metabolic pathways by using metabonomics and machine learning algorithms, provides biomarkers for identifying recurrent gout and frequent gout, and establishes a prediction model for identifying recurrent gout and frequent gout according to the biomarkers so as to effectively distinguish InGF and FrGF.

The invention provides a novel biomarker for identifying sporadic gout and frequent gout, wherein the biomarker is 4-trimethyl-ammioniobutanoic acid, 5'-methylthioadenosine (5' -methylthioadenosine), arachidonic acid (arachidic acid), taurine (taurines), uridine (uridines) and xanthine (xanthines).

The prediction model for identifying the sporadic gout and the frequent gout, which is established according to the biomarker, is characterized in that the identification and judgment formula of the model is as follows:

Predictionscore＝e ^logit(P) /(1+e ^logit(P) )

logit(P)＝2.00-0.21×[Arachidic acid]-0.12×[Xanthine]+0.41×[4-Trimethylammoniobutanoicacid]+0.08×[Taurine]+0.07×[5′-Methylth[oadenosine]-1.93×[Uridine]

wherein, (4-trimethyl-ammioniobutanoic acid) is 4-trimethylaminobutyric acid, (5 '-methylmethadenosine) is 5' -methylthioadenosine, (arachidic acid) is arachidonic acid, (taurines) is taurine, (uridines) is uridine and (xantine) is xanthine;

if the score is more than 0.5, the probability that the patient belongs to frequent gout is high; conversely, when the score is < 0.5, the patient will be classified as sporadic gout.

A kit for identifying sporadic and frequent gout, the kit comprising the biomarker described above.

The application of the biomarker in the preparation of the kit for identifying the sporadic gout and the frequent gout is applied to identifying the sporadic gout and the frequent gout.

The application of the model in the preparation of the kit for identifying the sporadic gout and the frequent gout is applied to identifying the sporadic gout and the frequent gout.

A computer-readable storage medium having stored thereon a computer program for execution by a processor to calculate an identification judgment formula of:

Predictionscore＝e ^logit(P) /(1+e ^logit(P) )

lgit(P)＝2.00-0.21×[Arachidic acid]-0.12×[Xanthine]+0.41×[4-Trimethylammoniobutanoic acid]+0.08×[Taurine]+0.07×[5′-Methylthioadenosine]-1.93×[Uridine]

An apparatus comprising an input device and a computing device, wherein the input device is to input 4-trimethylaminobutyric acid (4-trimethyl-ammioniobutanoic acid), 5'-methylthioadenosine (5' -methylthioadenosine), arachidonic acid (arachidic acid), taurine (taurines), uridine (uridines), xanthine (xanthines);

the computing device is configured to calculate from the input biomarkers using the following formula:

Prediction score＝e ^logit(P) /(1+e ^logit(P) )

logit(P)＝2.00-0.21×[Arachidic acid]-0.12×[Xanthine]+0.41×[4-Trimethylammoniobutanoic acid]+0.08×[Taurine]+0.07×[5′-Meihylthioadenosine]-1.93×[Uridine]

A computer device comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor performs the following formula calculations when executing the program:

Prediction score＝e ^logit(P) /(1+e ^logit(P) )

logit(P)＝2.00-0.21×[Arachidic acid]-0.12×[Xanthine]+0.41×[4-Trimethylammoniobutanoic acid]+0.08×[Taurine]+0.07×[5′-Methylthioadenosine]-1.93×[Uridine]

Benefits of the present application include, but are not limited to:

1. the invention uses a multivariable selection method based on machine learning to identify potential metabolic biomarkers, and obtains six biomarkers through further verification of targeted metabolomics: 4-trimethylaminobutyric acid, 5' -methylthioadenosine, arachidonic acid, taurine, uridine and xanthine.

2. The invention is based on six biomarkers, and the area estimation values under the working characteristic curve (ROC) of the subjects for distinguishing InGF and FrGF in the discovery queue and the verification queue are respectively 0.88 and 0.67.

3. According to the method, a feasible quantization method for recognizing the sporadic gout and the frequent gout is established through a model formula according to a prediction model for recognizing the sporadic gout and the frequent gout established by the biomarker.

4. The invention also comprises the technical schemes of a corresponding diagnosis kit, a detection system, a computing system and the like based on six biomarkers, and by the methods, the identification of the sporadic gout and the frequent gout becomes a realistic and feasible operation.

5. The invention provides unprecedented insight for the metabolic basis of gout attack frequency, proves the unique metabonomics profile of frequent gout and sporadic gout groups, and proves that the metabonomics characteristics can distinguish different clinical manifestations of gout.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1: the overall flow chart of the study of the invention.

Fig. 2: serum metabolic profiles of gout flares frequency were found in the cohort. Fig. 2A: principal component analysis based on overall metabolic profile and gout flares frequency; fig. 2B: principal component analysis of InGF and FrGF; fig. 2C: orthorhombic least squares discriminant analysis (OPLS-DA) of InGF and FrGF; fig. 2D: in InGF and FrGF, LASSO regression predicts seizure frequency and actual seizure frequency based on metabolic characteristics; fig. 2E: cluster analysis of differential metabolites (ANOVA, FDR < 0.05) between different seizure frequencies.

Fig. 3: changes in metabolites and metabolic pathways were found in the cohort. Fig. 3A: volcanic plot of different metabolites of InGF and FrGF (up-regulated metabolites of FrGF are shown in red, down-regulated in blue, and no significant difference in gray compared to InGF); fig. 3B: a change in metabolite class (represented by log2FC for each metabolite and classifying the metabolite); fig. 3C: KEGG enrichment analysis (using global test, 64 metabolic pathways were obtained); fig. 3D: the land slope of the metabolite (outer circle is 7 metabolic pathways, middle circle is metabolite relative intensity at different frequencies of onset, inner circle is metabolite class, middle line-connected metabolite class represents correlation (red represents correlation coefficient r > 0.5, blue represents r < -0.5)).

Fig. 4: a sub-network of deregulated metabolic pathways.

Fig. 5: model establishment and optimization of the model by targeted metabonomics. Fig. 5A: selection of biomarkers (calculation of corresponding AUC values with different numbers of metabolites); fig. 5B: ROC curves obtained with the first 5 predictive metabolites; fig. 5C: high resolution spectra of xanthine and MS2 spectra (this figure is used to match precursor ions in MS1 spectra with product ions in MS2 spectra); fig. 5D: correlation distribution of targeted and non-targeted metabonomics data (higher confidence of belief that correlation coefficient r > 0.5); fig. 5E: ROC curves for the discovery and validation queues when 6 metabolites were included; fig. 5F: in the final model, the relative concentrations of the 6 metabolites.

Fig. 6: queue and non-targeted metabonomics profile. Fig. 6A: discovery and validation of the frequency distribution of episodes for the patients in the cohort; fig. 6B: analyzing principal components of a sample and QC (quality control); fig. 6C: 200 displacement assays for OPLSDA analysis.

Fig. 7: cluster analysis heatmaps of important metabolites of the queue and metabolic pathways thereof were found.

Fig. 8: purine metabolism and caffeine metabolism are in a sub-network (arrows indicate conversion of metabolites, solid lines indicate direct conversion, dotted lines indicate indirect conversion, and a comparison of the relative concentration of each metabolite between InGF and FrGF is also provided, which indicates P < 0.05).

Fig. 9: in the queue, the effect of the dosing regimen on the differential metabolites was found. Fig. 9A: an upset plot of the number and percentage of overlapping differential metabolites after each drug was removed in sequence; fig. 9B: after each drug was removed in turn, the significant difference of several important metabolites varied (color of the dots indicates log2FC of the metabolite, size of the dots indicates FDR value, ×fdr < 0.05).

Fig. 10: potential biomarkers in non-targeted metabolomics. Fig. 10A: in the diagnostic model, the variable importance ranking of the metabolites (the "boxes" in the figure show the median, 25 th percentile, 75 th percentile); fig. 10B: AUC values for each metabolite corresponding to fig. 10A.

Fig. 11: correlation of non-targeting and targeting results for 25 metabolites.

Fig. 12: optimization of the model by targeted metabolomics. Fig. 12A: a targeted metabonomics optimization diagnostic model incorporating the number of metabolites and corresponding AUC values; fig. 12B: ROC curves of diagnostic models established by several machine learning algorithms; fig. 12C: and taking the medication condition as a prediction variable, and obtaining the ROC curve of the model.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following description of the preferred embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. The specific description is as follows: the following examples, in which the specific conditions are not specified, are conducted under conventional conditions or conditions recommended by the manufacturer, and the raw materials used in the following examples are commercially available from ordinary sources except for the specific descriptions.

The specific conditions are not specified in the embodiments and are carried out according to conventional conditions or conditions recommended by the manufacturer.

In the following embodiments, unless specified otherwise, the reagents or apparatus used are conventional products available commercially without reference to the manufacturer.

Study and sample collection

We have incorporated 638 patients from the gout dedicated disease clinic visits at the affiliated hospitals of the Qingdao university from 1 month 2019 to 11 months 2021. The discovery queue includes: sporadic gout 163 people and frequent gout 239 people, and the verification queue comprises: 97 persons with sporadic gout and 139 persons with frequent gout. The diagnosis of gout accords with the gout classification standard of ACR/EULAR in 2015 and the Chinese hyperuricemia and gout diagnosis and treatment guide in 2019. Sporadic gout (InGF) is defined as the number of times of gout flares in the past 1 year is less than or equal to 1 (patient self report), and frequent gout (FrGF) is defined as the number of times of gout flares in the past 1 year is more than or equal to 2. All participants were males, between 18 and 70 years of age. To minimize confounding factors for other metabolic diseases, exclusion criteria included: (a) At present, the patient has malignant tumor or has history of malignant tumor, (b) chronic renal failure (eGFR is less than 15 mL/min.1.73 m) ² ) (c) liver function abnormality (glutamic pyruvic transaminase or total bilirubin is not less than 2 times of normal value). The study of the invention was approved by the ethical committee of affiliated hospitals at the Qingdao university, and all subjects signed informed consent.

Demographic data and sample collection

Demographic and demographics (age, height, weight, diastolic Blood Pressure (DBP), number of gout flares, and other biochemical indicators) were obtained from all participants. After overnight fast, all participants collected peripheral venous blood in vacuum negative pressure blood collection tubes and coagulated for 30 minutes at room temperature. Then centrifuged at 3500rpm for 10min, the supernatant was separated and stored in a-80 ℃ refrigerator for subsequent biochemical index measurement and LC-MS analysis.

Clinical biochemical index measurement

Alanine Aminotransferase (ALT), aspartic acid Aminotransferase (AST), glucose, triglyceride (TG), cholesterol (CH), urea nitrogen (BUN), creatinine (Cr), uric Acid (UA) were measured using an automated biochemical analyzer (roche, germany).

Sample processing

Serum was collected independently from the discovery and validation queues and pre-treated, with the two queues being separated by 6 months. All samples were subjected to the same treatment procedure. Briefly, 50 μl of serum was mixed with 200 μl of methanol and an internal standard (fluorouracil) at 4deg.C and incubated for 15min to precipitate the proteins. After vortexing at 4 ℃ and centrifugation at 14000rpm, the supernatant was collected and stored at-80 ℃ for analysis. Quality Control (QC) samples were prepared by randomly collecting 200 aliquots of serum samples and mixing.

Non-targeted metabonomics analysis

Non-targeted metabonomics analysis was analyzed by Ultra High Performance Liquid Chromatography (UHPLC) (Nexera UHPLC-30A, japan, shimadzu) in combination with a mass spectrometry system (triple TOF 6600, AB SCIEX). The mass spectrometer operates on positive and negative ions in data-dependent acquisition (data-dependent acquisition, DDA) mode. The mass range of data acquisition is 60-1200 Da, the scanning frequency is 4Hz, and the collision energy is 30eV. UHPLC separation was performed using an ACQUITY UPLC BEH amide column (100 mm. Times.2.1 mm,1.7 μm; waters). Mobile phase a was ultrapure water containing 25mmol/L ammonium hydroxide (NH 4 OH) and 25mmol/L ammonium acetate (NH 4 Ac), and mobile phase B was pure Acetonitrile (ACN). The flow rate was maintained at 0.5mL/min and the oven temperature was maintained at 25 ℃. The separation was performed according to the following gradient: 0-0.5 min, 95% b;0.5-7 minutes, 95% b to 65% b;7-8 minutes, 65% b to 40% b;8-9 minutes, 40% B;9-9.1 minutes, 40% b to 95% b;9.1-12 minutes, 95% B. The sample loading was 2. Mu.L. QC (quality control) samples and blank samples were inserted between every 8 samples and cycled in this manner for data conditioning and instrument cleaning.

The raw data (. Wiff file) was converted by the Proteowizard software to mzML format (https:// Proteowizard. Sourceforge. Io) and then analyzed using the XCMS extension package of R software. The analytical procedure written in R is as follows: detecting peak value by using a Centwave algorithm, wherein the expected error is 20ppm, the peak width is 5-30 s, and the s/N threshold value is 10; retention Time (RT) is adjusted by a subset-based algorithm that adjusts RT of the sample by the previous and subsequent QC of the sample; peak correspondence was performed by a peak density method (peak density methos), and a peak appearing in > 40% of samples was used as a feature (feature); retrieving the missing value from the original file, and if there is no retrieved value, filling the missing value with half of the minimum value; a normalization method based on support vector regression is adopted to eliminate batch effect (batch effect). Features with a coefficient of variation exceeding 30% were excluded from the subsequent analysis.

Based on the precursor m/z and retention time, an MS/MS spectrum is extracted and corresponds to each MS feature. The spectra were first determined by our internal metabolite database, which contains the exact m/z, retention time and MS/MS spectra of the metabolites. Thus, such feature annotations may be considered level 1 according to MSI.

Next, the spectra were identified by several public databases (including HMDB, moNA, massbank and GNPS). The note that the vector dot product > 0.3 is considered to be level 2, otherwise (+.0.3) is considered to be level 3.

Furthermore, we analyzed the profile and MS/MS secondary spectra using MetDNA, which identified metabolites using a recursive algorithm based on the metabolic reaction network (metabolic reaction network-based recursive algorithm) and gave MSI class 4 features. If a feature is identified by multiple annotations, the MSI and priority of the product are considered to select the most reliable annotation.

Targeted metabonomics analysis

Targeted metabolomics was performed using the same chromatographic conditions as non-targeted metabolomics. In contrast, HPLC was combined with triple quadrupole mass spectrometry (LCMS-8050, kyoto Shimadzu, japan) for quantification of metabolites in Multiplex Reaction Monitoring (MRM) mode. For each metabolite, ion polarity, precursor and product ion transitions are extracted from the non-targeted metabonomics data and the ion monitoring parameters are optimized separately. Data processing was performed by Labsolutions software (Kyoto island, japan).

Statistical analysis

All statistical analyses were performed as R (4.0.5). Unsupervised Principal Component Analysis (PCA) was performed by the mixomic extension package of the R software package. Minimum absolute contraction and selection operator (LASSO) regression was performed by the gate extension package of R software as a continuous variable to predict gout's frequency of onset, using 10-fold cross validation in model training to avoid overfitting. Using the ropls extension package of R software, a supervised orthogonal partial least squares discriminant analysis (OPLS-DA) was performed to distinguish between InGF and FrGF, and 200 permutation tests were performed to avoid overfitting. The frequency of onset of gout was considered a discrete variable for the KrusKal-Wallis test, and metabolites with significant differences between different onset frequencies were used for cluster analysis. To identify differential metabolites, we used the Mann-Whitney U test to examine changes in metabolites. The metabolite and pathway classes follow KEGG BRITE (br 08001: compounds with biological roles, br08901: KEGG pathway maps). Enrichment analysis of metabolic pathways was performed by global test according to MSEA. Enrichment analysis was performed only on the pathways belonging to the taxonomic metabolism in br 08901. For metabolic network analysis, we constructed a KEGG-based whole metabolic network consisting of 5226 biological entities and 32020 connections using FELLA extension package of R software. Network analysis using differential metabolites showed the first 250 interfered nodes in the subnetwork. The selection of biomarkers was performed by MUVR extension package of R software (MUVR: multivariate method of unbiased variable selection in R).

Briefly, all samples were randomly divided into 20 external replicates, and in each external replicate, samples were randomly divided into 200 internal replicates for model training and parameter adjustment. The optimal model is evaluated, a variable of 70% is selected according to the importance of the optimal model, and the selected variable is continuously input for another iteration. For each repetition, a random forest model is built and parameters are automatically adjusted. The area under the subject's working characteristics (AUC) was calculated to assess the performance of the model and the variable importance was ranked for comparison between replicates. The predictive model is built from the caret extension package of R software, and if available, each model parameter is adjusted to optimize its performance. In each cohort, the intensity (non-targeted metabolomics) or ratio to internal standard (targeted metabolomics) was normalized, ranging from 0 to 1, to build the model.

Analysis of results

Clinical characteristics of gout patients

To describe the serum metabolic characteristics of InGF and FrGF patients, we performed a serometabonomic study on 163 InGF patients and 239 FrGF patients found in the cohort, the overall flow of the study of the invention being shown in FIG. 1 (FIG. 1: overall flow chart of the study).

Next, we used a machine learning algorithm to select potential metabolites to distinguish between InGF and FrGF. The predictive model was further optimized and validated using targeted metabonomics techniques on a single cohort of 236 subjects (including 97 InGF patients and 139 FrGF patients, respectively).

Table 1 summarizes the clinical characteristics of these two independent queues. The gout flares distribution (flare distribution) in both queues was similar (table 1 and fig. 6A). As expected, the number of tophus and serum uric acid levels in FrGF patients were significantly higher than in inf patients; the proportion of ULT and anti-inflammatory drugs taken by patients in the FrGF group is significantly greater. In both queues, other biochemical parameters are similar.

Table 1 clinical information of study cohort

For categorical variables,data are presented as n(％).For continuous variables _, values are presented as mean(±SD)when they conform to normal distribution.Otherwise,they are presented as medain(quartile).

InGF:infrequent gout flares；FrGF:frequent gout flares；ULT:urate-lowering therapy；DBP:diastolic blood pressure；BMI:body mass index；ALT:alanine aminotransferase；AST:aspartate aminotransferase；GLU:glucose；TG:triglycerides；CH:cholesterol；BUN:blood urea nitrogen；CREA:creatine；UA:uric acid；CCR:creatinine clearance rate；eGFR:estimated glomerular filtration rate.

a At least 20cigarette packs in a lifetime or at least one cigarette a day for at least 1year.

b Alcohol intake at least once a week for 6months.

*indicate p<0.05(FrGF versus InGF)；#indicate p<0.05(Discovery versus Validation cohort)；

Serum metabolic profile that varies with frequency of onset

First, we performed metabonomic analysis on serum samples in the discovery cohort using a non-targeted metabonomics method, with 14141 metabolic features determined in total in positive and negative ion modes (fig. 1, 6B). Interestingly, in Principal Component Analysis (PCA), we observed an overall trend of the metabolic profile in the frequency dependence of gout flares (fig. 2A). Furthermore, the InGF group and the FrGF group, in PCA (FIG. 2B), can be distinguished to some extent; in Orthogonal Partial Least Squares Discriminant Analysis (OPLSDA), the global metabolic characteristics of the InGF and FrGF groups can be completely separated without overfitting (fig. 2C and 6C). We then built a least absolute contraction and selection operator (LASSO) regression model to predict gout flares frequency from metabolic characteristics, with predicted flares frequency matching very well with actual flares frequency (fig. 2D). More importantly, patients in the InGF group were explicitly predicted to be ∈1, while patients in the FrGF group were predicted to be ∈2 (FIG. 2D). Furthermore, we performed a cluster analysis on the annotated 3560 compounds and observed a clear difference between gout at different frequencies of onset (fig. 2E). Consistent with the unsupervised and supervised models, the samples of the InGF group showed a highly similar pattern, unlike the patients of the FrGF group. Together, these results strongly support that the use of metabolic profiles can distinguish between InGF and FrGF.

Differential metabolites and deregulated metabolic pathways between InGF and FrGF

To explore the changes in metabolites and metabolic pathways between InGF and FrGF, we performed a Mann-Whitney U test to determine the differential metabolites between InGF and FrGF in the discovery queue. Of the noted metabolites, frGF patients had 116 metabolites up-regulated (FDR < 0.05 and fold change > 4/3) and 323 metabolites down-regulated (FDR < 0.05 and fold change < 3/4) compared to InGF (FIG. 3A, supplementary Table 1 and supplementary Table 2).

Up-regulated metabolites in FrGF patients of table 1

/>

Supplementation of FrGF patients with down-regulated metabolites

/>

/>

/>

/>

/>

/>

These differential metabolites play an important role in different biological functions. According to the biological function nomenclature known in KEGG (kyoto genome encyclopedia) biological knowledge database, in FrGF patients the down-regulated metabolites are mostly of organic acids, lipids, steroids, hormones and transmitters, while the up-regulated metabolites are mostly carbohydrates (fig. 3B). We then performed a quantitative enrichment analysis based assay on the 64 metabolic pathways of KEGG. As a result, there were 57 metabolic pathways with significant changes between InGF and FrGF (FDR < 0.05), and significantly deregulated pathways were mostly involved in carbohydrate metabolism, amino acid metabolism and nucleotide metabolism (FIG. 3C). Citric acid cycle (TCA cycle), amino sugar and nucleotide sugar metabolism, glyoxylic acid and dicarboxylic acid metabolism, glycolysis/gluconeogenesis are major differences in carbohydrate metabolism; several metabolites, such as pyruvic acid and oxaloacetic acid, are enriched in various carbohydrate metabolic pathways (fig. 3C, fig. 7). Amino acid metabolism, particularly alanine, aspartic acid and glutamic acid metabolism, showed significant changes, and a large number of amino acids and their derivatives were enriched in these pathways (fig. 3C, fig. 7). Nucleotide metabolism, mainly purine metabolism, was also enriched because of the changes in several purine metabolites, xanthine (xanthine), hypoxanthine (hypoxanthine) and uric acid (fig. 3C, fig. 7). Next, we constructed an anabolic landscape in the discovery queue (metabolic landscape) (fig. 3D). Among the seven most significantly altered metabolic pathways (inner circles), the metabolic profile between InGF and FrGF overall exhibited a clear separation.

Next, we applied a network propagation-based algorithm, FELLA, to study the crosstalk (cross talk) between these significantly disturbed single metabolic pathways between InGF and FrGF. The R-packet takes as input statistically different metabolites to evaluate each node (metabolite, enzyme, and reaction) and each edge (hierarchical connection) in the KEGG overall metabolic network to determine the sub-network with the greatest interference between InGF and FrGF. Interestingly, cross talk (cross talk) between purine metabolism and caffeine metabolism is the most clearly disturbed subnetwork (fig. 4). These two pathways focus on uric acid metabolism and are one of the most unique clinical parameters that distinguish between InGF and FrGF. Xanthine Dehydrogenase (XDH) is the rate-limiting enzyme for uric acid formation and thus is a therapeutic target for uric acid lowering drugs, such as febuxostat and allopurinol, and appears to play a key role in regulating caffeine and purine metabolism. Furthermore, the upregulation of XDH was responsible for elevated uric acid levels in FrGF compared to InGF (fig. 8). Upregulation of XDH also resulted in decreased levels of caffeine (caffeine), 1, 7-dimethylxanthine (1, 7-dimethyl xanthine), theophylline (theophylline) and 1-methylxanthine (1-methylxanthine) in caffeine metabolism (FIG. 8). These findings indicate that xanthine is a key metabolite linking these two pathways. In addition to being an endogenous metabolite in purine metabolism, enteric bacteria can synthesize xanthine from ingested caffeine or xanthosine.

The changes in the alanine, aspartate and glutamate metabolic sub-networks are related to taurine and hypotaurine metabolic sub-networks and primary bile acid biosynthesis (fig. 4). We observe four key enzymes linking these three metabolic pathways: serine-glyoxylate aminotransferase (serine-glyoxylate transaminase), aspartate 1-decacarboxylase (aspartate 1-decacarboxylase), palmitoyl-CoA hydrolase (palmitoyl-CoA hydrolase), and bile acid CoA (bile acid-CoA): amino acid N-acylases (FIG. 4, enzymes 15-18). Interestingly, bile acid synthesis is also affected by intestinal microbiomes, and interactions of these sub-networks strongly suggest that intestinal bacterial interactions with the host are involved in inflammation of InGF and FrGF.

To further investigate the effect of drugs (table 1) on metabonomics, we analyzed by excluding patients treated with each drug, and found that the effect of various drug treatments on the number of differential metabolites between InGF and FrGF was limited. Interestingly, the effect of allopurinol was much smaller than that of febuxostat, with a rate of overlap of 98% and 69% for the differential metabolites, respectively, compared to patients not treated with any drug (fig. 9A). Importantly, most important metabolites involved in purine metabolism, arachidonic acid metabolism, bile acid metabolism and aspartic acid metabolism remain statistically significant (fig. 9B). Thus, the metabolic changes observed in the studies of the present invention are mainly caused by endogenous metabolic pathways.

Selection of metabolic biomarkers using targeted metabolomics to build predictive models and validation in a separate cohort

To screen metabolites and build predictive models to distinguish between patients in the InGF and FrGF groups, we applied the multivariate selection algorithm MUVR on all determined metabolites, with confidence levels of MSI class 1 and class 2 metabolites, and tested in a Machine Learning (ML) model, including Support Vector Machines (SVM), random Forest (RF) and LASSO. To select the most predictive and robust (robust) metabolites, we performed 20 external replicates using MUVR, each comprising 200 internal replicates, for iterative variable selection according to their importance ranking (iterative variable selection). The first 6 predicted variables were sufficient to construct a model with an AUC of 0.985, although inclusion of more variables resulted in an increase in AUC (fig. 5A and 5B). In the iteration, 21 metabolites were in stable position and thus contributed most to the predictive model, while the other 35 metabolites in unstable position (turbulent position) also showed predictive ability to varying degrees (fig. 10A-B). All these metabolites are possible for use in predictive models.

It is well known that non-targeted metabonomics is quantitatively limited, whereas triple quadrupole mass spectrometry based on Multiple Reaction Monitoring (MRM) is a quantitative gold standard. Next, we began to build predictive models for InGF and FrGF using a Multiple Reaction Monitoring (MRM) based approach. For each selected biomarker we constructed transitions from precursor ion to product ion pairs from non-targeted MS/MS spectra (fig. 5C) and manually optimized other MS parameters (supplementary table 3).

Precursor and product mass to core ratios and retention times for supplementation of 25 metabolites of Table 3

m/z：mass-to-charge ratio；RT：retention time.

In the discovery cohort, 25 biomarkers were measured in total, 14 of which showed a high correlation between non-targeting and targeting methods (fig. 5D and 11). Next, we applied the same multivariate selection method to determine the most practical metabolite numbers. We build a model using the discovery queue and validate the model using the validation queue. The AUCs of both the discovery and validation queues tended to rise and fall. When 6 metabolites [ 4-trimethylaminobutyric acid (4-trimethyl-ammioniobutanoic acid), 5'-methylthioadenosine (5' -methyladenosine), arachidonic acid (arachidic acid), taurine (taurines), uridine (uridines), and xanthine (xanethine), fig. 5F ] were selected, the model reached the best AUR values in both the discovery and validation queues (fig. 12A). After optimization of the multiple machine learning algorithms (fig. 12B), the AUC in the queue was found to be 0.88, while the AUC of the validation queue was found to be 0.67 (fig. 5E). Notably, we also tried to incorporate various drugs into the model, but no significant improvement was found (fig. 12C). Thus, the last 6 selected biomarkers (fig. 5F and 12) were included in the logistic regression model and the following formula was derived:

Predictionscore＝e ^logit(P) /(1+e ^logit(P) )

each metabolite in the above formula was normalized. If the score is > 0.5, the patient has a higher likelihood of belonging to FrGF; conversely, when the score is < 0.5, the patient will be classified as InGF.

Comprehensive analysis

The present invention discloses a systematic metabolic profile and related metabolic pathways that are able to distinguish between InGF and FrFG. Gout flares are positively correlated with systemic changes in serum metabolome, and studies of the present invention have determined metabolic profiles associated with InGF and FrGF. Next, we selected and validated a set of 6 metabolites using three machine learning algorithms that differentiated the InGF and FrGF in separate validation queues.

Systematic analysis of circulating metabolites using metabolomics revealed a variety of metabolic pathways associated with InGF and FrGF (FIG. 2). Among them, carbohydrate metabolism was ranked most top in significantly altered pathways, and was mainly represented by an increase in oxalic acid succinic acid, oxalic acid, 2, 3-diphosphoglycemic acid and a decrease in citric acid in TCA cycle and glycolytic metabolism (supplementary table 1, supplementary table 2).

TCA and glycolysis are central to the metabolic activity of organisms and are involved in many metabolic diseases, arthritis and inflammation. For example, recombination (rewiring) to glycolysis accounts for macrophage activation and inflammatory factor release by MSU crystals. Glutamate and aspartate metabolism is one of the most altered amino acid metabolic pathways, consistent with previous studies. Interestingly, whole genome association analysis (GWAS) of gout revealed metabolic pathways similar to gout-related gene loci. Common missense variants of the CPS1 and GLS2 genes involved in glutamine metabolism have been found to be associated with lower plasma glutamine levels and identified as gout susceptibility gene loci. Glutamine, which serves as a substrate for the first and rate-limiting steps of the de novo purine biosynthesis, is used as an amino donor to produce 5-phosphoribosyl amine (5-PRA) and glutamic acid; further synthesis of purine and uric acid would use glutamic acid and aspartic acid. Furthermore, aspartic acid and glutamic acid are substrates for epigenomic reprogramming, which occurs in the "training" of the innate immune system by soluble uric acid, which makes the innate immune system more reactive towards MSU crystals. On the other hand, some lipids and fatty acids in FrGF are significantly reduced compared to InGF, such as arachidonic acid and eicosapentaenoic acid. Eicosanoids such as prostaglandin E2 and prostaglandin D2, or oxidized lipids (oxyipins), downstream metabolites of arachidonic acid, are involved in inflammatory and painful reactions associated with gout and various rheumatic diseases. A recent study found that some serum oxidized lipids are biomarkers for early onset of gout in adolescents. The bile acids (e.g., glycocholic acid and chenodeoxycholic acid) were significantly reduced in FrGF group patients, consistent with previous studies in rheumatoid arthritis and gout. Interestingly, the pathway of bile acid synthesis is also affected by the intestinal microbiome. Taken together, all of these data strongly suggest that the interaction of the intestinal flora with the host and the epigenetic modification of certain key metabolic enzymes may be related to inflammation of InGF and FrGF.

Further network analysis using network propagation-based algorithms revealed cross-talk (cross-talk) between different metabolic pathways, which may play a role in mediating metabolic changes in InGF and FrGF, which provides a systematic insight for a better understanding of the underlying metabolic pathophysiology. Consistent with pathway enrichment analysis, the overall interference is focused in a sub-network consisting of purine metabolism and caffeine metabolism. While previous studies have linked coffee intake to serum uric acid concentrations and reduced gout risk, which are associated with multiple alleles of several SNPs, potential pathways linking caffeine metabolism to uric acid formation and gout remain to be established.

Furthermore, we determined a key role for XDH in linking these two pathways (fig. 4). XDH is shared among multiple degradation steps of xanthine and caffeine derivatives. In agreement with this, the reduction of xanthosine and xanthine and the increase of uric acid indicate a higher XDH activity of FrGF, which may be responsible for the reduction of 1, 7-dimethylxanthine, theophylline and methylxanthine and the increase of serum uric acid in FrGF patients (FIG. 8).

In addition, there is an interaction between taurine and hypotaurine metabolism, primary bile acid biosynthesis, alanine, aspartic acid and glutamic acid metabolism. Previous studies have shown reduced bile acid biosynthesis in gout patients and in rat models. Bile acids inhibit XDH by peroxisome proliferator-activated receptor- α (PPAR- α).

In summary, the study of the present invention reveals interference in multiple metabolic networks that distinguish between InGF and FrGF: TCA cycle and glycolysis provide energy and substrates for the synthesis of several amino acids and other metabolic activities, while aspartate, glycine and threonine metabolism is involved in bile acid biosynthesis, which is an important regulator of XDH in purine metabolism, the latter (XDH) appears to be associated with changes in glycine, glutamine and aspartate in uric acid production and caffeine degradation. Taken together, these data again confirm the involvement of the intestinal microbiome, epigenetic modification of acquired immunity (trained immunity) during InGF and FrGF processes.

Metabolomics has become a powerful tool for identifying metabolic biomarkers along with machine learning algorithms for diagnosis of diseases. A recent study uses metabonomics and machine learning to predict clinical outcome of eight common diseases. Using similar methods, we have recently revealed metabolic differences in serum of hyperuricemia and gout patients. In addition to systematically analyzing the metabolic profile changes of InGF and FrGF in the present study, the present invention also creates a predictive model to distinguish between FrGF and InGF, which may have a significant impact on the precise gout management advocated by several clinical guidelines, but currently lacks diagnostic tools. We determined multiple metabolites as biomarkers after strict variable selection and cross-validated by non-targeted metabolomic analysis, then based on targeted metabolomics, a diagnostic model was built using machine learning algorithms to distinguish FrGF from InGF. In addition, the utility model included six metabolites and achieved effective predictions in the discovery cohort (auc=0.88). More importantly, the model was validated in a separate validation queue, auc=0.67.

The invention provides unprecedented insight for the metabolic basis of gout attack frequency, proves the unique metabonomics profile of frequent gout and sporadic gout groups, and proves that the metabonomics characteristics can distinguish different clinical manifestations of gout.

The foregoing is merely an implementation of the present application, and the scope of protection of the present application is not limited by these specific examples, but is determined by the claims of the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the technical ideas and principles of the present application should be included in the protection scope of the present application.

Claims

1. A biomarker for identifying sporadic gout and frequent gout, characterized in that the biomarker is 4-trimethyl-ammioniobutanoic acid, 5'-methylthioadenosine (5' -methylthioadenosine), arachidonic acid (arachidic acid), taurine (taurines), uridine (uridines) and xanthine (xanthines).

2. A predictive model for identifying sporadic gout and frequent gout established according to the biomarker of claim 1, wherein the identification and judgment formula of the model is as follows:

Prediction score＝e ^logit(P) /(1+e ^logit(P) )

3. A kit for identifying sporadic and frequent gout, comprising reagents for targeted detection of the biomarker of claim 1.

4. The use of the biomarker according to claim 1 in the preparation of a kit for identifying sporadic and frequent gout, characterized in that it is used for identifying sporadic and frequent gout.

5. Use of the model according to claim 2 for the preparation of a kit for identifying sporadic and frequent gout, characterized in that it is applied for identifying sporadic and frequent gout.

6. A computer-readable storage medium having stored thereon a computer program that is executed by a processor to calculate the recognition judgment formula of claim 2, if the score is > 0.5, the patient has a higher likelihood of belonging to frequent gout; conversely, when the score is < 0.5, the patient will be classified as sporadic gout.

7. An apparatus comprising an input device and a computing device, wherein the input device is configured to input content data of 4-trimethylaminobutyric acid (4-trimethyl-ammioniobutanoic acid), 5'-methylthioadenosine (5' -methylthioadenosine), arachidonic acid (arachidic acid), taurine (taurines), uridine (uridines), xanthine (xanthines);

Prediction score＝e ^logit(P) /(1+e ^logit(P) )

8. A computer device comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor performs the following formula calculations when executing the program:

Prediction score＝e ^logit(P) /(1+e ^logit(P) )