CN117007810B

CN117007810B - Biomarker for predicting progression of intravenous smooth muscle tumor and application thereof

Info

Publication number: CN117007810B
Application number: CN202310690937.5A
Authority: CN
Inventors: 冯鹏辉; 葛志通; 张紫娟; 毕谦; 尚萌萌; 闵婕; 周平; 肖滢; 陈蓉; 李建初
Original assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Current assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2024-04-05
Anticipated expiration: 2043-06-12
Also published as: CN117007810A

Abstract

The invention relates to the technical field of biological detection, and discloses a biomarker for predicting the progress of intravenous smooth myoma and application thereof. Specifically, a biomarker for predicting the progression of the intravenous smooth myoma is obtained by adopting a serum proteomics technology and a data statistics analysis technology, wherein the biomarker is any one or a combination of more than one of [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ] [ IGc843_heavy_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ] [ IGc678_heavy_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment) ] [ IGH+IGLc492_heavy_IGHV 3-20_IGHD3-22_IGHJ4 (Fragment) ]. When the method is used for predicting the progress of the intravenous smooth myoma, the operability is strong, the accuracy is high, the diagnosis and the prediction can be realized by taking blood, the method is rapid and convenient, belongs to an innovated method, is favorable for the early diagnosis and the treatment of the intravenous smooth myoma and the monitoring of tumor recurrence, and has good clinical value.

Description

Biomarker for predicting progression of intravenous smooth muscle tumor and application thereof

Technical Field

The invention relates to the technical field of biological detection, in particular to a biomarker for predicting the progress of intravenous smooth myoma and application thereof.

Background

Intravenous leiomyomas (intravenous leiomyomatosis, IVL) are a rare endocrine-related tumor with intravascular invasive properties. Studies have shown that IVL has a tendency to be estrogen dependent and is common in women of childbearing age, particularly in patients with a history of uterine fibroid surgery (incidence of about 25%). The mechanism of its occurrence is currently controversial, but most scholars believe that IVL originates from the uterine vein wall or pelvis outside the uterus, can extend into the venous passages of the uterus or pelvis, invade through the iliac or ovarian veins and extend to the inferior vena cava, and even the heart. Although the histological pathology of IVL is classified as benign, its continued growth may lead to serious circulatory disorders such as: syncope or sudden death, pulmonary embolism or cerebral infarction, so IVL should be emphasized in clinic and scientific research. Currently, the preferred treatment for IVL is surgery, but there is still a lack of reliable assessment and prediction as to whether surgery is sufficiently clean and whether there is recurrence after surgery. In addition, clinical symptoms and imaging (e.g., echocardiography, computed tomography-based angiography, etc.) are ambiguous when IVL is small, whether initially, recurrently, or for post-operative residues, with a high rate of missed diagnosis, and it is likely that IVL has been prolonged to a large extent by the time it is found, or even has invaded the heart. Therefore, a more sensitive screening index is urgently needed for diagnosis and prognosis prediction of clinical IVL.

Disclosure of Invention

The present invention provides a biomarker for predicting the progression of intravenous leiomyoma and application thereof. The method aims at screening and identifying key proteins affecting IVL progression (complete health, healthy uterine fibroids, no recurrence after IVL operation and recurrence after IVL operation) from serum of IVL patients, explaining potential regulatory mechanisms of IVL recurrence, providing a promising target for IVL diagnosis and/or prognosis prediction, and providing a theoretical basis for subsequent research.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, the present invention provides biomarkers for predicting the progression of an intravenous smooth myoma, which may be any one or a combination of [ iglc601_light_igl2-14_iglj3 (Fragment) ] (a0a5c2fue 5), [ igc843_weave_ighv3-49_ighd3-3_ighj6 (Fragment) ] (a0a5c GPQ 1), [ igc678_weave_ighv3-23_ighd1-14_ighj4 (Fragment) ] (a0a5c2gnc 7), [ igh_weave_ighv3-20_ighd3-22_ighj4 (Fragment) ].

Further, the expression level of the biomarker [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ] increases with the overall trend of the progression of the intravenous leiomyoma, and the expression level of [ IGc843_heel_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], [ IGc678_heel_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment) ] and [ IGH+IGLc492_heel_IGHV 3-20_IGH3-22_IGHJ4 (Fragment) ] decreases with the overall trend of the progression of the intravenous leiomyoma.

Further, the progression of the intravenous leiomyoma sequentially comprises 4 stages, namely complete health, healthy uterine fibroids, no recurrence after IVL operation and recurrence after IVL operation.

In a second aspect, the invention provides the use of a biomarker, which is any one or a combination of more of [ iglc601_light_igl2-14_iglj3 (Fragment) ], [ igc843_heav_ighv3-49_ighd3-3_ighj6 (Fragment) ], [ igc678_heav_ighv3-23_ighd1-14_ighj4 (Fragment) ], [ igh+iglc492_heav_ighv3-20_ighd3-22_ighj4 (Fragment) ], in the manufacture of a kit for predicting the progression of an intravenous leiomyoma.

Further, the kit includes reagents for detecting the expression level of [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ], [ IGc843_weave_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], [ IGc678_weave_IGHV 3-23_IGHV 3-23_IGH1-14_IGHJ4 (Fragment) ], [ IGH+IGLc492_weave_IGHV 3-20_IGHD3-22_IGHJ4 (Fragment) ].

Further, the samples used to detect the expression levels of [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ], [ IGc843_weave_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], [ IGc678_weave_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment) ], [ IGH+IGLc492_weave_IGHV 3-20_IGHD3-22_IGHJ4 (Fragment) ] were serum.

In a third aspect, the invention provides a method of screening for biomarkers for predicting the progression of an intravenous leiomyoma comprising the steps of:

step 1, respectively collecting serum of a patient without recurrence after IVL operation, a patient with recurrence after IVL operation, a completely healthy control person and a healthy person with hysteromyoma control;

step 2, protein detection is carried out through proteomics based on independent data acquisition mass spectrum;

and 3, screening core proteins involved in IVL progression by using weighted gene co-expression network analysis (WGCNA), lasso penalty Cox regression analysis (Lasso), trend clustering and generalized linear regression model (GLM).

Further, the specific process of protein detection in step 2 through the proteomics based on data independent acquisition mass spectrum is as follows:

step 2.1, separating the most abundant proteins in a serum pool by using a multiple affinity removal system chromatographic column, respectively separating high-abundance proteins and low-abundance proteins, and then desalting and concentrating the high-abundance components and the low-abundance components by using a 5kDa ultrafiltration tube; adding SDT buffer solution, boiling for 15 minutes; centrifugation was then carried out at 14000g for 20 minutes, and the BCA assay was used;

step 2.2, digesting high-abundance proteins and low-abundance proteins, specifically: repeated ultrafiltration of 200 μg protein using UA buffer to remove detergents, DTT and other low molecular weight components; then 100 μl iodoacetamide was added to block the reducing cysteine residues; then sequentially using 100 mu L of UA buffer solution and NH ₄ HCO ₃ Rinsing the filter tank by using a buffer solution; finally, the protein suspension is digested for further desalting and concentration;

step 2.3, separating the digested Chi Tai from the low abundance component using a high pH reverse phase peptide separation kit; iRT kit was added to correct iRT standard peptide to sample peptide volume ratio of 1:3, relative retention time difference between runs;

step 2.4, analyzing all scores for data dependent acquisition library generation by using a mass spectrometer;

step 2.5, further detecting the polypeptide by liquid chromatography-tandem mass spectrometry in DIA mode.

Further, the MS detection method in the step 2.4 is positive ions, the scanning range is 300-1800m/z, the MS1 scanning resolution is 60000, the scanning frequency is 200m/z, the automatic gain control target is 3e6, the maximum IT is 25MS, and the dynamic exclusion is 30.0s; the MS2 scanning resolution is 15000, the automatic gain control target is 5e4, the maximum IT is 25MS, and the normalized collision energy is 30eV;

further, each DIA period in step 2.5 includes one complete MS-SIM scan and 30 DIA scans, and the SIM full scan resolution is 120000; the automatic gain control target is 3e6; maximum IT 50ms; a configuration mode; the resolution of DIA scan is set to 15000; the automatic gain control target is 3e6; max IT auto; the normalized collision energy was 30eV.

Further, the DIA mode was injected after every six injections of the quality control sample at the beginning of the MS study and throughout the protein detection process, thereby ensuring the stability of the detection system and the reliability of experimental data.

Compared with the prior art, the invention has the following advantages:

the screening method provided by the invention can well obtain the biomarker for IVL progress prediction, has strong operability and high accuracy when being used for IVL progress prediction, can realize diagnosis and prediction by taking blood, is quick and convenient, belongs to an innovative-free method, is favorable for early diagnosis and treatment of IVL and monitoring of tumor recurrence, and has good clinical value.

Drawings

FIG. 1 is an identification of differentially expressed proteins. A. Enumeration of all identified peptides or proteomes in each sample. The dotted line on the abscissa indicates that the amount of protein or polypeptide is 50% of the total maximum identified amount. B. A volcanic plot of the differential changes of all proteins, wherein up-regulated protein is on the right side and down-regulated protein is on the left side, with Fold Change (FC) > 1.5 or < 0.67, p-value < 0.05; C. the first 10 differentially expressed proteins are shown as right (increased expression) or left (decreased content); IVL panel versus control OPLS-DA score plot showing the degree of separation between the two panels. t1 represents principal component one, to1 represents principal component two, and the distribution of dots represents the degree of difference between groups and within a group.

FIG. 2 is a structural property and functional enrichment analysis of all differentially expressed proteins. Subcellular localization and distribution of differentially expressed proteins between IVL group and control group. B. The major structure of the protein domain of all differentially expressed proteins. C. GO annotation analysis of all differentially expressed proteins, with the highlighted item marked with a dashed box. The KEGG pathway is enriched and the size of the bubbles reflects the number of proteins involved in the KEGG pathway.

Fig. 3 is a determination of WGCNA and key modules. A. Soft threshold (power=7) analysis. B. Cluster dendrograms based on modular signature genes. C. Recording of correlation of module signature genes with IVL tumorigenic clinical signatures, color blocks on the left ordinate are of different module types, large middle color blocks represent each protein module, and p-values and correlation coefficient values are marked on the color blocks. D. Expression patterns of proteins contained in the four core hub modules.

FIG. 4 is an analysis of protein co-expression networks in module and trend cluster evaluation. A. Each hub module is involved in PPI analysis of the gene; the node size represents the degree of connection, the higher the connection, the larger the node. B. Identifying clusters according to the expression trend of each protein in the disease progression process; the abscissa indicates the different groups and the ordinate indicates the variation in expression after homogenization.

FIG. 5 shows the construction of Lasso regression model and related analysis. A. Cross-validation evaluates partial likelihood bias, with solid vertical lines representing corresponding 95% CI and partial likelihood bias. Lasso coefficient spectra for 20 progress-related proteins, each curve representing a protein. C. Correlation analysis of the screened proteins based on Lasso analysis. Relative expression levels of the 12 proteins.

FIG. 6 is a generalized linear regression model based identification of hubcan. Forest charts show the dominance ratios of the four proteins. Proteins and p-values ∈0.05 are considered as independent risk factors or protective factors for IVL progression.

FIG. 7 is a graph showing the verification of the prediction accuracy of four core protein models. The reliability of the model is verified by an ROC machine learning algorithm based on the multi-classification variable micro-average and macro-average. 0-3 represent CO-no, CO-um, IVL-no and IVL-re subgroups, respectively. The abscissa represents the false positive rate and the ordinate represents the true positive rate. The AUC values were used to characterize the performance of the model.

Detailed Description

The following examples are illustrative of the invention and are not intended to limit the scope of the invention. The technical means used in the examples are conventional means well known to those skilled in the art unless otherwise indicated.

The experimental methods used in the examples below are conventional methods unless otherwise specified.

All materials, reagents, etc. in the examples described below are commercially available unless otherwise specified.

Examples

1. Sample collection

Patient: the groups were divided into recurrent IVL subgroup (IVL-re) and non-recurrent IVL subgroup (IVL-no), 15 cases each.

Healthy control group: half is defined as having no uterine fibroid (CO-no) and the other half is defined as having uterine fibroid (CO-um).

The patient inclusion criteria were:

1) Preoperative imaging or intraoperative discovery of celiac veins, pelvic veins (inferior vena cava, iliac veins, parauterine veins) or right atrial occupancy lesions;

2) Patients undergoing surgery treatment in our hospital;

3) Post-operative pathological diagnosis is IVL concomitant vascular invasion;

4) Age 18 years old or older.

The exclusion criteria were:

1) Lack of clinical data;

2) Women in lactation or pregnant women;

3) Mental disorder or inability to self-care;

4) Is not willing to participate in the study;

5) Other malignant patients were pooled.

The standards for the control group were:

1) Age above 18 years old;

2) No history of other malignancy;

3) Women in non-pregnant and lactating period;

4) There was no history of hysteromyectomy or hysterectomy related gynecological surgery.

All subjects were confirmed by ultrasound examination (including abdominal and pelvic blood vessels, examination was preceded by ordering the subject to adequately hold urine to fill the bladder in order to better assess uterus and double appendages) for abdominal or pelvic vascular occupancy lesions and uterine fibroids. The IVL group first examined the abdominal large vessels, including the patency of the inferior vena cava and bilateral iliac veins, with or without occupancy, and then observed whether there were occupancy lesions in the pelvic cavity. The ultrasonic examination of the inferior vena cava, the iliac vein and the gynaecology proves that the abdominal and pelvic blood vessels have no space occupying lesion and the uterus has myoma. If uterine fibroids are present, the location and size of the largest leiomyomas are recorded simultaneously. After examination, two other senior doctors independently evaluate the report, and if there is inconsistency, the decision is made through discussion. The inspection result is: the CO-no subgroup is characterized by a homogeneous myometrial echo without hysteromyoma or pelvic placeholder lesions. In contrast, the CO-um subgroup of myolayers showed hypoechogenic masses (maximum diameter. Gtoreq.2 cm) and ultrasound exhibited a typical swirl-like structure. The IVL-re subgroup found either a space-occupying lesion (not less than 1 cm) in the pelvic vein or abdominal vein, or a residual lesion greater than the previous size after at least two consecutive ultrasound examinations. When no placeholder lesions were found in the pelvis or blood vessels, the patients were divided into the IVL-no subgroup.

For all subjects after the ultrasonic examination, venous blood was collected from the elbow vein, plasma, serum and blood cells were extracted separately and packaged into 0.5 ml cryo-tubes and relevant information was marked on the tubes for subsequent retrieval and searching of samples. All aliquots were stored in-80 ℃ ice bins, which were equipped with appropriate alarm systems and emergency back-up power to prevent accidental thawing.

2. Baseline characteristics of the subject

From the above sample collection, it can be seen that: the study was divided into two groups, 30 IVL patients and 30 healthy controls, as shown in Table 1. Both groups were aged around 49.0 years, and the differences were not statistically significant. The median of the beginner ages of the IVL and CO groups were 14.0 years and 15.0 years, respectively (p-value=0.07). All patients had various and atypical symptoms before surgical resection, mainly manifested as shortness of breath (most common), edema of lower extremities, lumbago or backache, few patients had complained about menorrhagia, abdominal pain, syncope, abdominal mass, and as expected, still some cases were asymptomatic (n=7). In fact, all patients enrolled in the study had uterine fibroids, with more than two-thirds of the patients having a history of uterine surgery. The iliac veins account for the most prominent (93.3%) of the extended path of IVL. Notably, the affected extension sites of IVL are mainly right atrium (n=14), right ventricle (n=4), and sub-renal IVC (n=4), in a cast or lumen shape. As shown in Table 1, 23 IVL patients underwent primary surgery and the other patients underwent secondary surgery, except for 5 cases with intravascular or pelvic residual lesions, all lesions were completely resected. More seriously, half of IVL patients recur post-operatively and lesions appear in the blood vessels.

TABLE 1 baseline characteristics of all subjects

3. Detection of samples

After taking the ultrasound examination, the serum samples taken from each participant were subjected to further proteomic testing. The specific process is as follows:

separating the most abundant proteins in the serum pool by using a multiple affinity removal system chromatographic column (Agilent technologies company), separating high-abundance proteins and low-abundance proteins respectively, and desalting and concentrating the high-abundance and low-abundance components by using a 5kDa ultrafiltration tube (Sartorius); adding SDT buffer solution, boiling for 15 minutes; and then centrifuged at 14000g for 20 minutes, and quantified using BCA (Bio-Rad, USA).

Digestion of high and low abundance proteins both employs a digestion program modified from the Filtration Assisted Sample Preparation (FASP) protocol. Briefly, 200. Mu.g of protein was subjected to repeated ultrafiltration using UA buffer (8M urea, 150mM Tris-HCl pH 8.0) to remove detergents, DTT and other low molecular weight components; then 100 μl iodoacetamide was added to block the reducing cysteine residues; then sequentially using 100 mu L of UA buffer solution and NH ₄ HCO ₃ Rinsing the filter tank by using a buffer solution; finally, the protein suspension is digested for further desalting and concentration.

Use of a high pH reverse phase peptide isolation kit (Thermo Scientific) ^TM Pierce ^TM ) Separating the digest Chi Tai from the low abundance component; iRT kit (Biognosys) was added to correct iRT standard peptide to sample peptide volume ratio of 1:3, the relative retention time difference between runs.

All fractions used for Data Dependent Acquisition (DDA) library generation were analyzed using a Thermo Scientific Q Exactive HF X mass spectrometer. The MS detection method is positive ions, the scanning range is 300-1800m/z, the MS1 scanning resolution is 60000, the scanning frequency is 200m/z, the Automatic Gain Control (AGC) target is 3e6, the maximum IT is 25MS, and the dynamic exclusion is 30.0s. The MS2 scan resolution was 15000, the agc target was 5e4, the maximum IT was 25MS, and the normalized collision energy was 30eV.

The polypeptides were further detected by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in DIA mode. Each DIA cycle includes one complete MS-SIM scan and 30 DIA scans (SIM full scan resolution of 120000; agc 3e6; max IT 50MS; configuration mode; resolution of DIA scan set to 15000; agc target 3e6; max IT auto; normalized collision energy of 30 eV).

4. Quality control and analysis

QC samples (equal amounts of mixed samples from each sample in the experiment) were used to observe MS performance, and DIA mode was injected at the beginning of the MS study and after every six injections throughout the experiment.

DDA library data the FASTA sequence database was retrieved using spectrobautm software. The database is the uniprot_human database. The parameters are set as follows: the enzyme is trypsin, two maximum cleavage is carried out, the immobilized modification is formamide methyl (C), the dynamic modification is oxidation modification (M), and the acetyl (protein n) is obtained. All the generated data were protein identified based on 99% confidence, with False Discovery Rate (FDR) 1%. The original raw file and DDA search results were imported into Spectronaut Pulsar XTM (biogosys) to construct a spectral library. The spectrum library constructed as described above was searched using SpectronautTM and the DIA data was analyzed. All results were filtered at a Q cutoff of 0.01.

5. Data processing

Statistical analysis was performed using SPSS software (version.22, chicago, IL, USA). To describe the baseline characteristics of all participants in detail, the data were expressed in numbers (percent) of the count variable or in median (upper quartile, lower quartile) of the measured data, as determined by the Shapiro-Wilk test in combination with a normal map, based on the normal distribution of the data. The comparison between groups uses a non-parametric test. Only at p values <0.05, the results were statistically significant.

6. Analysis of results

(1) Identification of differentially expressed proteins (FIG. 1)

To distinguish between Differentially Expressed Proteins (DEPs) in IVL and CO groups, the present study introduced the DIA technique. After proteolytic preparation of the samples, the processed raw files from the DIA analysis were imported into Spectronaut for qualitative and quantitative evaluation. As shown in fig. 1A, we enumerate the recognizable peptide and protein groups in each sample, ultimately defining 2582 proteins for subsequent analysis. Based on this, IVL groups up-regulated 54 proteins and down-regulated 39 proteins depending on the filtration conditions (fold change (FC) > 1.5 or < 0.67, p-value < 0.05), as shown in FIG. 1B. Among these DEPs, the top ten proteins were recorded, with A0A5C2GK72, Q0ZCH9, A0A5C2G577, A0A5C2FV72, A0A5C2FVH2, A0A5C2GFF7, A0A5C2GQR5, A0A5C2GQ71, A0A5C2GTT5, A0A5C2FWG8 being the most prominent proteins expressed in IVL patients. In contrast, A0A5C2GIl4, A0A5C2GEY, Q562R1, A0A5C2GNC7, A0A5C2FWY9, A0A5C2GBR3, A0A5C2G2C3, A0A5C2G6T6, A0A5C2GHL8, and A8K061 were expressed primarily in healthy women (fig. 1C). Orthogonal partial least squares discriminant analysis (OPLS-DA) was performed on the two sets of protein spectra by building a discriminant model to evaluate the variability of the protein expression patterns. Our study shows that samples from the same group have similar protein expression trends, and the model established using the supervised discriminant analysis method has reliable stability and predictability, r2y=0.912, q2y=0.448, as shown in fig. 1D.

(2) Structural property and functional enrichment analysis of differentially expressed proteins

To elucidate the potential features of these identified DEPs, a multicellular localization was predicted using the multi-class SVM classification system CELLO (http:// CELLO. Life. Nctu. Edu. Tw /). It is well known that proteins behave differently in different organelles (such as mitochondria and endoplasmic reticulum), and thus analysis of subcellular localization helps to further explore the function of the protein. As shown in fig. 2A, these DEPs mostly consist of extracellular or secreted proteins (n=82), with more than one third located in the nucleus. In addition, some of them belong to mitochondrial or cytoplasmic proteins. Subsequently, the protein sequence was retrieved and mapped using the interprerscan software to identify protein domain features. A protein domain is defined as two or more spatially distinct localized regions of a large protein molecule that are tightly linked by adjacent supersecondary structures on the polypeptide chain. Each region has its own unique spatial structure and assumes different biological functions. Here, the main features of the domain enrichment are the immunoglobulin V-set and immunoglobulin C1-set domains, followed by globulins, actin and intermediate silk proteins (FIG. 2B). To determine the functional enrichment of these DEPs, we mapped the ontology (GO) pattern and annotated the sequences based on Blast2GO software. After the annotation step, the studied proteins were aligned to the on-line Kyoto genes and genome encyclopedia (KEGG) database (http:// geneontologiy. Org /), their KEGG ortholog identity was retrieved, and then mapped to the pathways in KEGG. As shown in fig. 2C, these DEPs are mainly enriched in antigen receptor mediated signaling pathways, regulation of B cell activation, positive regulation of B cell activation, B cell receptor signaling pathways, phagocytic recognition, etc. biological processes. In addition, these DEPs are primarily involved in molecular functions such as small molecule binding, oxygen carrier activity, oxygen binding, and molecular carrier activity. For cellular components, these DEPs are immunoglobulin complexes, circulation and the central part of immunoglobulin complexes. KEGG labeling showed that nitrogen metabolism, IL-17 signaling pathway, and Rap1 signaling pathway were most correlated with these DEPs (fig. 2D).

(3) Determination of WGCNA and critical modules

Weighted gene co-expression network analysis (WGCNA) is an algorithm for constructing a co-expression network using a pattern of interactions between targets. Module relevance is identified by computing adjacency matrices and Topology Overlap Matrices (TOMs) based on hierarchical clustering and dynamic tree-chopping. Kinesins are ultimately selected from target modules associated with IVL tumorigenesis. This study utilized WGCNA binding to DEPs to determine the IVL phenotype associated kinesins. As shown in fig. 3aa, b, power=7 was chosen as the optimal soft threshold, and the co-expressed protein modules were identified from a scaleless topology model fit (scaleless r2=085) and average connectivity. Then, 8 modules (black, brown, blue, yellow, green, red, cyan, and gray) were identified in the cluster dendrogram (fig. 3B). As shown in fig. 3C, 4 of the modules were significantly correlated with tumorigenesis of IVL (brown module correlation coefficient=0.28, p-value=0.03; green module correlation coefficient= -0.48, p-value=1e-04; red module correlation coefficient=0.52, p-value=2e-5; correlation coefficient= -0.47, yellow module p-value=1e-04 when considering the relation of module to sample properties). Then, we analyzed the expression patterns of the proteins contained in these modules, and our results indicate that the expression levels of the proteins in the brown and red modules are relatively increased in IVL cases. In contrast, the proteins in the green and yellow modules were down-regulated (fig. 3D). Based on the protein-protein interaction (PPI) information of the four hinge modules, 347 proteins were finally identified, establishing an interaction network. The results are imported into Gephi software (https:// Gephi. Org /) to visualize and further analyze the PPI network. It was observed that the brown and yellow modules involved 188 and 114 junction proteins, respectively. The remaining 26 hub proteins are arranged in green modules, 19 belonging to the red module.

(4) Analysis of protein co-expression networks in module and trend cluster evaluation

We further combined the results of WGCNA and DEPs, and the overlapping results indicated that 31 proteins were ultimately confirmed, as shown in figure 4A (figure 4A). As the disease progresses, the dynamic expression patterns of these proteins are studied, revealing their potential modes of action is essential. Thus, the CO and IVL groups are divided into four subgroups, CO-no, CO-um, IVL-no and IVL-re, depending on the presence or absence of uterine fibroids and postoperative recurrence of IVL. The fuzzy c-means (FCM) algorithm of the Mfuzz software is applied to divide all proteins into corresponding clusters according to the trend of the change of the protein expression. Interestingly, cluster 2 and cluster 3 were noted because the overall change exhibited a relatively consistent upward or downward trend, 7 proteins in cluster 2 participated, and 13 proteins in cluster 3 participated (fig. 4B).

(5) Based on Lasso regression analysis

In the following analysis, confounding factors between the above screened proteins were removed using Lasso-penalized Cox regression (Lasso), and finally 12 proteins were determined and verified by lambda value (λ) and partial likelihood bias cross-validation (G3 GAU4, A0A5C2G2C3, A0A5C2GPQ1, P07737, A0A5C2FZ98, A0A5C2GNC7, A0A5C2GBR3, P05109, A0A5C2FVH4, A0A5C2FUE5, A0A5C2GVV3, A0A5C2FVK 9) (fig. 5a, b.) to elucidate the relationship between these proteins, the correlation between them was known by correlation analysis. Proteins with expression correlation may be co-involved in biological processes, i.e. functional correlation. As shown in fig. 5C, P07737 is positively correlated with A0A5C2G2C3 (correlation coefficient=0.42). A0A5C2FZ98 is also associated with G3GAU 4. A0A5C2GVV3 had a significant positive correlation with P05109 (correlation coefficient was 0.45). A0A5C2FVK9 also has a close correlation (correlation coefficient 0.48 or 0.49, respectively) with A0A5C2FUE5 or A0A5C2GVV 3. P05109 is inversely related to the abundance of A0A5C2FZ 98. The abundance of A0A5C2FVH4 and A0A5C2GBR3 are inversely related. FIG. 5D shows the relative expression levels of the 12 proteins described above, and our results indicate that these proteins are clearly distinguishable between the IVL group and the control group. Of which 5 proteins were significantly enhanced in the IVL-no and IVL-re subgroups compared to the CO-no and CO-um subgroups, while the remaining 7 proteins were the opposite.

(6) Hub protein identification based on generalized linear regression model

Considering the interactions between these 12 proteins, we finally used a generalized linear regression model (GLM) to determine the characteristic key proteins. Our data demonstrate that four proteins, IGLc601_light_IGLV2-14_IGLJ3 (Fragment), [ IGc843_weave_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment), [ IGc678_weave_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment), [ IGH+IGLc492_weave_IGHV 3-20_IGHD3-22_IGHJ4 (Fragment), [ A0A5C2GBR 3) ] are closely related to the progression of IVL, as shown in FIG. 6. Iglc601_light_iglv2-14_iglj3 (Fragment) ((A0 A5C2FUE 5)) may be an independent risk factor (or=2.64) that promotes disease progression; [ IGc 843_weave_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], (A0A 5C2GPQ 1), [ IGc 678_weave_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment) ], (A0A 5C2GNC 7), [ IGH+IGLc492_weave_IGHV 3-20_IGHD3-22_IGHJ4 (Fragment) ], especially the former may be protective indicators (OR 0.32, 0.60, 0.53, respectively) for slowing down the progression of IVL. These results indicate that these four proteins are important factors in the future for hopefully predicting IVL prognosis and progression.

(7) Verification of core protein prediction accuracy

To verify the predictive value of these four proteins, the study was analyzed using a subject work profile (receiver operator characteristic curve, ROC) based on the Python module Scikt-learn (https:// Scikit-learn. Org /). Two integrated machine learning algorithms were introduced, including micro-average ROC (micro-average ROC: globally computing the index and treating each element of the index matrix as a label) and macro-average ROC (macro-average ROC: computing the matrix for each label and evaluating its unweighted average). Our findings indicate that these representative key proteins play a critical role in distinguishing between cases of different pathological conditions, with area under the curve (AUC) of 0.69 or 0.68, respectively, in either micro-or macro-average ROC analysis (fig. 7). Furthermore, it should be noted that these four key proteins show reliable discriminatory ability in identifying post-operative recurrent IVL patients (AUC value=0.92). These results indicate that the model built from the final identified key proteins is of great significance and predictive value for the progression of IVL.

The above research results are integrated to obtain: IVL patients can be effectively distinguished from healthy persons based on DIA proteomic analysis of their serum proteins. The possibility of tumor development or recurrence of IVL was demonstrated by the development of a panel consisting of 4 core proteins, which was further confirmed by a multi-class ROC analysis. Among these characteristic proteins, the expression of the immunoglobulin light chain protein iglc601_light_igl2-14_iglj3 (Fragment) (A0 A5C2FUE 5), a potential risk factor, increases with the progression of the disease. In contrast, the expression of three immunoglobulin heavy chain proteins ([ IGc843_weave_IGHV 3-49_IGHD3-3_IGHJ6 (fragments) ] (A0A 5C2GPQ 1), [ IGc678_weave_IGHV 3-23_IGHD1-14_IGHJ4 (fragments) ] (A0A 5C2GNC 7) and [ IGH+IGLc492_weave_IGHV 3-20 IGHD3-22_IGHJ4 (fragments) ] (A0A 5C2GBR 3)) as potential protection indicators was in a decreasing trend. Overall, these 4 proteins are likely to be promising biomarkers for predicting IVL progression in this study, and can be used for diagnosis and prognosis prediction of IVL.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. Use of a combination of biomarkers, wherein the biomarkers are [ iglc601_light_igl2-14_iglj3 (Fragment) ], [ igc843_weave_ighv3-49_ighd3-3_ighj6 (Fragment) ], [ igc678_weave_ighv3-23_ighd1-14_ighj4 (Fragment) ], [ igh+iglc492_weave_ighv3-20_ighd3-22_ighj4 (Fragment) ], for the preparation of a reagent or kit for predicting the progression of an intravenous leiomyoma.

2. The use according to claim 1, characterized in that: the expression level of the biomarker [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ] increases with the general trend of the progression of the intravenous leiomyoma, and the expression level of [ IGc843_heel_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], [ IGc678_heel_IGHV 3-23_IGH1-14_IGHJ4 (Fragment) ] and [ IGH+IGLc492_heel_IGHV 3-20_IGH3-22_IGHJ4 (Fragment) ] decreases with the general trend of the progression of the intravenous leiomyoma.

3. The use according to any one of claims 1 or 2, characterized in that: the intravenous smooth myoma comprises 4 stages in sequence before and after the intravenous smooth myoma is developed, wherein the stages are respectively completely healthy, healthy uterine myoma, no recurrence after the intravenous smooth myoma operation and recurrence after the intravenous smooth myoma operation.

4. The use according to claim 1, wherein the reagent or kit comprises a reagent for detecting the expression level of [ iglc601_light_igl2-14_iglj3 (Fragment) ], [ igc843_weave_ighv3-49_ighd3-3_ighj6 (Fragment) ], [ igc678_weave_ighv3-23_ighd1-14_ighj4 (Fragment) ], [ igh+iglc492_weave_ighv3-20_ighd3-22_ighj4 (Fragment) ].

5. The use according to claim 4, characterized in that: the samples of the expression levels of the detection [ IGLc601_light_IGLV2-14_IGLJ3 (Fragment) ], [ IGc843_heav_IGHV 3-49_IGHD3-3_IGHJ6 (Fragment) ], [ IGc678_heav_IGHV 3-23_IGHD1-14_IGHJ4 (Fragment) ], [ IGH+IGLc492_heav_IGHV 3-20_IGH3-22_IGHJ4 (Fragment) ] are serum.

6. The use according to claim 1, characterized in that the method of screening for biomarkers comprises the steps of:

step 1, respectively collecting serum of a patient without recurrence after intravenous leiomyoma operation, a patient with recurrence after intravenous leiomyoma operation, a completely healthy control person and a healthy person with hysteromyoma control;

and 3, screening the core protein participating in IVL progress by using weighted gene co-expression network analysis, lasso penalty Cox regression analysis, trend clustering and generalized linear regression model.

7. The use according to claim 6, wherein the specific process of protein detection in step 2 by proteomics based on data independent acquisition of mass spectrum is:

step 2.2, digesting high-abundance proteins and low-abundance proteins, specifically: repeated ultrafiltration of 200 μg protein using UA buffer to remove detergents, DTT and other low molecular weight components; then 100 μl iodoacetamide was added to block the reducing cysteine residues; then rinsing the filter tank with 100 mu L of UA buffer solution and NH4HCO3 buffer solution; finally, the protein suspension is digested for further desalting and concentration;

8. The method of claim 7, wherein the MS detection method in step 2.4 is positive ion, the scanning range is 300-1800m/z, the MS1 scanning resolution is 60000, the scanning frequency is 200m/z, and the automatic gain control target is 3e ⁶ Maximum IT25ms, dynamic exclusion 30.0s; MS2 scanning resolution is 15000, automatic gain control target is 5e ⁴ Maximum IT is 25ms, normalized collision energy is 30eV; each DIA period in step 2.5 includes one complete MS-SIM scan and 30 DIA scans, and the SIM full scan resolution is 120000; the automatic gain control target is 3e ⁶ The method comprises the steps of carrying out a first treatment on the surface of the Maximum IT 50ms; a configuration mode; the resolution of DIA scan is set to 15000; the automatic gain control target is 3e ⁶ The method comprises the steps of carrying out a first treatment on the surface of the Max IT auto; the normalized collision energy was 30eV.

9. The use according to claim 8, wherein the DIA mode is injected after every six injections of a quality control sample at the beginning of the MS study and throughout the protein assay, thereby ensuring the stability of the assay system and the reliability of the experimental data.