CN117147869A - Four biomarker combinations for identifying premature labor and application thereof - Google Patents

Four biomarker combinations for identifying premature labor and application thereof Download PDF

Info

Publication number
CN117147869A
CN117147869A CN202210568267.5A CN202210568267A CN117147869A CN 117147869 A CN117147869 A CN 117147869A CN 202210568267 A CN202210568267 A CN 202210568267A CN 117147869 A CN117147869 A CN 117147869A
Authority
CN
China
Prior art keywords
premature
specific protein
abundance
model
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210568267.5A
Other languages
Chinese (zh)
Inventor
陈宇凌
朴永俊
陈叙
王强
邓海腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyuan Tengda Biotechnology Co ltd
Tianjin Central Obstetrical &
Tsinghua University
Original Assignee
Beijing Tianyuan Tengda Biotechnology Co ltd
Tianjin Central Obstetrical &
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianyuan Tengda Biotechnology Co ltd, Tianjin Central Obstetrical &, Tsinghua University filed Critical Beijing Tianyuan Tengda Biotechnology Co ltd
Priority to CN202210568267.5A priority Critical patent/CN117147869A/en
Publication of CN117147869A publication Critical patent/CN117147869A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6893Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids related to diseases not provided for elsewhere
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/573Immunoassay; Biospecific binding assay; Materials therefor for enzymes or isoenzymes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/46Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from vertebrates
    • G01N2333/47Assays involving proteins of known structure or function as defined in the subgroups
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/705Assays involving receptors, cell surface antigens or cell surface determinants
    • G01N2333/71Assays involving receptors, cell surface antigens or cell surface determinants for growth factors; for growth regulators
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/902Oxidoreductases (1.)
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/914Hydrolases (3)
    • G01N2333/948Hydrolases (3) acting on peptide bonds (3.4)
    • G01N2333/95Proteinases, i.e. endopeptidases (3.4.21-3.4.99)
    • G01N2333/964Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue
    • G01N2333/96425Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals
    • G01N2333/96427Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals in general
    • G01N2333/9643Proteinases, i.e. endopeptidases (3.4.21-3.4.99) derived from animal tissue from mammals in general with EC number
    • G01N2333/96486Metalloendopeptidases (3.4.24)
    • G01N2333/96491Metalloendopeptidases (3.4.24) with definite EC number
    • G01N2333/96494Matrix metalloproteases, e. g. 3.4.24.7
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/36Gynecology or obstetrics
    • G01N2800/368Pregnancy complicated by disease or abnormalities of pregnancy, e.g. preeclampsia, preterm labour
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/50Determining the risk of developing a disease
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/60Complex ways of combining multiple protein biomarkers for diagnosis

Abstract

The application discloses four biomarker combinations for identifying premature delivery and application thereof. The application provides application of a substance for detecting abundance of each protein in a specific protein combination in preparation of a kit; the kit functions to predict the risk of premature birth of a pregnant woman. The specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin. The application solves the problems of difficult sampling, low accuracy, low sensitivity, narrow application range and high cost in the existing premature labor prediction method. The application has the advantages of convenient sampling, simple analysis, high accuracy and low cost, and provides a new idea for predicting premature delivery.

Description

Four biomarker combinations for identifying premature labor and application thereof
Technical Field
The application belongs to the technical field of biological medicine, and particularly relates to four biomarker combinations (SHBG, VEGFR1, MMP8 and Ceruloplasmin) for identifying premature birth and application thereof, wherein the application is used for evaluating the risk of premature birth in clinic.
Background
According to the world health organization survey, 1500 tens of thousands of infants die annually, while nearly 100 tens of thousands die from premature delivery complications. The number of premature delivery has been continuously increasing over the past 25 years. Premature delivery, i.e. production before 37 weeks gestation, accounts for about 10% of total delivery, is one of the main causes of death of neonates worldwide, and is also one of the main causes of nervous and developmental problems of neonates, and is indispensible from the occurrence of diseases such as cerebral palsy, bronchopulmonary dysplasia, retinopathy of prematurity, and the like. The pathogenesis of premature birth is still unclear and may be caused by a variety of causes including multiple gestations, infections, and chronic diseases such as diabetes and hypertension, among others. There is growing evidence that premature labor is a multifactorial complex syndrome that places a great burden on personal economy and health, and therefore how to effectively predict premature labor for immediate intervention has become a very challenging problem.
The prediction of premature delivery is currently based mainly on the physiological characteristics of pregnant women, transvaginal ultrasound and the detection of fetal fibronectin. The physiological characteristics of pregnant women mainly look at obstetric history, body Mass Index (BMI) of the pregnant women, weight gain amplitude of the pregnant women during pregnancy, whether the pregnant women have pathogen infection and periodontal disease and lack vitamin D, and the pregnant women need comprehensive information and have extremely low effective prediction rate. Transvaginal ultrasound primarily detects cervical length, cervical length less than 20mm is considered to be a high risk of premature delivery. However, vaginal ultrasound has many limitations: firstly, the positive prediction rate is low and is only 21%; secondly, the application range is narrow, and the pregnant women with pregnancy of <15 weeks or >28 weeks cannot use the pregnant women; third, the operation is dependent on the skill and equipment of the practitioner, and sometimes results in vaginal bleeding, central placenta leading to the occurrence of conditions. Fetal fibronectin (fFN, fetal fibronectin) is a glycoprotein that appears in the cervical vaginal secretions during early pregnancy, produced by amniotic and trophoblast cells bound to chorion and maternal decidua, and is expressed very low in cervical vaginal fluid during the 22-35 weeks gestation, so high expression of fFN during this period is considered a sign of premature delivery. However, the accuracy and sensitivity of detecting fFN for predicting premature delivery are low, the positive prediction rate is only 17%, the accuracy of premature delivery prediction is also only 57.1% in combination with transvaginal ultrasonic diagnosis in Chinese fFN detection, and fFN cannot be used for predicting premature delivery of pregnant women with pregnancy <24 weeks or >34 weeks. In addition, the expression detection of the fFN needs to be carried out by taking cervical fluid of a pregnant woman through a sterile speculum, but the operation process has risks of vaginal bleeding, injury, lesions and the like, and the false positive rate is high although the fFN can be detected through a blood swab. In addition to the three methods described above, there are two types of kits for clinical detection of premature labor, partosure and Premaquick, respectively. The Partosure kit is used for judging whether a pregnant woman has premature delivery sign or not by detecting the content of PAGM-1 (placental alpha macroglobulin-1, placenta alpha globulin 1) in cervical fluid. The Premaquick kit is used for judging whether the pregnant woman has premature delivery symptoms by detecting the content of IGFBP-1 (insulin-like growth factor binding protein-1, insulin-like growth factor binding protein 1) and IL-6 (interleukin 6) in cervical fluid. However, all the above methods for predicting premature delivery based on biomarkers have the defects of low accuracy and sensitivity, need to be used in combination with transvaginal ultrasound, and are difficult to sample, which easily causes wounds, inflammations and the like, and development of new blood biomarkers for accurate prediction of premature delivery is needed.
Disclosure of Invention
It is an object of the present application to provide four biomarker combinations for identifying premature labor and uses thereof.
The application provides application of a substance for detecting abundance of each protein in a specific protein combination in preparation of a kit; the kit functions to predict the risk of premature birth of a pregnant woman.
The substance for detecting the abundance of each protein in the specific protein combination is a substance for detecting the abundance of each protein in the specific protein combination in the serum of the pregnant woman.
The application also provides the use of a specific protein combination in serum as a detection target in the development of a reagent or kit for predicting the risk of premature birth in a pregnant woman.
The application also provides a kit for predicting the risk of premature birth of a pregnant woman, which comprises the following components: a substance for detecting the abundance of each protein in a specific protein combination.
The substance for detecting the abundance of each protein in the specific protein combination is a substance for detecting the abundance of each protein in the specific protein combination in the serum of the pregnant woman.
The abundance of any of the above proteins may be the protein concentration.
The abundance of any of the above proteins may be the molar concentration of the protein, e.g., mol/L.
Any of the above materials used to detect the abundance of each protein in a specific protein combination may be reagents and/or devices.
Any of the above-described materials for detecting the abundance of each protein in a specific protein combination may be reagents and/or devices for detecting protein abundance based on TMT-labeled quantitative proteomics.
Any of the above materials used to detect the abundance of each protein in a specific protein combination may be an LC-MS instrument.
The substance for detecting the abundance of each protein in the specific protein combination as described above may be an LC-MS instrument and TMT TM And (3) a label reagent.
The kit also comprises a carrier recorded with a model or a device loaded with the model;
the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women; n is a natural number, and m is a natural number.
The application also provides a device for predicting the risk of premature birth of pregnant women, which comprises a detection device and a result output device;
the detection device is used for detecting the abundance of each protein in the specific protein combination in the pregnancy serum sample of the person to be detected;
the result output device is used for receiving the information of the abundance of each protein in the specific protein combination in the serum sample of the person to be tested, which is output by the detection device, inputting the information into the model, outputting a result by the model, and predicting whether the person to be tested will be premature or normally produced;
the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women; n is a natural number, and m is a natural number.
The application also provides a device for predicting the risk of premature birth of pregnant women, which comprises a model loading device, a detection device and a result output device;
model loading means, i.e. means for loading a model; the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women; n is a natural number, m is a natural number;
the detection device is used for detecting the abundance of each protein in the specific protein combination in the pregnancy serum sample of the person to be detected;
the result output device is used for receiving the information of the abundance of each protein in the specific protein combination in the serum sample of the person to be tested, which is output by the detection device, and inputting the information into the model loading device to predict whether the person to be tested will be premature or normally produced.
n is a statistically significant natural number, and m is a statistically significant natural number.
The abundance of any of the above proteins may be the protein concentration.
The abundance of any of the above proteins may be the molar concentration of the protein, e.g., mol/L.
Any of the above-described devices for detecting the abundance of each protein in a specific protein combination in a pregnancy serum sample of a subject may be a device for quantitatively proteomic detection of protein abundance based on TMT markers.
Any of the above devices for detecting the abundance of each protein in a specific protein combination in a pregnancy serum sample of a subject may be an LC-MS instrument.
The input information (1) is known data or data obtained by self-detection.
The model is built by adopting weka and python machine learning platform tools
The model is a model built using a python machine learning platform tool.
Specifically, the algorithm for model construction is a logistic regression algorithm, a support vector machine algorithm, a decision tree algorithm or a random forest algorithm.
The parameters of the logistic regression algorithm may specifically be: logistic-R1.0E-8-M1-num-decimal-plants 4.
The parameters of the support vector machine algorithm can be specifically: SMO-C1.0-L0.001-P1.0E-12-N0-V-1-W1-K "PolyKernel-E1.0-C250007" -calizer "Logistic-R1.0E-8-M-1 num-dectam-plant 4".
The parameters of the decision tree algorithm can be specifically: J48-C0.25-M2.
The parameters of the random forest algorithm can be specifically: rannomforest-P100-I100-num-slots 1-K0-M1.0-V0.001-S1.
The model constructed by the logistic regression algorithm is a logistic regression model, and is specifically as follows:
model_lr=LogisticRegression(max_iter=5000)
model_lr.fit(X,y)。
the model constructed by the random forest algorithm is a random forest model, and is specifically as follows:
model_rf=RandomForestClassifier(n_estimators=20,random_state=100,max_depth=5)
model_rf.fit(X,y)。
the model constructed by the decision tree algorithm is a decision tree model, and is specifically as follows:
model_dt=DecisionTreeClassifier(criterion=“entropy”)
model_dt.fit(X,y)。
the model constructed by the support vector machine algorithm is a support vector machine model (SVM model), and is specifically as follows:
model_svm=SVC(kernel='linear',C=100)
model_svm.fit(X,y)。
any of the above specific protein combinations consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
SHBG is human SHBG, VEGFR1 is human VEGFR1, MMP8 is human MMP8, and Ceruloplasmin is human Ceruloplasmin.
SHBG is Sex hormone binding globulin (Sex hormone-binding globulin). VEGFR1 is vascular endothelial growth factor receptor 1 (Vascular endothelial growth factor receptor 1). MMP8 is neutrophil collagenase (Neutrophil collagenase). Ceruloplasmin is Ceruloplasmin (CER).
The uniprot ID of SHBG is P04278. The uniprot ID of VEGFR1 is P17948. Uniprot ID for MMP8 is P22894. The uniprot ID of Ceruloplasmin is P00450.
The pregnant woman serum can be specifically serum from 11 weeks to 24 weeks of pregnancy.
The serum of the pregnant woman can be specifically the serum from 11 weeks to 13 weeks of pregnancy.
The pregnant woman serum can be specifically serum from 20 weeks to 24 weeks of pregnancy.
The pregnancy may specifically be from 11 weeks to 24 weeks.
The pregnancy may specifically be 11 weeks to 13 weeks.
The pregnancy may specifically be 20 weeks to 24 weeks of pregnancy.
The current methods for predicting premature delivery based on biomarkers all have the following problems: (1) lower accuracy and sensitivity, less than 60%; (2) The accuracy of premature delivery prediction is generally improved by matching with transvaginal ultrasonic detection, so that the problem existing in the method for predicting premature delivery by transvaginal ultrasonic detection also exists; (3) The cervical fluid is generally detected, the sampling is difficult, and the damage to the mother and the fetus is easy to cause; (4) The expression detection of biomarkers is often limited by poor antibody specificity and is costly due to ELISA.
The application has the beneficial effects that: (1) Because the detection object is serum, the application can avoid the damage to the maternal and fetal of the pregnant woman caused by sampling, and simultaneously increases the operability of detection; (2) The detection of 4 protein markers can be realized by only 1 μl of serum, the serum dosage is low, and the sampling is simple; (3) By the combined detection of 4 protein markers, the accuracy of premature delivery prediction is improved to more than 90%; (4) The mass spectrum is adopted to detect the protein abundance (absolute quantitative analysis) to replace ELISA detection in the prior art, so that the nonspecific binding of antibodies can be effectively avoided, and the simultaneous detection of multiple samples with high flux can be realized; (5) The cost is low, four proteins are detected in one pregnant woman sample at the same time, the reagent cost is less than one hundred yuan, and compared with ELISA, the cost is reduced by 10 times.
The application solves the problems of difficult sampling, low accuracy, low sensitivity, narrow application range and high cost in the existing premature labor prediction method. The application has the advantages of convenient sampling, simple analysis, high accuracy and low cost, and provides a new idea for predicting premature delivery.
Drawings
FIG. 1 is a graph of 10-fold cross-validation results; the expression levels of SHBG, VEGFR1, MMP8 and Ceruloplasmin were used as ROC plots for the prediction of preterm labor, and the respective AUCs were calculated using four calculation methods, respectively.
FIG. 2 is a graph of 5-fold cross-validation results; the expression levels of SHBG, VEGFR1, MMP8 and Ceruloplasmin were used as ROC plots for the prediction of preterm labor, and the respective AUCs were calculated using four calculation methods, respectively.
Fig. 3 is a model effect diagram of the SVM algorithm construction.
And (3) SVM: support vector machine algorithms.
Detailed Description
The following detailed description of the application is provided in connection with the accompanying drawings that are presented to illustrate the application and not to limit the scope thereof. The examples provided below are intended as guidelines for further modifications by one of ordinary skill in the art and are not to be construed as limiting the application in any way.
The experimental methods in the following examples, unless otherwise specified, are conventional methods, and are carried out according to techniques or conditions described in the literature in the field or according to the product specifications. Materials, reagents and the like used in the examples described below are commercially available unless otherwise specified. Unless otherwise indicated, the quantitative tests in the examples below were all performed in triplicate, and the results averaged. In tables 1 and 2, "E-10" means ". Times.10 -10 ", and so on.
In the examples, the concentration of protein in serum was determined using a TMT label-based targeted protein detection method (TMT technology). TMT technology, which is totally called TMT labeling quantitative proteomics technology, is a polypeptide in vitro labeling technology developed by the American Thermo Scientific company, adopts 2, 6 or 10 isotope labels, specifically labels the primary ammonium groups of polypeptides, and can simultaneously compare the relative contents of proteins in different samples through tandem mass spectrometry LC-MS/MS analysis. Technical principle of TMT technology: the TMT label consists of a reporter group, a balance group and a reaction group, wherein the reporter group has 2/6/10 different molecular weights, the balance group also has 2/6/10 different molecular weights, and 2, 6 or 10 different ectopic labels with equal relative molecular weights are formed by matching the different reporter groups; ensuring that the labeled identical peptide fragments from different sources have the same mass-to-charge ratio in the primary mass spectrum; the reactive group can carry out covalent reaction with the N end of the peptide fragment and the amino group of the lysine side chain, so that the peptide fragment is marked by the TMT reagent. The experimental procedure is as follows: the TMT reagent marks the peptide after enzymolysis through a reaction group with high efficiency; in the primary mass spectrum scanning, the same peptide segment marked by different TMT reagents has the same mass-to-charge ratio, and then after HCD (high energy collision fragmentation) fragmentation, the reporter group on each peptide segment is released, and the signal intensity of each reporter ion is obtained through the secondary mass spectrum scanning and is used for representing the abundance of each peptide segment in each sample; finally, the quantitative result of the protein is obtained through software processing. And obtaining the content of the target protein by taking the content of the internal reference standard peptide fragment as a reference, thereby obtaining the molar concentration of the target protein in serum.
Example 1 screening of protein markers and construction of models
1. Obtaining serum samples
110 pregnancy serum samples from 110 pregnant women, wherein 26 serum samples are serum of premature pregnant women, and 84 serum samples are serum of normal production pregnant women. 110 pregnant women had been produced, and were all single fetuses, of which 26 had developed premature labor (delivery before 37 weeks of gestation) and 84 had been produced normally (delivery after 37 weeks of gestation). The age range of 110 pregnant women is 20-40 years old. The time for collecting serum was: the pregnancy period is 11 weeks to 13 weeks. Method for collecting serum: collecting whole venous blood, standing at room temperature until blood coagulates (about 10 min), centrifuging at 4deg.C for 15 min at 1000g, and collecting supernatant to obtain serum sample, and storing at-80deg.C. The trial had informed consent from each pregnant woman.
2. Preliminary screening of protein markers
Through a large number of pre-experiments and pre-analysis, a large number of candidate protein markers related to premature delivery of pregnant women are obtained through preliminary screening.
3. Screening candidate protein markers for potent protein markers
1. Detecting the content of candidate protein markers
And taking a serum sample, and measuring the content of each candidate protein marker by adopting a TMT (transition metal-based) marker-based targeting protein detection method. The mass spectrum raw data were analyzed using software Proteome Discoverer 2.3 to obtain relative quantitative information for each peptide fragment relative to the reference peptide fragment, thereby obtaining the molar concentration of each candidate protein marker in serum. Filtering the candidate protein markers with the deletion value of more than or equal to 20 percent in the sample, and filling the candidate protein markers with the deletion value of less than 20 percent by using an average value.
2. Screening for potent protein markers
And adopting a sequence forward selection (Sequential Forward Search) method in machine learning to perform feature selection on the filtered data. Sequence forward selection is a feature selection method that can find the optimal feature subset to improve the accuracy of the predictive model, i.e. starting from the empty set, by cross-validation, find the first marker from all candidate protein markers that optimizes the premature predictive objective function, then by the same strategy, select one marker at a time to add to the optimal feature subset. This process is repeated and marker screening is stopped when the predicted objective function reaches optimal performance. The predictive performance evaluation of each round of feature subsets is performed using a support vector machine (Support Vector Machine).
Finally, 4 potent protein markers associated with premature labor were screened.
The 4 effective protein markers associated with premature labor are: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
SHBG is Sex hormone binding globulin (Sex hormone-binding globulin), which is an androgen transporter that can bind testosterone, 5-dihydroalpha testosterone and 17-beta-estradiol, regulate its plasma metabolic clearance by controlling plasma concentration of steroid hormones, and at the same time participate in receptor-mediated biological processes. VEGFR1 is vascular endothelial growth factor receptor 1 (Vascular endothelial growth factor receptor 1), a tyrosine protein kinase, which is a cell surface receptor of VEGFA, VEGFB and PGF, and plays an important role in embryonic vascular development, angiogenesis regulation, cell survival and migration, macrophage function, cancer cell chemotaxis and invasion, and the like. MMP8 is neutrophil collagenase (Neutrophil collagenase) that degrades fibril I, II and type III collagen. Ceruloplasmin is synthesized by liver, can regulate copper ion distribution at various parts of organism, and has antioxidant effect. The uniprot ID of SHBG is P04278. The uniprot ID of VEGFR1 is P17948. Uniprot ID for MMP8 is P22894. The uniprot ID of Ceruloplasmin is P00450.
4. Modeling through training set
1. Abundance data (molar concentration in serum) of 4 effective protein markers were obtained, see table 1. In Table 1, samples 1 to 26 are the pregnant serum of 26 premature pregnant women, and samples 27 to 110 are the pregnant serum of 84 pregnant women who are producing the premature pregnant women.
TABLE 1 protein concentration in serum (mol/L)
2. And (5) performing model construction to obtain a prediction model for premature labor prediction.
Model construction using weka and python machine learning platform tools, the algorithms for model construction are: logistic-R1.0E-8-M1-num-decumal-plants 4), support vector machine algorithm (parameters: SMO-C1.0-L0.001-P1.0E-12-N0-V-1-W1-K "PolyKernel-E1.0-C250007" -calibrator "Logistic-R1.0E-8-M-1 num-decumal-plants 4"), decision tree algorithm (parameters: J48-C0.25-M2), random forest algorithm (parameters: random forest-P100-I100-num-slots 1-K0-M1.0-V0.001-S1).
The python computer language for marker screening and model construction is as follows:
/>
the logical language corresponding to the python computer language for the marker screening and model construction described above is as follows:
premature proteomics training data read and set it to DataFrame data format (lines 1-2);
designating the read data as a training set (3 rows);
the training set data type is set to float (line 4);
checking whether the data has full line missing or not, and extracting data without full line missing (5-6 lines);
filling the remaining missing values (lines 7-8) by means of the simpleInput mean method;
generating a new variable target, and designating the new variable target as class label (10 rows);
designating the label of the first 26 samples as premature 1 and the label of the last 84 samples as normal labor 0 (rows 11-12);
feature selection is carried out by using an SFS method, and the original data is subjected to dimension reduction (13 rows);
feature selection pre-processing (lines 14-15);
screening premature delivery related protein markers by characteristic selection (15-30 rows);
establishing a logistic regression model (31-32 rows);
performing performance evaluation on the logistic regression model (33-34 rows);
establishing a random forest model (35-36 rows);
performing performance evaluation on the random forest model (lines 37-38);
establishing a decision tree model (39-40 rows);
performing performance evaluation on the decision tree model (lines 41-42);
SVM model parameter optimization (43-47);
building SVM models (lines 48-49);
performing performance evaluation on the SVM model (50-51 rows);
3. the performance of each predictive model in the training set was evaluated using 10-fold cross-validation and 5-fold cross-validation, with results shown in figures 1 and 2.
Example 2 verification of a model by a verification set
1. Obtaining serum samples
And 18 pregnancy serum samples from 18 pregnant women, wherein 9 serum samples are serum of premature pregnant women, and 9 serum samples are serum of normal production pregnant women. All 18 pregnant women had been produced, all single fetuses, 9 of which had developed premature labor (delivery before 37 weeks of gestation) and 9 of which had been produced normally (delivery after 37 weeks of gestation). The age range of 18 pregnant women is 20-40 years old. The time for collecting serum was: the pregnancy period is 20 to 24 weeks. Method for collecting serum: collecting whole venous blood, standing at room temperature until blood coagulates (about 10 min), centrifuging at 4deg.C for 15 min at 1000g, and collecting supernatant to obtain serum sample, and storing at-80deg.C. The trial had informed consent from each pregnant woman.
2. Detecting the abundance of four protein markers in each serum sample
4 protein markers: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
Serum samples were taken and the molar concentration of each protein marker in serum was measured using the TMT label binding parallel reaction monitoring (Parallel Reaction Monitoring) assay format, the results are shown in Table 2. In Table 2, samples 1 to 9 are the pregnant serum of 9 premature pregnant women, and 10 to 18 samples are the pregnant serum of 9 pregnant women who are producing the premature pregnant women.
TABLE 2 protein concentration in serum (mol/L)
SHBG MMP-8 VEGFR1 Ceruloplasmin
Sample 1 1.35E-09 8.53E-10 1.23E-09 1.28E-10
Sample 2 1.63E-09 8.08E-10 1.24E-09 1.40E-10
Sample 3 1.55E-09 8.33E-10 1.52E-09 1.20E-10
Sample 4 1.53E-09 7.03E-10 1.01E-09 1.36E-10
Sample 5 1.73E-09 7.43E-10 1.19E-09 1.60E-10
Sample 6 2.05E-09 8.84E-10 1.23E-09 1.80E-10
Sample 7 1.52E-09 8.13E-10 1.16E-09 7.98E-11
Sample 8 7.10E-10 4.62E-10 8.40E-10 1.60E-11
Sample 9 8.58E-09 2.61E-09 1.39E-09 1.96E-10
Sample 10 2.19E-09 7.63E-10 1.45E-09 6.78E-11
Sample 11 3.22E-09 9.44E-10 1.66E-09 1.04E-10
Sample 12 2.18E-09 7.68E-10 1.55E-09 9.18E-11
Sample 13 2.28E-09 7.48E-10 1.12E-09 8.38E-11
Sample 14 1.91E-09 7.53E-10 1.00E-09 4.79E-11
Sample 15 2.19E-09 8.18E-10 1.29E-09 4.79E-11
Sample 16 2.40E-09 5.87E-10 9.43E-10 7.98E-11
Sample 17 2.86E-09 6.12E-10 8.35E-10 4.39E-11
Sample 18 2.73E-09 6.68E-10 1.19E-09 7.58E-11
3. Verification model
Substituting the protein concentration data obtained in the second step into each of the prediction models (logistic regression model, random forest model, decision tree model and SVM model, respectively) established in the fourth step of example 1, and performing model evaluation.
The logistic regression model corresponds to lines 31-32 of the aforementioned python computer language.
The random forest model corresponds to lines 35-36 of the aforementioned python computer language.
The decision tree model corresponds to lines 39-40 in the aforementioned python computer language.
The SVM model corresponds to lines 48-49 of the aforementioned python computer language.
The prediction result (predicted as premature or predicted as normal production) for each sample is output by the model. The predicted result is evaluated based on the actual result (actual premature or actual normal production).
The evaluation index includes Accuracy (Accuracy), true positive Rate (TP Rate), false positive Rate (FP Rate), precision (Precision), and area under ROC curve (AUC).
The results are shown in FIG. 3. The model constructed by the SVM algorithm works best, and the AUC value is 0.722.
The present application is described in detail above. It will be apparent to those skilled in the art that the present application can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the application and without undue experimentation. While the application has been described with respect to specific embodiments, it will be appreciated that the application may be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The application of some of the basic features may be done in accordance with the scope of the claims that follow.

Claims (8)

1. Use of a substance for detecting the abundance of each protein in a specific protein combination in the preparation of a kit; the kit has the function of predicting the risk of premature birth of a pregnant woman; the specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
2. The use according to claim 1, wherein: the substance for detecting the abundance of each protein in the specific protein combination is a substance for detecting the abundance of each protein in the specific protein combination in the serum of the pregnant woman.
3. Use of a specific protein combination in serum as a detection target in the development of a reagent or kit for predicting the risk of developing premature birth in a pregnant woman; the specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
4. A kit for predicting the risk of a pregnant woman for developing premature labor, comprising the following components: a substance for detecting the abundance of each protein in the specific protein combination; the specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
5. The kit of claim 4, wherein: the substance for detecting the abundance of each protein in the specific protein combination is a substance for detecting the abundance of each protein in the specific protein combination in the serum of the pregnant woman.
6. The kit of claim 5, wherein: the kit also comprises a carrier recorded with a model or a device loaded with the model;
the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women, wherein n is a natural number, and m is a natural number.
7. An apparatus for predicting the risk of premature birth of a pregnant woman, comprising detection means and outcome output means;
the detection device is used for detecting the abundance of each protein in the specific protein combination in the pregnancy serum sample of the person to be detected;
the result output device is used for receiving the information of the abundance of each protein in the specific protein combination in the serum sample of the person to be tested, which is output by the detection device, inputting the information into the model, outputting a result by the model, and predicting whether the person to be tested will be premature or normally produced;
the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women, wherein n is a natural number, and m is a natural number;
the specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
8. An apparatus for predicting the risk of premature birth of a pregnant woman, comprising model loading means, detection means and outcome output means;
model loading means, i.e. means for loading a model; the model is built up from input information (1) and input information (2) and is used to demonstrate the relationship between the input information (1) and the input information (2); the input information (1) is: modeling information of the abundance of each protein in a specific protein combination in pregnancy serum samples of all subjects in the group; (2) The phenotype of all subjects in the modeling group is that of premature pregnant women or normal pregnant women; the modeling group consists of n premature pregnant women and m normal production pregnant women, wherein n is a natural number, and m is a natural number;
the detection device is used for detecting the abundance of each protein in the specific protein combination in the pregnancy serum sample of the person to be detected;
the result output device is used for receiving the information of the abundance of each protein in the specific protein combination in the serum sample of the person to be tested, which is output by the detection device, and inputting the information into the model loading device to predict whether the person to be tested will be premature or normally produced;
the specific protein combination consists of the following four proteins: SHBG, VEGFR1, MMP8, and Ceruloplasmin.
CN202210568267.5A 2022-05-24 2022-05-24 Four biomarker combinations for identifying premature labor and application thereof Pending CN117147869A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210568267.5A CN117147869A (en) 2022-05-24 2022-05-24 Four biomarker combinations for identifying premature labor and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210568267.5A CN117147869A (en) 2022-05-24 2022-05-24 Four biomarker combinations for identifying premature labor and application thereof

Publications (1)

Publication Number Publication Date
CN117147869A true CN117147869A (en) 2023-12-01

Family

ID=88910632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210568267.5A Pending CN117147869A (en) 2022-05-24 2022-05-24 Four biomarker combinations for identifying premature labor and application thereof

Country Status (1)

Country Link
CN (1) CN117147869A (en)

Similar Documents

Publication Publication Date Title
US10215760B2 (en) Detection of intraamniotic and/or infection
Romero et al. The maternal plasma proteome changes as a function of gestational age in normal pregnancy: a longitudinal study
EP2118664B1 (en) Peptide markers for diagnosis of preeclampsia
CN104704364B (en) For the prediction of pre-eclampsia and/or HELLP syndrome or the biomarker test of early detection
WO2009097579A1 (en) Gestational age dependent proteomic changes of human maternal serum for monitoring maternal and fetal health
CN110305954B (en) Prediction model for early and accurate detection of preeclampsia
AU2009279809A1 (en) Multiplexed diagnostic test for preterm labor
JP2020533595A5 (en)
CN113564242A (en) Application of apoptosis-related gene in repeated planting failure
CN117147869A (en) Four biomarker combinations for identifying premature labor and application thereof
US20170122959A1 (en) Early placenta insulin-like peptide (pro-epil)
Kobayashi et al. Search for amniotic fluid-specific markers: Novel biomarker candidates for amniotic fluid embolism
US11255861B2 (en) Method for determining the risk of preterm birth
CN113791224B (en) Early warning method for recurrent abortion caused by unknown reasons based on follicular fluid protein expression
CN110577988B (en) Fetal growth restriction prediction model
Kanagasabai Biochemical markers in the prediction of pre-eclampsia, are we there yet
AU2016330398A1 (en) A method of treatment and prognosis
WO2023152203A1 (en) Methods for prediction and monitoring of spontaneous preterm birth
Parveen Maternal Serum Soluble Fms-Like Tyrosine Kinase-1 and Placental Growth Factor Ratio as A Short Term Predictor of Pre-Eclampsia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination