WO2023102840A1 - Use of gene marker in predicting risk of preeclampsia in pregnant woman - Google Patents

Use of gene marker in predicting risk of preeclampsia in pregnant woman Download PDF

Info

Publication number
WO2023102840A1
WO2023102840A1 PCT/CN2021/136842 CN2021136842W WO2023102840A1 WO 2023102840 A1 WO2023102840 A1 WO 2023102840A1 CN 2021136842 W CN2021136842 W CN 2021136842W WO 2023102840 A1 WO2023102840 A1 WO 2023102840A1
Authority
WO
WIPO (PCT)
Prior art keywords
preeclampsia
related diseases
risk
pregnant women
pregnant woman
Prior art date
Application number
PCT/CN2021/136842
Other languages
French (fr)
Chinese (zh)
Inventor
徐晨明
王琳
陈松长
王文婧
黄荷凤
徐讯
孙井花
Original Assignee
复旦大学附属妇产科医院
深圳华大基因股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 复旦大学附属妇产科医院, 深圳华大基因股份有限公司 filed Critical 复旦大学附属妇产科医院
Priority to CN202180102282.4A priority Critical patent/CN117940583A/en
Priority to PCT/CN2021/136842 priority patent/WO2023102840A1/en
Publication of WO2023102840A1 publication Critical patent/WO2023102840A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis

Definitions

  • the present invention relates to the field of preeclampsia diseases, in particular to the application of gene markers in predicting the risk of preeclampsia or related diseases in pregnant women.
  • Preeclampsia is a pregnancy disorder associated with new-onset hypertension in pregnancy, affecting 3–5% of pregnancies [1] .
  • preeclampsia is defined as systolic blood pressure ⁇ 140mmHg and/or diastolic blood pressure ⁇ 90mmHg in pregnant women after 20 weeks of gestation, accompanied by any one of the following: urine protein quantity ⁇ 0.3g/24h, or urine protein/ Creatinine ratio ⁇ 0.3, or random urine protein ⁇ (+) (examination method when protein quantification is unconditional); no proteinuria but accompanied by any one of the following organs or system involvement: heart, lung, liver, kidney and other important organs, Or abnormal changes in the blood system, digestive system, nervous system, placenta-fetus is affected, etc.
  • preeclampsia is divided into four types according to the time of diagnosis and delivery: early-onset, late-onset preeclampsia, premature, and
  • preeclampsia it is known that some risk factors for preeclampsia are used, such as advanced age; family history and history of preeclampsia; pregnancy interval; assisted reproductive technology; The lack of known risk factors does not mean that preeclampsia will not occur, and it is not accurate to predict the high-risk population of preeclampsia through maternal risk factors.
  • researchers have conducted some transcriptomic studies on the mechanism of placenta [8] , plasma [9] , decidua [10] , exosomes [11] , amniotic fluid [12] preeclampsia, and reported some Potential biomarkers associated with early prediction of preeclampsia.
  • Placental cell dysfunction can lead to serious pregnancy complications, but invasive placental tissue sampling will cause certain unsafety for pregnant women and fetuses, and analysis of free circulating RNA in maternal blood can be performed non-invasively
  • the discovery of abnormal function of extravillous trophoblast cells in the preeclamptic placenta may help in the early prediction of preeclampsia in pregnant women before the onset of preeclampsia symptoms.
  • the patent "circulating mRNA as a diagnostic marker for pregnancy-related diseases" (authorization number: 100379882) of Lo Yuk-ming of the Chinese University of Hong Kong authorized in 2008 proposes to diagnose, A method and a kit for monitoring or predicting the disease state of preeclampsia, fetal chromosomal aneuploidy and premature birth in pregnant women, and detecting pregnancy in women, said mRNA species comprising encoding human chorionic gonadotropin beta subunit (hCG-beta ), human corticotropin-releasing hormone (hCRH), human placental lactogen (hPL), KiSS-1 transfer suppressor (KISS1), tissue factor pathway inhibitor 2 (TPFI2), placenta-specific 1 (PLAC1), or glycerol Aldehyde-3-phosphate dehydrogenase (GAPDH) mRNA.
  • hCG-beta human corticotropin-releasing hormone
  • hPL human placental lactogen
  • the first step of the method is to quantify the level of one or more specific mRNA species in the blood of said pregnant woman.
  • the mRNA may encode hCG- ⁇ , hCRH, hPL, KISS1, TPFI2, PLAC1 or GAPDH.
  • the second step is to compare the mRNA level obtained in the first step with a standard control representing the level of mRNA encoding the same protein in the blood of normal non-preeclampsia women. An increase or decrease in said mRNA level is indicative of the presence of preeclampsia or an increased risk of developing said disease state.
  • peripheral blood samples of pregnant women were collected, RNA was extracted, real-time quantitative RT-PCR was performed, and statistical analysis was performed using Sigma Stat 2.03 software (SPSS).
  • SPSS Sigma Stat 2.03 software
  • This technology is based on the ratio of sFlt-1/P1GF or endoglin/P1GF to predict preeclampsia. This method can only predict whether there is a risk of preeclampsia in a short period of time, and has certain limitations.
  • the patent for circulating RNA markers specific to preeclampsia submitted by ILLUMINA INC in 2019 (application number: 201980002993.7) relates to a method for detecting preeclampsia and/or determining an increased risk of preeclampsia in pregnant women.
  • the method includes identifying a plurality of circulating RNA (C-RNA) molecules in a biological sample obtained from said pregnant woman.
  • C-RNA circulating RNA
  • the patent is based on the analysis of the confirmed population, and these C-RNAs are used to build models for classification, not prediction.
  • the markers selected by this technology come from the research on cases diagnosed with preeclampsia, which cannot accurately predict preeclampsia before the onset of symptoms.
  • the main purpose of the present invention is to provide the application of gene markers in predicting the risk of preeclampsia or related diseases in pregnant women, so as to provide a high specificity and high sensitivity prediction scheme for the risk of preeclampsia or related diseases in pregnant women.
  • a gene marker for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases Or a combination thereof including one or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
  • gene markers or their combination also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
  • the reagents include the gene markers of the first aspect or their combinations or their expression products Biomolecules that specifically hybridize;
  • the biomolecules include one or more selected from primers, probes and antibodies;
  • the reagents also include related reagents for preparing high-throughput sequencing libraries from the gene markers or combinations thereof in the first aspect above.
  • a method for detecting whether a pregnant woman suffers from preeclampsia or a related disease or predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or a related disease comprising:
  • Step S1 providing biological samples from pregnant women
  • Step S2 determining the expression profile of the above-mentioned gene markers or combinations thereof in the first aspect in the biological sample
  • Step S3 Based on the expression profiles of gene markers or combinations thereof, identifying whether the pregnant woman is suffering from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of the preeclampsia or related diseases in said pregnant women .
  • step S3 identifying whether the pregnant woman suffers from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of preeclampsia or related diseases in pregnant women is by using pregnant women's preeclampsia or The related disease risk prediction model is implemented, and the preeclampsia or related disease risk prediction model of pregnant women is implemented by using the genetic markers in biological samples from pregnant women who have been diagnosed with preeclampsia or related diseases and healthy control pregnant women Expression profiles of substances or combinations thereof are trained to generate a computer.
  • training computer is implemented through machine learning methods
  • the machine learning method comprises one or more of the following: generalized linear model, gradient boosting machine, random forest, support vector machine;
  • the machine learning method automatically calculates the risk score
  • a risk score greater than a threshold indicates that the pregnant woman suffers from preeclampsia or a related disease or has a risk of suffering from preeclampsia or a related disease or has a poor prognosis;
  • the threshold is 0.5;
  • the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; preferably, the biological sample is collected at the 11th to 25th week of pregnancy of the pregnant woman.
  • step S2 the expression profile of the gene markers or their combination is determined by quantitatively analyzing the extracellular free RNA in the biological sample;
  • the extracellular free RNA in the biological sample is quantitatively analyzed by high-throughput sequencing or RT-PCR;
  • a high-throughput sequencing method is used to quantitatively analyze the extracellular free RNA in the biological sample.
  • a kit comprising the gene markers of the above-mentioned first aspect of the present invention or a combination thereof, and/or the reagents of the above-mentioned second aspect of the present invention.
  • the gene marker of the above-mentioned first aspect of the present invention or its combination and/or the reagent of the second aspect in the preparation of a kit for detecting whether a pregnant woman suffers from eclampsia Preeclampsia or related diseases or predicting the risk or prognosis of pregnant women with preeclampsia or related diseases.
  • a device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases is provided.
  • Preeclampsia or related disease risk prediction model said prediction model is by using the expression of the gene markers or the combination thereof in the above-mentioned first aspect of the present invention in biological samples derived from pregnant women diagnosed with preeclampsia or related diseases The spectrum is generated by training the computer.
  • a method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of a pregnant woman suffering from preeclampsia or related diseases includes the step of detecting a differentially expressed substance between biological samples derived from a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases, wherein the differentially expressed substance includes the above-mentioned first aspect of the present invention Gene markers or combinations thereof.
  • the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; preferably, the biological sample is collected during the 11th to 25th week of gestation of a pregnant woman.
  • a computer-readable storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute the method for detecting pregnant women in the fourth aspect of the present invention Whether suffering from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases or the seventh aspect of the present invention for detecting whether pregnant women have preeclampsia or related diseases or predicting pregnant women Methods for modeling the risk or prognostic effect of having preeclampsia or related disorders.
  • a processor the processor is used to run a program, wherein, when the program is running, the fourth aspect of the present invention is executed for detecting whether a pregnant woman suffers from preeclampsia or related diseases or The method for predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or related diseases, or the method for detecting whether a pregnant woman has preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases according to the seventh aspect of the present invention How to model the effect.
  • the use of genetic markers as targets for screening drugs for the treatment or prevention of preeclampsia or related diseases in pregnant women wherein the genetic markers include the genetic markers of the first aspect of the present invention substances or combinations thereof.
  • the eleventh aspect of the present invention there is provided a use of a gene marker in detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of a pregnant woman suffering from preeclampsia or related diseases, wherein the gene
  • the markers include the gene markers of the first aspect of the present invention or a combination thereof.
  • a drug for treating or preventing preeclampsia or related diseases in pregnant women characterized in that the drug can increase the expression of PAAF1 in the pregnant woman; or the drug It can reduce the expression of one or more genes among ATF6, ATP6AP2, EVI2B, FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2 and ZNF224 in the pregnant woman.
  • the present invention aims at the problem of low prediction accuracy of the risk of preeclampsia or related diseases in pregnant women in the prior art, and proposes to use the gene markers of the application as detection targets.
  • the association of related diseases has achieved high specificity and high sensitivity risk prediction for preeclampsia or related diseases in pregnant women.
  • Fig. 1 shows the AUC curve graph predicted by the model in a preferred embodiment of the present invention.
  • this application compares the gene expression differences between the preeclampsia or related disease group and the non-preeclampsia or related disease group in the first and second trimesters (before the diagnosis of the disease), and combines machine learning algorithms to screen Gene markers for predicting the risk of preeclampsia or related diseases have been developed, and the high accuracy prediction of the risk of preeclampsia or related diseases during pregnancy has been realized by constructing a model.
  • the gene markers and prediction model of the present invention have high specificity and sensitivity for the prediction of the risk of preeclampsia or related diseases, and can detect the risk of preeclampsia or related diseases in pregnant women with high accuracy in the first trimester, realizing Intervene early.
  • preeclampsia-related diseases include systolic blood pressure ⁇ 140 mmHg and/or diastolic blood pressure ⁇ 90 mmHg in pregnant women after 20 weeks of gestation, accompanied by any one of the following: urine protein quantity ⁇ 0.3 g/24h, or Urinary protein/creatinine ratio ⁇ 0.3, or random urinary protein ⁇ (+) (examination method when protein quantification is unconditional); no proteinuria but with involvement of any of the following organs or systems: heart, lung, liver, kidney, etc. Vital organs, or abnormal changes in the blood system, digestive system, nervous system, placenta-fetus are affected, etc.
  • the FIGO guidelines divide preeclampsia into four types according to the time of diagnosis and delivery: early-onset, late-onset preeclampsia, premature, and term preeclampsia.
  • a gene marker or a combination thereof for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis effect of pregnant women suffering from preeclampsia or related diseases is provided, which includes One or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
  • This application is the first to discover that gene markers in biological samples of pregnant women (including one or more of EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224) are associated with preeclampsia or related diseases in pregnant women
  • Significant correlation therefore, can be used as a marker for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting pregnant women with preeclampsia or related diseases.
  • medical means can be used to intervene the pregnant woman with the disease, and the prognostic effect of preeclampsia or related diseases can be monitored through the expression profile of these gene markers or their combination in the present invention.
  • the above-mentioned gene markers or combinations thereof further include one or more of ATF6, ATP6AP2, FOS, and RASA2.
  • each of the genes listed above can be used alone or in combination.
  • each genetic marker can be used alone, or any two or more of them can be used in combination.
  • the combination of all the following genes can also be used as gene markers: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224, ATF6, ATP6AP2, FOS, RASA2, so as to achieve preeclampsia or Associated disease risk prediction.
  • the present invention provides a reagent for detecting the above-mentioned gene markers or their combinations
  • the reagents include specific hybridization with the above-mentioned gene markers or their combinations or their expression products Biomolecules; preferably, the biomolecules include one or more selected from primers, probes and antibodies; more preferably, the reagents also include preparing the RNA of the above-mentioned gene markers or a combination thereof into a high-throughput Related reagents for sequencing libraries.
  • the detection reagents for gene markers or combinations thereof preferably include probes and/or primers for detecting gene markers or combinations thereof, specifically one or more probes and/or primers that specifically bind (hybridize) to gene markers Or one or more primers that specifically amplify gene markers.
  • the detection reagent of the present invention preferably includes a reagent for converting RNA in a biological sample into a library of cDNA fragments.
  • a method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women with preeclampsia or related diseases comprising:
  • Step S1 providing biological samples from pregnant women
  • Step S2 determining the expression profile of the above-mentioned gene markers or combinations thereof of the present invention in the biological sample
  • Step S3 Based on the expression profiles of gene markers or combinations thereof, identifying whether the pregnant woman is suffering from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of the preeclampsia or related diseases in said pregnant women .
  • the identification in step S3 can be carried out in a conventional way of comparing with healthy controls, or can be carried out by using a computer prediction model.
  • the identification process of the above step S3 may include the step of comparing the expression profile with a reference data set or reference value; preferably, the reference data set or reference value includes biological samples derived from healthy control pregnant women.
  • the reference data set or reference value in the present invention refers to the expression profile of each gene marker obtained by operating the samples of healthy control individuals, which is used as a reference or control for the expression profile of the above gene marker.
  • the reference data set or reference value in the present invention refers to the reference value or normal value of healthy controls. Those skilled in the art know that when the sample volume is large enough, detection and calculation methods known in the art can be used to obtain the normal value (absolute value) range of each gene marker in the sample.
  • the absolute value of the gene marker level in the sample can be directly compared with the reference value, so as to assess the risk of disease and diagnose or early diagnose preeclampsia or related diseases.
  • a reduction in PAAF1 when compared to a reference data set or reference value indicates that said pregnant woman has preeclampsia or a related disorder or is at risk of having preeclampsia or a related disorder or has a poor prognosis
  • the increase of ATF6, ATP6AP2, EVI2B, FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2, ZNF224 indicates that the pregnant woman suffers from preeclampsia or related diseases or has preeclampsia or related diseases risk or poor prognosis.
  • step S3 it is identified whether the pregnant woman is suffering from preeclampsia or related diseases or is at risk of suffering from preeclampsia or related diseases or the pregnant woman is
  • the prognostic effect of preeclampsia or related diseases is implemented by using a risk prediction model for preeclampsia or related diseases in pregnant women, which is obtained by using The expression profiles of the gene markers or combinations thereof in the biological samples of pregnant women with related diseases and healthy control pregnant women are generated by training a computer.
  • a training set and a verification set need to be used.
  • the training set and validation set have meanings known in the art.
  • the training set refers to a data set comprising a certain number of samples of gene marker expression profiles in diagnosed patients with preeclampsia or related diseases and healthy control samples.
  • the verification set is an independent data set used to test the performance of the training set and the effect of the model.
  • Machine learning generally refers to algorithms that give computers the ability to learn without being explicitly programmed, including algorithms that learn from data and make predictions about that data.
  • the machine learning methods used in the present invention may include random forest, least absolute shrinkage and selection operator logistic regression, regularized logistic regression, XGBoost, decision tree learning, artificial neural network, deep neural network, support vector machine, rule-based machine Learning, Generalized Linear Models, Gradient Boosting Machines, etc.
  • Preferred machine learning methods include one or more of the following: generalized linear models, gradient boosting machines, random forests, and support vector machines.
  • the risk score automatically calculated by the model can be used to evaluate and predict the risk or prognosis of preeclampsia or related diseases.
  • a risk score greater than a threshold indicates that the pregnant woman has or is at risk of having preeclampsia or a related disorder or has a poor prognosis.
  • the threshold is 0.5.
  • the risk score is greater than 0.5, it is considered that the pregnant woman suffers from preeclampsia or related diseases or is at risk of suffering from preeclampsia or related diseases (or the risk of preeclampsia or related diseases is high) or the prognosis is poor, if the risk A score of less than 0.5 indicates that the pregnant woman does not have preeclampsia or a related disorder or is not at risk of developing preeclampsia or a related disorder (or is at low risk of preeclampsia or a related disorder) or has a good prognosis.
  • the biological sample derived from a pregnant woman can be one or more of the following: plasma, serum, whole blood, urine, amniotic fluid. Plasma, serum or whole blood derived from pregnant women are preferably used for the detection and identification steps of the present invention.
  • the biological sample is most preferably plasma, for example, peripheral blood can be obtained from a pregnant woman and subjected to plasma separation to obtain a plasma biological sample to be used.
  • plasma, serum or whole blood other bodily fluid samples such as urine, amniotic fluid, etc. can also be used.
  • Biological samples can be obtained by conventional methods in the art.
  • the collection of biological samples can be carried out at the 11th to 25th gestational week (preferably 13 to 25th gestational week) of the pregnant woman.
  • the application population of the present invention does not need to distinguish whether pregnant women are high-risk or not, and can be applied to asymptomatic general pregnant populations.
  • the present invention can realize the prediction of preeclampsia or related diseases in the first trimester.
  • the present invention can realize prediction before symptoms appear (up to 18 weeks in advance). Therefore, the method of the present invention is applicable to a wider population and has more clinical applicability.
  • the expression profile of the gene markers or their combination is determined by quantitatively analyzing the free extracellular RNA (cfRNA) in the biological sample; preferably, high-throughput sequencing or RT Quantitative analysis of free extracellular RNA in biological samples by PCR; more preferably, quantitative analysis of free extracellular RNA in biological samples by high-throughput sequencing (such as next-generation sequencing).
  • cfRNA free extracellular RNA
  • the free extracellular RNA in the biological sample can be extracted by a method or a kit commonly used in the art or a combination of the two.
  • cell-free extracellular RNA can be isolated from plasma biological samples using TRIzol LS standard RNA extraction procedures.
  • the quantitative analysis of extracellular free RNA includes the method of building a library of extracellular free RNA, which can simultaneously capture long and short fragments of RNA in plasma, providing more information for prediction. Characteristics. Sequencing of cell-free extracellular RNA can be performed using whole-transcriptome sequencing [13] , using next-generation sequencing to sequence cell-free extracellular RNA in biological samples (preferably plasma) from pregnant women. RT-PCR method can also be used for analysis. The expression profile of extracellular free RNA can also be quantitatively analyzed by other methods known in the art such as qPCR.
  • the quantitative analysis of extracellular free RNA also includes the step of quality control of the original extracellular free RNA sequencing data, preferably including cutting adapters, removing low-quality reads, removing ⁇ 17bp reads, and removing rRNA sequence, value RNA and Y RNA sequence, the remaining reads are first compared to the human transcriptome (the order is miRNA, tRNA and piRNA, mRNA and lncRNA, and finally other RNAs), and then the remaining reads are compared to the human genome.
  • the human transcriptome the order is miRNA, tRNA and piRNA, mRNA and lncRNA, and finally other RNAs
  • a prediction kit for the gene markers of the present invention or a combination thereof can be prepared according to the existing kit preparation principles. Detection probes, chips, etc. for predicting the risk of preeclampsia or related diseases in pregnant women can also be prepared for these gene markers.
  • the present invention achieves high specificity and high sensitivity for preeclampsia or related diseases in pregnant women by using specific gene markers as detection targets, based on the correlation between the expression profile of gene markers and pregnant women's preeclampsia or related diseases risk prediction.
  • the present invention provides a kit, which may include the above-mentioned gene markers of the present invention or a combination thereof.
  • the present invention provides a kit for predicting the risk or prognosis of preeclampsia or related diseases in pregnant women.
  • the kit includes detection reagents for genetic markers, and the genetic markers include the following: One or more genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
  • the kit is used for prediction, which makes the prediction more convenient, simple and fast.
  • the above gene markers also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
  • the detection reagents for gene markers may include probes and/or primers for detecting gene markers, specifically one or more probes and/or primers that specifically bind (hybridize) to gene markers One or more primers that specifically amplify a genetic marker.
  • RNA sequencing generally includes a reverse transcription step to generate cDNA molecules for sequencing, when RNA sequencing is used, the kit of the present invention may also include reagents for converting RNA in a biological sample into a library of cDNA fragments.
  • the application of the above-mentioned gene markers or combinations thereof and/or the above-mentioned detection reagents in the preparation of a kit for detecting whether a pregnant woman suffers from preeclampsia or related diseases or To predict the risk or prognostic effect of pregnant women with preeclampsia or related disorders.
  • the present invention provides a device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases, the device There is a built-in risk prediction model for pregnant women with preeclampsia or related diseases, which is developed by using the expression profiles of the above gene markers or their combinations in biological samples from pregnant women who have been diagnosed with preeclampsia or related diseases. produce.
  • the prediction model is a generalized linear model, a gradient boosting machine, a random forest or a support vector machine model.
  • a method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or related diseases includes the step of detecting the differentially expressed substances between the biological samples derived from the population of pregnant women with preeclampsia or related diseases and the population of pregnant women without preeclampsia or related diseases, wherein the differentially expressed substances include the above-mentioned gene markers of the present invention substances or combinations thereof.
  • the construction method includes: detecting the differential expression of gene markers in biological samples derived from a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases; Part of the group of pregnant women with preeclampsia or related diseases and part of the group of pregnant women without preeclampsia or related diseases are used as the training set, and the best gene markers are screened out using the training set; in the training set, the best gene markers are used to train the computer,
  • the risk prediction model for pregnant women with preeclampsia or related diseases is obtained; the remaining part of the group of pregnant women with preeclampsia or related diseases and the remaining group of pregnant women without preeclampsia or related diseases are used as a verification set, and the verification set is used to verify the risk of pregnant women.
  • the best genetic markers include one or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
  • the above optimal gene markers also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
  • the biological sample used in the model construction method of the present invention is preferably one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; particularly preferably plasma, serum, whole blood; most preferably plasma. Also, biological samples can be collected from the 11th to 25th week of pregnancy.
  • a machine learning method can be used, preferably the machine learning method includes one or more of the following: generalized linear model, gradient boosting machine, random forest and support vector machine.
  • the training set and the verification set can be split according to a certain ratio according to needs, preferably, all pregnant women with preeclampsia or related diseases are randomly split into the training set according to the ratio of 2:1
  • the verification set all pregnant women without preeclampsia or related diseases were randomly split into a training set and a verification set according to the ratio of 2:1.
  • the screening of the best gene markers is done in the training set, and the validation set is used to test the prediction effect of the best gene markers and models.
  • the candidate gene markers are preliminarily screened by comparing the difference in gene expression profile between a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases.
  • This step can use the DESeq2 package (R package) implementation.
  • R package DESeq2 package
  • the difference and stability of the average expression level in the two populations will be considered in this step (preferably the average expression level difference is greater than 1, and the p value is less than 0.001), and finally the genes that pass the screening become candidate gene markers things.
  • the two models can be used to screen according to the importance of features, and the gene markers with higher frequency can be selected as the best gene markers.
  • two models of generalized linear and random forest are used to carry out 7-fold cross-validation, and a plurality (for example, 30) of the most important molecules are screened out. This step can be iterated 100 times, and then the molecules with a frequency higher than 50% are selected The gene marker with the highest ranking of importance among them was regarded as the best gene marker.
  • each method uses 7-fold cross-validation to select the optimal parameters for prediction model construction.
  • the resulting model can be validated against the validation set.
  • the model with the best effect can be selected and the feature importance can be calculated through the effect verification of the verification set.
  • the prediction model constructed by the method of the present invention can be used in the first trimester and up to 18 weeks in advance, and only need to collect peripheral blood from pregnant women to use non-invasive methods to predict the risk of preeclampsia or related diseases, predict The effect (sensitivity, specificity) is higher than the state of the art.
  • the present application can be realized by means of software plus necessary detection instruments and other hardware devices.
  • the data processing part in the technical solution of the present application can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, optical disks, etc., including several instructions.
  • a computer device which may be a personal computer, a server, or a network device, etc. executes the methods of various embodiments or some parts of the embodiments of the present application.
  • the application can be used in numerous general purpose or special purpose computing system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
  • modules or steps of the above-mentioned application can be implemented on general-purpose computing devices, and they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices , alternatively, they can be implemented with executable program codes of the computing device, thus, they can be stored in the storage device and executed by the computing device, or they can be made into individual integrated circuit modules respectively, or the Multiple modules or steps are implemented as a single integrated circuit module.
  • the present application is not limited to any specific combination of hardware and software.
  • a computer-readable storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned method for detecting whether a pregnant woman suffers from preeclampsia. or related diseases or methods for predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases or performing the above methods for detecting whether pregnant women have preeclampsia or related diseases or predicting the risk of pregnant women suffering from preeclampsia or related diseases Or the construction method of the model of prognostic effect.
  • a processor is provided, and the processor is used to run a program, wherein, when the program is running, the above-mentioned method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting that a pregnant woman has preeclampsia is performed.
  • the present invention also provides the use of genetic markers as targets for screening drugs for the treatment or prevention of preeclampsia or related diseases in pregnant women, wherein the genetic markers include the genetic markers described above in the present invention substances or combinations thereof.
  • the present invention also provides the use of genetic markers in detecting whether pregnant women suffer from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases, wherein the genetic markers include the above The gene markers or a combination thereof.
  • the present invention also provides a drug for treating or preventing preeclampsia or related diseases in pregnant women, characterized in that the drug can increase the expression of PAAF1 in the pregnant woman; or the drug can increase the expression of PAAF1 in the pregnant woman.
  • the peripheral blood of 64 cases of singleton pregnant women was obtained from the hospital, and the blood was collected from 13 to 25 weeks of gestation. There were 31 and 33 pregnant women with preeclampsia and non-preeclampsia respectively (see Table 1 for relevant data of pregnant women). In the preeclampsia group, the gestational age difference from blood collection to preeclampsia diagnosis was 18 weeks. All blood samples were immediately stored at 4°C and plasma separation was performed within 8 hours. Plasma was separated by a two-step centrifugation method, centrifuged at 1,600g for 10 minutes at 4°C, and then centrifuged at 12,000g for 10 minutes. Immediately after separation, the plasma was stored at -80°C pending further processing.
  • Trizol LS Add Trizol LS to the plasma and vortex immediately to mix.
  • the subsequent cfRNA extraction steps are performed using the standard RNA extraction method of TRIzol LS.
  • Sequencing of cfRNA utilized whole-transcriptome sequencing of preeclampsia and non-preeclampsia plasma samples using next-generation sequencing.
  • RNA and Y RNA sequences Quality control was performed on the original cfRNA sequencing data, including cutting adapters, removing low-quality reads, removing reads ⁇ 17bp in length, removing rRNA sequences, value RNA and Y RNA sequences. Align the remaining reads to the human transcriptome (in the order of miRNA, tRNA and piRNA, mRNA and lncRNA, and finally other RNAs), and then align the remaining reads to the human genome.
  • the expression level of long RNA is corrected to TPM, the formula is as follows:
  • TPM (Ni/Li)*1000000/(sum(N1/L1+N2/L2+N3/L3+...+Nn/Ln))
  • Ni is the number of reads aligned to the i-th gene; Li is the length of the i-th gene; sum(N1/L1+N2/L2+...+Nn/Ln) is the length of all (n) genes The sum of values after normalization.
  • the pregnant women with preeclampsia and the pregnant women without preeclampsia were randomly split into a training set and a validation set at a ratio of 2:1.
  • the training set contained 21 samples of preeclampsia and 23 samples of non-preeclampsia.
  • the validation set Contains 10 preeclamptic samples and 10 non-preeclamptic samples.
  • the screening of gene markers is completed in the training set, and the verification set is used to test the prediction effect of gene markers and models. Please refer to Table 1 for the relevant data of the pregnant women group.
  • Table 1 Relevant data of the preeclampsia pregnant women group (case) and non-preeclampsia pregnant women group (control) in embodiment 1
  • Candidate gene markers were initially screened by comparing the expression profile differences between the preeclampsia and non-preeclampsia groups, and this step was implemented using the DESeq2 package (R package). For each gene, the difference and stability of the average expression level in the two groups are considered in this step (the average expression level difference is greater than 1, and the p value is less than 0.001), and finally the genes that pass the screening become candidate gene markers.
  • the generalized linear model and random forest were used to screen according to the importance of features, and the gene markers with higher frequency were selected as the best gene markers.
  • the specific sequence information of the above gene markers can be obtained according to the sequence numbers in Genbank.
  • “Up” indicates that the expression level of the corresponding gene in pregnant women with preeclampsia or related diseases is increased compared with healthy controls
  • “Down” indicates that the expression level of the corresponding gene in pregnant women with preeclampsia or related diseases is similar. decreased compared to healthy controls.
  • Example 2 Based on the 14 best gene markers finally screened out in Example 1, they were randomly combined and single gene markers EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224, The verification of the prediction effect is carried out in the above verification set. See Table 3 for gene markers or their combinations and their corresponding predictive effects.
  • Table 3 Model results of genetic markers or their combinations in the prediction of preeclampsia
  • the above-mentioned embodiment of the present invention has achieved the following technical effects: using the combination of multiple mRNA gene markers of the present invention in plasma, combined with a machine learning model, can predict preeclampsia up to 18 weeks earlier.
  • the present invention can predict the risk of preeclampsia in a non-invasive way only by taking peripheral blood from pregnant women.
  • the gene markers of the present invention can be used alone or in combination. When used alone, the predictive sensitivity and specificity of the gene markers of the present invention can reach at least 40% and up to 80%, respectively, which is higher than the predictive effect of preeclampsia using the gene markers alone in the prior art.
  • the gene markers of the present invention can achieve more than 70% prediction sensitivity and prediction specificity, the area under the receiver operating characteristic curve (AUC) reaches more than 0.92 in the training set, and more than 0.82 in the verification set, are higher than the state of the art.
  • the sensitivity of prediction can reach 80%, and specificity can reach 100%, and the area under the receiver operating characteristic curve (AUC) reaches more than 0.98 in the training set,
  • the validation set reaches above 0.93, which is much higher than the state of the art.
  • the method of the present invention can be applied to asymptomatic general pregnant women groups, regardless of whether high-risk or not, and can be predicted before symptoms appear, and the applicable population is wider, and it has more clinical applicability.
  • the prediction model of the present invention has relatively high accuracy, and is suitable for early prediction of preeclampsia in pregnant women, so as to achieve early intervention.

Abstract

The present invention provides a use of a gene marker in predicting the risk of preeclampsia in a pregnant woman. The present invention provides a gene marker or a combination thereof for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk that a pregnant woman suffers from preeclampsia or related diseases. The present invention further provides a reagent for detecting a gene marker or a combination thereof, and a method, kit and device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk that a pregnant woman suffers from preeclampsia or related diseases, and further provides a construction method for a model used for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk that a pregnant woman suffers from preeclampsia or related diseases. According to the present invention, by means of the correlation between an expression profile of the gene marker and the preeclampsia or related diseases of the pregnant woman, high-specificity and high-sensitivity risk prediction for the preeclampsia or related diseases of the pregnant woman is achieved.

Description

基因标志物在预测孕妇子痫前期风险中的应用Application of gene markers in predicting the risk of preeclampsia in pregnant women 技术领域technical field
本发明涉及子痫前期疾病领域,具体而言,涉及基因标志物在预测孕妇子痫前期或相关疾病风险中的应用。The present invention relates to the field of preeclampsia diseases, in particular to the application of gene markers in predicting the risk of preeclampsia or related diseases in pregnant women.
背景技术Background technique
子痫前期(PE)是一种与妊娠新发高血压相关的妊娠疾病,影响3–5%的怀孕 [1]。在最新的指南中,子痫前期定义为妊娠20周后孕妇出现收缩压≥140mmHg和(或)舒张压≥90mmHg,伴有下列任意1项:尿蛋白定量≥0.3g/24h,或尿蛋白/肌酐比值≥0.3,或随机尿蛋白≥(+)(无条件进行蛋白定量时的检查方法);无蛋白尿但伴有以下任何1种器官或系统受累:心、肺、肝、肾等重要器官,或血液系统、消化系统、神经系统的异常改变,胎盘‐胎儿受到累及等。FIGO指南中根据子痫前期的诊断时间和分娩时间将子痫前期分成四种类型:早发型、晚发型子痫前期,早产、足月子痫前期 [2]Preeclampsia (PE) is a pregnancy disorder associated with new-onset hypertension in pregnancy, affecting 3–5% of pregnancies [1] . In the latest guidelines, preeclampsia is defined as systolic blood pressure ≥ 140mmHg and/or diastolic blood pressure ≥ 90mmHg in pregnant women after 20 weeks of gestation, accompanied by any one of the following: urine protein quantity ≥ 0.3g/24h, or urine protein/ Creatinine ratio ≥ 0.3, or random urine protein ≥ (+) (examination method when protein quantification is unconditional); no proteinuria but accompanied by any one of the following organs or system involvement: heart, lung, liver, kidney and other important organs, Or abnormal changes in the blood system, digestive system, nervous system, placenta-fetus is affected, etc. According to FIGO guidelines, preeclampsia is divided into four types according to the time of diagnosis and delivery: early-onset, late-onset preeclampsia, premature, and term preeclampsia [2] .
截止目前,子痫前期的确切发病机理仍不清楚,也没有有效的治疗方法,终止妊娠和胎盘分娩是子痫前期的唯一有效治疗选择。在预防子痫前期的措施中,针对高风险子痫前期的孕妇人群服用阿司匹林是公认的方法 [2-5]。研究表明在妊娠≤16周服用阿司匹林可显著降低子痫前期的风险 [6,7]。如何对子痫前期或相关疾病高风险孕妇在子痫前期或相关疾病发作之前进行早期预测是迫切需要解决的问题。 So far, the exact pathogenesis of preeclampsia is still unclear, and there is no effective treatment. Termination of pregnancy and placental delivery are the only effective treatment options for preeclampsia. Among the measures to prevent preeclampsia, it is a recognized method to take aspirin for pregnant women with high risk of preeclampsia [2-5] . Studies have shown that taking aspirin at ≤16 weeks of pregnancy can significantly reduce the risk of preeclampsia [6,7] . How to make early prediction for pregnant women with high risk of preeclampsia or related diseases before the onset of preeclampsia or related diseases is an urgent problem to be solved.
已知利用一些子痫前期的风险因素,如高龄;家族病史和子痫前期病史;妊娠间隔时间;辅助生殖技术;肥胖等进行子痫前期高危人群筛查的研究,但是由于子痫前期的异质性和复杂性表现,没有已知的危险因素并不意味着子痫前期不会发生,通过母体的风险因素对子痫前期的高危人群进行预测并不准确。近年来,研究者对胎盘 [8]、血浆 [9]、蜕膜 [10]、外泌体 [11]、羊水 [12]子痫前期的机制进行了一些转录组学研究,并报道了一些与子痫前期早期预测相关的潜在的生物标志物。胎盘细胞功能障碍会导致严重的妊娠并发症,但侵入性胎盘组织采样会对孕妇和胎儿造成一定的不安全性,而通过对母体血液中的游离的循环RNA进行分析,可以无创地非侵入性发现子痫前期胎盘中绒毛外滋养层细胞的功能异常,有助于在子痫前期症状发生前对孕妇进行子痫前期早期预测。 It is known that some risk factors for preeclampsia are used, such as advanced age; family history and history of preeclampsia; pregnancy interval; assisted reproductive technology; The lack of known risk factors does not mean that preeclampsia will not occur, and it is not accurate to predict the high-risk population of preeclampsia through maternal risk factors. In recent years, researchers have conducted some transcriptomic studies on the mechanism of placenta [8] , plasma [9] , decidua [10] , exosomes [11] , amniotic fluid [12] preeclampsia, and reported some Potential biomarkers associated with early prediction of preeclampsia. Placental cell dysfunction can lead to serious pregnancy complications, but invasive placental tissue sampling will cause certain unsafety for pregnant women and fetuses, and analysis of free circulating RNA in maternal blood can be performed non-invasively The discovery of abnormal function of extravillous trophoblast cells in the preeclamptic placenta may help in the early prediction of preeclampsia in pregnant women before the onset of preeclampsia symptoms.
2008年授权的香港中文大学卢煜明的专利“作为妊娠相关病症的诊断标志物的循环mRNA”(授权号:100379882),提出通过定量分析母体血液中的一种或多种mRNA种类的含量来诊断、监测或预测妊娠妇女的子痫前期、胎儿染色体非整倍性和早产的疾病状态以及检测妇女妊娠的方法和试剂盒,所述mRNA种类包括编码人绒毛膜促性腺激素β亚基(hCG-β)、人促肾上腺皮质激素释放激素(hCRH)、人胎盘促乳素(hPL)、KiSS-1转移抑制因子(KISS1)、组织因子途径抑制剂2(TPFI2)、胎盘特异1(PLAC1)或甘油醛-3-磷酸脱氢酶(GAPDH)的mRNA。该方法第一步是定量测定所述妊娠妇女血液中的一种或多种特定mRNA种类的含 量。所述mRNA可编码hCG-β、hCRH、hPL、KISS1、TPFI2、PLAC1或GAPDH。第二步是将第一步获得的mRNA含量与代表正常非子痫前期的妇女血液中编码相同蛋白的mRNA含量的标准对照进行比较。所述mRNA水平的增加或降低表明存在子痫前期或发展为所述疾病状态的风险增加。在该方法中,收集孕妇外周血样品,提取RNA,进行实时定量RT-PCR,利用Sigma Stat 2.03软件(SPSS)进行统计学分析。该技术使用RT-PCR的方法进行预测,根据标志物的mRNA含量的增加或降低来预测子痫前期的风险,这种方法对子痫前期预测的特异性和精确性有些不足。The patent "circulating mRNA as a diagnostic marker for pregnancy-related diseases" (authorization number: 100379882) of Lo Yuk-ming of the Chinese University of Hong Kong authorized in 2008 proposes to diagnose, A method and a kit for monitoring or predicting the disease state of preeclampsia, fetal chromosomal aneuploidy and premature birth in pregnant women, and detecting pregnancy in women, said mRNA species comprising encoding human chorionic gonadotropin beta subunit (hCG-beta ), human corticotropin-releasing hormone (hCRH), human placental lactogen (hPL), KiSS-1 transfer suppressor (KISS1), tissue factor pathway inhibitor 2 (TPFI2), placenta-specific 1 (PLAC1), or glycerol Aldehyde-3-phosphate dehydrogenase (GAPDH) mRNA. The first step of the method is to quantify the level of one or more specific mRNA species in the blood of said pregnant woman. The mRNA may encode hCG-β, hCRH, hPL, KISS1, TPFI2, PLAC1 or GAPDH. The second step is to compare the mRNA level obtained in the first step with a standard control representing the level of mRNA encoding the same protein in the blood of normal non-preeclampsia women. An increase or decrease in said mRNA level is indicative of the presence of preeclampsia or an increased risk of developing said disease state. In this method, peripheral blood samples of pregnant women were collected, RNA was extracted, real-time quantitative RT-PCR was performed, and statistical analysis was performed using Sigma Stat 2.03 software (SPSS). This technology uses the RT-PCR method to predict the risk of preeclampsia based on the increase or decrease of the mRNA content of the markers. This method is somewhat insufficient for the specificity and accuracy of preeclampsia prediction.
2018年授权的专利“用sFlt-1/PlGF或内皮糖蛋白/PlGF比值来排除子痫前期在某时期内发病的手段和方法”(授权号:104412107),用于诊断怀孕受试者是否在短时间窗口内不处于子痫前期风险的方法,包括:a)测定所述受试者的样品中至少一种血管发生生物标志的量,所述血管发生生物标志选自sFlt-1、内皮糖蛋白以及P1GF;和b)将所述量与参照比较,在sFlt-1和内皮糖蛋白的情况下,如果所述量与所述参照相比是相同或减少的,并且在P1GF的情况下,如果所述量与所述参照相比是相同或增加的,则由此诊断所述受试者在短时期内不处于发展成子痫前期的风险,其中所述参照允许作出具有至少约98%的阴性预测值的诊断。该技术是基于sFlt-1/P1GF或内皮糖蛋白/P1GF比值来进行子痫前期预测,该方法只能够对短时期内是否有子痫前期风险进行预测,具有一定的局限性。The patent "Means and method for excluding the onset of preeclampsia within a certain period of time by using the ratio of sFlt-1/PlGF or endoglin/PlGF" (authorized number: 104412107) was granted in 2018, which is used to diagnose whether pregnant subjects are in A method of not being at risk of preeclampsia within a short time window, comprising: a) determining the amount of at least one angiogenic biomarker selected from the group consisting of sFlt-1, endothelial glycosides, in a sample from said subject protein and P1GF; and b) comparing said amount to a reference, in the case of sFlt-1 and Endoglin, if said amount is the same or reduced compared to said reference, and in the case of P1GF, If the amount is the same or increased compared to the reference, the subject is thus diagnosed not to be at risk of developing preeclampsia within a short period of time, wherein the reference allows for a Diagnosis with negative predictive value. This technology is based on the ratio of sFlt-1/P1GF or endoglin/P1GF to predict preeclampsia. This method can only predict whether there is a risk of preeclampsia in a short period of time, and has certain limitations.
ILLUMINA INC在2019年提交的对子痫前期具有特异性的循环RNA标识专利(申请号:201980002993.7),涉及一种在孕妇中检测子痫前期和/或确定子痫前期风险增加的方法,所述方法包括在从所述孕妇获得的生物样品中鉴定多个循环RNA(C-RNA)分子。该专利基于确诊人群进行分析,用这些C-RNA构建模型用于分类,而非预测。该技术所选取的标记物来自于对已诊断为子痫前期病例的研究,不能在症状发生前对子痫前期进行准确预测。The patent for circulating RNA markers specific to preeclampsia submitted by ILLUMINA INC in 2019 (application number: 201980002993.7) relates to a method for detecting preeclampsia and/or determining an increased risk of preeclampsia in pregnant women. The method includes identifying a plurality of circulating RNA (C-RNA) molecules in a biological sample obtained from said pregnant woman. The patent is based on the analysis of the confirmed population, and these C-RNAs are used to build models for classification, not prediction. The markers selected by this technology come from the research on cases diagnosed with preeclampsia, which cannot accurately predict preeclampsia before the onset of symptoms.
到目前为止,还没有任何一种可以对子痫前期或相关疾病进行高特异性和灵敏性预测的基因标志物。所以,迫切需要开发一种非侵入性的可以高特异性和高灵敏性地对孕妇子痫前期或相关疾病进行预测的基因标志物。So far, there is no gene marker that can predict preeclampsia or related diseases with high specificity and sensitivity. Therefore, there is an urgent need to develop a non-invasive gene marker that can predict preeclampsia or related diseases in pregnant women with high specificity and high sensitivity.
发明内容Contents of the invention
本发明的主要目的在于提供基因标志物在预测孕妇子痫前期或相关疾病风险中的应用,以提供一种对孕妇子痫前期或相关疾病风险的高特异性和高灵敏性的预测方案。The main purpose of the present invention is to provide the application of gene markers in predicting the risk of preeclampsia or related diseases in pregnant women, so as to provide a high specificity and high sensitivity prediction scheme for the risk of preeclampsia or related diseases in pregnant women.
为了实现上述目的,根据本发明的第一方面,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的基因标志物或其组合,包括以下一种或多种基因:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224。In order to achieve the above object, according to the first aspect of the present invention, there is provided a gene marker for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases Or a combination thereof, including one or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
进一步地,上述基因标志物或其组合还包括以下一种或多种基因:ATF6、ATP6AP2、FOS、RASA2。Further, the above gene markers or their combination also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
根据本发明的第二方面,提供了一种用于检测上述第一方面的基因标志物或其组合的试剂,所述试剂包括与上述第一方面的基因标志物或其组合或者它们的表达产物特异性杂交的生物分子;According to the second aspect of the present invention, there is provided a reagent for detecting the gene markers of the first aspect or their combination, the reagents include the gene markers of the first aspect or their combinations or their expression products Biomolecules that specifically hybridize;
优选地,生物分子包括选自引物、探针和抗体中的一种或多种;Preferably, the biomolecules include one or more selected from primers, probes and antibodies;
更优选地,试剂还包括将上述第一方面的基因标志物或其组合的RNA制备成高通量测序文库的相关试剂。More preferably, the reagents also include related reagents for preparing high-throughput sequencing libraries from the gene markers or combinations thereof in the first aspect above.
根据本发明的第三方面,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法,所述方法包括:According to a third aspect of the present invention, there is provided a method for detecting whether a pregnant woman suffers from preeclampsia or a related disease or predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or a related disease, the method comprising:
步骤S1:提供来源于孕妇的生物样品;Step S1: providing biological samples from pregnant women;
步骤S2:确定生物样品中上述第一方面的基因标志物或其组合的表达谱;Step S2: determining the expression profile of the above-mentioned gene markers or combinations thereof in the first aspect in the biological sample;
步骤S3:基于基因标志物或其组合的表达谱,鉴别孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者所述孕妇的子痫前期或相关疾病的预后效果。Step S3: Based on the expression profiles of gene markers or combinations thereof, identifying whether the pregnant woman is suffering from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of the preeclampsia or related diseases in said pregnant women .
优选地,在步骤S3中,鉴别孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者孕妇的子痫前期或相关疾病的预后效果是通过利用孕妇子痫前期或相关疾病风险预测模型来实施的,所述孕妇子痫前期或相关疾病风险预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇和健康对照孕妇的生物样品中所述基因标志物或其组合的表达谱训练计算机而产生。Preferably, in step S3, identifying whether the pregnant woman suffers from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of preeclampsia or related diseases in pregnant women is by using pregnant women's preeclampsia or The related disease risk prediction model is implemented, and the preeclampsia or related disease risk prediction model of pregnant women is implemented by using the genetic markers in biological samples from pregnant women who have been diagnosed with preeclampsia or related diseases and healthy control pregnant women Expression profiles of substances or combinations thereof are trained to generate a computer.
进一步地,训练计算机是通过机器学习方法来实施;Further, the training computer is implemented through machine learning methods;
优选地,机器学习方法包括以下一种或多种:广义线性模型、梯度提升机、随机森林、支持向量机;Preferably, the machine learning method comprises one or more of the following: generalized linear model, gradient boosting machine, random forest, support vector machine;
优选地,所述机器学习方法自动计算得出风险分数;Preferably, the machine learning method automatically calculates the risk score;
优选地,风险分数大于阈值表明所述孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险或者预后效果差;Preferably, a risk score greater than a threshold indicates that the pregnant woman suffers from preeclampsia or a related disease or has a risk of suffering from preeclampsia or a related disease or has a poor prognosis;
优选地,阈值为0.5;Preferably, the threshold is 0.5;
进一步地,生物样品为以下一种或多种:血浆、血清、全血、尿液、羊水;优选所述生物样品在所述孕妇第11至25孕周时采集获得。Further, the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; preferably, the biological sample is collected at the 11th to 25th week of pregnancy of the pregnant woman.
进一步地,在步骤S2中,通过对生物样品中的胞外游离RNA进行定量分析,从而确定基因标志物或其组合的表达谱;Further, in step S2, the expression profile of the gene markers or their combination is determined by quantitatively analyzing the extracellular free RNA in the biological sample;
优选地,采用高通量测序法或RT-PCR法对生物样品中的胞外游离RNA进行定量分析;Preferably, the extracellular free RNA in the biological sample is quantitatively analyzed by high-throughput sequencing or RT-PCR;
更优选地,采用高通量测序法对生物样品中的胞外游离RNA进行定量分析。More preferably, a high-throughput sequencing method is used to quantitatively analyze the extracellular free RNA in the biological sample.
根据本发明的第四方面,提供了一种试剂盒,包括本发明上述第一方面的基因标志物或其组合,和/或本发明上述第二方面的试剂。According to the fourth aspect of the present invention, a kit is provided, comprising the gene markers of the above-mentioned first aspect of the present invention or a combination thereof, and/or the reagents of the above-mentioned second aspect of the present invention.
根据本发明的第五方面,提供了本发明上述第一方面的基因标志物或其组合和/或第二方面的试剂在制备试剂盒中的应用,试剂盒用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果。According to the fifth aspect of the present invention, there is provided the application of the gene marker of the above-mentioned first aspect of the present invention or its combination and/or the reagent of the second aspect in the preparation of a kit for detecting whether a pregnant woman suffers from eclampsia Preeclampsia or related diseases or predicting the risk or prognosis of pregnant women with preeclampsia or related diseases.
根据本发明的第六方面,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的装置,所述装置内置有孕妇子痫前期或相关疾病风险预测模型,所述预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇的生物样品中本发明上述第一方面的基因标志物或其组合的表达谱训练计算机而产生。According to the sixth aspect of the present invention, there is provided a device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases. Preeclampsia or related disease risk prediction model, said prediction model is by using the expression of the gene markers or the combination thereof in the above-mentioned first aspect of the present invention in biological samples derived from pregnant women diagnosed with preeclampsia or related diseases The spectrum is generated by training the computer.
根据本发明的第七方面,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法,所述构建方法包括检测来源于子痫前期或相关疾病的孕妇群体和非子痫前期或相关疾病的孕妇群体的生物样品之间差异表达的物质的步骤,其中差异表达的物质包括本发明上述第一方面的基因标志物或其组合。According to the seventh aspect of the present invention, there is provided a method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of a pregnant woman suffering from preeclampsia or related diseases. The method includes the step of detecting a differentially expressed substance between biological samples derived from a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases, wherein the differentially expressed substance includes the above-mentioned first aspect of the present invention Gene markers or combinations thereof.
进一步地,生物样品为以下一种或多种:血浆、血清、全血、尿液、羊水;优选所述生物样品在孕妇第11至25孕周时采集获得。Further, the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; preferably, the biological sample is collected during the 11th to 25th week of gestation of a pregnant woman.
根据本发明的第八方面,提供了一种计算机可读存储介质,存储介质包括存储的程序,其中,在程序运行时控制所述存储介质所在设备执行本发明上述第四方面的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或本发明第七方面的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法。According to an eighth aspect of the present invention, there is provided a computer-readable storage medium, the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute the method for detecting pregnant women in the fourth aspect of the present invention Whether suffering from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases or the seventh aspect of the present invention for detecting whether pregnant women have preeclampsia or related diseases or predicting pregnant women Methods for modeling the risk or prognostic effect of having preeclampsia or related disorders.
根据本发明的第九方面,提供了一种处理器,处理器用于运行程序,其中,所述程序运行时执行本发明上述第四方面的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或本发明第七方面的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇子痫前期或相关疾病的风险或预后效果的模型的构建方法。According to a ninth aspect of the present invention, there is provided a processor, the processor is used to run a program, wherein, when the program is running, the fourth aspect of the present invention is executed for detecting whether a pregnant woman suffers from preeclampsia or related diseases or The method for predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or related diseases, or the method for detecting whether a pregnant woman has preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases according to the seventh aspect of the present invention How to model the effect.
根据本发明的第十方面,提供了基因标志物作为靶点用于筛选治疗或者预防孕妇子痫前期或相关疾病的药物的用途,其中所述基因标志物包括本发明上述第一方面的基因标志物或其组合。According to the tenth aspect of the present invention, there is provided the use of genetic markers as targets for screening drugs for the treatment or prevention of preeclampsia or related diseases in pregnant women, wherein the genetic markers include the genetic markers of the first aspect of the present invention substances or combinations thereof.
根据本发明的第十一方面,提供了基因标志物在检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果中的用途,其中所述基因标志物包括本发明上述第一方面的基因标志物或其组合。According to the eleventh aspect of the present invention, there is provided a use of a gene marker in detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of a pregnant woman suffering from preeclampsia or related diseases, wherein the gene The markers include the gene markers of the first aspect of the present invention or a combination thereof.
根据本发明的第十二方面,提供了一种用于治疗或预防孕妇子痫前期或相关疾病的药物,其特征在于,所述药物能够使得所述孕妇中PAAF1的表达增加;或者所述药物能够使得所述 孕妇中ATF6、ATP6AP2、EVI2B、FOS、MARCH7、MED21、NEMF、RASA2、SNX14、SRSF7、TMEM245、TRIB2、ZNF224中一种或多种基因的表达减少。According to the twelfth aspect of the present invention, there is provided a drug for treating or preventing preeclampsia or related diseases in pregnant women, characterized in that the drug can increase the expression of PAAF1 in the pregnant woman; or the drug It can reduce the expression of one or more genes among ATF6, ATP6AP2, EVI2B, FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2 and ZNF224 in the pregnant woman.
本发明针对现有技术中孕妇子痫前期或相关疾病风险的预测准确性较低的问题,提出了采用本申请的基因标志物作为检测靶标,通过基因标志物的表达谱与孕妇子痫前期或相关疾病的关联性,实现了对孕妇子痫前期或相关疾病的高特异性和高灵敏性的风险预测。The present invention aims at the problem of low prediction accuracy of the risk of preeclampsia or related diseases in pregnant women in the prior art, and proposes to use the gene markers of the application as detection targets. The association of related diseases has achieved high specificity and high sensitivity risk prediction for preeclampsia or related diseases in pregnant women.
附图说明Description of drawings
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The accompanying drawings constituting a part of the present application are used to provide a further understanding of the present invention, and the schematic embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:
图1示出了根据本发明的优选的实施例中模型预测的AUC曲线图。Fig. 1 shows the AUC curve graph predicted by the model in a preferred embodiment of the present invention.
具体实施方式Detailed ways
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将结合实施例来详细说明本发明。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present invention will be described in detail below in conjunction with examples.
如背景技术部分所提到的,目前存在着对孕妇子痫前期或相关疾病进行临床早期预测的需求。本申请基于来源于孕妇的生物样品,通过比较子痫前期或相关疾病组及非子痫前期或相关疾病组在孕早、中期(疾病确诊前)的基因表达量差异,结合机器学习算法,筛选出预测子痫前期或相关疾病风险的基因标志物,并通过构建模型实现了在孕期对子痫前期或相关疾病风险的高准确度预测。本发明的基因标志物和预测模型对于子痫前期或相关疾病风险的预测具有较高的特异性和灵敏性,可在孕早期以较高准确度发现孕妇的子痫前期或相关疾病风险,实现尽早干预。As mentioned in the background art section, there is a need for clinical early prediction of preeclampsia or related diseases in pregnant women. Based on biological samples from pregnant women, this application compares the gene expression differences between the preeclampsia or related disease group and the non-preeclampsia or related disease group in the first and second trimesters (before the diagnosis of the disease), and combines machine learning algorithms to screen Gene markers for predicting the risk of preeclampsia or related diseases have been developed, and the high accuracy prediction of the risk of preeclampsia or related diseases during pregnancy has been realized by constructing a model. The gene markers and prediction model of the present invention have high specificity and sensitivity for the prediction of the risk of preeclampsia or related diseases, and can detect the risk of preeclampsia or related diseases in pregnant women with high accuracy in the first trimester, realizing Intervene early.
在本发明的背景下,子痫前期的相关疾病包括妊娠20周后孕妇出现收缩压≥140mmHg和(或)舒张压≥90mmHg,伴有下列任意1项:尿蛋白定量≥0.3g/24h,或尿蛋白/肌酐比值≥0.3,或随机尿蛋白≥(+)(无条件进行蛋白定量时的检查方法);无蛋白尿但伴有以下任何1种器官或系统受累:心、肺、肝、肾等重要器官,或血液系统、消化系统、神经系统的异常改变,胎盘‐胎儿受到累及等。FIGO指南中根据子痫前期的诊断时间和分娩时间将子痫前期分成四种类型:早发型、晚发型子痫前期,早产、足月子痫前期。In the context of the present invention, preeclampsia-related diseases include systolic blood pressure ≥ 140 mmHg and/or diastolic blood pressure ≥ 90 mmHg in pregnant women after 20 weeks of gestation, accompanied by any one of the following: urine protein quantity ≥ 0.3 g/24h, or Urinary protein/creatinine ratio ≥ 0.3, or random urinary protein ≥ (+) (examination method when protein quantification is unconditional); no proteinuria but with involvement of any of the following organs or systems: heart, lung, liver, kidney, etc. Vital organs, or abnormal changes in the blood system, digestive system, nervous system, placenta-fetus are affected, etc. The FIGO guidelines divide preeclampsia into four types according to the time of diagnosis and delivery: early-onset, late-onset preeclampsia, premature, and term preeclampsia.
在该研究结果的基础上,申请人提出了本申请的技术方案。在一种典型的实施方式中,提供了用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的基因标志物或其组合,其包括以下一种或多种基因:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224。On the basis of the research results, the applicant proposed the technical solution of the present application. In a typical embodiment, a gene marker or a combination thereof for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis effect of pregnant women suffering from preeclampsia or related diseases is provided, which includes One or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
本申请首次发现孕妇生物样品中的基因标志物(包括EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224中的一种或多种)与孕妇子痫前期或相关疾病有着显著的相关性,因而可以作为检测孕妇是否患有子痫前期或相关疾病或者预测 孕妇子痫前期或相关疾病的标志物。在预测患病的情况下,可以采用医疗手段对患病孕妇进行干预,并可以通过本发明这些基因标志物或其组合的表达谱来监测子痫前期或相关疾病的预后效果。This application is the first to discover that gene markers in biological samples of pregnant women (including one or more of EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224) are associated with preeclampsia or related diseases in pregnant women Significant correlation, therefore, can be used as a marker for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting pregnant women with preeclampsia or related diseases. In the case of predicting the disease, medical means can be used to intervene the pregnant woman with the disease, and the prognostic effect of preeclampsia or related diseases can be monitored through the expression profile of these gene markers or their combination in the present invention.
在本发明中,上述基因标志物或其组合还包括ATF6、ATP6AP2、FOS、RASA2中的一种或多种。In the present invention, the above-mentioned gene markers or combinations thereof further include one or more of ATF6, ATP6AP2, FOS, and RASA2.
上面列出的各基因可单独或组合使用。例如,可以单独使用每一种基因标志物,或是采用它们中的任何两种或更多种进行组合。特别地,还可以采用以下全部基因的组合作为基因标志物:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224、ATF6、ATP6AP2、FOS、RASA2,从而实现子痫前期或相关疾病的风险预测。Each of the genes listed above can be used alone or in combination. For example, each genetic marker can be used alone, or any two or more of them can be used in combination. In particular, the combination of all the following genes can also be used as gene markers: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224, ATF6, ATP6AP2, FOS, RASA2, so as to achieve preeclampsia or Associated disease risk prediction.
在第二种典型的实施方式中,本发明提供了一种用于检测上面所述的基因标志物或其组合的试剂,试剂包括与上述基因标志物或其组合或者它们的表达产物特异性杂交的生物分子;优选地,生物分子包括选自引物、探针和抗体中的一种或多种;更优选地,所述试剂还包括将上述基因标志物或其组合的RNA制备成高通量测序文库的相关试剂。In a second typical embodiment, the present invention provides a reagent for detecting the above-mentioned gene markers or their combinations, the reagents include specific hybridization with the above-mentioned gene markers or their combinations or their expression products Biomolecules; preferably, the biomolecules include one or more selected from primers, probes and antibodies; more preferably, the reagents also include preparing the RNA of the above-mentioned gene markers or a combination thereof into a high-throughput Related reagents for sequencing libraries.
基因标志物或其组合的检测试剂优选包括用于检测基因标志物或其组合的探针和/或引物,具体为一种或多种特异性结合(杂交)至基因标志物的探针和/或一种或多种特异性扩增基因标志物的引物。The detection reagents for gene markers or combinations thereof preferably include probes and/or primers for detecting gene markers or combinations thereof, specifically one or more probes and/or primers that specifically bind (hybridize) to gene markers Or one or more primers that specifically amplify gene markers.
由于RNA测序通常包括产生用于测序的cDNA分子的反转录步骤,因而在采用RNA测序时,本发明的检测试剂优选包含将生物样品中的RNA转化为cDNA片段文库的试剂。Since RNA sequencing generally includes a reverse transcription step to generate cDNA molecules for sequencing, when RNA sequencing is used, the detection reagent of the present invention preferably includes a reagent for converting RNA in a biological sample into a library of cDNA fragments.
在第三种典型的实施方式中,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇子痫前期或相关疾病的风险或预后效果的方法,该方法包括:In a third exemplary embodiment, a method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women with preeclampsia or related diseases is provided, the method comprising:
步骤S1:提供来源于孕妇的生物样品;Step S1: providing biological samples from pregnant women;
步骤S2:确定生物样品中本发明上述基因标志物或其组合的表达谱;Step S2: determining the expression profile of the above-mentioned gene markers or combinations thereof of the present invention in the biological sample;
步骤S3:基于基因标志物或其组合的表达谱,鉴别孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者所述孕妇的子痫前期或相关疾病的预后效果。Step S3: Based on the expression profiles of gene markers or combinations thereof, identifying whether the pregnant woman is suffering from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the prognostic effect of the preeclampsia or related diseases in said pregnant women .
本发明的上述方法可以采用常规的与健康对照进行比较的方式来实施步骤S3的鉴别,也可以利用计算机预测模型来实施该鉴别。In the above method of the present invention, the identification in step S3 can be carried out in a conventional way of comparing with healthy controls, or can be carried out by using a computer prediction model.
在常规方式中,上述步骤S3的鉴别过程可包括将所述的表达谱与参考数据集或参考值进行比较的步骤;优选地,所述参考数据集或参考值包括来源于健康对照孕妇的生物样品中所述基因标志物或其组合的表达谱。本发明中所述参考数据集或参考值指的是对健康对照个体的样本进行操作,所获得的各基因标志物的表达谱,用来作为上述基因标志物的表达谱的参考或对照。本发明中所述参考数据集或参考值指的是健康对照的参考值或正常值。本领域技术人员已知,当样本容量足够大时,可利用本领域公知的检测和计算方法获得样品中每个基因标志物的正常值(绝对值)的范围。当采用测定方法检测生物标志物的水平时,可将样品 中的基因标志物水平的绝对值直接与参考值进行比较,以评估患病风险以及诊断或早期诊断子痫前期或相关疾病。在一种具体实施方式中,当与参考数据集或参考值比较时,PAAF1的减少表明所述孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险或者预后效果差;ATF6、ATP6AP2、EVI2B、FOS、MARCH7、MED21、NEMF、RASA2、SNX14、SRSF7、TMEM245、TRIB2、ZNF224的增加表明所述孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险或者预后效果差。In a conventional manner, the identification process of the above step S3 may include the step of comparing the expression profile with a reference data set or reference value; preferably, the reference data set or reference value includes biological samples derived from healthy control pregnant women. The expression profile of the gene markers or combination thereof in the sample. The reference data set or reference value in the present invention refers to the expression profile of each gene marker obtained by operating the samples of healthy control individuals, which is used as a reference or control for the expression profile of the above gene marker. The reference data set or reference value in the present invention refers to the reference value or normal value of healthy controls. Those skilled in the art know that when the sample volume is large enough, detection and calculation methods known in the art can be used to obtain the normal value (absolute value) range of each gene marker in the sample. When the assay method is used to detect the level of the biomarker, the absolute value of the gene marker level in the sample can be directly compared with the reference value, so as to assess the risk of disease and diagnose or early diagnose preeclampsia or related diseases. In a specific embodiment, a reduction in PAAF1 when compared to a reference data set or reference value indicates that said pregnant woman has preeclampsia or a related disorder or is at risk of having preeclampsia or a related disorder or has a poor prognosis The increase of ATF6, ATP6AP2, EVI2B, FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2, ZNF224 indicates that the pregnant woman suffers from preeclampsia or related diseases or has preeclampsia or related diseases risk or poor prognosis.
作为另一种操作模式,在本发明典型的实施方式中,在步骤S3中,鉴别所述孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者所述孕妇的子痫前期或相关疾病的预后效果是通过利用孕妇子痫前期或相关疾病风险预测模型来实施的,所述孕妇子痫前期或相关疾病风险预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇和健康对照孕妇的生物样品中所述基因标志物或其组合的表达谱训练计算机而产生。As another mode of operation, in an exemplary embodiment of the present invention, in step S3, it is identified whether the pregnant woman is suffering from preeclampsia or related diseases or is at risk of suffering from preeclampsia or related diseases or the pregnant woman is The prognostic effect of preeclampsia or related diseases is implemented by using a risk prediction model for preeclampsia or related diseases in pregnant women, which is obtained by using The expression profiles of the gene markers or combinations thereof in the biological samples of pregnant women with related diseases and healthy control pregnant women are generated by training a computer.
在本发明的一个实施方式中,在孕妇子痫前期或相关疾病风险预测模型的构建和验证过程中,需要采用训练集和验证集。所述训练集和验证集具有本领域公知的含义。在本发明的一个实施方式中,所述训练集是指包含一定样本数的子痫前期或相关疾病的已确诊患者和健康对照样本中的基因标志物表达谱的数据集合。所述验证集是用来测试训练集性能以及模型效果的独立数据集合。In one embodiment of the present invention, in the process of constructing and verifying the risk prediction model for preeclampsia or related diseases in pregnant women, a training set and a verification set need to be used. The training set and validation set have meanings known in the art. In one embodiment of the present invention, the training set refers to a data set comprising a certain number of samples of gene marker expression profiles in diagnosed patients with preeclampsia or related diseases and healthy control samples. The verification set is an independent data set used to test the performance of the training set and the effect of the model.
训练计算机通过机器学习方法来实施。机器学习方法选自回归法、分类法或其组合。“机器学习”一般表示在未明确编程的情况下,给予计算机学习能力的算法,包括从数据学习并对数据做出预测的算法。本发明所使用的机器学习方法可以包括随机森林、最小绝对收缩和选择算子逻辑回归、正则化逻辑回归、XGBoost、决策树学习、人工神经网络、深度神经网络、支持向量机、基于规则的机器学习、广义线性模型、梯度提升机等。优选机器学习方法包括以下一种或多种:广义线性模型、梯度提升机、随机森林、支持向量机。Training the computer is carried out through machine learning methods. The machine learning method is selected from regression, classification or a combination thereof. "Machine learning" generally refers to algorithms that give computers the ability to learn without being explicitly programmed, including algorithms that learn from data and make predictions about that data. The machine learning methods used in the present invention may include random forest, least absolute shrinkage and selection operator logistic regression, regularized logistic regression, XGBoost, decision tree learning, artificial neural network, deep neural network, support vector machine, rule-based machine Learning, Generalized Linear Models, Gradient Boosting Machines, etc. Preferred machine learning methods include one or more of the following: generalized linear models, gradient boosting machines, random forests, and support vector machines.
在预测模型中,可通过模型自动计算得出的风险分数,来评价和预测子痫前期或相关疾病风险或预后效果。风险分数大于阈值表明孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险或者预后效果差。例如,阈值为0.5。具体地,若风险分数大于0.5,认为孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险(或者子痫前期或相关疾病的风险高)或者预后效果差,若风险分数小于0.5,则认为孕妇未患有子痫前期或相关疾病或者不存在患有子痫前期或相关疾病的风险(或者子痫前期或相关疾病的风险低)或者预后效果好。In the prediction model, the risk score automatically calculated by the model can be used to evaluate and predict the risk or prognosis of preeclampsia or related diseases. A risk score greater than a threshold indicates that the pregnant woman has or is at risk of having preeclampsia or a related disorder or has a poor prognosis. For example, the threshold is 0.5. Specifically, if the risk score is greater than 0.5, it is considered that the pregnant woman suffers from preeclampsia or related diseases or is at risk of suffering from preeclampsia or related diseases (or the risk of preeclampsia or related diseases is high) or the prognosis is poor, if the risk A score of less than 0.5 indicates that the pregnant woman does not have preeclampsia or a related disorder or is not at risk of developing preeclampsia or a related disorder (or is at low risk of preeclampsia or a related disorder) or has a good prognosis.
来源于孕妇的生物样品可以为以下一种或多种:血浆、血清、全血、尿液、羊水。优选采用来源于孕妇的血浆、血清或全血,用于本发明的检测和鉴别步骤。该生物样品最优选为血浆,例如,可以从孕妇获取外周血并实施血浆分离,从而获得待使用的血浆生物样品。除了血浆、血清或全血,还可以使用其他体液样品,如尿液、羊水等。生物样品的获取可以采用本领域常规的方法实施。The biological sample derived from a pregnant woman can be one or more of the following: plasma, serum, whole blood, urine, amniotic fluid. Plasma, serum or whole blood derived from pregnant women are preferably used for the detection and identification steps of the present invention. The biological sample is most preferably plasma, for example, peripheral blood can be obtained from a pregnant woman and subjected to plasma separation to obtain a plasma biological sample to be used. In addition to plasma, serum or whole blood, other bodily fluid samples such as urine, amniotic fluid, etc. can also be used. Biological samples can be obtained by conventional methods in the art.
在本发明中,生物样品的采集可以在孕妇第11至25孕周(优选13至25孕周)时进行。通过采用上述特定的基因标志物作为预测因子,本发明的应用群体不必区分孕妇是否高危,可以适用于无症状的一般孕妇群体。利用上述基因标志物,本发明在孕早期就可以实现子痫前期或相关疾病的预测。本发明能够在症状出现之前(最多可以提前18周)实现预测。因此,本发明的方法适用人群更广,更具有临床应用性。In the present invention, the collection of biological samples can be carried out at the 11th to 25th gestational week (preferably 13 to 25th gestational week) of the pregnant woman. By using the above-mentioned specific gene markers as predictors, the application population of the present invention does not need to distinguish whether pregnant women are high-risk or not, and can be applied to asymptomatic general pregnant populations. Using the above gene markers, the present invention can realize the prediction of preeclampsia or related diseases in the first trimester. The present invention can realize prediction before symptoms appear (up to 18 weeks in advance). Therefore, the method of the present invention is applicable to a wider population and has more clinical applicability.
在上述方法的步骤S2中,通过对生物样品中的胞外游离RNA(cfRNA)进行定量分析,从而确定所述基因标志物或其组合的表达谱;优选地,采用高通量测序法或RT-PCR法对生物样品中的胞外游离RNA进行定量分析;更优选地,采用高通量测序法(如下一代测序法)对生物样品中的胞外游离RNA进行定量分析。In step S2 of the above method, the expression profile of the gene markers or their combination is determined by quantitatively analyzing the free extracellular RNA (cfRNA) in the biological sample; preferably, high-throughput sequencing or RT Quantitative analysis of free extracellular RNA in biological samples by PCR; more preferably, quantitative analysis of free extracellular RNA in biological samples by high-throughput sequencing (such as next-generation sequencing).
具体来说,生物样品中的胞外游离RNA可采用本领域常用的方法或试剂盒或两者组合提取获得。例如,可以使用TRIzol LS标准的RNA提取步骤,从血浆生物样品中提取胞外游离RNA。Specifically, the free extracellular RNA in the biological sample can be extracted by a method or a kit commonly used in the art or a combination of the two. For example, cell-free extracellular RNA can be isolated from plasma biological samples using TRIzol LS standard RNA extraction procedures.
在一种具体的实施方式中,对胞外游离RNA进行定量分析,包括采用胞外游离RNA的建库方法,这种方法能同时捕获血浆中长片段及短片段的RNA,为预测提供更多的特征。胞外游离RNA的测序可利用全转录组测序 [13],使用下一代测序法对孕妇生物样品(优选血浆)中的胞外游离RNA进行测序。也可以采用RT-PCR的方法进行分析。还可以采用本领域已知的其他方法如qPCR法对胞外游离RNA的表达谱进行定量分析。 In a specific embodiment, the quantitative analysis of extracellular free RNA includes the method of building a library of extracellular free RNA, which can simultaneously capture long and short fragments of RNA in plasma, providing more information for prediction. Characteristics. Sequencing of cell-free extracellular RNA can be performed using whole-transcriptome sequencing [13] , using next-generation sequencing to sequence cell-free extracellular RNA in biological samples (preferably plasma) from pregnant women. RT-PCR method can also be used for analysis. The expression profile of extracellular free RNA can also be quantitatively analyzed by other methods known in the art such as qPCR.
优选地,对胞外游离RNA进行定量分析,还包括将原始的胞外游离RNA测序数据进行质控的步骤,优选包括剪切接头,去除低质量读长,去除<17bp长度的读长,去除rRNA序列和value RNA及Y RNA序列,将剩余读长先比对到人转录组(顺序为miRNA、tRNA和piRNA,mRNA和lncRNA,最后为其他RNA),接着剩余读长比对到人基因组。Preferably, the quantitative analysis of extracellular free RNA also includes the step of quality control of the original extracellular free RNA sequencing data, preferably including cutting adapters, removing low-quality reads, removing <17bp reads, and removing rRNA sequence, value RNA and Y RNA sequence, the remaining reads are first compared to the human transcriptome (the order is miRNA, tRNA and piRNA, mRNA and lncRNA, and finally other RNAs), and then the remaining reads are compared to the human genome.
通过将本发明的基因标志物作为预测孕妇子痫前期或相关疾病风险的标志物,根据现有试剂盒的制备原则,可以制备出针对本发明所述基因标志物或其组合的预测试剂盒。还可以针对这些基因标志物,制备出用于预测孕妇子痫前期或相关疾病风险的检测探针、芯片等。By using the gene markers of the present invention as markers for predicting the risk of preeclampsia or related diseases in pregnant women, a prediction kit for the gene markers of the present invention or a combination thereof can be prepared according to the existing kit preparation principles. Detection probes, chips, etc. for predicting the risk of preeclampsia or related diseases in pregnant women can also be prepared for these gene markers.
本发明通过采用特定的基因标志物作为检测靶标,基于基因标志物的表达谱与孕妇子痫前期或相关疾病的关联性,实现了对孕妇子痫前期或相关疾病的高特异性和高灵敏性的风险预测。The present invention achieves high specificity and high sensitivity for preeclampsia or related diseases in pregnant women by using specific gene markers as detection targets, based on the correlation between the expression profile of gene markers and pregnant women's preeclampsia or related diseases risk prediction.
在第四种典型的实施方式中,本发明提供了一种试剂盒,其可包括上面所述的本发明的基因标志物或其组合。In the fourth typical embodiment, the present invention provides a kit, which may include the above-mentioned gene markers of the present invention or a combination thereof.
在一种优选的实施方式中,本发明提供了一种用于预测孕妇子痫前期或相关疾病风险或预后效果的试剂盒,该试剂盒包括基因标志物的检测试剂,基因标志物包括以下一种或多种基因:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224。采用试剂盒进行预测,使得预测更加方便、简单、快速。优选上述基因标志物还包括以下一种或多种基因:ATF6、ATP6AP2、FOS、RASA2。在试剂盒中,基因标志物的检测 试剂可包括用于检测基因标志物的探针和/或引物,具体为一种或多种特异性结合(杂交)至基因标志物的探针和/或一种或多种特异性扩增基因标志物的引物。由于RNA测序通常包括产生用于测序的cDNA分子的反转录步骤,因而在采用RNA测序时,本发明的试剂盒还可以包含将生物样品中的RNA转化为cDNA片段文库的试剂。In a preferred embodiment, the present invention provides a kit for predicting the risk or prognosis of preeclampsia or related diseases in pregnant women. The kit includes detection reagents for genetic markers, and the genetic markers include the following: One or more genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224. The kit is used for prediction, which makes the prediction more convenient, simple and fast. Preferably, the above gene markers also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2. In the kit, the detection reagents for gene markers may include probes and/or primers for detecting gene markers, specifically one or more probes and/or primers that specifically bind (hybridize) to gene markers One or more primers that specifically amplify a genetic marker. Since RNA sequencing generally includes a reverse transcription step to generate cDNA molecules for sequencing, when RNA sequencing is used, the kit of the present invention may also include reagents for converting RNA in a biological sample into a library of cDNA fragments.
在第五种典型的实施方式中,提供了上述基因标志物或其组合和/或上述检测试剂在制备试剂盒中的应用,该试剂盒用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果。In a fifth typical embodiment, the application of the above-mentioned gene markers or combinations thereof and/or the above-mentioned detection reagents in the preparation of a kit for detecting whether a pregnant woman suffers from preeclampsia or related diseases or To predict the risk or prognostic effect of pregnant women with preeclampsia or related disorders.
在第六种典型的实施方式中,本发明提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的装置,该装置内置有孕妇子痫前期或相关疾病风险预测模型,该预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇的生物样品中上述基因标志物或其组合的表达谱训练计算机而产生。在一种优选的实施方式中,该预测模型为广义线性模型、梯度提升机、随机森林或支持向量机模型。In the sixth typical embodiment, the present invention provides a device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases, the device There is a built-in risk prediction model for pregnant women with preeclampsia or related diseases, which is developed by using the expression profiles of the above gene markers or their combinations in biological samples from pregnant women who have been diagnosed with preeclampsia or related diseases. produce. In a preferred embodiment, the prediction model is a generalized linear model, a gradient boosting machine, a random forest or a support vector machine model.
在第七种典型的实施方式中,提供了一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法,该构建方法包括检测来源于子痫前期或相关疾病的孕妇群体和非子痫前期或相关疾病的孕妇群体的生物样品之间差异表达的物质的步骤,其中差异表达的物质包括本发明上述的基因标志物或其组合。In a seventh typical embodiment, a method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognosis of a pregnant woman suffering from preeclampsia or related diseases is provided, the The construction method includes the step of detecting the differentially expressed substances between the biological samples derived from the population of pregnant women with preeclampsia or related diseases and the population of pregnant women without preeclampsia or related diseases, wherein the differentially expressed substances include the above-mentioned gene markers of the present invention substances or combinations thereof.
在一种具体的实施方式中,该构建方法包括:检测来源于子痫前期或相关疾病的孕妇群体和非子痫前期或相关疾病的孕妇群体的生物样品中的基因标志物的差异表达;将部分子痫前期或相关疾病的孕妇群体和部分非子痫前期或相关疾病的孕妇群体作为训练集,利用训练集筛选出最佳基因标志物;在训练集中,利用最佳基因标志物训练计算机,从而得到孕妇子痫前期或相关疾病风险预测模型;将剩余部分的子痫前期或相关疾病的孕妇群体和剩余部分的非子痫前期或相关疾病的孕妇群体作为验证集,利用验证集验证孕妇子痫前期或相关疾病风险预测模型;其中,最佳基因标志物包括以下一种或多种基因:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224。优选上述最佳基因标志物还包括以下一种或多种基因:ATF6、ATP6AP2、FOS、RASA2。In a specific embodiment, the construction method includes: detecting the differential expression of gene markers in biological samples derived from a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases; Part of the group of pregnant women with preeclampsia or related diseases and part of the group of pregnant women without preeclampsia or related diseases are used as the training set, and the best gene markers are screened out using the training set; in the training set, the best gene markers are used to train the computer, Thus, the risk prediction model for pregnant women with preeclampsia or related diseases is obtained; the remaining part of the group of pregnant women with preeclampsia or related diseases and the remaining group of pregnant women without preeclampsia or related diseases are used as a verification set, and the verification set is used to verify the risk of pregnant women. Risk prediction model for preeclampsia or related diseases; wherein, the best genetic markers include one or more of the following genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224. Preferably, the above optimal gene markers also include one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
本发明的模型构建方法所采用的生物样品优选为以下一种或多种:血浆、血清、全血、尿液、羊水;特别优选血浆、血清、全血;最优选血浆。并且,生物样品可在孕妇第11至25孕周时采集获得。The biological sample used in the model construction method of the present invention is preferably one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; particularly preferably plasma, serum, whole blood; most preferably plasma. Also, biological samples can be collected from the 11th to 25th week of pregnancy.
本发明训练计算机时可采用机器学习方法,优选机器学习方法包括以下一种或多种:广义线性模型、梯度提升机、随机森林和支持向量机。When the present invention trains the computer, a machine learning method can be used, preferably the machine learning method includes one or more of the following: generalized linear model, gradient boosting machine, random forest and support vector machine.
在本发明的模型构建方法中,训练集和验证集可以根据需要按照一定比例进行拆分,优选地,将所有子痫前期或相关疾病的孕妇按照2:1的人数比例随机拆分为训练集和验证集,将所有非子痫前期或相关疾病的孕妇按照2:1的人数比例随机拆分为训练集和验证集。最佳基因标志物的筛选在训练集完成,验证集则用于检验最佳基因标志物及模型的预测效果。In the model construction method of the present invention, the training set and the verification set can be split according to a certain ratio according to needs, preferably, all pregnant women with preeclampsia or related diseases are randomly split into the training set according to the ratio of 2:1 And the verification set, all pregnant women without preeclampsia or related diseases were randomly split into a training set and a verification set according to the ratio of 2:1. The screening of the best gene markers is done in the training set, and the validation set is used to test the prediction effect of the best gene markers and models.
在优选的实施方式中,通过比较子痫前期或相关疾病孕妇群体和非子痫前期或相关疾病孕妇群体的基因表达谱差异来初步筛选候选的基因标志物,该步骤例如可使用DESeq2包(R软件包)实施。对于每一个基因,在两群体中的平均表达量的差异和稳定性会在该步骤中考虑(优选平均表达量差异倍数大于1,p值小于0.001),最终通过筛选的基因成为候选的基因标志物。随后,可采用两种模型根据特征重要性进行筛选,挑选出现频率较高的基因标志物作为最佳基因标志物。两种模型共同使用有利于保证特征的稳定性。优选地,利用广义线性和随机森林两种模型进行7折交叉验证,从中筛选出多个(例如30个)最重要的分子,这个步骤可迭代100次,随后挑选出频率高于50%的分子中重要性排序最高的基因标志物作为最佳基因标志物。In a preferred embodiment, the candidate gene markers are preliminarily screened by comparing the difference in gene expression profile between a group of pregnant women with preeclampsia or related diseases and a group of pregnant women without preeclampsia or related diseases. This step, for example, can use the DESeq2 package (R package) implementation. For each gene, the difference and stability of the average expression level in the two populations will be considered in this step (preferably the average expression level difference is greater than 1, and the p value is less than 0.001), and finally the genes that pass the screening become candidate gene markers things. Subsequently, the two models can be used to screen according to the importance of features, and the gene markers with higher frequency can be selected as the best gene markers. The joint use of the two models is beneficial to ensure the stability of the features. Preferably, two models of generalized linear and random forest are used to carry out 7-fold cross-validation, and a plurality (for example, 30) of the most important molecules are screened out. This step can be iterated 100 times, and then the molecules with a frequency higher than 50% are selected The gene marker with the highest ranking of importance among them was regarded as the best gene marker.
在优选的实施方式中,在训练集中,基于最终筛选出来的最佳基因标记物,采用四种机器学习方法(广义线性模型,梯度提升机、随机森林和支持向量机)进行子痫前期或相关疾病的风险预测。优选每一种方法都采用7折交叉验证的方式挑选出最优参数进行预测模型构建。形成的模型可在验证集中验证效果。In a preferred embodiment, in the training set, based on the best gene markers finally screened out, four machine learning methods (generalized linear model, gradient boosting machine, random forest and support vector machine) are used to perform preeclampsia or related Disease risk prediction. Preferably, each method uses 7-fold cross-validation to select the optimal parameters for prediction model construction. The resulting model can be validated against the validation set.
优选地,可通过验证集的效果验证,挑选出效果最优的模型并计算特征重要性。Preferably, the model with the best effect can be selected and the feature importance can be calculated through the effect verification of the verification set.
在优选的实施例中,本发明方法构建的预测模型可以在孕早期且最多可以提前18周,以及只需要采取孕妇外周血就可以用无创的方法对子痫前期或相关疾病进行风险预测,预测效果(灵敏性、特异性)高于现有技术水平。In a preferred embodiment, the prediction model constructed by the method of the present invention can be used in the first trimester and up to 18 weeks in advance, and only need to collect peripheral blood from pregnant women to use non-invasive methods to predict the risk of preeclampsia or related diseases, predict The effect (sensitivity, specificity) is higher than the state of the art.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence. Because of the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions involved are not necessarily required by the present invention.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的检测仪器等硬件设备的方式来实现。基于这样的理解,本申请的技术方案中数据处理的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the present application can be realized by means of software plus necessary detection instruments and other hardware devices. Based on this understanding, the data processing part in the technical solution of the present application can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, optical disks, etc., including several instructions. So that a computer device (which may be a personal computer, a server, or a network device, etc.) executes the methods of various embodiments or some parts of the embodiments of the present application.
本申请可用于众多通用或专用的计算系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。The application can be used in numerous general purpose or special purpose computing system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc.
显然,本领域的技术人员应该明白,上述的本申请的部分模块或步骤可以在通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。Apparently, those skilled in the art should understand that some modules or steps of the above-mentioned application can be implemented on general-purpose computing devices, and they can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices , alternatively, they can be implemented with executable program codes of the computing device, thus, they can be stored in the storage device and executed by the computing device, or they can be made into individual integrated circuit modules respectively, or the Multiple modules or steps are implemented as a single integrated circuit module. As such, the present application is not limited to any specific combination of hardware and software.
在一种优选的实施例中,提供了一种计算机可读存储介质,该存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或执行上述用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法。In a preferred embodiment, a computer-readable storage medium is provided, the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned method for detecting whether a pregnant woman suffers from preeclampsia. or related diseases or methods for predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases or performing the above methods for detecting whether pregnant women have preeclampsia or related diseases or predicting the risk of pregnant women suffering from preeclampsia or related diseases Or the construction method of the model of prognostic effect.
在一种优选的实施例中,提供了一种处理器,处理器用于运行程序,其中,程序运行时执行上述用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或执行上述用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法。In a preferred embodiment, a processor is provided, and the processor is used to run a program, wherein, when the program is running, the above-mentioned method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting that a pregnant woman has preeclampsia is performed. A method for risk or prognosis of a pregnant woman or a related disease or a method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or a related disease or predicting the risk or prognosis of a pregnant woman with a preeclampsia or a related disease.
在其他典型的实施方式中,本发明还提供了基因标志物作为靶点用于筛选治疗或者预防孕妇子痫前期或相关疾病的药物的用途,其中基因标志物包括本发明上面所述的基因标志物或其组合。In other typical embodiments, the present invention also provides the use of genetic markers as targets for screening drugs for the treatment or prevention of preeclampsia or related diseases in pregnant women, wherein the genetic markers include the genetic markers described above in the present invention substances or combinations thereof.
此外,本发明还提供了基因标志物在检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果中的用途,其中基因标志物包括本发明上面所述的基因标志物或其组合。In addition, the present invention also provides the use of genetic markers in detecting whether pregnant women suffer from preeclampsia or related diseases or predicting the risk or prognosis of pregnant women suffering from preeclampsia or related diseases, wherein the genetic markers include the above The gene markers or a combination thereof.
本发明还提供了一种用于治疗或预防孕妇子痫前期或相关疾病的药物,其特征在于,所述药物能够使得所述孕妇中PAAF1的表达增加;或者所述药物能够使得所述孕妇中ATF6、ATP6AP2、EVI2B、FOS、MARCH7、MED21、NEMF、RASA2、SNX14、SRSF7、TMEM245、TRIB2、ZNF224中一种或多种基因的表达减少。The present invention also provides a drug for treating or preventing preeclampsia or related diseases in pregnant women, characterized in that the drug can increase the expression of PAAF1 in the pregnant woman; or the drug can increase the expression of PAAF1 in the pregnant woman. Reduced expression of one or more genes among ATF6, ATP6AP2, EVI2B, FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
下面将结合具体的实施例来进一步说明本申请的有益效果。The beneficial effects of the present application will be further described below in conjunction with specific embodiments.
实施例1Example 1
(1)孕妇血浆样品的获取(1) Obtaining plasma samples from pregnant women
64例的单胎孕妇外周血从医院获取,血液收集孕周为13周至25周。包括子痫前期和非子痫前期的孕妇,分别为31和33例(孕妇群体的相关数据请参见表1)。子痫前期组的样品从采血到子痫前期诊断相差孕周为18周。所有血液样品立即存储在4℃下,并在8小时内实行血浆分离。血浆分离采用2步离心法,在4℃以1,600g转速离心10分钟,再以12,000g转速离心10分钟。血浆分离之后立即存储在-80℃等待下一步的处理。The peripheral blood of 64 cases of singleton pregnant women was obtained from the hospital, and the blood was collected from 13 to 25 weeks of gestation. There were 31 and 33 pregnant women with preeclampsia and non-preeclampsia respectively (see Table 1 for relevant data of pregnant women). In the preeclampsia group, the gestational age difference from blood collection to preeclampsia diagnosis was 18 weeks. All blood samples were immediately stored at 4°C and plasma separation was performed within 8 hours. Plasma was separated by a two-step centrifugation method, centrifuged at 1,600g for 10 minutes at 4°C, and then centrifuged at 12,000g for 10 minutes. Immediately after separation, the plasma was stored at -80°C pending further processing.
(2)胞外游离RNA(cfRNA)的提取(2) Extraction of extracellular free RNA (cfRNA)
在血浆中加入Trizol LS并立即震荡混匀,后续的cfRNA提取步骤使用TRIzol LS标准的RNA提取方法进行。Add Trizol LS to the plasma and vortex immediately to mix. The subsequent cfRNA extraction steps are performed using the standard RNA extraction method of TRIzol LS.
(3)cfRNA的测序(3) Sequencing of cfRNA
采用胞外游离RNA的建库方法。cfRNA的测序利用全转录组测序,使用下一代测序法对子痫前期和非子痫前期的血浆样品进行测序。The library construction method of extracellular free RNA was adopted. Sequencing of cfRNA utilized whole-transcriptome sequencing of preeclampsia and non-preeclampsia plasma samples using next-generation sequencing.
(4)cfRNA的表达谱定量(4) Expression profile quantification of cfRNA
将原始的cfRNA测序数据进行质控,包括剪切接头,去除低质量读长,去除<17bp长度的的读长,去除rRNA序列和value RNA及Y RNA序列。将剩余读长比对到人转录组(顺序为miRNA、tRNA和piRNA,mRNA和lncRNA,最后为其他RNA),接着剩余读长比对到人基因组。长RNA(包括mRNA和lncRNA)的表达量矫正为TPM,公式如下:Quality control was performed on the original cfRNA sequencing data, including cutting adapters, removing low-quality reads, removing reads <17bp in length, removing rRNA sequences, value RNA and Y RNA sequences. Align the remaining reads to the human transcriptome (in the order of miRNA, tRNA and piRNA, mRNA and lncRNA, and finally other RNAs), and then align the remaining reads to the human genome. The expression level of long RNA (including mRNA and lncRNA) is corrected to TPM, the formula is as follows:
TPM=(Ni/Li)*1000000/(sum(N1/L1+N2/L2+N3/L3+…+Nn/Ln))TPM=(Ni/Li)*1000000/(sum(N1/L1+N2/L2+N3/L3+…+Nn/Ln))
Ni为比对到第i个基因的读长数;Li为第i个基因的长度;sum(N1/L1+N2/L2+...+Nn/Ln)为所有(n个)基因按长度进行标准化之后数值的和。Ni is the number of reads aligned to the i-th gene; Li is the length of the i-th gene; sum(N1/L1+N2/L2+...+Nn/Ln) is the length of all (n) genes The sum of values after normalization.
(5)最佳基因标志物的筛选(5) Screening of the best gene markers
将子痫前期孕妇群体和非子痫前期孕妇群体分别按照2:1的比例随机拆分成训练集和验证集,训练集包含21个子痫前期的样本和23个非子痫前期样本,验证集包含10个子痫前期的样本和10个非子痫前期样本。基因标志物的筛选在训练集完成,验证集用于检验基因标志物及模型的预测效果。孕妇群体的相关数据请参见表1。The pregnant women with preeclampsia and the pregnant women without preeclampsia were randomly split into a training set and a validation set at a ratio of 2:1. The training set contained 21 samples of preeclampsia and 23 samples of non-preeclampsia. The validation set Contains 10 preeclamptic samples and 10 non-preeclamptic samples. The screening of gene markers is completed in the training set, and the verification set is used to test the prediction effect of gene markers and models. Please refer to Table 1 for the relevant data of the pregnant women group.
表1:实施例1中的子痫前期孕妇群体(病例)和非子痫前期孕妇群体(对照)的相关数据Table 1: Relevant data of the preeclampsia pregnant women group (case) and non-preeclampsia pregnant women group (control) in embodiment 1
Figure PCTCN2021136842-appb-000001
Figure PCTCN2021136842-appb-000001
通过比较子痫前期和非子痫前期组的表达谱差异来初步筛选候选的基因标志物,该步骤使用DESeq2包(R软件包)实现。对于每一个基因,两组中平均表达量的差异和稳定性在该步骤中加以考虑(平均表达量差异倍数大于1,p值小于0.001),最终通过筛选的基因成为候选的基因标志物。用广义线性模型和随机森林根据特征重要性进行筛选,挑选出现频率较高的基因标志物作为最佳基因标志物。Candidate gene markers were initially screened by comparing the expression profile differences between the preeclampsia and non-preeclampsia groups, and this step was implemented using the DESeq2 package (R package). For each gene, the difference and stability of the average expression level in the two groups are considered in this step (the average expression level difference is greater than 1, and the p value is less than 0.001), and finally the genes that pass the screening become candidate gene markers. The generalized linear model and random forest were used to screen according to the importance of features, and the gene markers with higher frequency were selected as the best gene markers.
本实施例筛选得到的最佳基因标志物示出在以下表2中。The best gene markers screened in this example are shown in Table 2 below.
表2:实施例1筛选得到的最佳基因标志物的基因和转录本信息Table 2: Gene and transcript information of the best gene markers screened in Example 1
Figure PCTCN2021136842-appb-000002
Figure PCTCN2021136842-appb-000002
上述基因标志物的具体序列信息可在Genbank中根据序列编号获取。在上表中,“Up”表示对应的基因在子痫前期或相关疾病孕妇中的表达量相比健康对照增加,“Down”表示对应的基因在子痫前期或相关疾病孕妇中的表达量相比健康对照减少。The specific sequence information of the above gene markers can be obtained according to the sequence numbers in Genbank. In the above table, "Up" indicates that the expression level of the corresponding gene in pregnant women with preeclampsia or related diseases is increased compared with healthy controls, and "Down" indicates that the expression level of the corresponding gene in pregnant women with preeclampsia or related diseases is similar. decreased compared to healthy controls.
(6)基于最佳基因标志物的模型构建及验证(6) Model construction and verification based on the best gene markers
在训练集中,基于最终筛选出来的最佳基因标志物,采用4种机器学习算法(广义线性模型,梯度提升机、随机森林和支持向量机)进行子痫前期的风险预测。每一种算法都采用7折交叉验证的方式挑选出最优参数进行预测模型构建。得到的模型在验证集中验证效果,并从中挑选最好的模型作为最优模型(随机森林模型)并计算特征重要性。In the training set, based on the best genetic markers screened out, four machine learning algorithms (generalized linear model, gradient boosting machine, random forest and support vector machine) were used to predict the risk of preeclampsia. Each algorithm uses 7-fold cross-validation to select the optimal parameters for prediction model construction. The obtained model is verified in the validation set, and the best model is selected as the optimal model (random forest model) and the feature importance is calculated.
(7)基因标志物对子痫前期风险的预测效果评估(7) Evaluation of the predictive effect of genetic markers on the risk of preeclampsia
通过计算特征重要性,最终产生并确定了14个最佳基因标志物。使用14个基因标志物的组合在验证集达到了最好的预测效果,灵敏性80%,特异性100%,AUC(Area under the receiver operating characteristic curve,接收者操作特征曲线面积)93%(如图1中示出的)。By calculating feature importance, 14 best gene markers were finally generated and identified. Using a combination of 14 gene markers achieved the best prediction effect in the verification set, with a sensitivity of 80%, a specificity of 100%, and an AUC (Area under the receiver operating characteristic curve) of 93% (such as shown in Figure 1).
实施例2Example 2
基于实施例1中最终筛选出来的14个最佳基因标志物,对它们进行随机组合以及对其中的单个基因标志物EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224,在上述验证集中进行预测效果的验证。基因标志物或其组合以及它们所对应的预测效果见表3。Based on the 14 best gene markers finally screened out in Example 1, they were randomly combined and single gene markers EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224, The verification of the prediction effect is carried out in the above verification set. See Table 3 for gene markers or their combinations and their corresponding predictive effects.
表3:基因标志物或其组合在子痫前期预测中的模型结果Table 3: Model results of genetic markers or their combinations in the prediction of preeclampsia
Figure PCTCN2021136842-appb-000003
Figure PCTCN2021136842-appb-000003
从以上的结果可以看出,本发明上述的实施例实现了如下技术效果:利用血浆中本发明多个mRNA基因标志物的组合,结合机器学习模型,可最高提早18周预测子痫前期。本发明只需要采取孕妇外周血就可以用无创的方法对子痫前期进行风险预测。本发明的基因标志物可以单独使用或组合使用。在单独使用的情况下,本发明的基因标志物预测灵敏性和特异性分别可至少达到40%、最高达到80%,高于现有技术单独采用基因标志物的子痫前期预测效果。本发明的基因标志物在随机组合的情况下,可以实现70%以上的预测灵敏性和预测特异性,接收器工作特性曲线下面积(AUC)在训练集达到0.92以上,验证集达到0.82以上,均高于现有技术水平。在最优实施方式下(如实施例1示出的结果),预测的灵敏性可达80%,特异性可达100%,接收器工作特性曲线下面积(AUC)在训练集达到0.98以上,验证集达到0.93以上,远远高于现有技术水平。本发明的方法可适用于无症状的一般孕妇群体,不区分是否高危,在症状出现之前就可以预测,适用人群更广,更具有临床应用性。经过数据验证,本发明的预测模型的准确性比较高,适合用于早期预测孕妇的子痫前期,从而实现尽早干预。From the above results, it can be seen that the above-mentioned embodiment of the present invention has achieved the following technical effects: using the combination of multiple mRNA gene markers of the present invention in plasma, combined with a machine learning model, can predict preeclampsia up to 18 weeks earlier. The present invention can predict the risk of preeclampsia in a non-invasive way only by taking peripheral blood from pregnant women. The gene markers of the present invention can be used alone or in combination. When used alone, the predictive sensitivity and specificity of the gene markers of the present invention can reach at least 40% and up to 80%, respectively, which is higher than the predictive effect of preeclampsia using the gene markers alone in the prior art. In the case of random combination, the gene markers of the present invention can achieve more than 70% prediction sensitivity and prediction specificity, the area under the receiver operating characteristic curve (AUC) reaches more than 0.92 in the training set, and more than 0.82 in the verification set, are higher than the state of the art. Under optimal implementation mode (as the result shown in embodiment 1), the sensitivity of prediction can reach 80%, and specificity can reach 100%, and the area under the receiver operating characteristic curve (AUC) reaches more than 0.98 in the training set, The validation set reaches above 0.93, which is much higher than the state of the art. The method of the present invention can be applied to asymptomatic general pregnant women groups, regardless of whether high-risk or not, and can be predicted before symptoms appear, and the applicable population is wider, and it has more clinical applicability. After data verification, the prediction model of the present invention has relatively high accuracy, and is suitable for early prediction of preeclampsia in pregnant women, so as to achieve early intervention.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
参考文献references
1.Mol BW,Roberts CT,Thangaratinam S,et al.Pre-eclampsia.The Lancet.2016;387:999-1011.1. Mol BW, Roberts CT, Thangaratinam S, et al. Pre-eclampsia. The Lancet. 2016; 387:999-1011.
2.Poon LC,Shennan A,Hyett JA,et al.The International Federation of Gynecology and Obstetrics(FIGO)initiative on pre-eclampsia:A pragmatic guide for first-trimester screening and prevention.Int J Gynaecol Obstet.2019;145Suppl 1:1-33.2. Poon LC, Shennan A, Hyett JA, et al. The International Federation of Gynecology and Obstetrics (FIGO) initiative on pre-eclampsia: A pragmatic guide for first-trimester screening and prevention. Int J Gynaecol Obstet. 2019; 145S uppl 1 :1-33.
3.Obstetricians ACo and Gynecologists.Gestational hypertension and preeclampsia:ACOG Practice Bulletin,number 222.Obstet Gynecol.2020;135:e237-e60.3. Obstetricians ACo and Gynecologists. Gestational hypertension and preeclampsia: ACOG Practice Bulletin, number 222. Obstet Gynecol. 2020; 135:e237-e60.
4.Brown MA,Magee LA,Kenny LC,et al.Hypertensive Disorders of Pregnancy:ISSHP Classification,Diagnosis,and Management Recommendations for International Practice.Hypertension.2018;72:24-43.4. Brown MA, Magee LA, Kenny LC, et al. Hypertensive Disorders of Pregnancy: ISSHP Classification, Diagnosis, and Management Recommendations for International Practice. Hypertension. 2018; 72:24-43.
5.Alliance NG(2019),Hypertension in pregnancy:diagnosis and management(National Institute for Health and Care Excellence(UK)).5. Alliance NG (2019), Hypertension in pregnancy: diagnosis and management (National Institute for Health and Care Excellence (UK)).
6.Villa PM,Kajantie E,
Figure PCTCN2021136842-appb-000004
K,et al.Aspirin in the prevention of pre‐eclampsia in high‐risk women:a randomised placebo‐controlled PREDO Trial and a meta‐analysis of randomised trials.BJOG:An International Journal of Obstetrics&Gynaecology.2013;120:64-74.
6. Villa PM, Kajantie E,
Figure PCTCN2021136842-appb-000004
K, et al. Aspirin in the prevention of pre‐eclampsia in high‐risk women: a randomised placebo‐controlled PREDO Trial and a meta‐analysis of randomised trials. BJOG: An International Journal of Obstetrics & Gynaecology. 2013;120:64-74 .
7.Cui Y,Zhu B,and Zheng F.Low-dose aspirin at≤16 weeks of gestation for preventing preeclampsia and its maternal and neonatal adverse outcomes:A systematic review and meta-analysis.Exp Ther Med.2018;15:4361-69.7.cui y, zhu b, and zheng f.low-dose aspirin at ≤16 weeks of gestation for preventing presentia and its maternal and neonata outcoms: A SystemsaTic Rev IW and Meta-analysis.exp Theer Med.2018; 15: 4361 -69.
8.Liu R,Wang X,and Yan Q.The regulatory network of lncRNA DLX6-AS1/miR-149-5p/ERP44 is possibly related to the progression of preeclampsia.Placenta.2020;93:34-42.8. Liu R, Wang X, and Yan Q. The regulatory network of lncRNA DLX6-AS1/miR-149-5p/ERP44 is possibly related to the progression of preeclampsia. Placenta. 2020; 93:34-42.
9.Tarca AL,Romero R,Erez O,et al.Maternal whole blood mRNA signatures identify women at risk of early preeclampsia:a longitudinal study.J Matern Fetal Neonatal Med.2020:1-12.9.Tarca AL,Romero R,Erez O,et al.Maternal whole blood mRNA signatures identify women at risk of early preeclampsia:a longitudinal study.J Maternal Fetal Neonatal Med.2020:1-12.
10.Ma K,Li N,Yan X,Zhu Y,and Zhang C.Energy deficiency caused by CTPS downregulation in decidua may contribute to pre-eclampsia by impairing decidualization.J Cell Physiol.2021.10. Ma K, Li N, Yan X, Zhu Y, and Zhang C. Energy deficiency caused by CTPS downregulation in decidua may contribute to pre-eclampsia by impairment decidualization. J Cell Physiol. 2021.
11.Matsubara K,Matsubara Y,Uchikura Y,and Sugiyama T.Pathophysiology of Preeclampsia:The Role of Exosomes.Int J Mol Sci.2021;22.11. Matsubara K, Matsubara Y, Uchikura Y, and Sugiyama T. Pathophysiology of Preeclampsia: The Role of Exosomes. Int J Mol Sci. 2021; 22.
12.Park HJ,Cho HY,and Cha DH.The Amniotic Fluid Cell-Free Transcriptome Provides Novel Information about Fetal Development and Placental Cellular Dynamics.Int J Mol Sci.2021;22.12. Park HJ, Cho HY, and Cha DH. The Amniotic Fluid Cell-Free Transcriptome Provides Novel Information about Fetal Development and Placental Cellular Dynamics. Int J Mol Sci. 2021; 22.
13.Yang X,Wang T,Zhu S,et al.PALM-Seq:integrated sequencing of cell-free long RNA and small RNA.bioRxiv.2019:686055.。13. Yang X, Wang T, Zhu S, et al. PALM-Seq: integrated sequencing of cell-free long RNA and small RNA. bioRxiv.2019:686055..

Claims (18)

  1. 一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的基因标志物或其组合,其特征在于,包括以下一种或多种基因:EVI2B、MARCH7、MED21、NEMF、PAAF1、SNX14、SRSF7、TMEM245、TRIB2、ZNF224。A gene marker or combination thereof for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases, characterized in that it includes one or more of the following Genes: EVI2B, MARCH7, MED21, NEMF, PAAF1, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
  2. 根据权利要求1所述的基因标志物或其组合,其特征在于,还包括以下一种或多种基因:ATF6、ATP6AP2、FOS、RASA2。The gene marker or combination thereof according to claim 1, further comprising one or more of the following genes: ATF6, ATP6AP2, FOS, RASA2.
  3. 一种用于检测权利要求1或2所述的基因标志物或其组合的试剂,其特征在于,所述试剂包括与权利要求1或2所述的基因标志物或其组合或者它们的表达产物特异性杂交的生物分子;A reagent for detecting the gene marker or its combination according to claim 1 or 2, characterized in that the reagent comprises the gene marker or its combination according to claim 1 or 2 or their expression products Biomolecules that specifically hybridize;
    优选地,所述生物分子包括选自引物、探针和抗体中的一种或多种;Preferably, the biomolecules include one or more selected from primers, probes and antibodies;
    更优选地,所述试剂还包括将权利要求1或2所述的基因标志物或其组合的RNA制备成高通量测序文库的相关试剂。More preferably, the reagents also include related reagents for preparing high-throughput sequencing libraries from RNAs of the gene markers or combinations thereof according to claim 1 or 2.
  4. 一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法,其特征在于,所述方法包括:A method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases, characterized in that the method comprises:
    步骤S1:提供来源于所述孕妇的生物样品;Step S1: providing a biological sample derived from the pregnant woman;
    步骤S2:确定所述生物样品中根据权利要求1或2所述的基因标志物或其组合的表达谱;Step S2: determining the expression profile of the gene markers or combinations thereof according to claim 1 or 2 in the biological sample;
    步骤S3:基于所述基因标志物或其组合的表达谱,鉴别所述孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者所述孕妇的子痫前期或相关疾病的预后效果。Step S3: Based on the expression profile of the gene markers or combinations thereof, identify whether the pregnant woman is suffering from preeclampsia or related diseases or is at risk of suffering from preeclampsia or related diseases or whether the pregnant woman is suffering from preeclampsia or related diseases prognostic effect of the disease.
  5. 根据权利要求4所述的方法,其特征在于,在步骤S3中,鉴别所述孕妇是否患有子痫前期或相关疾病或者患有子痫前期或相关疾病的风险或者所述孕妇的子痫前期或相关疾病的预后效果是通过利用孕妇子痫前期或相关疾病风险预测模型来实施的,所述孕妇子痫前期或相关疾病风险预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇和健康对照孕妇的生物样品中所述基因标志物或其组合的表达谱训练计算机而产生。The method according to claim 4, characterized in that, in step S3, identifying whether the pregnant woman suffers from preeclampsia or related diseases or the risk of suffering from preeclampsia or related diseases or the preeclampsia of the pregnant woman The prognostic effect of preeclampsia or related diseases is implemented by using the risk prediction model of preeclampsia or related diseases in pregnant women. The risk prediction model of preeclampsia or related diseases in pregnant women is implemented by using Expression profiles of the gene markers or combinations thereof in biological samples of pregnant women and healthy control pregnant women are generated by training a computer.
  6. 根据权利要求5所述的方法,其特征在于,所述训练计算机是通过机器学习方法来实施;The method according to claim 5, wherein the training computer is implemented by a machine learning method;
    优选地,所述机器学习方法包括以下一种或多种:广义线性模型、梯度提升机、随机森林、支持向量机;Preferably, the machine learning method includes one or more of the following: generalized linear model, gradient boosting machine, random forest, support vector machine;
    优选地,所述机器学习方法自动计算得出风险分数;Preferably, the machine learning method automatically calculates the risk score;
    优选地,所述风险分数大于阈值表明所述孕妇患有子痫前期或相关疾病或者存在患有子痫前期或相关疾病的风险或者预后效果差;Preferably, the risk score greater than a threshold indicates that the pregnant woman suffers from preeclampsia or related diseases or has a risk of suffering from preeclampsia or related diseases or has a poor prognosis;
    优选地,所述阈值为0.5。Preferably, the threshold is 0.5.
  7. 根据权利要求4所述的方法,其特征在于,所述生物样品为以下一种或多种:血浆、血清、全血、尿液、羊水;优选所述生物样品在所述孕妇第11至25孕周时采集获得。The method according to claim 4, wherein the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; Collected at gestational age.
  8. 根据权利要求4所述的方法,其特征在于,在步骤S2中,通过对所述生物样品中的胞外游离RNA进行定量分析,从而确定所述基因标志物或其组合的表达谱;The method according to claim 4, characterized in that, in step S2, the expression profile of the gene marker or its combination is determined by quantitatively analyzing the extracellular free RNA in the biological sample;
    优选地,采用高通量测序法或RT-PCR法对所述生物样品中的胞外游离RNA进行定量分析;Preferably, the extracellular free RNA in the biological sample is quantitatively analyzed by high-throughput sequencing or RT-PCR;
    更优选地,采用高通量测序法对所述生物样品中的胞外游离RNA进行定量分析。More preferably, a high-throughput sequencing method is used to quantitatively analyze the extracellular free RNA in the biological sample.
  9. 一种试剂盒,其特征在于,包括权利要求1或2所述的基因标志物或其组合,和/或权利要求3所述的试剂。A kit, characterized in that it comprises the gene marker or the combination thereof according to claim 1 or 2, and/or the reagent according to claim 3.
  10. 权利要求1或2所述的基因标志物或其组合和/或权利要求3所述的试剂在制备试剂盒中的应用,所述试剂盒用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果。The application of the gene marker or the combination thereof according to claim 1 or 2 and/or the reagent according to claim 3 in the preparation of a kit for detecting whether a pregnant woman suffers from preeclampsia or related diseases or To predict the risk or prognostic effect of pregnant women with preeclampsia or related disorders.
  11. 一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的装置,其特征在于,所述装置内置有孕妇子痫前期或相关疾病风险预测模型,所述预测模型是通过利用来源于已确诊患有子痫前期或相关疾病的孕妇的生物样品中权利要求1或2所述的基因标志物或其组合的表达谱训练计算机而产生。A device for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases, characterized in that the device has built-in preeclampsia or related diseases in pregnant women A risk prediction model, the prediction model is produced by using the expression profile of the gene markers or combinations thereof according to claim 1 or 2 in biological samples derived from pregnant women who have been diagnosed with preeclampsia or related diseases. .
  12. 一种用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法,其特征在于,所述构建方法包括检测来源于子痫前期或相关疾病的孕妇群体和非子痫前期或相关疾病的孕妇群体的生物样品之间差异表达的物质的步骤,其中所述差异表达的物质包括权利要求1或2所述的基因标志物或其组合。A method for constructing a model for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases, characterized in that the construction method includes detecting The step of the differentially expressed substance between the biological samples of the group of pregnant women with preeclampsia or related diseases and the group of pregnant women without preeclampsia or related diseases, wherein the differentially expressed substance includes the gene markers described in claim 1 or 2 or a combination thereof.
  13. 根据权利要求12所述的构建方法,其特征在于,所述生物样品为以下一种或多种:血浆、血清、全血、尿液、羊水;优选所述生物样品在孕妇第11至25孕周时采集获得。The construction method according to claim 12, wherein the biological sample is one or more of the following: plasma, serum, whole blood, urine, amniotic fluid; Collected weekly.
  14. 一种计算机可读存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求4至8中任一项所述的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或权利要求12至13中任一项所述的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的模型的构建方法。A computer-readable storage medium, characterized in that the storage medium includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the user described in any one of claims 4 to 8 The method for detecting whether a pregnant woman suffers from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases or any one of claims 12 to 13 for detecting whether a pregnant woman has Preeclampsia or related diseases or a method for constructing a model for predicting the risk or prognostic effect of a pregnant woman suffering from preeclampsia or related diseases.
  15. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求4至8中任一项所述的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果的方法或权利要求12至13中任一项所述的用于检测孕妇是否患有子痫前期或相关疾病或者预测孕妇子痫前期或相关疾病的风险或预后效果的模型的构建方法。A processor, characterized in that the processor is used to run a program, wherein, when the program runs, it executes the method for detecting whether a pregnant woman suffers from preeclampsia or related diseases according to any one of claims 4 to 8 Or the method for predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases or any one of claims 12 to 13 for detecting whether pregnant women suffer from preeclampsia or related diseases or predicting pregnant women's preeclampsia A method for constructing a model of the risk or prognostic effect of a disease or related disease.
  16. 基因标志物作为靶点用于筛选治疗或者预防孕妇子痫前期或相关疾病的药物的用途,其中所述基因标志物包括权利要求1或2所述的基因标志物或其组合。The use of gene markers as targets for screening drugs for the treatment or prevention of preeclampsia or related diseases in pregnant women, wherein the gene markers include the gene markers described in claim 1 or 2 or combinations thereof.
  17. 基因标志物在检测孕妇是否患有子痫前期或相关疾病或者预测孕妇患有子痫前期或相关疾病的风险或预后效果中的用途,其中所述基因标志物包括权利要求1或2所述的基因标志物或其组合。The use of gene markers in detecting whether pregnant women suffer from preeclampsia or related diseases or predicting the risk or prognostic effect of pregnant women suffering from preeclampsia or related diseases, wherein the gene markers include the ones described in claim 1 or 2 Gene markers or combinations thereof.
  18. 一种用于治疗或预防孕妇子痫前期或相关疾病的药物,其特征在于,所述药物能够使得所述孕妇中PAAF1的表达增加;或者所述药物能够使得所述孕妇中ATF6、ATP6AP2、EVI2B、FOS、MARCH7、MED21、NEMF、RASA2、SNX14、SRSF7、TMEM245、TRIB2、ZNF224中一种或多种基因的表达减少。A medicine for treating or preventing preeclampsia or related diseases in pregnant women, characterized in that the medicine can increase the expression of PAAF1 in the pregnant woman; or the medicine can increase the expression of ATF6, ATP6AP2, EVI2B in the pregnant woman Reduced expression of one or more genes among , FOS, MARCH7, MED21, NEMF, RASA2, SNX14, SRSF7, TMEM245, TRIB2, ZNF224.
PCT/CN2021/136842 2021-12-09 2021-12-09 Use of gene marker in predicting risk of preeclampsia in pregnant woman WO2023102840A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180102282.4A CN117940583A (en) 2021-12-09 2021-12-09 Application of gene marker in prediction of preeclampsia risk of pregnant women
PCT/CN2021/136842 WO2023102840A1 (en) 2021-12-09 2021-12-09 Use of gene marker in predicting risk of preeclampsia in pregnant woman

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/136842 WO2023102840A1 (en) 2021-12-09 2021-12-09 Use of gene marker in predicting risk of preeclampsia in pregnant woman

Publications (1)

Publication Number Publication Date
WO2023102840A1 true WO2023102840A1 (en) 2023-06-15

Family

ID=86729306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136842 WO2023102840A1 (en) 2021-12-09 2021-12-09 Use of gene marker in predicting risk of preeclampsia in pregnant woman

Country Status (2)

Country Link
CN (1) CN117940583A (en)
WO (1) WO2023102840A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120283125A1 (en) * 2009-11-12 2012-11-08 UNIVERSITé LAVAL Ovarian Markers of Oocyte Competency and Uses Thereof
US20140107991A1 (en) * 2012-10-17 2014-04-17 Michael Elashoff Systems and methods for determining the probability of a pregnancy at a selected point in time
US20150031616A1 (en) * 2013-07-25 2015-01-29 University Of Florida Research Foundation, Inc. Use of relaxin to treat placental syndromes
US20150368714A1 (en) * 2013-03-07 2015-12-24 Juneau Biosciences, Llc Method of Testing for Endometriosis and Treatment Therefor
US20160251718A1 (en) * 2013-10-01 2016-09-01 The Regents Of The University Of California Endometriosis Classifier

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120283125A1 (en) * 2009-11-12 2012-11-08 UNIVERSITé LAVAL Ovarian Markers of Oocyte Competency and Uses Thereof
US20140107991A1 (en) * 2012-10-17 2014-04-17 Michael Elashoff Systems and methods for determining the probability of a pregnancy at a selected point in time
US20150368714A1 (en) * 2013-03-07 2015-12-24 Juneau Biosciences, Llc Method of Testing for Endometriosis and Treatment Therefor
US20150031616A1 (en) * 2013-07-25 2015-01-29 University Of Florida Research Foundation, Inc. Use of relaxin to treat placental syndromes
US20160251718A1 (en) * 2013-10-01 2016-09-01 The Regents Of The University Of California Endometriosis Classifier

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, RANRAN: "Analysis of Pregnancy Outcomes and Pathogenesis of Preeclampsia", MASTER'S THESIS, no. 1, 1 March 2020 (2020-03-01), CN, pages 1 - 71, XP009546486, DOI: 10.27374/d.cnki.gwnyy.2020.000240 *
ZHAO MEI, LI LIN, YANG XIUMEI, CUI JIANYING, LI HONG: "FN1, FOS, and ITGA5 induce preeclampsia: Abnormal expression and methylation.", HYPERTENSION IN PREGNANCY, vol. 36, no. 4, 19 October 2017 (2017-10-19), pages 302 - 309, XP009546321, ISSN: 1064-1955, DOI: 10.1080/10641955.2017.1385795 *

Also Published As

Publication number Publication date
CN117940583A (en) 2024-04-26

Similar Documents

Publication Publication Date Title
Martin et al. Can the quantity of cell‐free fetal DNA predict preeclampsia: a systematic review
Tarca et al. Maternal whole blood mRNA signatures identify women at risk of early preeclampsia: a longitudinal study
Gerson et al. Low fetal fraction of cell-free DNA predicts placental dysfunction and hypertensive disease in pregnancy
Asgharnia et al. Maternal serum uric acid level and maternal and neonatal complications in preeclamptic women: A cross-sectional study
Morano et al. Cell-free DNA (cfDNA) fetal fraction in early-and late-onset fetal growth restriction
JP2019518225A (en) Method and composition for predicting preterm birth
CN112513633A (en) Circulating biomarkers for placental or fetal health
CN111094988A (en) Pre-eclampsia biomarkers and related systems and methods
Camunas-Soler et al. Predictive RNA profiles for early and very early spontaneous preterm birth
Guo et al. Association between fetal fraction at the second trimester and subsequent spontaneous preterm birth
Xu et al. Non‐invasive prediction of fetal growth restriction by whole‐genome promoter profiling of maternal plasma DNA: a nested case–control study
Han et al. Potential biomarkers for late-onset and term preeclampsia: A scoping review
Naumovic et al. Application of artificial neural networks in estimating predictive factors and therapeutic efficacy in idiopathic membranous nephropathy
Zanello et al. Circulating mRNA for the PLAC1 gene as a second trimester marker (14-18 weeks' gestation) in the screening for late preeclampsia
WO2023102840A1 (en) Use of gene marker in predicting risk of preeclampsia in pregnant woman
Eiben et al. Clinical experience with noninvasive prenatal testing in Germany: Analysis of over 500 high-risk cases for trisomy 21, 18, 13 and monosomy X
US20140127703A1 (en) Method for Diagnosing Preeclampsia
US11255861B2 (en) Method for determining the risk of preterm birth
WO2023102786A1 (en) Application of gene marker in prediction of premature birth risk of pregnant woman
CN118028446A (en) Detection marker and application thereof
WO2023142311A1 (en) Model for predicting tumor tissue source during pregnancy by utilizing plasma free dna and construction method of model
Wilstrup et al. Symbolic regression analysis of interactions between first trimester maternal serum adipokines in pregnancies which develop pre-eclampsia
Han Olivia J. Holland, o. holland@ griffith. edu. au SPECIALTY SECTION This article was submitted to Developmental Physiology
US10802030B2 (en) Systems and methods to predict risk for preterm labor and/or preterm birth
CN116287175A (en) Application of marker in preparation of related products for predicting intrahepatic cholestasis in gestation period

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966762

Country of ref document: EP

Kind code of ref document: A1