WO2022082436A1 - 确定孕妇的孕期状态的方法 - Google Patents
确定孕妇的孕期状态的方法 Download PDFInfo
- Publication number
- WO2022082436A1 WO2022082436A1 PCT/CN2020/122214 CN2020122214W WO2022082436A1 WO 2022082436 A1 WO2022082436 A1 WO 2022082436A1 CN 2020122214 W CN2020122214 W CN 2020122214W WO 2022082436 A1 WO2022082436 A1 WO 2022082436A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pregnant woman
- preterm birth
- gene
- genes
- samples
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 230000035935 pregnancy Effects 0.000 title claims abstract description 76
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 186
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 77
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 77
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 77
- 238000012549 training Methods 0.000 claims abstract description 44
- 238000012163 sequencing technique Methods 0.000 claims abstract description 43
- 238000012360 testing method Methods 0.000 claims abstract description 38
- 210000005259 peripheral blood Anatomy 0.000 claims abstract description 28
- 239000011886 peripheral blood Substances 0.000 claims abstract description 28
- 208000005107 Premature Birth Diseases 0.000 claims description 118
- 108700009124 Transcription Initiation Site Proteins 0.000 claims description 48
- 230000014509 gene expression Effects 0.000 claims description 39
- 238000007637 random forest analysis Methods 0.000 claims description 19
- 238000001514 detection method Methods 0.000 claims description 17
- 238000011144 upstream manufacturing Methods 0.000 claims description 17
- 238000000513 principal component analysis Methods 0.000 claims description 16
- 239000012634 fragment Substances 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 6
- 230000002028 premature Effects 0.000 claims description 3
- 230000035897 transcription Effects 0.000 claims description 3
- 238000013518 transcription Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 230000005026 transcription initiation Effects 0.000 claims 1
- 230000001605 fetal effect Effects 0.000 abstract description 13
- 230000000875 corresponding effect Effects 0.000 description 16
- 230000008774 maternal effect Effects 0.000 description 10
- 201000011461 pre-eclampsia Diseases 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 8
- 210000004369 blood Anatomy 0.000 description 7
- 239000008280 blood Substances 0.000 description 7
- 206010036590 Premature baby Diseases 0.000 description 6
- 238000010241 blood sampling Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000009826 distribution Methods 0.000 description 6
- 208000012113 pregnancy disease Diseases 0.000 description 6
- 206010028980 Neoplasm Diseases 0.000 description 5
- 102000016359 Fibronectins Human genes 0.000 description 4
- 108010067306 Fibronectins Proteins 0.000 description 4
- 208000002787 Pregnancy Complications Diseases 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 108010047956 Nucleosomes Proteins 0.000 description 3
- 208000035010 Term birth Diseases 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 210000001623 nucleosome Anatomy 0.000 description 3
- 238000003793 prenatal diagnosis Methods 0.000 description 3
- 206010061452 Complication of pregnancy Diseases 0.000 description 2
- 208000001362 Fetal Growth Retardation Diseases 0.000 description 2
- 206010070531 Foetal growth restriction Diseases 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 208000026350 Inborn Genetic disease Diseases 0.000 description 2
- 208000024556 Mendelian disease Diseases 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 2
- 241000283973 Oryctolagus cuniculus Species 0.000 description 2
- 241000700159 Rattus Species 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000007877 drug screening Methods 0.000 description 2
- 208000030941 fetal growth restriction Diseases 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 208000011580 syndromic disease Diseases 0.000 description 2
- 230000036266 weeks of gestation Effects 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 102100031126 6-phosphogluconolactonase Human genes 0.000 description 1
- 102100025683 Alkaline phosphatase, tissue-nonspecific isozyme Human genes 0.000 description 1
- 201000010374 Down Syndrome Diseases 0.000 description 1
- 101001066181 Homo sapiens 6-phosphogluconolactonase Proteins 0.000 description 1
- 101000574445 Homo sapiens Alkaline phosphatase, tissue-nonspecific isozyme Proteins 0.000 description 1
- 101000882217 Homo sapiens Protein FAM50A Proteins 0.000 description 1
- 208000020584 Polyploidy Diseases 0.000 description 1
- 102100038926 Protein FAM50A Human genes 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 108091062489 miR-514a-1 stem-loop Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
- 210000001215 vagina Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6881—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for tissue or cell typing, e.g. human leukocyte antigen [HLA] probes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
Definitions
- the present invention relates to the field of biotechnology, in particular to a method and device for determining the pregnancy state of a pregnant woman and a corresponding method and device for constructing a machine learning prediction model.
- Non-invasive prenatal diagnosis based on maternal plasma cfDNA data has gradually become one of the important screening methods for fetal trisomy 21 syndrome. one.
- most applications based on maternal plasma cfDNA data focus on the detection of fetal polyploidy and copy number variation. So far, there is no effective method to detect pregnancy syndrome based on maternal cfDNA.
- the clinically used fetal fibronectin molecular-assisted method for diagnosing premature birth has the problem of too high false positives.
- Statistics show that among pregnant women diagnosed with fetal fibronectin molecularly positive, only less than 3% of the samples are finally diagnosed. In the case of preterm birth, the problem of too high false positives makes this diagnostic method questionable.
- the present invention proposes a method for constructing a predictive model for predicting the pregnancy state of a pregnant woman.
- the method includes: (1) constructing a training set and an optional test set, wherein the training set and the optional test set are composed of a plurality of samples of pregnant women, and the samples of pregnant women have known Pregnancy status; (2) for each of the pregnant women samples in the training set, determine the predetermined parameters of the pregnant women samples, the predetermined parameters include the differentially expressed gene information of the free nucleic acids in the peripheral blood of the pregnant women samples, the The differentially expressed gene information is obtained by calculating the sequencing information of the cell-free nucleic acid in the peripheral blood of the pregnant woman sample; and (3) building the prediction model based on the known pregnancy state and the predetermined parameter.
- the method according to the embodiment of the present invention utilizes the differentially expressed gene information of cell-free nucleic acids obtained by a single blood collection of multiple pregnant women samples and the pregnancy state of the pregnant woman (eg, preterm birth, gestational age of delivery) to construct a prediction model for the pregnancy state of the pregnant woman.
- the method according to the embodiment of the present invention utilizes the differentially expressed genes of free nucleic acid in the peripheral blood of pregnant women to predict the state of pregnancy. For different states of pregnancy, such as preterm birth, preeclampsia, etc., different differentially expressed genes can be detected.
- the corresponding differential genes are selectively selected, thereby improving the accuracy of the model prediction, and the prediction model can be constructed by only one blood sampling and sequencing of the pregnant woman, and the method is convenient, fast and highly accurate.
- the method is suitable for humans and other animals, such as mice, rats, rabbits, etc., and is convenient for scientific research on the treatment mechanism of pregnancy disorders, the treatment mechanism of hereditary diseases, and drug screening.
- the present invention proposes a system for constructing a predictive model for determining the pregnancy state of a pregnant woman.
- the system includes: a training set building module, the training set and optional test set are composed of a plurality of pregnant women samples, the pregnant women samples have known pregnancy states; a predetermined parameter determination module, The predetermined parameter determination module is connected to the training set construction module, and for each of the pregnant women samples in the training set, determines the predetermined parameters of the pregnant women samples, the predetermined parameters including the free nucleic acid in the pregnant women samples.
- the differentially expressed gene information is obtained by calculating the sequencing information of the free nucleic acid of the peripheral blood of the pregnant woman sample; and a prediction model building module, the prediction model building module is connected with the predetermined parameter determination module, based on the The prediction model is constructed based on the known pregnancy state and the predetermined parameters.
- the system according to the embodiment of the present invention is suitable for carrying out the aforementioned method for constructing a prediction model, using differentially expressed genes of free nucleic acid in the peripheral blood of pregnant women to predict the state of pregnancy, for different states of pregnancy, such as preterm birth, preeclampsia, etc. Different differentially expressed genes can be detected, and the corresponding differential genes can be selected in a targeted manner, thereby improving the accuracy of model prediction, and the prediction model can be constructed by only one blood sampling and sequencing of pregnant women.
- the present invention provides a method for determining the pregnancy status of a pregnant woman.
- the method includes: (1) determining a predetermined parameter of the pregnant woman, where the predetermined parameter includes the expression prediction information of the preterm birth-related gene of the pregnant woman, the expression prediction information of the preterm birth-related gene Obtained by calculating the sequencing information of the free nucleic acid in the peripheral blood of the pregnant woman; and (2) based on the predetermined parameters and the prediction model, the prediction model is obtained by the method proposed in the first aspect of the present invention or in the second aspect of the present invention.
- the system of the proposed aspect is constructed to determine the gestational status of the pregnant woman.
- the method according to the embodiment of the present invention can realize the prediction of the pregnancy status after blood sampling of the pregnant woman to be detected, and the pregnancy status includes the probability of premature birth, intrauterine growth retardation, preeclampsia, and other correlations with the free nucleic acid in the plasma of the pregnant woman
- the method is simple and easy to implement, does not affect the life of pregnant women, has accurate detection and simple operation.
- the method is suitable for humans and other animals, such as mice, rats, rabbits, etc., and it is convenient to use the present invention to conduct scientific research on the treatment mechanism of pregnancy disorders, the treatment mechanism of hereditary diseases, and drug screening.
- the present invention provides a device for determining the pregnancy state of a pregnant woman.
- the device includes: a parameter determination module configured to determine a predetermined parameter of the pregnant woman, the predetermined parameter including the expression prediction information of the preterm birth-related gene of the pregnant woman, the expression prediction of the preterm birth-related gene The information is obtained by calculating the sequencing information of free nucleic acid in the peripheral blood of the pregnant woman; and a pregnancy state determination module, the pregnancy state determination module is connected with the parameter determination module, and based on the predetermined parameters and a prediction model, the prediction model is The gestational state of the pregnant woman is determined by the method proposed in the first aspect of the present invention or the system constructed in the second aspect of the present invention.
- the device according to the embodiment of the present invention is suitable for executing the aforementioned method for determining the pregnancy state of a pregnant woman, and can realize that the pregnancy state can be predicted by taking blood from the pregnant woman to be detected once, and the pregnancy state includes the probability of premature birth, intrauterine growth retardation , preeclampsia, and other pregnancy complications associated with free nucleic acid in maternal plasma.
- the present invention provides a computer-readable storage medium on which a computer program is stored.
- the program is executed by a processor to implement the aforementioned steps for building a predictive model.
- the aforementioned method for constructing a predictive model can be effectively implemented, so that a predictive model can be effectively constructed, and further, the predictive model can be used to predict unknown samples to determine the pregnancy status of the pregnant woman to be detected.
- the present invention provides an electronic device, the device comprising the aforementioned computer-readable storage medium; and one or more processors for executing a program in the computer-readable storage medium .
- FIG. 1 is a schematic flowchart of a method for constructing a prediction model according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of obtaining differentially expressed gene information according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a method for converting the ends of sequencing reads into the ends of cfDNA original fragments from the original alignment result according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of a system for constructing a prediction model according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram of a predetermined parameter determination module according to an embodiment of the present invention.
- FIG. 6 is a schematic flowchart of a method for determining a pregnancy state of a pregnant woman according to an embodiment of the present invention
- FIG. 7 is a schematic diagram of a device for determining a pregnancy state of a pregnant woman according to an embodiment of the present invention.
- FIG. 8 is a sample screening process for a training set and a test set of a preterm birth prediction model according to an embodiment of the present invention
- FIG. 9 is a schematic diagram showing the classification results of preterm and term samples in the test data set by the PCA training model according to an embodiment of the present invention.
- first”, “second”, “third” and other similar terms used in this document are for the purpose of distinguishing between description and convenience, and do not imply or expressly differ from each other for any purpose. There are differences in order or importance between them, and it does not mean that the content defined by “first”, “second”, “third” and other similar terms is composed of only one component.
- the present invention proposes a method for constructing a prediction model, which is used to predict the pregnancy state of a pregnant woman.
- the method includes: S100 , constructing a training set and an optional test set, wherein the training set and the optional test set are composed of a plurality of pregnant women samples, and the pregnant women samples have Known pregnancy state; S200, for each of the pregnant women samples in the training set, determine predetermined parameters of the pregnant women samples, the predetermined parameters including the differentially expressed gene information of the free nucleic acids in the peripheral blood of the pregnant women samples, The differentially expressed gene information is obtained by calculating the sequencing information of cell-free nucleic acids in the peripheral blood of the pregnant woman sample; and S300 , constructing the prediction model based on the known pregnancy state and the predetermined parameter.
- the method uses the differentially expressed gene information of cell-free nucleic acids obtained from multiple samples of pregnant women in one blood collection and the pregnancy status of the pregnant woman (eg, preterm birth, gestational age of delivery) to construct a prediction model for the pregnancy status of the pregnant woman.
- Pregnant women with known pregnancy status (such as premature birth or full-term birth) are selected as the training set or verification set, the training set is used for model construction and correlation coefficient adjustment, and the verification set is used for model accuracy verification.
- the predetermined parameters of the required pregnant women samples are input into the model to obtain the prediction results, and the prediction results are compared with the pregnancy status of the corresponding pregnant women samples in the test set, so as to verify the accuracy of the model.
- the peripheral blood of pregnant women's samples is drawn to obtain free nucleic acids in the peripheral blood of pregnant women's samples, the free nucleic acids are sequenced to obtain the sequence information of the free nucleic acids, and then the sequence information of the free nucleic acids is calculated and compared to obtain the full-term maternal
- the information of differentially expressed genes with cell-free nucleic acid in plasma of premature mothers was used to construct a model based on this information.
- the free nucleic acid in the plasma of the pregnant woman is obtained by extracting the peripheral blood of the pregnant woman, and the trauma to the pregnant woman is small, and other methods can also be used to obtain the free nucleic acid in the plasma of the pregnant woman.
- the free nucleic acid in maternal plasma includes the pregnant woman's own free nucleic acid, as well as fetal free nucleic acid. It should be noted that the fetal free nucleic acid concentration can also be used as a predetermined parameter to construct a prediction model.
- the gestational state includes the delivery zone of the pregnant woman.
- the delivery interval is not greater than the normal pregnancy period, which means premature birth.
- the method can also be applied to other pregnancy complications related to nucleic acid expression, such as pregnancy tumor and preeclampsia.
- the samples of pregnant women include samples of preterm pregnant women and samples of full-term pregnant women.
- a plurality of samples of preterm pregnant women and samples of full-term pregnant women are respectively selected as training sets and test sets, so as to construct a preterm birth prediction model according to the differentially expressed genes of cell-free nucleic acids in the plasma of samples of preterm pregnant women and full-term pregnant women, respectively .
- the differentially expressed genes in cell-free nucleic acids are also different; for different sample numbers, the differentially expressed genes in cell-free nucleic acids are also different.
- the corresponding pregnant women with preeclampsia or pregnant women with tumors during pregnancy and normal pregnant women were selected as samples, and their cell-free nucleic acid differentially expressed genes were obtained to construct corresponding prediction models.
- expanding the sample size will help to obtain more accurate differentially expressed genes, so as to build more accurate prediction models.
- the sampling gestational age is 15-22 weeks.
- the inventors have found that the differentially expressed genes of free nucleic acid in the plasma of pregnant women are strongly correlated with preterm birth when the blood collection is between 15 and 22 weeks of gestation. Constructing the sample avoids the risk and cost of repeated blood collection for pregnant women's samples during the sample collection process.
- the obtained cell-free nucleic acid differentially expressed genes are different, and for different types of predictions, the optimal blood sampling gestational weeks are also different.
- the prediction model is at least one of principal component analysis and random forest. According to the method of the embodiment of the present invention, the prediction model is principal component analysis or random forest.
- the prediction model is not limited to the principal component analysis model and the random forest prediction model, and any statistical model that can generalize different distributions of differences can be applied.
- the differentially expressed gene information is obtained by the following steps: S210 , using the coverage depth of the sequencing reads of the free nucleic acid in the peripheral blood of the pregnant woman sample at the gene transcription initiation site to predict the full coverage depth Genome-wide gene expression situation; S220, for the vicinity of each gene transcription initiation site, the cell-free nucleic acid on each base site in the preterm pregnant woman sample and the full-term pregnant woman sample and S230, using the significance test, select significantly differentially expressed genes as preterm birth-related genes, that is, select genes with p ⁇ 0.05/(total number of genes) as preterm birth-related genes, so as to construct sequencing Model.
- differentially expressed genes are related to the number of samples of pregnant women, the depth of sequencing, and the type of prediction (ie, the purpose of prediction). express genes. The selection of differentially expressed genes is mainly based on the sequencing results and detection purposes when the model is constructed. It can be a single gene or a combination of multiple genes.
- the alignment to the positive strand and The number of ends of negative-strand reads was converted to the number of ends of the original cfDNA fragments, see Figure 3.
- the sum of the ends of the cfDNA fragments corresponding to each base site covered is the read start count (RSC) of the site.
- RSC read start count
- Differential (p ⁇ 0.05/total number of genes) genes were used as preterm birth-related genes for the construction of subsequent prediction models.
- other calculation methods can also be used to predict the distribution of nucleosomes, and then use the nucleosome distribution information corresponding to free nucleic acids in maternal plasma to determine differentially expressed genes.
- window protection score window protection score
- relative coverage relative coverage
- the peak positions peak calls
- Relative coverage For paired-end sequencing cfDNA data, the middle part of each pair of paired sequencing reads can be directly complemented. In a uniform way, the coverage depth of the original cfDNA fragments at each site on the genome is calculated, that is, the relative coverage, and finally the relative coverage is corresponding to the gene expression, so as to analyze the phenotype related to gene expression.
- the nearby region is within the range of 100-1000 bases each of the transcription initiation site and upstream and downstream of the transcription initiation site. ⁇ , ⁇ 100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,290,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000 ⁇
- the S300 further includes: S310, compare the number of samples of pregnant women, the pregnancy status of the samples of pregnant women, the number of preterm birth-related genes, the number of The fragment length of the nearby region and the coverage depth of the free nucleic acid on each base site of the region near the transcription start site of the preterm birth-related gene are used as inputs to construct a prediction model.
- a preterm birth prediction model is constructed based on the sequencing data of cell-free nucleic acid (cfDNA) in the plasma of pregnant women.
- the off-machine data fq format
- use the alignment software such as the samse mode in BWA
- use the sequencing data quality control software such as Picard
- uses the variant detection algorithm such as the base quality value correction BQSR function in GATK
- each base site will be aligned to the positive strand
- the number of ends of reads of the minus strand was converted to the number of ends of the original fragment of cfDNA.
- the sum of the ends of the corresponding cfDNA fragments covered at each base site is the read start count (RSC) of the site;
- RSC read start count
- the RSC value at each base locus of preterm and full-term samples was tested for significance (general statistical monitoring methods such as rank sum test or T test are acceptable), and significant differences were selected (p ⁇ 0.05/total number of genes) ) gene as a preterm birth-related gene for the construction of a subsequent prediction model;
- the RSC result matrix at different base sites in the TSSs region is used as input to establish a prediction model, that is, the RSC is calculated for each base site in the upstream and downstream 1kb region of TSS corresponding to m preterm birth-related genes with n samples, then n ⁇ ( m ⁇ 2000
- the preterm birth-related gene includes at least one selected from the genes shown in Table 1.
- the preterm birth-related genes are related to the number of samples of pregnant women and the sequencing depth.
- the types and quantities of preterm birth-related genes will change.
- the present invention proposes a system for constructing a predictive model for determining the pregnancy state of a pregnant woman.
- the system includes: a training set building module 100, the training set and optional test set are composed of a plurality of samples of pregnant women, the samples of pregnant women have known pregnancy states; A predetermined parameter determination module 200, the predetermined parameter determination module 200 is connected 100 to the training set construction module, and for each of the pregnant women samples in the training set, determines the predetermined parameters of the pregnant women samples, the predetermined parameters include The differentially expressed gene information of the free nucleic acid in the plasma of the pregnant woman sample, the differentially expressed gene information is obtained by calculating the sequencing information of the free nucleic acid in the peripheral blood of the pregnant woman sample; and a prediction model building module 300, the prediction model building module 300 In connection with the predetermined parameter determination module 200, the prediction model is constructed based on the known pregnancy state and the predetermined parameter.
- the predetermined parameter determination module further includes: a gene expression status determination unit 210 , which utilizes the coverage depth of the sequencing reads of the free nucleic acid in the peripheral blood of the pregnant woman sample at the initiation site of gene transcription Predicting the gene expression situation in the whole genome; the gene expression difference significance detection unit 220, the gene expression difference significance detection unit 220 is connected with the gene expression situation determination unit 210, and for each gene transcription start position In the vicinity of the point, the coverage depth of the cell-free nucleic acid on each base site of the preterm pregnant woman sample and the full-term pregnant woman sample is significantly detected; and the preterm birth-related gene selection unit 230, the preterm birth-related gene The selection unit 230 is connected to the gene expression difference significance detection unit 220, and uses the significance detection to select genes with significant differences as preterm birth-related genes, that is, select genes with p ⁇ 0.05/(total number of genes) as preterm birth-related genes , in order to build a sequencing
- differentially expressed genes are related to the number of samples of pregnant women, the depth of sequencing, and the type of prediction (ie, the purpose of prediction). express genes. The selection of differentially expressed genes is mainly based on the sequencing results and detection purposes when the model is constructed. It can be a single gene or a combination of multiple genes.
- the nearby region is within the range of 100-1000 bases each of the transcription initiation site and upstream and downstream of the transcription initiation site.
- the prediction model building module 300 further includes: a data input unit 310 for inputting the number of samples of pregnant women, the pregnancy status of the samples of pregnant women, the data of all samples in the training set and the optional verification set
- the number of the preterm birth-related genes, the fragment length of the nearby region, and the coverage depth of the episomal nucleic acid on each base site in the region near the transcription start site of the preterm birth-related gene are used as inputs to construct a prediction model.
- the preterm birth-related gene includes at least one selected from the genes shown in Table 1.
- the preterm birth-related genes are related to the number of samples of pregnant women and the sequencing depth.
- the types and quantities of preterm birth-related genes will change.
- the present invention provides a method for determining the pregnancy status of a pregnant woman.
- the method includes: S1000, determining a predetermined parameter of the pregnant woman, where the predetermined parameter includes the expression prediction information of the preterm birth-related gene of the pregnant woman, the expression of the preterm birth-related gene
- the expression prediction information is obtained by calculating the sequencing information of the free nucleic acid in the peripheral blood of the pregnant woman; and S2000, based on the predetermined parameters and the prediction model, determine the pregnancy state of the pregnant woman, and the prediction model is determined by the first aspect of the present invention.
- the proposed method or the system proposed in the second aspect of the present invention is constructed.
- the free nucleic acid in the plasma of the pregnant woman to be tested is extracted and sequenced, and the sequence of the free nucleic acid in the plasma of the pregnant woman to be tested is analyzed according to the preterm birth-related genes obtained when the prediction model is constructed, and the sequence of the preterm birth-related gene is obtained.
- the prediction information is expressed, and the expression information related to preterm birth is input into the prediction model to predict whether the pregnant woman will have preterm birth.
- the corresponding prediction models and prediction model-related genes are used for prediction.
- the gestational state includes the delivery zone of the pregnant woman.
- the delivery interval is not greater than the normal pregnancy period, which means premature birth.
- the method can also be applied to other pregnancy complications related to nucleic acid expression, such as pregnancy tumor and preeclampsia.
- the sampling gestational age is 15-22 weeks.
- the inventors found that the differentially expressed genes of free nucleic acid in the plasma of pregnant women are strongly correlated with preterm birth when the blood collection is between 15 and 22 weeks of gestation. , to avoid the risk and cost of repeated blood sampling for pregnant women during the sample collection process.
- the obtained cell-free nucleic acid differentially expressed genes are different, and for different types of predictions, the optimal blood sampling gestational weeks are also different.
- the prediction model is at least one of principal component analysis and random forest. According to the method of the embodiment of the present invention, the prediction model is principal component analysis or random forest.
- the prediction model is not limited to the principal component analysis model and the random forest prediction model, and any statistical model that can generalize different distributions of differences can be applied.
- the step S2000 further includes: S2100, for each pregnant woman sample to be tested, the number of the preterm birth-related gene, the fragment length of the nearby region and the transcription start site of the preterm birth-related gene
- S2100 for each pregnant woman sample to be tested, the number of the preterm birth-related gene, the fragment length of the nearby region and the transcription start site of the preterm birth-related gene
- the depth of coverage of the free nucleic acid at each base site in the vicinity is input into the prediction model in order to obtain a prediction result.
- the cell-free nucleic acid sequencing data of the pregnant woman to be detected is obtained, and for each nucleic acid sample, the RSC value is calculated in the TSS region of the preterm birth-related gene, and the (m ⁇ 2000) RSC values of each nucleic acid sample are calculated.
- the value is used as input, and the prediction model is used for prediction, and the position coordinates (i.e., RSC matrix) of each nucleic acid sample obtained are corresponding to the preterm and term regions, and whether preterm birth occurs in the sample of pregnant women to be tested is predicted.
- preterm birth-related genes include at least one selected from the genes shown in Table 1.
- the preterm birth-related genes are related to the number of samples of pregnant women and the sequencing depth.
- the types and quantities of preterm birth-related genes will change.
- predictions are made based on the preterm birth-related genes obtained when the prediction model is constructed.
- the present invention provides a device for determining the pregnancy state of a pregnant woman.
- the apparatus includes: a predetermined parameter determination module 1000 for determining a predetermined parameter of the pregnant woman, the predetermined parameter including the expression prediction information of the preterm birth-related gene of the pregnant woman, the The expression prediction information of preterm birth-related genes is obtained by calculating the sequencing information of free nucleic acid in the peripheral blood of the pregnant woman; and the pregnancy state determination module 2000, the pregnancy state determination module 2000 is connected with the predetermined parameter determination module 1000, based on the predetermined parameters and a prediction model to determine the pregnancy state of the pregnant woman, and the prediction model is constructed by the method provided in the first aspect of the present invention or the system provided in the second aspect of the present invention.
- the device according to the embodiment of the present invention is suitable for executing the above-mentioned method for determining the pregnancy state of a pregnant woman, and some of its additional technical features and technical effects are the same as
- the parameter determination module further includes: a preterm birth-related gene expression information determination unit 1100, which predicts preterm birth-related genes using the coverage depth of the sequencing reads of the free nucleic acid in the peripheral blood of the pregnant woman at the gene transcription start site
- the expression condition, the preterm birth-related gene is determined by the method provided in the first aspect of the present invention or the system provided in the second aspect of the present invention.
- the pregnancy state determination module further includes: a data input unit 2100, for each sample of pregnant women to be tested, the number of the preterm birth-related genes, the fragment length of the nearby region and the preterm birth-related genes The coverage depth of the free nucleic acid at each base position in the region near the transcription initiation site is input into the prediction model to obtain a prediction result.
- the cell-free nucleic acid sequencing data of the pregnant woman to be detected is obtained, and for each nucleic acid sample, the RSC value is calculated in the TSS region of the preterm birth-related gene, and the (m ⁇ 2000) RSC values of each nucleic acid sample are calculated.
- the value is used as input, and the prediction model is used for prediction, and the position coordinates (i.e., RSC matrix) of each nucleic acid sample obtained are corresponding to the preterm and term regions, and whether preterm birth occurs in the sample of pregnant women to be tested is predicted.
- the preterm birth-related gene includes at least one selected from the genes shown in Table 1.
- the preterm birth-related genes are related to the number of samples of pregnant women and the sequencing depth.
- the types and quantities of preterm birth-related genes will change.
- predictions are made based on the preterm birth-related genes obtained when the prediction model is constructed.
- NIPT non-invasive prenatal diagnosis
- Table 2 Summary of preterm birth prediction training set and test set samples
- the P-values obtained by each gene after the rank sum test between the 400 bp upstream and downstream TSS upstream and downstream of each round of TSS group and the full-term group rank sum test in 20 independent rounds of analysis if the P-value of a gene has more than 60% (12/20) rounds below the threshold of 1.31e10 -6 , the gene is considered to be a preterm birth-related gene; the chromosome number, start position and end position of the gene are shown in the table 1 shown.
- Table 3 Rank sum test values for each round of 207 genes
- step (3) the training set samples obtained by 20 rounds of aggregation in (2), that is, the data of 1959 preterm samples and the corresponding RSC value matrix of 1959 full-term samples, respectively
- PCA and random forest model training is performed.
- the PCA and random forest test packages in R statistical software are used to complete the model training.
- the resulting trained model was saved for final preterm birth prediction.
- the RSC value of the TSS region of the significantly different genes selected in the step (3) of the test set obtained in (2) is used as input, and the matrix as shown in Table 4 is formed.
- the first row is the name of the 207 genes, and the second row
- the Nth row represents the RSC value of the corresponding gene TSS region of each sample in the preterm group (case group) and the full-term group (control group).
- Table 5 Summary of Accuracy in Predicting Preterm birth in the Test Group
- 1.mtry is a parameter in the random forest software package in R, which is used to specify the number of variables used for the binary tree in the node.
- ntree is a parameter in the random forest software package in R, specifying the number of decision trees included in the random forest.
- Adjust the random forest function parameters (mtry and ntree), and summarize the accuracy of predicting preterm birth in the test group. For example, in the first round of testing, when mtry is 140 and ntree is 700, the consistency between predicted preterm birth and true preterm birth is 92%, when When mtry is 200 and ntree is 700, the consistency between predicted preterm birth and true preterm birth is 94%. When mtry is 140 and ntree is 500, the consistency between predicted preterm birth and true preterm birth is 92%, and so on for each round. In terms of the prediction accuracy of 20 rounds, when the parameter setting mtry is 200 and ntree is 700, the prediction effect is the best, with an average of 91%.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Chemical & Material Sciences (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Biotechnology (AREA)
- Primary Health Care (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Biochemistry (AREA)
- Genetics & Genomics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Cell Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Phe | ESPN | H6PD | ALPL | … | … | MIR514A1 | FAM50A | LOC100507404 |
Case1 | ESPN_RSC | H6PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507405_RSC |
… | ESPN_RSC | H7PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507406_RSC |
CaseN | ESPN_RSC | H8PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507407_RSC |
Control1 | ESPN_RSC | H9PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507408_RSC |
… | ESPN_RSC | H10PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507409_RSC |
CntrolN | ESPN_RSC | H11PD_RSC | ALPL_RSC | … | … | MIR514A2_RSC | FAM51A_RSC | LOC100507410_RSC |
Claims (34)
- 一种构建预测模型的方法,所述预测模型用于预测孕妇的孕期状态,其特征在于,包括:(1)构建训练集合及可选的测试集合,所述训练集合和可选的测试集合由多个孕妇样本组成,所述孕妇样本具有已知的孕期状态;(2)针对所述训练集合的每一个所述孕妇样本,确定所述孕妇样本的预定参数,所述预定参数包括所述孕妇样本外周血中游离核酸的差异表达基因信息,所述差异表达基因信息通过计算所述孕妇样本外周血中游离核酸的测序信息获得;以及(3)基于所述已知的所述孕期状态和所述预定参数,构建所述预测模型。
- 根据权利要求1所述的方法,其特征在于,所述孕期状态包括所述孕妇的分娩区间。
- 根据权利要求1所述的方法,其特征在于,所述孕妇样本包括早产孕妇样本和足月孕妇样本。
- 根据权利要求1所述的方法,其特征在于,取样孕周为15~22周。
- 根据权利要求1所述的方法,其特征在于,所述预测模型包括选自主成分分析和随机森林至少之一。
- 根据权利要求1所述的方法,其特征在于,所述差异表达基因信息通过以下步骤获得:(a)利用所述孕妇样本外周血中游离核酸在基因转录起始位点的测序读段覆盖深度预测全基因组范围内的基因表达情况;(b)针对每个所述基因转录起始位点的附近区,将所述早产孕妇样本和所述足月孕妇样本中每个碱基位点上所述游离核酸的覆盖深度进行显著性检测;和(c)利用所述显著性检测,选择显著性差异表达的基因作为早产相关基因,以便构建测序模型;任选地,所述显著性差异表达的基因是指p<0.05/(基因总数)的基因。
- 根据权利要求6所述的方法,其特征在于,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各100~1000个碱基范围内;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各100个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各400个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各600个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各1000个碱基。
- 根据权利要求1所述的方法,其特征在于,所述步骤(3)进一步包括:将所述训练集合和可选的验证集合中的所述孕妇样本数量、所述孕妇样本的孕期状态、所述早产相关基因数量、所述附近区的片段长度和所述早产相关基因转录起始位点附近区每个碱基位点上的所述游离核酸的覆盖深度作为输入构建预测模型。
- 根据权利要求1所述的方法,其特征在于,所述早产相关基因包括选自表1所示基因至少之一。
- 一种构建预测模型的系统,其特征在于,所述预测模型用于确定孕妇的孕期状态,包括:训练集合构建模块,所述训练集合和可选的测试集合由多个孕妇样本组成,所述孕妇样本具有已知的孕期状态;预定参数确定模块,所述预定参数确定模块与所述训练集合构建模块相连,针对所述训练集合的每一个所述孕妇样本,确定所述孕妇样本的预定参数,所述预定参数包括所述孕妇样本血浆中游离核酸的差异表达基因信息,所述差异表达基因信息通过计算所述孕妇样本外周血中游离核酸的测序信息获得;以及预测模型构建模块,所述预测模型构建模块与所述预定参数确定模块相连,基于所述已知的所述孕期状态和所述预定参数,构建所述预测模型。
- 根据权利要求10所述的系统,其特征在于,所述孕期状态包括所述孕妇的分娩区间。
- 根据权利要求10所述的系统,其特征在于,所述孕妇样本包括早产孕妇样本和足月孕妇样本。
- 根据权利要求10所述的系统,其特征在于,取样孕周为15~22周。
- 根据权利要求10所述的系统,其特征在于,所述预测模型为主成分分析和随机森林至少之一。
- 根据权利要求10所述的系统,其特征在于,所述预定参数确定模块进一步包括:基因表达情况确定单元,利用所述孕妇样本外周血中游离核酸在基因转录起始位点的测序读段覆盖深度预测全基因组范围内的基因表达情况;基因表达差异显著性检测单元,所述基因表达差异显著性检测单元与所述基因表达情况确定单元相连,针对每个所述基因转录起始位点的附近区,将所述早产孕妇样本和所述足月孕妇样本每个碱基位点上所述游离核酸的覆盖深度进行显著性检测;和早产相关基因选择单元,所述早产相关基因选择单元与所述基因表达差异显著性检测单元相连,利用所述显著性检测,选择显著性差异的基因作为早产相关基因,以便构建测序模型;任选地,所述显著性差异表达的基因是指p<0.05/(基因总数)的基因。
- 根据权利要求15所述的系统,其特征在于,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各100~1000个碱基范围内;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各100个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各400个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各600个碱基;任选地,所述附近区为所述转录起始位点及所述转录起始位点的上游和下游各1000个碱基。
- 根据权利要求10所述的系统,其特征在于,所述预测模型构建模块进一步包括:数据输入单元,将所述训练集合和可选的验证集合中的所述孕妇样本数量、所述孕妇样本的孕期状态、所述早产相关基因数量、所述附近区的片段长度和所述早产相关基因转录起始位点附近区每个碱基位点上的所述游离核酸的覆盖深度作为输入构建预测模型。
- 根据权利要求15所述的系统,其特征在于,所述早产相关基因包括选自表1所示基因至少之一。
- 一种确定孕妇的孕期状态的方法,其特征在于,包括:(A)确定所述孕妇的预定参数,所述预定参数包括所述孕妇的所述早产相关基因的表达预测信息,所述早产相关基因的表达预测信息通过计算所述孕妇外周血中游离核酸的测序信息获得;以及(B)基于所述预定参数和预测模型,确定所述孕妇的孕期状态,所述预测模型是通过权利要求1~9任一项所述的方法或权利要求10~18任一项所述的系统构建的。
- 根据权利要求19所述的方法,其特征在于,所述孕期状态包括所述孕妇的分娩区间。
- 根据权利要求19所述的方法,其特征在于,取样孕周为15~22周。
- 根据权利要求19所述的方法,其特征在于,所述预测模型为主成分分析和随机森林至少之一。
- 根据权利要求19所述的方法,其特征在于,所述早产相关基因的表达预测信息通过以下方法获得:利用所述孕妇外周血中游离核酸在基因转录起始位点的测序读段覆盖深度预测早产相关基因表达情况,所述早产相关基因是通过权利要求1~9任一项所述的方法或权利要求10~18任一项所述的系统确定的。
- 根据权利要求19所述的方法,其特征在于,所述步骤(B)进一步包括:针对每个待测孕妇样本,将所述早产相关基因数量、所述附近区的片段长度和所述早产相关基因转录起始位点附近区每个碱基位点上的所述游离核酸的覆盖深度输入至所述预测模型中,以便获得预测结果。
- 根据权利要求23所述的方法,其特征在于,所述早产相关基因包括选自表1所示基因至少之一。
- 一种确定孕妇的孕期状态的装置,其特征在于,包括:参数确定模块,用于确定所述孕妇的预定参数,所述预定参数包括所述孕妇的早产相关基因的表达预测信息,所述早产相关基因的表达预测信息通过计算所述孕妇外周血中游离核酸的测序信息获得;以及孕期状态确定模块,所述孕期状态确定模块与所述参数确定模块相连,基于所述预定参数和预测模型,确定所述孕妇的孕期状态,所述预测模型是通过权利要求1~9任一项所述的方法或权利要求10~18任一项所述的系统构建的。
- 根据权利要求26所述的装置,其特征在于,所述孕期状态包括所述孕妇的分娩区间。
- 根据权利要求26所述的装置,其特征在于,取样孕周为15~22周。
- 根据权利要求26所述的装置,其特征在于,所述预测模型为主成分分析和随机森林至少之一。
- 根据权利要求26所述的装置,其特征在于,所述参数确定模块进一步包括:早产相关基因表达信息确定单元,利用所述孕妇外周血中游离核酸在基因转录起始位点的测序读段覆盖深度预测早产相关基因表达情况,所述早产相关基因是通过权利要求1~9任一项所述的方法或权利要求10~18任一项所述的系统确定的。
- 根据权利要求26所述的装置,其特征在于,所述孕期状态确定模块进一步包括:数据输入单元,针对每个待测孕妇样本,将所述早产相关基因数量、所述附近区的片段长度和所述早产相关基因转录起始位点附近区每个碱基位点上的所述游离核酸的覆盖深度输入至所述预测模型中,以便获得预测结果。
- 根据权利要求30所述的装置,其特征在于,所述早产相关基因包括选自表1所示基因至少之一。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1~9或权利要求19~25中任一项所述方法的步骤。
- 一种电子设备,其特征在于,包括:权利要求33中所述的计算机可读存储介质;以及一个或者多个处理器,用于执行所述计算机可读存储介质中的程序。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/032,661 US20230386607A1 (en) | 2020-10-20 | 2020-10-20 | Method for determining pregnancy status of pregnant woman |
CN202080106438.1A CN116323978A (zh) | 2020-10-20 | 2020-10-20 | 确定孕妇的孕期状态的方法 |
PCT/CN2020/122214 WO2022082436A1 (zh) | 2020-10-20 | 2020-10-20 | 确定孕妇的孕期状态的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/122214 WO2022082436A1 (zh) | 2020-10-20 | 2020-10-20 | 确定孕妇的孕期状态的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022082436A1 true WO2022082436A1 (zh) | 2022-04-28 |
Family
ID=81291283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/122214 WO2022082436A1 (zh) | 2020-10-20 | 2020-10-20 | 确定孕妇的孕期状态的方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230386607A1 (zh) |
CN (1) | CN116323978A (zh) |
WO (1) | WO2022082436A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160017412A1 (en) * | 2014-07-18 | 2016-01-21 | Illumina, Inc. | Non-invasive prenatal diagnosis of fetal genetic condition using cellular dna and cell free dna |
WO2019191319A1 (en) * | 2018-03-30 | 2019-10-03 | Juno Diagnostics, Inc. | Deep learning-based methods, devices, and systems for prenatal testing |
CN110785499A (zh) * | 2018-05-25 | 2020-02-11 | 伊鲁米那股份有限公司 | 对先兆子痫具有特异性的循环rna标识 |
WO2020154402A1 (en) * | 2019-01-24 | 2020-07-30 | Illumina, Inc. | Methods and systems for monitoring organ health and disease |
CN111566228A (zh) * | 2017-10-23 | 2020-08-21 | 陈扎克伯格生物中心公司 | 用于在胎儿孕育中预测胎龄和早产的无创分子钟 |
-
2020
- 2020-10-20 US US18/032,661 patent/US20230386607A1/en active Pending
- 2020-10-20 CN CN202080106438.1A patent/CN116323978A/zh active Pending
- 2020-10-20 WO PCT/CN2020/122214 patent/WO2022082436A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160017412A1 (en) * | 2014-07-18 | 2016-01-21 | Illumina, Inc. | Non-invasive prenatal diagnosis of fetal genetic condition using cellular dna and cell free dna |
US20190062832A1 (en) * | 2014-07-18 | 2019-02-28 | Illumina, Inc. | Non-invasive prenatal diagnosis of fetal genetic condition using cellular dna and cell free dna |
CN111566228A (zh) * | 2017-10-23 | 2020-08-21 | 陈扎克伯格生物中心公司 | 用于在胎儿孕育中预测胎龄和早产的无创分子钟 |
WO2019191319A1 (en) * | 2018-03-30 | 2019-10-03 | Juno Diagnostics, Inc. | Deep learning-based methods, devices, and systems for prenatal testing |
CN110785499A (zh) * | 2018-05-25 | 2020-02-11 | 伊鲁米那股份有限公司 | 对先兆子痫具有特异性的循环rna标识 |
WO2020154402A1 (en) * | 2019-01-24 | 2020-07-30 | Illumina, Inc. | Methods and systems for monitoring organ health and disease |
Also Published As
Publication number | Publication date |
---|---|
US20230386607A1 (en) | 2023-11-30 |
CN116323978A (zh) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7297015B2 (ja) | エピジェネティックな染色体相互作用 | |
Tarca et al. | Analysis of microarray experiments of gene expression profiling | |
AU2020221278A1 (en) | Methods and systems for determining a pregnancy-related state of a subject | |
WO2022170909A1 (zh) | 药物敏感预测方法、电子设备及计算机可读存储介质 | |
CN109767810A (zh) | 高通量测序数据分析方法及装置 | |
KR101721480B1 (ko) | 염색체 이상 검사 방법 및 시스템 | |
JPWO2020168118A5 (zh) | ||
JP2021500061A5 (zh) | ||
CN109979529A (zh) | Cnv检测装置 | |
KR20230110615A (ko) | 태아 염색체 이상을 검출하는 방법 및 시스템 | |
KR101678962B1 (ko) | 대규모 병렬형 게놈서열분석 방법을 이용한 비침습적 산전검사 장치 및 방법 | |
CN110387414B (zh) | 一种利用外周血游离dna预测妊娠期糖尿病的模型 | |
WO2018137496A1 (zh) | 确定生物样本中预定来源的游离核酸比例的方法及装置 | |
CN117275585A (zh) | 基于lp-wgs和dna甲基化的肺癌早筛模型构建方法及电子设备 | |
WO2021243650A1 (zh) | 确定孕妇的孕期状态的方法 | |
WO2022082436A1 (zh) | 确定孕妇的孕期状态的方法 | |
CN110580934B (zh) | 一种基于外周血游离dna高通量测序的妊娠期相关疾病预测方法 | |
CN117551760A (zh) | 用于预测进展性结核和非进展性结核的生物标志物及其应用 | |
Fang et al. | Psychosocial correlates of intention to undergo prophylactic oophorectomy among women with a family history of ovarian cancer | |
CN107109324B (zh) | 确定胎儿核酸含量的方法和装置 | |
CN114822682B (zh) | 与早发型重度子痫前期发生相关的基因组合及其应用 | |
CN108229099A (zh) | 数据处理方法、装置、存储介质及处理器 | |
CN114267409A (zh) | 无创产前基因检测测序数据的分析方法、装置及存储介质 | |
Gerashchenko et al. | Development of gene expression panels to determine prostate cancer | |
CN110305970A (zh) | 一种基于外周血游离dna检测的巨大儿预测模型 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20958019 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18032661 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13/09/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20958019 Country of ref document: EP Kind code of ref document: A1 |