WO2021134513A1 - 确定染色体非整倍性、构建分类模型的方法和装置 - Google Patents
确定染色体非整倍性、构建分类模型的方法和装置 Download PDFInfo
- Publication number
- WO2021134513A1 WO2021134513A1 PCT/CN2019/130625 CN2019130625W WO2021134513A1 WO 2021134513 A1 WO2021134513 A1 WO 2021134513A1 CN 2019130625 W CN2019130625 W CN 2019130625W WO 2021134513 A1 WO2021134513 A1 WO 2021134513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- chromosome
- sample
- feature
- concentration
- aneuploidy
- Prior art date
Links
- 210000000349 chromosome Anatomy 0.000 title claims abstract description 376
- 238000000034 method Methods 0.000 title claims abstract description 150
- 208000036878 aneuploidy Diseases 0.000 title claims abstract description 129
- 231100001075 aneuploidy Toxicity 0.000 title claims abstract description 129
- 238000013145 classification model Methods 0.000 title claims description 41
- 239000000523 sample Substances 0.000 claims abstract description 195
- 238000012163 sequencing technique Methods 0.000 claims abstract description 194
- 230000001605 fetal effect Effects 0.000 claims abstract description 100
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 74
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 74
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 74
- 210000003754 fetus Anatomy 0.000 claims abstract description 69
- 239000013068 control sample Substances 0.000 claims abstract description 26
- 238000010801 machine learning Methods 0.000 claims description 45
- 238000012360 testing method Methods 0.000 claims description 40
- 238000012549 training Methods 0.000 claims description 30
- 230000002759 chromosomal effect Effects 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 210000005259 peripheral blood Anatomy 0.000 claims description 8
- 239000011886 peripheral blood Substances 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 239000013642 negative control Substances 0.000 claims description 6
- 239000013641 positive control Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 description 31
- 108020004414 DNA Proteins 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 15
- 238000012706 support-vector machine Methods 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 9
- 230000035945 sensitivity Effects 0.000 description 5
- 208000037280 Trisomy Diseases 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008774 maternal effect Effects 0.000 description 4
- 238000003793 prenatal diagnosis Methods 0.000 description 4
- 238000010200 validation analysis Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 238000009609 prenatal screening Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 108010047956 Nucleosomes Proteins 0.000 description 2
- 210000002593 Y chromosome Anatomy 0.000 description 2
- 238000002669 amniocentesis Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 210000001623 nucleosome Anatomy 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 230000000472 traumatic effect Effects 0.000 description 2
- 206010000234 Abortion spontaneous Diseases 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 238000010241 blood sampling Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 210000004252 chorionic villi Anatomy 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 210000004700 fetal blood Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 208000015994 miscarriage Diseases 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000000405 serological effect Effects 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 239000000344 soap Substances 0.000 description 1
- 208000000995 spontaneous abortion Diseases 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the present invention relates to the field of biotechnology, particularly non-invasive prenatal genetic testing, and specifically relates to a method and device for determining chromosome aneuploidy and a corresponding method and device for constructing a machine learning classification model.
- Prenatal screening methods are usually divided into two categories, namely traumatic methods (also called prenatal diagnosis) and non-invasive methods.
- the former mainly includes amniocentesis, villus sampling, cord blood sampling, etc.; the latter includes ultrasound, maternal peripheral serum marker determination, and fetal cell detection.
- Traumatic methods such as chorionic villus sampling (CVS) or amniocentesis are used to obtain cells isolated from the fetus, which can be used for routine prenatal diagnosis.
- CVS chorionic villus sampling
- amniocentesis are used to obtain cells isolated from the fetus, which can be used for routine prenatal diagnosis.
- Non-invasive prenatal screening mainly uses high-throughput sequencing technology to analyze the free DNA of the fetus in the peripheral blood of pregnant women to assess the risk of common chromosomal aneuploidy abnormalities in the fetus.
- the common screening scopes are chromosome 21 aneuploidy (T21), chromosome 18 aneuploidy (T18), chromosome 13 aneuploidy (T13) and sex chromosomes.
- NIPT based on the quantitative method of sequencing sequence number: The main principle of this method is to use the comparison software to locate the sequencing sequence (read, sometimes called “sequencing read") in a pre-defined window, and then use it appropriately. The method of aneuploidy detection of the chromosome to be tested.
- NIPT based on a single nucleotide polymorphism (SNP) method The main principle of this method is to capture and sequence the genomic DNA and fetal cell-free DNA of both parents according to the predetermined SNP site region, thereby using the parents And the genotype information of the fetus adopts Bayesian model to detect the chromosome aneuploidy under examination.
- SNP single nucleotide polymorphism
- NIPT based on the size of DNA fragments:
- PE paired-end sequencing technology
- Z test is used to detect the aneuploidy of the chromosome to be inspected based on the reference sample.
- an object of the present invention is to provide a method that can effectively determine chromosome aneuploidy.
- the present invention provides a method for determining whether a fetus has chromosomal aneuploidy.
- the method includes: (1) obtaining nucleic acid sequencing data from a pregnant woman sample, said The pregnant woman sample contains free fetal nucleic acid, and the nucleic acid sequencing data is composed of a plurality of sequencing reads; (2) the fetal concentration of the pregnant woman sample and the back estimated concentration of a predetermined chromosome are determined based on the nucleic acid sequencing data, and the back estimated concentration It is determined based on the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome, the predetermined chromosome includes the chromosome to be tested and the second comparison chromosome, and the first comparison chromosome includes at least one difference The autosome of the predetermined chromosome; (3) the first feature is determined based on the difference between the inverse estimated concentration of the test
- This method can effectively determine whether the fetus has aneuploidy for the chromosome to be tested.
- the method replaces the current threshold setting based on the number of sequencing sequences.
- the established strategy eliminates the gray area of detection, at the same time it can shorten the sample detection cycle, improve the customer experience, and can significantly reduce the cost of sequencing and testing.
- the above method may also have the following additional technical features:
- the pregnant woman sample includes a pregnant woman's peripheral blood.
- the nucleic acid sequencing sample is obtained by paired-end sequencing, single-end sequencing, or single-molecule sequencing.
- the fetal concentration is determined by the following steps: (a) comparing the nucleic acid sequencing data from the pregnant woman sample with a reference sequence, so as to determine the sequence reads that fall within a predetermined window And (b) determine the fetal concentration of the pregnant woman sample based on the number of sequencing reads that fall into the predetermined window.
- the number of sequencing reads of the first comparison chromosome is the average number of sequencing reads of a plurality of autosomes, and the plurality of autosomes includes at least one known not having Autosomes with aneuploidy.
- the number of sequencing reads of the first comparison chromosome is the average number of sequencing reads of at least 15 autosomes, optionally, the sequencing reads of the first comparison chromosome The number is the average number of sequencing reads of at least 20 autosomes.
- the number of sequencing reads of the first comparison chromosome is the average number of sequencing reads of all autosomes.
- the inverse estimated concentration is determined according to the following formula:
- j represents the number of the chromosome for which the inverse estimated concentration needs to be determined
- Fj represents the inverse estimated concentration of chromosome j
- Rr represents the average number of sequencing reads of the multiple autosomes
- Rj represents the number of reads sequenced on chromosome j.
- the first feature is determined based on the difference between the counter-estimated concentration of the chromosome to be tested and the average value of the counter-estimated concentration of the second comparison chromosome.
- the second comparison chromosome includes at least 10 autosomes.
- the second comparison chromosome includes 15 autosomes.
- it further includes: determining the inverse estimated concentration of a plurality of autosomes; and selecting the target-ranked autosomes as the second comparison chromosome in an order of priority from small to large.
- the first feature is determined by the following formula:
- X1 represents the first feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fr represents the average value of the inverse estimated concentration of the second comparison chromosome.
- the second characteristic is determined by the following formula:
- X2 represents the second feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fa represents the fetal concentration
- the first feature and the second feature are standardized, so that the absolute values of the first feature and the second feature are independently at 0. Between ⁇ 1.
- step (4) the ratio of the number of positive samples to the number of negative samples is not less than 1:4.
- step (4) the ratio of the number of positive samples to the number of negative samples does not exceed 4:1.
- step (4) the ratio of the number of the positive samples to the negative samples is 1:0.1-5.
- step (4) the ratio of the number of positive samples to the number of negative samples is 1:0.25-4.
- neither the positive sample nor the negative sample has aneuploidy for chromosomes other than the chromosome to be tested.
- the first feature and the second feature are used to determine the two-dimensional feature vector of the pregnant woman sample and the control sample, based on the two-dimensional feature vector Determine the distance between samples, and classify the pregnant woman sample between the positive control sample and the negative control sample, so as to determine whether the fetus has aneuploidy for the chromosome to be tested.
- the distance is Euclidean distance, Manhattan distance or Chebyshev distance.
- step (4) it further includes: (4-1) respectively calculating the distance between the pregnant woman sample and the control sample; (4-2) comparing the obtained distance Perform sorting, the sorting is based on the order from small to large; (4-3) based on the sorting, a predetermined number of control samples are selected from small to large; (4-4) the predetermined number of control samples are respectively determined The number of positive samples and negative samples in the middle; (4-5) Based on the majority decision-making method, determine the result of classifying the pregnant women samples.
- the predetermined number is not more than 20.
- the predetermined number is 3-10.
- step (4-2) before the sorting, the distance between the sample to be tested and the predetermined control sample is weighted in advance.
- the present invention provides a device for determining whether a fetus has chromosomal aneuploidy, which is characterized by comprising: a data acquisition module for acquiring nucleic acid sequencing data from a sample of a pregnant woman, the pregnant woman The sample contains free fetal nucleic acid, and the nucleic acid sequencing data is composed of multiple sequencing reads; the fetal concentration-inverse concentration determination module is used to determine the fetal concentration of the pregnant woman sample and the inverse estimate of the predetermined chromosome based on the nucleic acid sequencing data The inverse estimated concentration is determined based on the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome.
- the predetermined chromosome includes the chromosome to be tested and the second comparison chromosome.
- the comparison chromosome includes at least one autosome that is different from the predetermined chromosome; a feature determination module is used to determine the first feature based on the difference between the back-estimated concentration of the chromosome to be tested and the back-estimated concentration of the second comparison chromosome, based on The difference between the inverse estimated concentration of the chromosome to be tested and the fetal concentration determines the second feature; and the aneuploidy determination module is configured to determine the second feature based on the first feature and the second feature and using the corresponding data of the control sample Whether the fetus of the pregnant woman has aneuploidy for the chromosome to be tested, wherein the control sample includes a positive sample and a negative sample, the positive sample has aneuploidy for the chromosome to be tested, and the negative The sample does not have aneuploidy for the chromosome to be
- the device for determining whether a fetus has chromosome aneuploidy can effectively implement the method for determining whether a fetus has chromosome aneuploidy, so as to effectively determine whether the fetus is targeted for the chromosome to be tested. Whether there is aneuploidy.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers Experience degree, and can significantly reduce sequencing and detection costs.
- the above-mentioned device may also have the following additional technical features:
- the fetal concentration-reverse-estimated concentration determination module includes: a comparison unit, configured to compare the nucleic acid sequencing data from the pregnant woman sample with a reference sequence, so as to determine what falls within a predetermined window The number of sequencing reads; and a fetal concentration calculation unit for determining the fetal concentration of the pregnant woman sample based on the number of sequencing reads that fall into the predetermined window.
- the fetal concentration-reverse estimated concentration determination module includes: a reverse estimated concentration calculation unit configured to determine the reverse estimated concentration according to the following formula:
- j represents the number of the chromosome for which the inverse estimated concentration needs to be determined
- Fj represents the inverse estimated concentration of chromosome j
- Rr represents the average number of sequencing reads of the multiple autosomes
- Rj represents the number of reads sequenced on chromosome j.
- the fetal concentration-reverse estimated concentration determining module includes: a second comparing chromosome determining unit is used to sort the reverse estimated concentrations of a plurality of autosomes in a priority order from small to large, and to sort the targets The autosome is used as the second comparison chromosome.
- the feature determination module includes:
- the first feature determining unit is configured to determine the first feature using the following formula:
- X1 represents the first feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fr represents the average value of the inverse estimated concentration of the second comparison chromosome.
- the feature determining module includes: a second feature determining unit, configured to determine the second feature using the following formula:
- X2 represents the second feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fa represents the fetal concentration
- the feature determination module includes: a standardization processing unit, configured to perform standardization processing on the first feature and the second feature, so that the absolute value of the first feature and the second feature The values are independently between 0 and 1.
- the aneuploidy determination module is configured to use the first feature and the second feature to determine the two-dimensional feature vector of the pregnant woman sample and the control sample, based on the two The inter-sample distance determined by the dimensional feature vector classifies the pregnant woman sample between the positive control sample and the negative control sample, so as to determine whether the fetus has aneuploidy for the chromosome to be tested.
- the distance is Euclidean distance, Manhattan distance or Chebyshev distance.
- the aneuploidy determination module is configured to use a k-nearest neighbor model to determine the classification result of the pregnant woman sample.
- the K value adopted by the k-nearest neighbor model does not exceed 20.
- the K value adopted by the k-nearest neighbor model is 3-10.
- the distance between the samples is weighted.
- the present invention provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the aforementioned determination of whether the fetus has chromosomal aneuploidy is realized.
- the steps of the sexual method Therefore, the method for determining whether the fetus has chromosome aneuploidy described above can be effectively implemented, so that it can be effectively determined whether the fetus has aneuploidy with respect to the chromosome to be tested.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers Experience degree, and can significantly reduce sequencing and detection costs.
- the present invention provides an electronic device, which includes: the aforementioned computer-readable storage medium; and one or more processors configured to execute program. Therefore, the method for determining whether the fetus has chromosome aneuploidy described above can be effectively implemented, so that it can be effectively determined whether the fetus has aneuploidy with respect to the chromosome to be tested.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers Experience degree, and can significantly reduce sequencing and detection costs.
- the present invention proposes a method for constructing a machine learning classification model.
- the method includes: (a) For each of a plurality of pregnant women samples: The nucleic acid sequencing data of a pregnant woman sample, the pregnant woman sample contains free fetal nucleic acid, the nucleic acid sequencing data is composed of a plurality of sequencing reads, the pregnant woman sample includes at least one positive sample and at least one negative sample, and the positive sample is for The chromosome to be tested has aneuploidy, and the negative sample does not have aneuploidy for the chromosome to be tested; the fetal concentration of the pregnant woman sample and the inverse estimated concentration of the predetermined chromosome are determined based on the nucleic acid sequencing data.
- the inverse estimation concentration is determined based on the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome, the predetermined chromosome includes the chromosome to be tested and the second comparison chromosome, and the first comparison chromosome includes at least An autosome that is different from the predetermined chromosome; and determining the first feature based on the difference between the inverse estimated concentration of the test chromosome and the inverse estimated concentration of the second comparison chromosome, based on the inverse estimated concentration of the test chromosome
- the difference between the concentration of the fetus and the concentration of the fetus determines the second feature, and (b) the multiple pregnant women samples are used as samples, and the first feature and the second feature of the samples are used to perform machine learning training, so as to construct a Machine learning classification model with aneuploidy.
- a machine learning classification model can be effectively constructed, so that the classification model can be further used to identify and classify unknown samples to determine whether there is chromosome aneuploidy for a specific chromosome Sex.
- the machine learning classification model is a KNN model.
- the KNN model adopts Euclidean distance.
- the present invention provides a device for constructing a machine learning classification model, which includes: a feature acquisition module for performing separately for each of a plurality of pregnant women samples: acquiring nucleic acids from the pregnant women samples Sequencing data, the pregnant woman sample contains free fetal nucleic acid, the nucleic acid sequencing data consists of a plurality of sequencing reads, the pregnant woman sample includes at least one positive sample and at least one negative sample, and the positive sample has a non-negative sample for the chromosome to be tested.
- the negative sample does not have aneuploidy for the chromosome to be tested;
- the fetal concentration of the pregnant woman sample and the back-estimated concentration of the predetermined chromosome are determined based on the nucleic acid sequencing data, and the back-estimated concentration is based on Is determined by the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome
- the predetermined chromosome includes a chromosome to be tested and a second comparison chromosome
- the first comparison chromosome includes at least one that is different from the An autosome of a predetermined chromosome; and determining the first feature based on the difference between the inverse estimated concentration of the test chromosome and the inverse estimated concentration of the second comparison chromosome, based on the inverse estimated concentration of the test chromosome and the fetal concentration
- the second feature is determined by the difference of, and the training module is used to perform machine learning training using the
- the device can effectively implement the aforementioned method of constructing a machine learning classification model, thereby effectively constructing a machine learning classification model, so that the classification model can be further used to identify and classify unknown samples to determine the target Whether there is chromosome aneuploidy in a specific chromosome.
- the machine learning classification model is a KNN model.
- the present invention proposes a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, it implements the steps for constructing a machine learning classification method described in the preceding claims.
- the aforementioned method of constructing a machine learning classification model can be effectively implemented, so that a machine learning classification model can be effectively constructed, so that the classification model can be further used to identify and classify unknown samples to determine the target Whether there is chromosome aneuploidy in a specific chromosome.
- Figure 1 shows a schematic flow chart of a method for determining whether a fetus has chromosomal aneuploidy according to an embodiment of the present invention
- Figure 2 shows a schematic flow chart of a method for determining fetal concentration according to an embodiment of the present invention
- Fig. 3 shows a schematic flow chart of a method for classifying pregnant women samples according to an embodiment of the present invention
- Figure 4 shows a block diagram of a device for determining whether a fetus has chromosomal aneuploidy according to an embodiment of the present invention
- Figure 5 shows a block diagram of a fetal concentration-inverse concentration determination module according to an embodiment of the present invention
- Figure 6 shows a block diagram of a feature determining module according to an embodiment of the present invention
- Fig. 7 shows a block diagram of constructing a machine learning classification model according to an embodiment of the present invention
- Figures 12 and 13 show the ROC curve corresponding to the parameter k when the KNN model is used to detect T13 according to an embodiment of the present invention.
- the present invention provides a method for determining whether a fetus has chromosomal aneuploidy.
- the method for determining whether a fetus has chromosomal aneuploidy according to an embodiment of the present invention will be described in detail below by referring to FIGS. 1 to 3.
- the method for determining whether a fetus has chromosomal aneuploidy includes:
- the pregnant woman sample that can be used includes, but is not limited to, the peripheral blood of the pregnant woman. .
- the peripheral blood of the pregnant woman includes, but is not limited to, the peripheral blood of the pregnant woman.
- NIPT non-invasive prenatal diagnosis
- nucleic acid sequencing when obtaining samples of pregnant women, such as the peripheral blood of pregnant women, nucleic acid sequencing can be performed on these samples to obtain nucleic acid sequencing data of the samples of pregnant women.
- the nucleic acid sequencing data is composed of multiple or a large number of sequencing reads. (read) constituted.
- the method for sequencing the nucleic acid molecules of the pregnant woman sample is not particularly limited. Specifically, any sequencing method known to those skilled in the art can be used, for example, including but not limited to paired-end sequencing, single-end sequencing, and single-end sequencing. End-sequencing or single-molecule sequencing sequence the nucleic acid molecules of pregnant women's samples.
- the obtained sequencing data consisting of a large number of sequencing reads can be filtered and screened according to the quality control standards to remove the sequencing reads with sequencing quality problems. , which can improve the accuracy of subsequent data analysis.
- the fetal concentration of the pregnant woman sample and the inverse estimated concentration of a specific chromosome can be determined.
- the fetal concentration refers to the ratio of the number of free nucleic acids from the fetus to the total number of free nucleic acids in free nucleic acids in a sample of pregnant women, such as peripheral blood.
- the value of the fetal concentration will increase with the increase of the gestational week. For example, around the 12th gestational week, the ratio of fetal free nucleic acid (sometimes directly referred to as "fetal free DNA") to the total free nucleic acid (ie " Fetal concentration”) can reach 10-14%, and after the 20th gestational week, this ratio can reach more than 20%.
- the fetal concentration will be abnormal. Therefore, the fetal concentration can be used as an important indicator to characterize the samples of pregnant women.
- Y chromosome estimation method SNP-based fetal-specific SNP site method
- nucleosome-based imprinting method the inventors of the present invention found that these methods have their limitations.
- the Y chromosome estimation method is not suitable for female fetuses, and the SNP-based fetal-specific SNP site method needs to obtain the father’s DNA samples (sometimes these samples are more difficult to use). Obtained), based on the poor accuracy of the nucleosome imprinting method, and deep sequencing is required when constructing the model.
- the fetal concentration in a nucleic acid sample can be determined through the following steps, which specifically include:
- S210 Compare the nucleic acid sequencing data from the pregnant woman sample with the reference sequence, so as to determine the number of sequencing reads that fall into the predetermined window;
- S220 Determine the fetal concentration of the pregnant woman sample based on the number of sequencing reads that fall into the predetermined window.
- the method for determining the fetal concentration is based on the number of sequencing reads in a specific window (ie, a certain length of nucleic acid sequence), which is positively correlated with the fetal concentration. Therefore, by determining the number of sequencing reads in at least one predetermined window, the fetal concentration of the pregnant woman's sample can be obtained inversely, for example, in a weighted average manner.
- the predetermined window can be determined by means of statistics or machine learning.
- the predetermined window is obtained by continuously dividing specific chromosomes of the reference genome sequence, and the weight of each predetermined window is further used to determine the fetal concentration.
- the weight of each predetermined window is predetermined by using training samples. As a result, the results are accurate, reliable, and repeatable.
- the weight is determined using at least one of a ridge regression statistical model and a neural network model.
- the neural network model adopts a TesnsorFlow learning system.
- the parameters of the TesnsorFlow learning system include: adopting the number of sequencing data in each window of autosomes as the input layer; adopting the fetal concentration as the output layer; adopting ReLu as the neuron type; adopting the optimization algorithm selected from Adam At least one of, SGD and Ftrl; preferably Ftrl.
- the parameters of the Tesnsor Flow learning system further include: the learning rate is set to 0.002; the number of hidden layers is 1; the number of neurons in the hidden layer is 200.
- the results are accurate and reliable.
- weight used in this article is a relative concept and is aimed at a certain index.
- the weight of an indicator refers to the relative importance of the indicator in the overall evaluation.
- a certain "weight of a predetermined window” refers to the relative importance of a certain predetermined window among all predetermined windows.
- a certain "connection weight” refers to the relative importance of a connection between two different layers in all two different layers.
- PCT/CN2018/07204 title of invention: method and device for determining the proportion of free nucleic acid from a predetermined source in a biological sample
- the full text is incorporated by reference.
- the method can obtain fetal concentration data simply, quickly and accurately.
- the obtained fetal concentration data can be more effectively applied to the method of the present invention to determine whether the fetus has chromosomal aneuploidy.
- the fetal concentration can be determined, but also the inverse estimated concentration of the predetermined chromosomes can be further determined.
- inverse estimation concentration used in this article refers to a measure that characterizes the difference between the DNA content of a specific chromosome and that of a normal chromosome. Specifically, the number of sequencing reads of a specific chromosome can be compared with that of normal chromosomes. The difference in the number of segments is expressed. For example, in an ideal state, for a chromosome with trisomy, the inverse-estimated concentration is the amount that represents the DNA content of one extra chromosome. For normal chromosomes, because there is no extra chromosome, the inverse-estimated concentration is 0.
- normal chromosome refers to a chromosome without chromosome aneuploidy, and does not mean that the chromosome does not have other abnormalities.
- the expression “number of sequencing reads of" is mentioned many times, such as “number of sequencing reads of normal chromosome”, “number of sequencing reads of specific chromosome”, “sequencing that falls into a predetermined pair
- the meaning of “number of reads” refers to the number of sequencing reads that can be matched with the region.
- the nucleic acid sequencing result can be compared with a reference sequence such as hg19.
- a conventional software such as SOAP is used for comparison, it can be compared with a specific
- the sequencing reads compared to the region of are considered as the sequencing reads of the region.
- the step of determining the number of corrected sequencing reads includes:
- inverse estimated concentration refers to a measure that characterizes the difference between the DNA content of a specific chromosome and the DNA content of normal chromosomes. Therefore, the inverse estimated concentration can be used as an important indicator for characterizing pregnant women's samples. According to an embodiment of the present invention, the inverse estimation concentration is determined based on the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome.
- predetermined chromosome includes the chromosome to be tested, that is, the chromosome for which aneuploidy needs to be determined.
- the predetermined chromosome also includes a second comparison chromosome.
- the second comparison chromosome includes At least one autosome. It should be noted that the inverse estimated concentration is calculated separately for each of the predetermined chromosomes, so for each of the chromosome to be tested and the second comparison chromosome, the inverse estimated concentration corresponding to the chromosome will be obtained respectively.
- the first comparison chromosome and the second comparison chromosome are derived from the same sample as the chromosome to be tested, instead of using data from other samples for analysis.
- the second comparison chromosome includes at least 10 autosomes. According to an embodiment of the present invention, the second comparison chromosome includes 15 autosomes.
- the back-estimated concentration can be used as an indicator to characterize whether a chromosome is abnormal. Therefore, the second comparison chromosome can be selected by using the back-estimated concentration. According to the embodiment of the present invention, it further includes: determining the inverse estimated concentration of a plurality of autosomes; and selecting the target-ranked autosomes as the second comparison chromosome in an order of priority from small to large. According to the previous description, the smaller the inverse concentration, the higher the probability of the chromosome as a normal chromosome.
- a suitable autosome can be selected as the second comparison chromosome.
- whether there is an abnormality in the number of chromosomes can be determined through experience. For example, statistical analysis finds that some chromosomes have almost no aneuploidy. Therefore, these chromosomes can be regarded as the first Two compare chromosomes.
- the inverse estimation of the concentration is to characterize the difference between the characteristic chromosome and the normal chromosome. Therefore, according to an embodiment of the present invention, the first comparison chromosome includes at least one that is different from the predetermined An autosome of a chromosome. It should be noted that the first comparison chromosome and the second comparison chromosome mentioned here may be crossed. Specifically, in the calculation formula of the inverse estimation concentration, a specific chromosome will be selected from the predetermined chromosomes. Therefore, the rest Although the chromosome may be covered by the meaning of "second comparison chromosome", it still belongs to the concept of "autosome different from the predetermined chromosome".
- chromosome 23 is selected as the test chromosome and chromosomes 2 to 5 are used as the second comparison chromosome, when calculating the inverse estimated concentration of chromosome 23, chromosomes 2 to 5 can still be used as the first comparison chromosome.
- the first comparison chromosome may include multiple autosomes, and when calculating the inverse estimation concentration, the average number of reads for sequencing may be selected. In this way, the efficiency and accuracy of sequencing data analysis can be further improved.
- the number of sequencing reads of the first comparison chromosome is an average number of sequencing reads of a plurality of autosomes, the plurality of autosomes including at least one autosome that is known to have no aneuploidy.
- the number of sequencing reads of the first comparison chromosome is an average number of sequencing reads of at least 15 autosomes.
- the number of sequencing reads of the first comparison chromosome is an average of at least 20 autosomes.
- the number of sequencing reads of the first comparison chromosome is the average number of sequencing reads of all autosomes. In this way, by selecting the average number of reads for multiple chromosomes, the differences between the chromosomes can be eliminated.
- the inverse estimated concentration is determined according to the following formula:
- j represents the number of the chromosome for which the inverse estimated concentration needs to be determined
- Fj represents the inverse estimated concentration of chromosome j
- Rr represents the average number of sequencing reads of the multiple autosomes
- Rj represents the number of reads sequenced on chromosome j.
- the fetal concentration and the inverse estimated concentration determined in this step are both affected by the chromosome aneuploidy to varying degrees, so these two parameters can be used in subsequent aneuploidy detection.
- these parameters can be further used as the characteristic values of the sample, so that machine learning can be further used for analysis.
- the first feature is determined by the difference between the back-estimated concentration of the chromosome to be tested and the back-estimated concentration of the second comparison chromosome, and the difference between the previously determined back-estimated concentration of the test chromosome and the fetal concentration is determined
- the second feature Determine the second feature. Therefore, the obtained first feature and second feature can be regarded as features that can be affected by aneuploidy, and therefore, can be effectively applied to subsequent analysis.
- those skilled in the art can use a variety of algorithms to characterize the aforementioned differences, for example, by calculating the difference of the values, the ratio of the values, and so on.
- the counter-estimated concentration of the second comparison chromosome is preferably the average counter-estimated concentration of multiple autosomes. As a result, the efficiency and accuracy of the analysis can be further improved.
- the first feature is determined by the following formula:
- X1 represents the first feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fr represents the average value of the inverse estimated concentration of the second comparison chromosome.
- the second characteristic is determined by the following formula:
- X2 represents the second feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fa represents the fetal concentration
- the first feature and the second feature thus obtained can reflect the differences adopted by each, on the other hand, the obtained values are all on the same order of magnitude, avoiding excessive influence of a single parameter Analyze the result of the situation. If the selection of features is not appropriate, subsequent analysis results may be biased.
- the distance between samples should be calculated according to the characteristics of the samples (for example, the feature of sample x 1 The characteristics of sample x 2 are Then the distance between samples x 1 and x 2 is ), if the feature value difference between the two samples is very large, for example, the distance is
- the obtained first feature and the second feature are standardized, so that the absolute values of the first feature and the second feature are independently at Between 0 and 1.
- the means for standardizing the first feature and the second feature is not particularly limited. Specifically, the following methods can be used to deal with a batch of data of the same dimension (both the first feature or the second feature: ), processed according to the following formula
- min and max are the minimum and maximum values of this batch of values
- oldvale represents the value before processing
- newvalue represents the value after normalization processing
- S400 Determines the aneuploidy based on the first feature and the second feature
- the values of the first feature and the second feature are both affected by aneuploidy. Therefore, after obtaining the first feature and the second feature, use the corresponding data of the control sample to determine that the fetus is specific to the chromosome to be tested. Whether there is aneuploidy.
- the control sample includes a positive sample and a negative sample, the positive sample has aneuploidy for the chromosome to be tested, and the negative sample does not have aneuploidy for the chromosome to be tested.
- the determination of whether the test chromosome has aneuploidy can be realized.
- the inventor found in the research process that the number of positive samples and negative samples satisfying a certain ratio can further improve the accuracy of the analysis.
- the ratio of the number of positive samples and negative samples is not less than 1:4.
- the ratio of the number of positive samples to the number of negative samples does not exceed 4:1.
- the ratio of the numbers of the positive samples and the negative samples is 1:0.1-5.
- the ratio of the numbers of the positive samples and the negative samples is 1:0.25-4.
- neither the positive sample nor the negative sample has aneuploidy for chromosomes other than the chromosome to be tested.
- the classification reference ability of the control sample can be further improved.
- the method of using the first feature and the second feature to classify is not particularly limited, and a variety of machine learning methods, such as neural networks, SVM methods, etc., can be used.
- machine learning methods such as neural networks, SVM methods, etc.
- the first feature and the second feature may be used to determine the two-dimensional feature vector of the pregnant woman sample and the control sample. Based on the distance between the samples determined by the two-dimensional feature vector, the pregnant woman sample is placed in the positive control.
- the sample and the negative control sample are classified to determine whether the fetus has aneuploidy for the chromosome to be tested.
- the distance that can be used includes, but is not limited to, Euclidean distance, Manhattan distance, or Chebyshev distance.
- KNN K-nearest neighbor
- the classification process includes the following steps:
- S450 Based on the majority decision-making method, determine the result of classifying the sample of pregnant women.
- the predetermined number is not more than 20. According to an embodiment of the present invention, the predetermined number is 3-10.
- the K value can be an odd number to avoid situations where a decision cannot be made.
- the final K value selected for different chromosomes to be tested may be different. For example, according to an embodiment of the present invention, the final selected k for T13 and T18 detection is 7, and the final selection for T21 detection choose k as 9.
- the distance between the sample to be tested and a predetermined control sample may be weighted in advance. As a result, the accuracy of the inspection can be further improved.
- weighting coefficients of these weighting processes or the K value of the KNN model can be obtained through machine learning and using known samples as the training set for training.
- the output of the model the category y to which the sample x belongs
- the method can effectively determine whether the fetus has aneuploidy for the chromosome to be tested.
- the method replaces the current number based on the number of sequencing sequences.
- the threshold setting strategy eliminates the gray area of detection, and at the same time can shorten the sample detection cycle, improve customer experience, and can significantly reduce sequencing and detection costs.
- an embodiment of the present application also provides a corresponding device for implementing the foregoing method.
- the present invention provides a device for determining whether a fetus has chromosomal aneuploidy.
- the device including determining whether a fetus has chromosomal aneuploidy includes:
- the data acquisition module 100 is used to acquire nucleic acid sequencing data from a sample of pregnant women.
- the pregnant woman sample contains free fetal nucleic acid, and the nucleic acid sequencing data is composed of multiple sequencing reads;
- the fetal concentration-inverse concentration determination module 200 is used to determine the fetal concentration of the pregnant woman sample and the inverse estimated concentration of the predetermined chromosome based on the nucleic acid sequencing data.
- the inverse estimated concentration is based on the number of sequencing reads of the predetermined chromosome and the sequencing read of the first comparison chromosome If the difference in the number of segments is determined, the predetermined chromosome includes the chromosome to be tested and the second comparison chromosome, and the first comparison chromosome includes at least one autosome that is different from the predetermined chromosome;
- the feature determination module 300 is configured to determine the first feature based on the difference between the back-estimated concentration of the chromosome to be tested and the back-estimated concentration of the second comparison chromosome, and determine the second feature based on the difference between the back-estimated concentration of the chromosome to be tested and the fetal concentration;
- the aneuploidy determination module 400 is used to determine whether the pregnant woman’s fetus has aneuploidy for the chromosome to be tested based on the first feature and the second feature and using the corresponding data of the control sample.
- the photo sample includes a positive sample and a negative sample. A sample, the positive sample has aneuploidy for the chromosome to be tested, and the negative sample does not have aneuploidy for the chromosome to be tested.
- the device for determining whether a fetus has chromosome aneuploidy can effectively implement the method for determining whether a fetus has chromosome aneuploidy, so as to effectively determine whether the fetus is targeted for the chromosome to be tested. Whether there is aneuploidy.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers Experience degree, and can significantly reduce sequencing and detection costs.
- the fetal concentration-inverse concentration determination module 200 includes:
- the comparison unit 210 is configured to compare the nucleic acid sequencing data from a pregnant woman sample with a reference sequence, so as to determine the number of sequencing reads that fall into a predetermined window;
- the fetal concentration calculation unit 220 is configured to determine the fetal concentration of the pregnant woman sample based on the number of sequencing reads that fall into the predetermined window.
- the fetal concentration-inverse estimated concentration determination module 200 further includes:
- the counter-estimation concentration calculation unit 230 is configured to determine the counter-estimation concentration according to the following formula:
- j represents the number of the chromosome for which the inverse estimated concentration needs to be determined
- Fj represents the inverse estimated concentration of chromosome j
- Rr represents the average number of sequencing reads of the multiple autosomes
- Rj represents the number of reads sequenced on chromosome j.
- the fetal concentration-inverse concentration determination module 200 includes:
- the second comparison chromosome determination unit 240 is configured to select the target-ranked autosomes as the second comparison chromosome according to the inverse estimated concentration of the plurality of autosomes in a priority order from small to large.
- the feature determination module 300 includes:
- the first feature determining unit 310 is configured to determine the first feature through the following formula:
- X1 represents the first feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fr represents the average value of the inverse estimated concentration of the second comparison chromosome.
- the feature determining module 300 further includes:
- the second feature determining unit 320 is configured to determine the second feature using the following formula:
- X2 represents the second feature
- i the number of the chromosome to be tested
- Fi represents the inverse estimated concentration of the chromosome to be tested
- Fa represents the fetal concentration
- the feature determining module 300 further includes:
- the standardization processing unit 330 is configured to perform standardization processing on the first feature and the second feature, so that the absolute values of the first feature and the second feature are independently between 0 and 1 respectively.
- the aneuploidy determination module 400 is configured to use the first feature and the second feature to determine the two-dimensional feature vector of the pregnant woman sample and the control sample, based on the two-dimensional The distance between samples determined by the feature vector classifies the pregnant woman sample between the positive control sample and the negative control sample, so as to determine whether the fetus has aneuploidy with respect to the chromosome to be tested.
- the distance is Euclidean distance, Manhattan distance or Chebyshev distance.
- the aneuploidy determination module is configured to use a k-nearest neighbor model to determine the classification result of the pregnant woman sample.
- the K value adopted by the k-nearest neighbor model does not exceed 20.
- the K value adopted by the k-nearest neighbor model is 3-10.
- the distance between the samples is weighted.
- the present invention provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the aforementioned determination of whether the fetus has chromosomal aneuploidy is realized.
- the steps of the sexual method Therefore, the method for determining whether the fetus has chromosome aneuploidy described above can be effectively implemented, so that it can be effectively determined whether the fetus has aneuploidy with respect to the chromosome to be tested.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers. Experience degree, and can significantly reduce sequencing and detection costs.
- the present invention provides an electronic device, which includes: the aforementioned computer-readable storage medium; and one or more processors configured to execute program. Therefore, the method for determining whether the fetus has chromosome aneuploidy described above can be effectively implemented, so that it can be effectively determined whether the fetus has aneuploidy with respect to the chromosome to be tested.
- the method replaces the current threshold setting strategy based on the number of sequencing sequences, eliminates the detection gray area, and can also shorten the sample detection cycle and improve customers Experience degree, and can significantly reduce sequencing and detection costs.
- the present invention proposes a method for constructing a machine learning classification model.
- the method includes:
- the pregnant women samples contain fetal free nucleic acids.
- the nucleic acid sequencing data consists of multiple sequencing reads.
- the pregnant women samples include at least one positive sample and at least one negative sample.
- the positive sample has an aneuploidy for the chromosome to be tested.
- Sex, the negative sample does not have aneuploidy for the chromosome to be tested;
- the predetermined chromosome includes the chromosome to be tested.
- the first comparison chromosome includes at least one autosome different from the predetermined chromosome; and the first feature is determined based on the difference between the back-estimated concentration of the chromosome to be tested and the back-estimated concentration of the second comparison chromosome, based on the chromosome to be tested The difference between the inverse estimated concentration and the fetal concentration determines the second feature,
- a machine learning classification model can be effectively constructed, so that the classification model can be further used to identify and classify unknown samples to determine whether there is chromosome aneuploidy for a specific chromosome Sex.
- the machine learning classification model is a KNN model.
- the KNN model adopts Euclidean distance.
- the present invention provides a device for constructing a machine learning classification model.
- the device includes:
- the feature acquisition module 800 is used to perform separately for each of multiple pregnant women samples: acquire nucleic acid sequencing data from the pregnant women samples, the pregnant women samples contain free fetal nucleic acids, the nucleic acid sequencing data are composed of multiple sequencing reads, and the pregnant women samples include at least one A positive sample and at least one negative sample, the positive sample has aneuploidy for the chromosome to be tested, and the negative sample does not have aneuploidy for the chromosome to be tested; based on nucleic acid sequencing data to determine the fetal concentration of the pregnant sample and the inverse estimated concentration of the chromosome The back-estimation concentration is determined based on the difference between the number of sequencing reads of the predetermined chromosome and the number of sequencing reads of the first comparison chromosome.
- the predetermined chromosome includes the test chromosome and the second comparison chromosome
- the first comparison chromosome includes at least one different from the predetermined chromosome. And determine the first feature based on the difference between the back estimated concentration of the test chromosome and the back estimated concentration of the second comparison chromosome, and determine the second feature based on the difference between the back estimated concentration of the test chromosome and the fetal concentration;
- the training module 900 is configured to use multiple pregnant women samples as samples to perform machine learning training, so as to construct a machine learning classification model for determining whether the fetus has aneuploidy.
- the device can effectively implement the previous method of constructing a machine learning classification model, thereby effectively constructing a machine learning classification model, so that the classification model can be further used to identify and classify unknown samples to determine the specific Whether there is chromosome aneuploidy.
- the machine learning classification model is a KNN model.
- a machine learning classification model can be effectively constructed, so that the classification model can be further used to identify and classify unknown samples to determine whether there is chromosome aneuploidy for a specific chromosome Sex.
- the machine learning classification model is a KNN model.
- the KNN model adopts Euclidean distance.
- the present invention proposes a computer-readable storage medium on which a computer program is stored.
- the program When the program is executed by a processor, it implements the steps for constructing a machine learning classification method in the preceding claims.
- the previous method of constructing machine learning classification models can be effectively implemented, so that machine learning classification models can be effectively constructed, so that the classification model can be further used to identify and classify unknown samples to determine specific Whether there is chromosome aneuploidy.
- the features and advantages described above for the method for determining whether a fetus has chromosomal aneuploidy are applicable to the computer-readable storage medium of the constructed model, and will not be repeated here.
- This example is based on the BGISEQ-500 platform from 2017 to 2018 with 3075 samples with return visit results (including male fetus: 1716 cases, female fetus: 1359 cases, negative samples: 2215 cases, chromosome 21 trisomy (T21): 637 cases, chromosome 18 trisomy (T18): 165 cases, chromosome 13 trisomy (T13): 58 cases) for model training and model prediction.
- the reference genome (GRCh37) is continuously divided into adjacent windows according to a fixed length (60K is used in this method), the windows in the N area are filtered out, and the GC content in the window is counted to obtain the reference window file hg19.gc;
- Filtering and preliminary statistics According to the comparison results, select the only completely aligned sequence, remove the repetitive sequence and the sequence with base mismatches to obtain the effective sequence, and then count the effective sequence number of each window and its GC according to the window in the hg19.gc file content;
- j represents the number of the chromosome, Indicates the number of GC-corrected sequencing reads that can match the reference sequence of chromosome j, Represents the average number of GC-corrected sequencing reads that can be matched with all autosomal reference sequences.
- Sample set division and data preprocessing The sample set is randomly divided into training set, validation set and test set at a ratio of 6:2:2; data preprocessing is performed on the samples of training set, validation set and test set respectively , So that each sample gets a two-dimensional feature vector, and the corresponding label (negative is -1, positive is +1).
- Model training consists of two parts: KNN model training and k value selection. At this time, the Euclidean distance and the majority voting rule are selected.
- KNN model training For classification decision function:
- Figures 10 and 11 show the ROC curves when the KNN model detects T18 and the parameter k is selected as 6, 7, 8, and 9 respectively.
- Figures 12 and 13 show the ROC curves when the KNN model detects T13 and the parameter k is selected as 6, 7, 8, and 9, respectively. According to the results of Figs. 8-13, the final selection k for T13 and T18 is 7, and the final selection k for T21 is 9.
- Model prediction Based on the model trained in the above steps, the test set is predicted, and the prediction results are shown in the following table.
- SVM Small Vector Machine
- the KNN model has 14 false positive samples, while the KNN model has only 3 false positives; in the T18 test, the SVM model has 8 false positives, while the KNN model has only 5 false positives; in the T13 test, The SVM model has 8 false positives, while the KNN model has 6 false positives. Regardless of T21, T18 or T13, the KNN model has a lower false positive rate than the SVM model.
- the inventor analyzes that the main reason for the lower false positive rate of the KNN model than the SVM model is: the model itself, that is, KNN is mainly based on clustering, it is a lot of refined clusters, and SVM is only two simple categories, so the level of detail There is no better KNN.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Analytical Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Pathology (AREA)
- Bioethics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- Biochemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
Abstract
Description
灵敏度 | 特异度 | PPV | ACC | |
T21 | 100% | 99.38% | 97.60% | 99.51% |
T18 | 100% | 99.13% | 86.84% | 99.18% |
T13 | 100% | 99.00% | 62.50% | 99.01% |
灵敏度 | 特异度 | PPV | ACC | |
T21 | 100% | 97.13% | 89.71% | 97.71% |
T18 | 100% | 98.61% | 80.49% | 98.69% |
T13 | 100% | 98.67% | 55.56% | 98.69% |
Claims (46)
- 一种确定胎儿是否存在染色体非整倍性的方法,其特征在于,包括:(1)获取来自孕妇样本的核酸测序数据,所述孕妇样本含有胎儿游离核酸,所述核酸测序数据由多个测序读段构成;(2)基于所述核酸测序数据确定所述孕妇样本的胎儿浓度以及预定染色体的反估浓度,所述反估浓度是基于所述预定染色体的测序读段数目和第一比较染色体的测序读段数目的差异确定的,所述预定染色体包括待测染色体和第二比较染色体,所述第一比较染色体包括至少一个不同于所述预定染色体的常染色体;(3)基于所述待测染色体的反估浓度与所述第二比较染色体的反估浓度的差异确定第一特征,基于所述待测染色体的反估浓度与所述胎儿浓度的差异确定第二特征;和(4)基于所述第一特征和第二特征,利用对照样本的相应数据,确定所述胎儿针对所述待测染色体是否存在非整倍性,其中,所述对照样本包括阳性样本和阴性样本,所述阳性样本针对所述待测染色体具有非整倍性,所述阴性样本针对所述待测染色体不具有非整倍性。
- 根据权利要求1所述的方法,其特征在于,所述孕妇样本包括孕妇外周血。
- 根据权利要求1所述的方法,其特征在于,所述核酸测序样本是通过双末端测序、单末端测序或者单分子测序获得的。
- 根据权利要求1所述的方法,其特征在于,所述胎儿浓度是通过下列步骤确定的:(a)将来自所述孕妇样本的所述核酸测序数据与参照序列比对,以便确定落入预定窗口的测序读段的数目;和(b)基于所述落入预定窗口的测序读段的数目,确定所述孕妇样本的胎儿浓度。
- 根据权利要求1所述的方法,其特征在于,在步骤(2)中,所述第一比较染色体的测序读段数目为多条常染色体的平均测序读段数目,所述多条常染色体包括至少一个已知不具有非整倍性的常染色体。
- 根据权利要求5所述的方法,其特征在于,在步骤(2)中,所述第一比较染色体的测序读段数目为至少15条常染色体的平均测序读段数目,可选的,第一比较染色体的测序读段数目为至少20条常染色体的平均测序读段数目,可选的,第一比较染色体的测序读段数目为全部常染色体的平均测序读段数目。
- 根据权利要求5所述的方法,其特征在于,反估浓度是按照下列公式确定的:Fj=2*|Rj-Rr|/(Rr)其中j表示需要确定所述反估浓度的染色体的编号,Fj表示第j号染色体的反估浓度,Rr表示所述多条常染色体的平均测序读段数目,Rj表示第j号染色体的测序读段数目。
- 根据权利要求1所述的方法,其特征在于,在步骤(2)中,所述第二比较染色体包含多个不具有非整倍性的常染色体,并且在步骤(3)中,基于所述待测染色体的反估浓度与所述第二比较染色体的反估浓度平均值的差异确定第一特征。
- 根据权利要求8所述的方法,其特征在于,所述第二比较染色体包含至少10条常染色体。
- 根据权利要求8所述的方法,其特征在于,所述第二比较染色体包含15条常染色体。
- 根据权利要求8所述的方法,其特征在于,进一步包括:确定多条常染色体的所述反估浓度;和按照由小至大的优先顺序,选择目标排序的常染色体作为所述第二比较染色体。
- 根据权利要求1所述的方法,其特征在于,所述第一特征是通过下列公式确定的:X1=Fi-Fr其中X1表示第一特征,i表示所述待测染色体的编号,Fi表示所述待测染色体的反估浓度,Fr表示所述第二比较染色体的反估浓度平均值。
- 根据权利要求1~13任一项所述的方法,其特征在于,在进行步骤(4)之前,对所述第一特征和所述第二特征进行标准化处理,以便所述第一特征和所述第二特征的绝对值分别独立地处于0~1之间。
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,所述阳性样本和所述阴性样本的数目比例不低于1:4。
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,所述阳性样本和所述阴性样本的数目比例不超过4:1。
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,所述阳性样本和所述阴性样本的数目比例为1:0.1~5。
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,所述阳性样本和所述 阴性样本的数目比例为1:0.25~4。
- 根据权利要求1所述的方法,其特征在于,所述阳性样本和所述阴性样本针对所述待测染色体以外的其他染色体均不存在非整倍性。
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,采用所述第一特征和所述第二特征确定所述孕妇样本和所述对照样本的二维特征向量,基于由所述二维特征向量确定的样本间距离,将所述孕妇样本在所述阳性对照样本和所述阴性对照样本之间进行归类,以便确定所述胎儿针对所述待测染色体是否存在非整倍性。
- 根据权利要求20所述的方法,其特征在于,所述距离为欧几里得距离、曼哈顿距离或切比雪夫距离。
- 根据权利要求20所述的方法,其特征在于,在步骤(4)中,进一步包括:(4-1)分别计算所述孕妇样本与所述对照样本之间的距离;(4-2)将所得到的所述距离进行排序,所述排序基于由小到大的顺序;(4-3)基于所述排序,从小到大选择预定数量的对照样本;(4-4)分别确定所述预定数量的所述对照样本中阳性样本和阴性样本的数目;(4-5)基于多数决策法,确定将所述孕妇样本的归类结果。
- 根据权利要求22所述的方法,其特征在于,所述预定数量为不超过20。
- 根据权利要求22所述的方法,其特征在于,所述预定数量为3~10。
- 根据权利要求22所述的方法,其特征在于,在步骤(4-2)中,在进行所述排序之前,预先对所述待测样本与预定所述对照样本之间的距离进行加权处理。
- 一种确定胎儿是否存在染色体非整倍性的装置,其特征在于,包括:数据获取模块,用于获取来自孕妇样本的核酸测序数据,所述孕妇样本含有胎儿游离核酸,所述核酸测序数据由多个测序读段构成;胎儿浓度-反估浓度确定模块,用于基于所述核酸测序数据确定所述孕妇样本的胎儿浓度以及预定染色体的反估浓度,所述反估浓度是基于所述预定染色体的测序读段数目和第一比较染色体的测序读段数目的差异确定的,所述预定染色体包括待测染色体和第二比较染色体,所述第一比较染色体包括至少一个不同于所述预定染色体的常染色体;特征确定模块,基于所述待测染色体的反估浓度与所述第二比较染色体的反估浓度的差异确定第一特征,用于基于所述待测染色体的反估浓度与所述胎儿浓度的差异确定第二特征;和非整倍性确定模块,用于基于所述第一特征和第二特征,利用对照样本的相应数据,确定所述孕妇的胎儿针对所述待测染色体是否存在非整倍性,其中,所述对照样本包括阳性样本和阴性样本,所述阳性样本针对所述待测染色体具有非整倍性,所述阴性样本针对所述待测染色体不具有非整倍性。
- 根据权利要求26所述的装置,其特征在于,所述胎儿浓度-反估浓度确定模块包括:比对单元,用于将来自所述孕妇样本的所述核酸测序数据与参照序列比对,以便确定落入预定窗口的测序读段的数目;和胎儿浓度计算单元,用于基于所述落入预定窗口的测序读段的数目,确定所述孕妇样本的胎儿浓度。
- 根据权利要求26所述的装置,其特征在于,所述胎儿浓度-反估浓度确定模块包括:反估浓度计算单元,用于按照下列公式确定所述反估浓度:Fj=2*|Rj-Rr|/(Rr)其中j表示需要确定所述反估浓度的染色体的编号,Fj表示第j号染色体的反估浓度,Rr表示多条常染色体的平均测序读段数目,和Rj表示第j号染色体的测序读段数目。
- 根据权利要求26所述的装置,其特征在于,所述胎儿浓度-反估浓度确定模块包括:第二比较染色体确定单元用于将多条常染色体的所述反估浓度按照由小至大的优先顺序,选择目标排序的常染色体作为所述第二比较染色体。
- 据权利要求26所述的装置,其特征在于,所述特征确定模块包括:第一特征确定单元,用于通过下列公式确定所述第一特征:X1=Fi-Fr其中X1表示第一特征,i表示所述待测染色体的编号,Fi表示所述待测染色体的所述反估浓度,Fr表示所述第二比较染色体的反估浓度平均值。
- 据权利要求26所述的装置,其特征在于,所述特征确定模块包括:标准化处理单元,用于对所述第一特征和所述第二特征进行标准化处理,以便所述第一特征和所述第二特征的绝对值分别独立地处于0~1之间。
- 据权利要求26所述的装置,其特征在于,所述非整倍性确定模块用于采用所述第一特征和所述第二特征确定所述孕妇样本和所述对照样本的二维特征向量,基于由所述二维特征向量确定的样本间距离,将所述孕妇样本在所述阳性对照样本和所述阴性对照样本 之间进行归类,以便确定所述胎儿针对所述待测染色体是否存在非整倍性。
- 据权利要求33所述的装置,其特征在于,所述距离为欧几里得距离、曼哈顿距离或切比雪夫距离。
- 据权利要求26所述的装置,其特征在于,所述非整倍性确定模块用于采用k-近邻模型确定将所述孕妇样本的归类结果。
- 根据权利要求35所述的装置,其特征在于,所述k-近邻模型采用的K值为不超过20。
- 根据权利要求35所述的装置,其特征在于,所述k-近邻模型采用的K值为3~10。
- 根据权利要求35所述的装置,其特征在于,所述k-近邻模型中,对所述样本间距离进行加权处理。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-25中任一项所述方法的步骤。
- 一种电子设备,其特征在于,包括:权利要求39中所述的计算机可读存储介质;以及一个或者多个处理器,用于执行所述计算机可读存储介质中的程序。
- 一种构建机器学习分类模型的方法,其特征在于,包括:(a)针对多个孕妇样本的每一个分别进行:获取来自所述孕妇样本的核酸测序数据,所述孕妇样本含有胎儿游离核酸,所述核酸测序数据由多个测序读段构成,所述孕妇样本包括至少一个阳性样本和至少一个阴性样本,所述阳性样本针对待测染色体具有非整倍性,所述阴性样本针对所述待测染色体不具有非整倍性;基于所述核酸测序数据确定所述孕妇样本的胎儿浓度以及预定染色体的反估浓度,所述反估浓度是基于所述预定染色体的测序读段数目和第一比较染色体的测序读段数目的差异确定的,所述预定染色体包括待测染色体和第二比较染色体,所述第一比较染色体包括至少一个不同于所述预定染色体的常染色体;和基于所述待测染色体的反估浓度与所述第二比较染色体的反估浓度的差异确定第一特征,基于所述待测染色体的反估浓度与所述胎儿浓度的差异确定第二特征,(b)将所述多个孕妇样本作为样本,利用所述样本的第一特征和第二特征,进行机器学习训练,以便构建用于确定胎儿是否具有非整倍性的器学习分类模型。
- 根据权利要求41所述的方法,其特征在于,所述机器学习分类模型为KNN模型。
- 根据权利要求42所述的方法,其特征在于,所述KNN模型采用欧几里得距离。
- 一种构建机器学习分类模型的装置,其特征在于,包括:特征获取模块,用于针对多个孕妇样本的每一个分别进行:获取来自所述孕妇样本的核酸测序数据,所述孕妇样本含有胎儿游离核酸,所述核酸测序数据由多个测序读段构成,所述孕妇样本包括至少一个阳性样本和至少一个阴性样 本,所述阳性样本针对待测染色体具有非整倍性,所述阴性样本针对所述待测染色体不具有非整倍性;基于所述核酸测序数据确定所述孕妇样本的胎儿浓度以及预定染色体的反估浓度,所述反估浓度是基于所述预定染色体的测序读段数目和第一比较染色体的测序读段数目的差异确定的,所述预定染色体包括待测染色体和第二比较染色体,所述第一比较染色体包括至少一个不同于所述预定染色体的常染色体;和基于所述待测染色体的反估浓度与所述胎儿浓度的差异确定第二特征,基于所述待测染色体的反估浓度与所述第二比较染色体的反估浓度的差异确定第一特征,训练模块,用于将所述多个孕妇样本作为样本,进行机器学习训练,以便构建用于确定胎儿是否具有非整倍性的器学习分类模型。
- 根据权利要求44所述的装置,其特征在于,所述机器学习分类模型为KNN模型。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求41~43任一项所述方法的步骤。
Priority Applications (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19958118.2A EP4086356A4 (en) | 2019-12-31 | 2019-12-31 | METHOD FOR DETERMINING CHROMOSOME ANEUPLOIDY AND CONSTRUCTION CLASSIFICATION MODEL AND APPARATUS |
US17/612,515 US20220336047A1 (en) | 2019-12-31 | 2019-12-31 | Method and device for determining chromosomal aneuploidy and constructing classification model. |
CA3141362A CA3141362A1 (en) | 2019-12-31 | 2019-12-31 | Method and device for determining chromosomal aneuploidy and constructing classification model |
JP2021569370A JP7467504B2 (ja) | 2019-12-31 | 2019-12-31 | 染色体異数性を判定するためおよび分類モデルを構築するための方法およびデバイス |
CN201980004859.0A CN111226281B (zh) | 2019-12-31 | 2019-12-31 | 确定染色体非整倍性、构建分类模型的方法和装置 |
PCT/CN2019/130625 WO2021134513A1 (zh) | 2019-12-31 | 2019-12-31 | 确定染色体非整倍性、构建分类模型的方法和装置 |
KR1020227003512A KR20220122596A (ko) | 2019-12-31 | 2019-12-31 | 염색체 이수성 판별 및 분류 모델 구성 방법 및 장치 |
AU2019480813A AU2019480813B2 (en) | 2019-12-31 | 2019-12-31 | Methods for determining chromosome aneuploidy and constructing classification model, and device |
IL277746A IL277746A (en) | 2019-12-31 | 2020-10-01 | Method and device for determining chromosome aneuploidy and building a classification model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/130625 WO2021134513A1 (zh) | 2019-12-31 | 2019-12-31 | 确定染色体非整倍性、构建分类模型的方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021134513A1 true WO2021134513A1 (zh) | 2021-07-08 |
Family
ID=70827394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130625 WO2021134513A1 (zh) | 2019-12-31 | 2019-12-31 | 确定染色体非整倍性、构建分类模型的方法和装置 |
Country Status (9)
Country | Link |
---|---|
US (1) | US20220336047A1 (zh) |
EP (1) | EP4086356A4 (zh) |
JP (1) | JP7467504B2 (zh) |
KR (1) | KR20220122596A (zh) |
CN (1) | CN111226281B (zh) |
AU (1) | AU2019480813B2 (zh) |
CA (1) | CA3141362A1 (zh) |
IL (1) | IL277746A (zh) |
WO (1) | WO2021134513A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112037846A (zh) * | 2020-07-14 | 2020-12-04 | 广州市达瑞生物技术股份有限公司 | 一种cffDNA非整倍体检测方法、系统、储存介质以及检测设备 |
CN116648752A (zh) * | 2020-11-27 | 2023-08-25 | 深圳华大生命科学研究院 | 一种胎儿染色体异常的检测方法与系统 |
CN116312813B (zh) * | 2023-05-22 | 2023-08-22 | 上海科技大学 | 鉴定干细胞群代次的方法及标志物 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011130880A1 (zh) * | 2010-04-23 | 2011-10-27 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
WO2013040773A1 (zh) * | 2011-09-21 | 2013-03-28 | 深圳华大基因科技有限公司 | 确定单细胞染色体非整倍性的方法和系统 |
WO2014153755A1 (zh) * | 2013-03-28 | 2014-10-02 | 深圳华大基因研究院 | 确定胎儿染色体非整倍性的方法、系统和计算机可读介质 |
CN104232777A (zh) * | 2014-09-19 | 2014-12-24 | 天津华大基因科技有限公司 | 同时确定胎儿核酸含量和染色体非整倍性的方法及装置 |
WO2015006932A1 (zh) * | 2013-07-17 | 2015-01-22 | 深圳华大基因科技有限公司 | 一种染色体非整倍性检测方法及装置 |
WO2015089726A1 (zh) * | 2013-12-17 | 2015-06-25 | 深圳华大基因科技有限公司 | 一种染色体非整倍性检测方法及装置 |
CN104789686A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN104789466A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN106520940A (zh) * | 2016-11-04 | 2017-03-22 | 深圳华大基因研究院 | 一种染色体非整倍体和拷贝数变异检测方法及其应用 |
WO2018132400A1 (en) * | 2017-01-11 | 2018-07-19 | Quest Diagnostics Investments Llc | Method for non-invasive prenatal screening for aneuploidy |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DK2768978T3 (en) * | 2011-10-18 | 2017-12-18 | Multiplicom Nv | Fetal CHROMOSOMAL ANEUPLOIDID DIAGNOSIS |
EP2834376B1 (en) * | 2012-04-06 | 2017-03-15 | The Chinese University Of Hong Kong | Noninvasive prenatal diagnosis of fetal trisomy by allelic ratio analysis using targeted massively parallel sequencing |
US20160026759A1 (en) * | 2014-07-22 | 2016-01-28 | Yourgene Bioscience | Detecting Chromosomal Aneuploidy |
US20180327844A1 (en) * | 2015-11-16 | 2018-11-15 | Sequenom, Inc. | Methods and processes for non-invasive assessment of genetic variations |
WO2017093561A1 (en) * | 2015-12-04 | 2017-06-08 | Genesupport Sa | Method for non-invasive prenatal testing |
CN105844116B (zh) * | 2016-03-18 | 2018-02-27 | 广州市锐博生物科技有限公司 | 测序数据的处理方法和处理装置 |
HUE055063T2 (hu) * | 2017-07-26 | 2021-10-28 | Trisomytest S R O | Eljárás magzati kromoszóma aneuploidia nem-invazív azonosítására születés elõtt anyai vérbõl Bayes-háló alapján |
SK862017A3 (sk) * | 2017-08-24 | 2020-05-04 | Grendar Marian Doc Mgr Phd | Spôsob použitia fetálnej frakcie a chromozómovej reprezentácie pri určovaní aneuploidného stavu v neinvazívnom prenatálnom testovaní |
CN108363903B (zh) * | 2018-01-23 | 2022-03-04 | 和卓生物科技(上海)有限公司 | 一种适用于单细胞的染色体非整倍性检测系统及应用 |
CN108611408A (zh) * | 2018-02-23 | 2018-10-02 | 深圳市瀚海基因生物科技有限公司 | 检测胎儿染色体非整倍性的方法和装置 |
-
2019
- 2019-12-31 CN CN201980004859.0A patent/CN111226281B/zh active Active
- 2019-12-31 KR KR1020227003512A patent/KR20220122596A/ko unknown
- 2019-12-31 AU AU2019480813A patent/AU2019480813B2/en active Active
- 2019-12-31 JP JP2021569370A patent/JP7467504B2/ja active Active
- 2019-12-31 CA CA3141362A patent/CA3141362A1/en active Pending
- 2019-12-31 EP EP19958118.2A patent/EP4086356A4/en active Pending
- 2019-12-31 US US17/612,515 patent/US20220336047A1/en active Pending
- 2019-12-31 WO PCT/CN2019/130625 patent/WO2021134513A1/zh unknown
-
2020
- 2020-10-01 IL IL277746A patent/IL277746A/en unknown
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011130880A1 (zh) * | 2010-04-23 | 2011-10-27 | 深圳华大基因科技有限公司 | 胎儿染色体非整倍性的检测方法 |
WO2013040773A1 (zh) * | 2011-09-21 | 2013-03-28 | 深圳华大基因科技有限公司 | 确定单细胞染色体非整倍性的方法和系统 |
WO2014153755A1 (zh) * | 2013-03-28 | 2014-10-02 | 深圳华大基因研究院 | 确定胎儿染色体非整倍性的方法、系统和计算机可读介质 |
WO2015006932A1 (zh) * | 2013-07-17 | 2015-01-22 | 深圳华大基因科技有限公司 | 一种染色体非整倍性检测方法及装置 |
WO2015089726A1 (zh) * | 2013-12-17 | 2015-06-25 | 深圳华大基因科技有限公司 | 一种染色体非整倍性检测方法及装置 |
CN104232777A (zh) * | 2014-09-19 | 2014-12-24 | 天津华大基因科技有限公司 | 同时确定胎儿核酸含量和染色体非整倍性的方法及装置 |
CN104789686A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN104789466A (zh) * | 2015-05-06 | 2015-07-22 | 安诺优达基因科技(北京)有限公司 | 检测染色体非整倍性的试剂盒和装置 |
CN106520940A (zh) * | 2016-11-04 | 2017-03-22 | 深圳华大基因研究院 | 一种染色体非整倍体和拷贝数变异检测方法及其应用 |
WO2018132400A1 (en) * | 2017-01-11 | 2018-07-19 | Quest Diagnostics Investments Llc | Method for non-invasive prenatal screening for aneuploidy |
Non-Patent Citations (3)
Title |
---|
CHIU ROSSA W K: "Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma.", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES, vol. 105, no. 51, 23 December 2008 (2008-12-23), pages 20458 - 20463, XP002620454, ISSN: 0027-8424, DOI: 10.1073/pnas.0810641105 * |
LAU TZE KIN; CHEN FANG; PAN XIAOYU; POOH RITSUKO K; JIANG FUMAN; LI YIHAN; JIANG HUI; LI XUCHAO; CHEN SHENGPEI; ZHANG XIUQING: "Noninvasive prenatal diagnosis of common fetal chromosomal aneuploidies by maternal plasma DNA sequencing.", JOURNAL OF MATERNAL-FETAL AND NEONATAL MEDICINE., vol. 25, no. 8, 31 December 2012 (2012-12-31), pages 1370 - 1374, XP008164835, ISSN: 1057-0802, DOI: 10.3109/14767058.2011.635730 * |
See also references of EP4086356A4 |
Also Published As
Publication number | Publication date |
---|---|
JP2023517155A (ja) | 2023-04-24 |
CA3141362A1 (en) | 2021-07-08 |
AU2019480813A1 (en) | 2021-12-16 |
AU2019480813A8 (en) | 2022-05-12 |
EP4086356A4 (en) | 2023-09-27 |
CN111226281B (zh) | 2023-03-21 |
AU2019480813B2 (en) | 2024-07-18 |
JP7467504B2 (ja) | 2024-04-15 |
CN111226281A (zh) | 2020-06-02 |
EP4086356A1 (en) | 2022-11-09 |
IL277746A (en) | 2021-12-01 |
KR20220122596A (ko) | 2022-09-02 |
US20220336047A1 (en) | 2022-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230222311A1 (en) | Generating machine learning models using genetic data | |
WO2021062904A1 (zh) | 基于病理图像的tmb分类方法、系统及tmb分析装置 | |
CN112020565A (zh) | 用于确保基于测序的测定的有效性的质量控制模板 | |
WO2021134513A1 (zh) | 确定染色体非整倍性、构建分类模型的方法和装置 | |
CN108778287B (zh) | 用于早产结果的早期风险评估的方法和系统 | |
US20210166813A1 (en) | Systems and methods for evaluating longitudinal biological feature data | |
US20210102262A1 (en) | Systems and methods for diagnosing a disease condition using on-target and off-target sequencing data | |
CN110191964B (zh) | 确定生物样本中预定来源的游离核酸比例的方法及装置 | |
CN104951671A (zh) | 基于单样本外周血检测胎儿染色体非整倍性的装置 | |
CN117106870B (zh) | 胎儿浓度的确定方法及装置 | |
CN115223654A (zh) | 检测胎儿染色体非整倍体异常的方法、装置及存储介质 | |
US11535896B2 (en) | Method for analysing cell-free nucleic acids | |
US20200105374A1 (en) | Mixture model for targeted sequencing | |
US20240203521A1 (en) | Evaluation and improvement of genetic screening tests using receiver operating characteristic curves | |
US20230005569A1 (en) | Chromosomal and Sub-Chromosomal Copy Number Variation Detection | |
US12020779B1 (en) | Noninvasive prenatal screening using dynamic iterative depth optimization with depth-scaled variance determination | |
Lu | An embedded method for gene identification in heterogenous data involving unwanted heterogeneity | |
WO2024107868A1 (en) | Systems and methods for identifying clonal expansion of abnormal lymphocytes | |
CN114512232A (zh) | 基于级联机器学习模型的爱德华氏综合征筛查系统 | |
CN117393054A (zh) | 鉴定核酸样本拷贝数变异真假阳性和细胞分裂来源的方法及装置 | |
CN110428873A (zh) | 一种染色体倍数异常检测方法及检测系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19958118 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021569370 Country of ref document: JP Kind code of ref document: A Ref document number: 3141362 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 2019480813 Country of ref document: AU Date of ref document: 20191231 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019958118 Country of ref document: EP Effective date: 20220801 |