WO2024011929A1 - 检测胎儿染色体非整倍体异常的方法、装置及存储介质 - Google Patents

检测胎儿染色体非整倍体异常的方法、装置及存储介质 Download PDF

Info

Publication number
WO2024011929A1
WO2024011929A1 PCT/CN2023/080510 CN2023080510W WO2024011929A1 WO 2024011929 A1 WO2024011929 A1 WO 2024011929A1 CN 2023080510 W CN2023080510 W CN 2023080510W WO 2024011929 A1 WO2024011929 A1 WO 2024011929A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
fetal
chromosome
sample
new
Prior art date
Application number
PCT/CN2023/080510
Other languages
English (en)
French (fr)
Inventor
杨杰淳
彭继光
彭智宇
孙隽
向嘉乐
刘晶娟
李婧柔
Original Assignee
深圳华大基因股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因股份有限公司 filed Critical 深圳华大基因股份有限公司
Publication of WO2024011929A1 publication Critical patent/WO2024011929A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present application relates to the technical field of detecting fetal chromosomal aneuploidy abnormalities, and in particular to a method, device and storage medium for detecting fetal chromosomal aneuploidy abnormalities.
  • Fetal chromosomal aneuploidy abnormality that is, the fetal chromosomes are aneuploid.
  • a normal fetus has 23 pairs of chromosomes (46), that is, the chromosomes are euploid. If there is a chromosome deletion or an increase in chromosomes, aneuploidy occurs. , it indicates that there is an abnormality in the fetal chromosomes, that is, fetal chromosomal aneuploidy.
  • Down syndrome (Trisomy 21, Down syndrome) is a genetic disease caused by trisomy of chromosome 21. Common symptoms include developmental delay, special facial features, and mild to moderate intellectual disability. Currently, there is no effective treatment for Down syndrome. The quality of life of patients can only be improved through daily care and education. In addition to Down syndrome, clinically common fetal chromosomal aneuploidy abnormalities include Edwards syndrome (Trisomy 18, Edwards syndrome) and Patau syndrome (Trisomy 13 syndrome, Patau syndrome). syndrome), etc., can lead to severe developmental abnormalities in children.
  • the molecular biological mechanism of Down syndrome is that chromosome 21 does not separate during the generation of germ cells, resulting in three copies of chromosome 21 in the fertilized egg, which in turn leads to a series of abnormalities in molecular and developmental biological processes. Since there is no effective treatment for chromosomal aneuploidy syndromes such as Down syndrome and no specific behavioral or environmental factors related to their onset have been found, the current main response is to carry out prenatal screening of pregnant women. To avoid the birth of babies with serious genetic diseases such as Down syndrome, that is, to conduct corresponding tests when the mother is pregnant. If the relevant indicators are positive or high-risk, the pregnancy will be terminated to avoid the birth of trisomic babies.
  • serological markers such as AFP, free ⁇ -hCG, uE3, and Inhibin-A. Since serological markers are indirect indicators and cannot directly reflect the fetal chromosomal aneuploidy status, their sensitivity and specificity are poor.
  • high-throughput sequencing technology gradually emerged and became popular.
  • cell-free DNA (cfDNA) in maternal plasma can be accurately detected and quantified, and then the relative content of the target chromosome can be used to screen for genes containing DNA.
  • Chromosomal abnormalities including trisomy 21 (ie NIPT, Non-Invasive Prenatal Testing).
  • the performance of NIPT detection still needs to be improved.
  • the positive predictive value (PPV) needs to be improved.
  • the positive predictive value of T21 is 92.2%
  • the positive predictive value of T18 is 76.6%
  • the positive predictive value of T13 is only 32.8%, showing that the traditional NIPT detection method has more false positives and positive
  • the predictive value needs to be improved;
  • the retest rate is high.
  • the purpose of this application is to provide an improved method, device and storage medium for detecting fetal chromosomal aneuploidy abnormalities.
  • the first aspect of this application discloses a method for detecting fetal chromosomal aneuploidy abnormalities, which includes calculating and obtaining a new chromosomal aneuploidy of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the free DNA in the blood of pregnant women of the sample to be tested.
  • the Z value, marked as Z new is used to determine whether the fetal chromosome of the sample to be tested has aneuploidy abnormalities; the chimerism is the ratio of abnormal fetal cells to all fetal cells.
  • the key to this application's method for detecting fetal chromosomal aneuploidy abnormalities is to calculate three variables: fetal DNA concentration, this application's unique indicator: chimerism, and the traditional Z value to obtain a commonly used and recognized method in the field.
  • the new Z value is Z new .
  • the "traditional Z value” refers to the Z value obtained according to the traditional conventional method
  • the "new Z value” of this application refers to the Z value obtained by this application through three variable calculations.
  • the method of this application uses chimerism as an input variable to calculate the "new Z value", which helps to improve the accuracy of NIPT testing, has good discrimination between true positive and false positive samples, and reduces false positives; new The Z value conforms to the normal distribution, which not only meets the current regulatory and clinical use requirements; it can also greatly reduce the volatility of data distribution, thereby reducing the gray area rate, reducing the retest rate, and improving the stability of the test results.
  • a new Z value of the sample to be tested is calculated based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood in the sample to be tested, including adding the fetal DNA concentration,
  • the Z value and chimerism are input into the fetal chromosomal aneuploidy abnormality detection model to obtain the model output value corresponding to the sample to be tested, and the new Z value of the sample to be tested is obtained by imprinting the model output value; among them, fetal chromosomal aneuploidy
  • the abnormality detection model uses several samples with known fetal chromosome conditions as training samples.
  • the training samples include positive and negative samples of fetal chromosomal aneuploidy abnormalities.
  • the fetal DNA concentration, Z value and chimerism are used as inputs to perform machine learning.
  • the learning model is trained to obtain a model output value that combines the three variables of fetal DNA concentration, Z value and chimerism to characterize the fetal chromosome condition. The model is obtained from this.
  • training the machine learning model to obtain a new Z value is only one implementation method of this application. It is not excluded that other calculation methods can be used to obtain the new Z value of this application based on fetal DNA concentration, Z value, and chimerism.
  • the new Z value of the sample to be tested is obtained by mapping the model output value, including the model output value of the sample to be tested, the positive threshold, the negative threshold, and the median of the model output values of all negative samples. digits, calculate and obtain the new Z value of the sample to be tested; where the positive threshold is the threshold of the model output value corresponding to the positive sample, and the negative threshold is the threshold of the model output value corresponding to the negative sample.
  • model output value is a value output by the fetal chromosomal aneuploidy abnormality detection model to evaluate fetal chromosomal aneuploidy abnormalities. This value cannot be based on the traditional Z value like the traditional Z value.
  • the threshold can only be delineated through the characteristics of the training data; for example, the negative threshold can be delineated so that all true positive samples in the training data will not be judged as negative, ensuring that the model will not produce false negatives; Delineate the positive threshold so that as many true positive samples as possible can be judged as positive, and at the same time as few original false positive samples as possible can be judged as positive, thereby reducing false positives and improving the performance of NIPT testing; the difference between the positive threshold and the negative threshold are gray areas; in order to enable the model output value of the sample to be tested to be directly used to determine the abnormal status of fetal chromosomal aneuploidy, this application further prints the model output value into a new Z value, that is, Z new ; and, The test results show that the new Z value obtained through the printing of this application conforms to the normal distribution, and the center of the distribution is located at 0. Therefore, Z>3 can still be used as the positive judgment standard, and Z ⁇ 1.96
  • the median of the model output values of negative samples is the median of the model output values of all negative samples obtained by re-inputting all negative training samples into the fetal chromosome aneuploidy abnormality detection model. number of digits;
  • the new Z value of the sample to be tested is obtained by mapping the model output value, including the following mapping methods:
  • Z new is the new Z value
  • LD is the model output value of the sample to be tested
  • cut p is the positive threshold
  • cut n is the negative threshold
  • Med is the median of the model output values of all negative samples.
  • the new Z value obtained by the final conversion needs to be between 1.96 and 3, because it is clinically customary to use Z ⁇ [1.96, 3) as the gray area range.
  • this application also ensures that the median of the new Z value of the negative sample is 0, because for the standard normal distribution, the median should be equal to 0; therefore, Med is considered in the formula, that is, in the model output value of the negative sample number of digits. According to the above mapping formula, it can be found that when the model output value is equal to the median of the model output value of the negative sample, the new Z value is 0.
  • aneuploidy abnormality occurs in the fetal chromosome of the sample to be tested based on the new Z value, including determining that the new Z value is greater than 3 as positive, that is, fetal chromosome aneuploidy abnormality; If the new Z value is less than 1.96, it is judged as negative, that is, the fetal chromosomes are normal.
  • the machine learning model is a linear discriminant analysis model (linear discriminant analysis, abbreviated as LDA).
  • fetal abnormal cells are cells containing fetal chromosomal aneuploidy abnormalities.
  • the fetal DNA concentration and Z value in the cell-free DNA of pregnant women are calculated and obtained through high-throughput sequencing data of the cell-free DNA of pregnant women.
  • Mosaick is the mosaicism of the k-th chromosome
  • fra k is the relative fetal concentration of the k-th chromosome
  • FF is the fetal DNA concentration
  • fra k is the relative fetal concentration of the kth chromosome, is the average value of the corrected depth of the kth chromosome, is the average of the corrected depths for all autosomal chromosomes;
  • Mosaic k is 0, indicating that the k-th chromosome of the fetus is normal; Mosaic k is 1, indicating that the k-th chromosome of the fetus is completely trisomic; Mosaic k is between 0 and 1, indicating that the k-th chromosome of the fetus is mosaic. combine.
  • the existence of mosaicism for fetal chromosome k means that the chromosome k in some fetal cells is in a trisomic state, and the chromosome k in another part of the fetal cells is in a non-trisomic state; in principle, under the condition of fixed fetal DNA concentration, If the fetus is a complete trisomy, the trisomy signal in the peripheral blood of the pregnant woman is strong; if the fetus is a mosaic trisomy, the trisomy signal in the peripheral blood of the pregnant woman is weak; and when the degree of chimerism is low, there is generally a data fluctuation band. False positives are usually true positives when the degree of chimerism is high.
  • the average value of the corrected depth of each chromosome and the average value of the corrected depth of all autosomal chromosomes are calculated and obtained through high-throughput sequencing data of cell-free DNA in the blood of pregnant women.
  • the method for detecting fetal chromosomal aneuploidy abnormalities includes the following steps:
  • the data acquisition step includes obtaining high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested;
  • the data processing steps include calculating the fetal DNA concentration and Z value based on the high-throughput sequencing data of free DNA in the blood of pregnant women to be tested;
  • the chimerism calculation step includes calculating the chimerism of each chromosome according to Formula 1;
  • the new Z value calculation step includes calculating and obtaining a new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood in the sample to be tested;
  • the steps for determining fetal chromosomal aneuploidy abnormalities include determining whether aneuploidy abnormalities occur in the chromosomes of the fetus to be tested based on the new Z value.
  • the key to the method of detecting fetal chromosomal aneuploidy abnormalities in this application is to comprehensively consider the three variables of fetal DNA concentration, chimerism, and traditional Z value through the fetal chromosomal aneuploidy abnormality detection model to obtain the model output. value, and convert the value into the Z value commonly used and recognized in the current field, that is, the new Z value (Z new ).
  • the "traditional Z value” is the Z value obtained by the "data processing step" according to the traditional conventional method.
  • the Z value obtained by the “data processing step” is called “Traditional Z value”
  • the Z value obtained by this application through model output value mapping is called “new Z value”.
  • This application incorporates chimerism into the machine learning model to help improve the accuracy of NIPT detection. It has good discrimination between true positive and false positive samples and reduces false positives; the new Z value conforms to the normal distribution in terms of distribution. Not only can it meet current regulatory and clinical use requirements; It can greatly reduce the volatility of data distribution, thereby reducing the gray area rate, reducing the retest rate, and improving the stability of test results.
  • the second aspect of this application discloses a method for constructing a fetal chromosomal aneuploidy abnormality detection model, which includes using several samples with known fetal chromosomal conditions as training samples, and the training samples include positive fetal chromosomal aneuploidy abnormalities.
  • the fetal DNA concentration, Z value and chimerism are used as inputs for machine learning model training to obtain a model output value that combines the three variables of fetal DNA concentration, Z value and chimerism to characterize the fetal chromosome condition.
  • the obtained model is a fetal chromosomal aneuploidy abnormality detection model.
  • the method for constructing a fetal chromosomal aneuploidy abnormality detection model of the present application is actually the construction of a fetal chromosomal aneuploidy abnormality detection model in the method of detecting fetal chromosomal aneuploidy abnormalities of the present application.
  • the calculation methods of fetal DNA concentration, Z value and chimerism can refer to the method of detecting fetal chromosomal aneuploidy abnormalities in this application, which will not be described again here.
  • the third aspect of this application discloses a device for detecting fetal chromosomal aneuploidy abnormalities, which includes a new Z-value calculation module and a fetal chromosomal aneuploidy abnormality judgment module;
  • the new Z-value calculation module includes: Calculate the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of pregnant women to obtain a new Z value for the sample to be tested;
  • the chimerism is the ratio of abnormal fetal cells to all fetal cells;
  • the fetal chromosomal aneuploidy abnormality module includes The new Z value determines whether aneuploidy abnormalities occur in the fetal chromosomes of the sample to be tested.
  • the new Z value calculation module also includes inputting the fetal DNA concentration, Z value and chimerism into the fetal chromosome aneuploidy abnormality detection model to obtain the model output value corresponding to the sample to be tested, and use the model to The output value is printed to obtain a new Z value of the sample to be tested; among them, the fetal chromosomal aneuploidy abnormality detection model uses several samples with known fetal chromosomal conditions as training samples, and the training samples include fetal chromosomal aneuploidy abnormalities.
  • the fetal DNA concentration, Z value and chimerism are used as inputs to perform machine learning model training, and the resulting model is obtained; the model output value is used to synthesize the three variables of fetal DNA concentration, Z value and chimerism. Characterize the fetal chromosomal condition.
  • the device of the present application also includes a model training module, which uses several samples of known fetal chromosome conditions as training samples.
  • the training samples include positive samples and negative samples of fetal chromosomal aneuploidy abnormalities.
  • Sample use fetal DNA concentration, Z value and chimerism as input, conduct machine learning model training, and obtain a model output value that combines the three variables of fetal DNA concentration, Z value and chimerism to characterize the fetal chromosome situation.
  • the model obtained thereby that is, fetal chromosomal aneuploidy abnormality detection model.
  • the machine learning model is a linear discriminant analysis model.
  • the new Z value calculation module includes a model output value analysis sub-module and a Z value imprinting sub-module;
  • the model output value analysis sub-module includes a fetal DNA concentration, Z value of the sample to be tested, The value and chimerism are input into the fetal chromosomal aneuploidy abnormality detection model to obtain the model output value corresponding to the sample to be tested;
  • the Z-value mapping submodule includes the model output value based on the sample to be tested, and the positive
  • the positive threshold, the negative threshold, and the median of the model output values of all negative samples are calculated to obtain the new Z value of the sample to be tested;
  • the positive threshold is the threshold of the model output value corresponding to the positive sample
  • the negative threshold is the model corresponding to the negative sample.
  • the threshold for the output value is the threshold of the model output value corresponding to the negative sample.
  • the Z value mapping sub-module obtains a new Z value according to the following method:
  • Z new is the new Z value
  • LD is the model output value of the sample to be tested
  • cut p is the positive threshold
  • cut n is the negative threshold
  • Med is the median of the model output values of all negative samples
  • the fetal chromosome aneuploidy abnormality module determines whether aneuploidy abnormality occurs in the fetal chromosome of the sample to be tested based on the new Z value, including determining that the new Z value is greater than 3 as positive, That is, fetal chromosomal aneuploidy is abnormal; a new Z value less than 1.96 is judged as negative, that is, the fetal chromosomes are normal.
  • the model training module can be used according to needs, for example, after the fetal chromosomal aneuploidy abnormality detection model, the positive threshold, the negative threshold and the median of the model output values of all negative samples have been obtained , other modules can directly call the model and data; therefore, it is not necessary to run the model training module for every detection.
  • the training samples change, such as adding training samples, it is recommended to run the model training module to further improve the model and various data.
  • the device for detecting fetal chromosomal aneuploidy abnormalities in this application actually implements the method of detecting fetal chromosomal aneuploidy abnormalities in this application through each module; therefore, the specific limitations of each module can be referred to this document
  • Application method for detecting fetal chromosomal aneuploidy abnormalities For example, the calculation of fetal DNA concentration, Z value and chimerism, the specific Z new calculation method, linear discriminant analysis model, how to judge positive and negative based on Z new , etc., can all refer to the application for detecting fetal chromosomal aneuploidy abnormalities. Methods.
  • the fourth aspect of the present application discloses a device for detecting fetal chromosomal aneuploidy abnormalities.
  • the device includes a memory and a processor; the memory includes a program for storing the program; the processor includes a program for executing the program stored in the memory to implement the present invention.
  • the device of the present application implements the construction method of the fetal chromosomal aneuploidy abnormality detection model of the present application by executing the program stored in the memory
  • the device of the present application is actually a user.
  • the model constructed by the device can be used to detect fetal chromosomal aneuploidy abnormalities according to the method of the present application.
  • the fifth aspect of the present application discloses a computer-readable storage medium.
  • a program is stored in the storage medium.
  • the program can be executed by a processor to implement the method of detecting fetal chromosomal aneuploidy abnormalities of the present application or the method of the present application.
  • the computer-readable storage medium of the present application actually It is a computer-readable storage medium used for model construction.
  • the computer-readable storage medium can be used directly to realize the construction of fetal chromosomal aneuploidy abnormality detection model.
  • the model obtained thereby can be constructed according to the method of the present application. For the detection of fetal chromosomal aneuploidy abnormalities.
  • This application's method and device for detecting fetal chromosomal aneuploidy abnormalities is the first to incorporate chimerism into the detection of fetal chromosomal aneuploidy abnormalities. It comprehensively considers the three variables of fetal DNA concentration, chimerism and traditional Z value to calculate a new Z value. .
  • the method and device of the present application can improve the accuracy of NIPT detection, have good discrimination between true positive and false positive samples, and reduce false positives.
  • the new Z value conforms to the normal distribution in distribution, which not only meets the current regulatory and clinical use requirements; it can also reduce the volatility of data distribution, thereby reducing the gray area rate, reducing the retest rate, and improving the stability of test results. sex.
  • Figure 1 is a flow chart of a method for detecting fetal chromosomal aneuploidy abnormalities in an embodiment of the present application
  • Figure 2 is a structural block diagram of a device for detecting fetal chromosomal aneuploidy abnormalities in an embodiment of the present application
  • Figure 3 is an analysis chart of T13 chimerism of 10,240 samples in the examples of this application;
  • Figure 4 is a Q-Q diagram of the new Z value of chromosome 21 of 10,000 samples in the embodiment of the present application;
  • Figure 5 is a distribution diagram of the traditional Z value and the new Z value of chromosome 13 of 10,000 samples in the embodiment of the present application.
  • this application creatively uses the degree of chimerism as a variable in calculating the new Z value to improve the accuracy of NIPT detection. Therefore, this application proposes a method for detecting fetal chromosomal aneuploidy abnormalities, which includes calculating and obtaining a new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood in the sample to be tested. , based on the new Z value, determine whether the fetal chromosomes of the sample to be tested have aneuploidy abnormalities; among them, the chimerism is the ratio of abnormal fetal cells to all fetal cells.
  • a method for detecting fetal chromosomal aneuploidy abnormalities is shown in Figure 1 and specifically includes a data acquisition step 11, a data processing step 12, a chimerism calculation step 13, and a new Z value calculation step. 14. Step 15 for determining fetal chromosomal aneuploidy abnormalities.
  • data acquisition step 11 includes obtaining high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested.
  • the off-machine data is a fastq format file generated by the sequencer.
  • Data processing step 12 includes calculating the fetal DNA concentration, Z value, the average of the corrected depth of each chromosome, and the average of the corrected depth of all autosomal chromosomes based on the high-throughput sequencing data of the free DNA of the pregnant woman to be tested. value.
  • this step includes common operations of the conventional NIPT process, specifically including the following:
  • Sequence alignment and filtering compare the sequence information contained in the fastq format file generated by the sequencer to the human reference genome, such as GRCh37/hg19, through public software, such as BWA (0.7.7-r441), and filter to remove the differences. For poor quality sequences, multiple alignment sequences, repeated sequences, and imperfect alignment sequences, unique alignment sequences are left, and the coordinates and other information of each unique alignment sequence are stored in BAM format files.
  • B) Window division and data correction Divide the human reference genome into windows of about 60kb, and count the number of uniquely aligned sequences in each 60kb window as the original depth information of the window, that is, the window depth. Furthermore, GC correction and inter-sample correction are performed on the original depth of each window to obtain the corrected depth information (i.e. UR) of each window. The corrected depths of all windows on a chromosome are averaged to obtain the "kth item" "Average of corrected depths for chromosomes", this application calculates the average of corrected depths for all autosomal chromosomes.
  • the male fetal concentration is calculated as follows:
  • the fetal concentration of a male fetus is determined by the proportion of the Y chromosome.
  • the mean window UR of the Y chromosome is divided by the mean UR of the autosomal chromosomes, and then multiplied by 2 to obtain the fetal concentration FF of the male fetus:
  • the female fetal concentration is calculated as follows:
  • the fetal concentration of female fetuses is estimated by establishing a high-dimensional regression model using the non-uniform distribution of fetal cell-free DNA on the genome.
  • the background assumption is that the fetus, whether male or female, has The distribution characteristics of cfDNA and maternal cfDNA on the genome are different. Therefore, the fetal concentration estimated by the Y chromosome method of male fetuses is used as the input of the training model, and the regression model is constructed using the neural network machine learning method, as follows:
  • l is the sequence number of the layer of the network
  • the first layer is the input layer
  • the last layer is the output layer (only one neuron)
  • the middle is the hidden layer.
  • is the value of the j-th neuron in the l-th layer is the value of the k-th neuron in layer l-1
  • w and b are obtained when training the model.
  • the Z value of the chromosome to be tested can be calculated using the distribution of the depth of the interval of the chromosome to be tested. The Z value That can be used as a basis to determine whether the chromosome is trisomic.
  • the traditional Z value calculation method is as follows:
  • the corrected depth information UR of each window of autosomal chromosomes obeys the Poisson distribution. When the number of windows is large, it obeys the normal distribution. For normal samples, there is no significant difference between the distribution of the UR of the chromosome to be tested and the distribution of the UR of the reference chromosome. For There are slight differences in abnormal samples.
  • the Z test can be used to determine fetal chromosomal aneuploidy abnormalities, as follows:
  • SD i represents the standard deviation of the UR of chromosome i
  • SD j represents the standard deviation of the UR of chromosome j
  • L i represents the number of windows divided by chromosome i
  • L j represents the number of windows divided by chromosome j
  • Z i Indicates the significance of aneuploidy of chromosome i, the difference between reaction and euploidy.
  • the above formula compares the 22 autosomes within the same sample with each other.
  • the background assumption of this is that most of the chromosomes in a sample should be normal diploid. Therefore, compare the target chromosome with the remaining 21 chromosomes 21 times. If the target chromosome is a normal diploid, most of the 21 Z test values should be close to 0, and the average will get a negative Z value; on the contrary, if the target chromosome is trisomic, most of the values of the 21 Z tests will be much greater than 0, and a positive Z value will be obtained by averaging.
  • the chimerism calculation step 13 includes calculating the chimerism of each chromosome based on fetal DNA concentration.
  • Mosaic k is the chimerism of the k-th chromosome
  • fra k is the relative fetal concentration of the k-th chromosome
  • FF is the fetal DNA concentration
  • fra k is calculated using Formula 2:
  • fra k is the relative fetal concentration of the kth chromosome, is the average value of the corrected depth of the kth chromosome, is the average of the corrected depths of all autosomal chromosomes; in Formula 1 and Formula 2, the value of k ranges from 1 to 22.
  • Mosaic k is 0, indicating that chromosome k is normal; Mosaic k is 1, indicating that the fetus's chromosome k is completely trisomic; Mosaic k is between 0 and 1, indicating that the fetus's chromosome k is mosaic.
  • the trisomy signal in the peripheral blood of the pregnant woman will be stronger; if the fetus is a chimeric trisomy, The trisomy signal in the peripheral blood of pregnant women is weak. Since NIPT involves multiple steps such as plasma collection, storage, transportation, cfDNA isolation, library construction, and on-machine sequencing, slight fluctuations in any step will cause fluctuations in the final test results. For negative samples, it may be due to data fluctuations. Contributes to a weak trisomic signal similar to low chimerism.
  • this application creatively proposes to quantitatively describe the degree of chimerism, and further clarify the difference between the degree of chimerism of true positive samples and the degree of chimerism of weak trisomic signals caused by data fluctuations, so as to better compare true positives and False positive samples are distinguished.
  • the new Z value calculation step 14 includes calculating the new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood in the sample to be tested; among which, the chimerism is the proportion of abnormal fetal cells accounting for all ratio of fetal cells.
  • the new Z value calculation step 14 is divided into a model output value analysis sub-step and a Z value mapping sub-step.
  • the sub-step of model output value analysis includes inputting the fetal DNA concentration, traditional Z value and chimerism of the sample to be tested into the fetal chromosomal aneuploidy abnormality detection model to obtain the model output corresponding to the sample to be tested. value.
  • the fetal chromosomal aneuploidy abnormality detection model uses several samples with known fetal chromosomal aneuploidy abnormalities as training samples, takes fetal DNA concentration, traditional Z value and chimerism as input, and the model output value is the output , perform machine learning model training and obtain the model.
  • machine learning model training is another innovative improvement of this application.
  • this application found that there is a very good linear relationship between the three variables of fetal DNA concentration, chimerism, and traditional Z value; Therefore, the three variables of fetal DNA concentration, chimerism, and traditional Z value were put into the LDA (linear discriminant analysis) model for model training, and the trained model was obtained, which is the fetal chromosomal aneuploidy abnormality detection model.
  • LDA linear discriminant analysis
  • w k is the coefficient, which is the model output value obtained by model training
  • a k is the variable, which is the sample information input to the model.
  • it is the fetal concentration, traditional Z value and chimerism. Therefore, what is actually obtained after model training is the coefficients of the three variables of fetal concentration, traditional Z value, and chimerism. With these three coefficients, plus the fetal concentration, traditional Z value, and chimerism of the sample, we can The result of the machine learning model is obtained through the above formula, that is, the model output value (LD value).
  • the Z value mapping sub-step includes calculating the new Z value of the sample to be tested based on the model output value of the sample to be tested, the positive threshold, the negative threshold, and the median of the model output values of all negative samples, which is marked as Z new .
  • the threshold cannot be divided based on statistical significance like the traditional Z value.
  • the threshold can only be delineated through the characteristics of the training data.
  • the negative threshold is defined so that all true positive samples in the training data will not be judged as negative, ensuring that the model will not produce false negatives.
  • the positive threshold is defined so that as many true positive samples as possible can be judged as positive, while as few original false positive samples as possible can be judged as positive, thereby reducing false positives and improving the performance of NIPT testing.
  • the gray area is between the positive threshold and the negative threshold.
  • the machine learning model used in one implementation of this application is a linear model, so that the final result generated by the machine learning model can maintain the distribution characteristics of the traditional Z value; therefore, this application creatively uses the printing method to print the model output value is the new Z value, which can not only improve the performance of NIPT detection, but also make the final result have distribution characteristics similar to the Z value, that is, it conforms to the normal distribution with the center at 0.
  • the specific printing method is as follows:
  • Z new is the new Z value
  • LD is the model output value of the sample to be tested
  • cut p is the positive threshold
  • cut n is the negative threshold
  • Med is the median model output value of the negative sample.
  • Step 15 of determining fetal chromosomal aneuploidy abnormalities includes determining whether aneuploidy abnormalities occur in the chromosomes of the fetus to be tested based on the new Z value.
  • the new Z value obtained through segmented imprinting also conforms to the normal distribution, and the center of the distribution is located at 0; therefore, Z>3 can still be used as the positive judgment value, and Z ⁇ 1.96 can be used as the positive judgment value. Negative judgment value.
  • this application proposes a method for constructing a fetal chromosomal aneuploidy abnormality detection model, which includes using several samples with known fetal chromosomal conditions as training samples.
  • the above training samples include positive and negative samples of fetal chromosomal aneuploidy abnormalities.
  • the machine learning model is trained to obtain a comprehensive fetal DNA concentration, Z value and chimerism.
  • Each variable represents the model output value of the fetal chromosome condition, and the model obtained thereby is the fetal chromosomal aneuploidy abnormality detection model.
  • the calculation methods of fetal DNA concentration, Z value and chimerism can refer to the method of detecting fetal chromosomal aneuploidy abnormalities in this application, which will not be described here.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium can include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., through the computer Execute this program to achieve the above functions.
  • the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the above functions can be realized.
  • the program can also be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk or a mobile hard disk, and can be downloaded or copied to save it. into the memory of the local device, or performs a version update on the system of the local device.
  • a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk or a mobile hard disk, and can be downloaded or copied to save it. into the memory of the local device, or performs a version update on the system of the local device.
  • this application proposes a device for detecting fetal chromosomal aneuploidy abnormalities, including a new Z-score calculation module and a fetal chromosomal aneuploidy abnormality determination module , the new Z value calculation module includes calculating the new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood; chimerism The degree of synthesis is the ratio of abnormal fetal cells to all fetal cells; the fetal chromosome aneuploidy abnormality module includes determining whether the fetal chromosomes of the sample to be tested have aneuploidy abnormalities based on the new Z value.
  • a device for detecting fetal chromosomal aneuploidy abnormalities includes a data acquisition module 21, a data processing module 22, a chimerism calculation module 23, a model training module 24, a new Z value calculation module 25 and fetal chromosome aneuploidy abnormality judgment module 26.
  • the data acquisition module 21 includes high-throughput sequencing data for acquiring blood cell-free DNA of pregnant women to be tested. For example, obtain the fastq format file generated by the sequencer.
  • the data processing module 22 includes a method for calculating the fetal DNA concentration, the traditional Z value, the average of the corrected depth of each chromosome, and the corrected depth of all autosomal Depth average. For example, refer to the existing conventional NIPT protocol to calculate fetal DNA concentration, traditional Z-score, the average of the corrected depth of each chromosome, the average of the corrected depth of all autosomal chromosomes, etc.
  • the chimerism calculation module 23 includes calculating the chimerism of each chromosome based on fetal DNA concentration.
  • Mosaic k is the chimerism of the k-th chromosome
  • fra k is the relative fetal concentration of the k-th chromosome
  • FF is the fetal DNA concentration
  • fra k is the relative fetal concentration of the kth chromosome, is the average value of the corrected depth of the kth chromosome, is the average of the corrected depths for all autosomal chromosomes;
  • Mosaic k is 0, indicating that chromosome k is normal; Mosaic k is 1, indicating that the fetus's chromosome k is completely trisomic; Mosaic k is between 0 and 1, indicating that the fetus's chromosome k is mosaic.
  • the model training module 24 includes using several samples with known fetal chromosome conditions as training samples.
  • the training samples include positive samples and negative samples of fetal chromosomal aneuploidy abnormalities, with fetal DNA concentration, Z value and chimerism as inputs. Carry out machine learning model training to obtain a model output value that combines the three variables of fetal DNA concentration, Z value and chimerism to characterize the fetal chromosome condition.
  • the model obtained is the fetal chromosome aneuploidy abnormality detection model; and is in progress. After the model is trained, positive samples are used to obtain the corresponding positive threshold, negative samples are used to obtain the corresponding negative threshold, and all negative sample model output values are used to obtain the median.
  • the new Z value calculation module 25 includes calculating the new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood in the sample to be tested; among which, the chimerism is the ratio of fetal abnormal cells accounting for all ratio of fetal cells.
  • the new Z-value calculation module 25 includes a model output value analysis sub-module and a Z-value mapping sub-module;
  • the model output value analysis sub-module includes inputting the fetal DNA concentration, Z value and chimerism of the sample to be tested into the fetus
  • the chromosomal aneuploidy abnormality detection model obtains the model output value corresponding to the sample to be tested;
  • the Z-value printing sub-module includes the model output value based on the sample to be tested, as well as the positive threshold, negative threshold, and models for all negative samples Calculate the median of the output value to obtain the new Z value of the sample to be tested;
  • the positive threshold is the threshold of the model output value corresponding to the positive sample
  • the negative threshold is the threshold of the model output value corresponding to the negative sample.
  • the fetal chromosome aneuploidy abnormality determination module 26 is configured to determine whether the chromosome of the fetus to be tested has aneuploidy abnormality based on the new Z value. For example, if the new Z value is greater than 3, it is judged as positive, that is, the fetal chromosome aneuploidy is abnormal; if the new Z value is less than 1.96, it is judged as negative, that is, the fetal chromosome is normal.
  • the device includes a memory and a processor; the memory is used to store programs; the processor is used to store programs by executing the memory.
  • the program is used to implement the following method: calculate the new Z value of the sample to be tested based on the fetal DNA concentration, Z value, and chimerism in the cell-free DNA of the pregnant woman's blood, and determine the fetal chromosomes of the sample to be tested based on the new Z value. Whether aneuploidy abnormalities occur; where mosaicism is the ratio of abnormal fetal cells to all fetal cells.
  • the data acquisition step includes obtaining high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested;
  • the data processing step includes obtaining the high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested, Calculate the fetal DNA concentration and traditional Z value;
  • the chimerism calculation step includes calculating the chimerism of each chromosome based on the fetal DNA concentration;
  • the model value analysis step includes inputting the fetal DNA concentration to be measured, the traditional Z value and the chimerism into the fetal chromosomes
  • the aneuploidy anomaly detection model obtains the model output value corresponding to the sample to be tested;
  • the Z-value mapping step includes the model output value, positive threshold, negative threshold, and median of the model output value of the negative sample based on the sample to be tested, Calculate and obtain a new Z value;
  • the step of determining fetal chromosomal aneuploidy abnormalities includes determining whether aneuploidy abnormal
  • the device includes a memory and a processor; the memory is configured to store a program; the processor is configured to execute the program stored in the memory to implement the following method: including using several samples of known fetal chromosome conditions as training samples, The training samples include positive and negative samples of fetal chromosomal aneuploidy abnormalities.
  • the training samples include positive and negative samples of fetal chromosomal aneuploidy abnormalities.
  • the machine learning model is trained to obtain a comprehensive fetal DNA concentration, Z value and chimerism.
  • the variable represents the model output value of the fetal chromosomal condition, and the model obtained thereby is the fetal chromosomal aneuploidy abnormality detection model.
  • the storage medium includes a program, and the program can be executed by a processor to implement the following method: according to the sample of pregnant woman blood to be tested The fetal DNA concentration, Z value, and chimerism in the cell-free DNA are calculated to obtain a new Z value of the sample to be tested, and based on the new Z value, it is determined whether the fetal chromosome of the sample to be tested has aneuploidy abnormalities; among them, the chimerism is The ratio of abnormal fetal cells to all fetal cells.
  • the data acquisition step includes obtaining high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested;
  • the data processing step includes obtaining the high-throughput sequencing data of cell-free DNA in the blood of pregnant women to be tested, Calculate the fetal DNA concentration and traditional Z value;
  • the chimerism calculation step includes calculating the chimerism of each chromosome based on the fetal DNA concentration;
  • the model value analysis step includes inputting the fetal DNA concentration to be measured, the traditional Z value and the chimerism into the fetal chromosomes
  • the aneuploidy anomaly detection model obtains the model output value corresponding to the sample to be tested;
  • the Z-value mapping step includes the model output value of the sample to be tested, the positive threshold, the negative threshold, and the median of the model output value of the negative sample, Calculate and obtain a new Z value;
  • the step of determining fetal chromosomal aneuploidy abnormalities includes determining whether aneuploidy
  • the storage medium includes a program, which can be executed by the processor to implement the following method: including using several samples with known fetal chromosomal conditions as training samples, and the training samples include positive samples of fetal chromosomal aneuploidy abnormalities and For negative samples, the fetal DNA concentration, Z value and chimerism are used as inputs to perform machine learning model training to obtain a model output value that combines the three variables of fetal DNA concentration, Z value and chimerism to characterize the fetal chromosome condition.
  • the resulting model namely fetal chromosomal aneuploidy abnormality detection model.
  • This application combines the three variables of fetal concentration, the unique indicator of this application - chimerism, and the traditional Z value, and in an implementation method, linear discriminant analysis (LDA) is specifically selected as the machine learning model to carry out the model training and judgment of results.
  • LDA linear discriminant analysis
  • the three variables used in this application are all linear relationships.
  • the linear relationship is simple and clear, avoiding the complexity caused by too many variables and different dimensions and distribution characteristics between variables.
  • the linear discriminant analysis model (LDA) model is used for analysis.
  • the model is simple and does not have the problem of overfitting.
  • This application has developed a new Z-score conversion method to convert values without statistical significance obtained by machine learning into Z-scores that are commonly used clinically and comply with regulatory requirements, which is the new Z-score of this application.
  • Z value is the new Z-score of this application.
  • the new Z value obtained through the Z value conversion method of the present application conforms to the normal distribution and can meet the current regulatory and clinical use requirements.
  • This application first created a new detection indicator - chimerism, and further integrated the three variables of fetal DNA concentration, chimerism, and traditional Z value to overcome the inaccuracy in the results of traditional NIPT that only relies on Z value to determine trisomy. sex; and using a linear model to combine the above three variables, the model is simple and does not have the problem of overfitting. Furthermore, this application also developed a Z value conversion method, namely Z value mapping, to convert the meaningless values obtained by the machine learning model into meaningful and clinically recognized Z values, namely Z new , while reducing the traditional The gray area of the Z value and retesting improve the stability of the test results.
  • a Z value conversion method namely Z value mapping
  • This example uses the established fetal chromosomal aneuploidy abnormality detection model to predict samples with diagnostic results/follow-up results. Specifically, this example uses a total of 108,293 samples for model training. These samples are divided into negative and positive categories when entering model training, but they also contain 3 types of karyotypes. Negative samples include true negative and false positive samples, and positive samples include true Positive sample. Since the fetal concentration calculation methods of male and female fetuses are different, the fetal concentration data characteristics of male and female fetuses are different, and fetal concentration is one of the key variables of the model, so two models are trained separately for male and female fetuses. The specific sample number is shown in Table 1.
  • the fetal DNA concentration, traditional Z value and chimerism of the training sample are input into the LDA model for training and the model output value is obtained.
  • the median of the values obtained by machine learning of the negative samples we obtain the "median of the model output value", which is the Med in the subsequent imprinting formula.
  • the median calculated in this example is shown in Table 3.
  • the threshold of the LD value is defined so that: 1. No true positive sample will be judged as negative; 2. As many true positives as possible The sample is judged as positive; 3. As few false positive samples as possible are judged as positive.
  • the thresholds of LD values are demarcated according to the above principles, that is, the positive threshold (cut p ) and negative threshold (cut n ) in the imprinting formula. The specific values in this example are shown in Table 4.
  • Z new is the new Z value
  • LD is the model output value
  • cut p is the positive threshold
  • cut n is the negative threshold
  • Med is the median model output value of the negative sample.
  • a total of 10,240 samples tested by BGI in actual clinical applications and underwent prenatal diagnosis/postnatal follow-up were selected. These samples provide test results based on traditional Z values in actual clinical testing, and Follow-up prenatal diagnosis/postpartum follow-up is performed based on the test results. Therefore, each sample can be classified into three categories: true positive, false positive, and true negative based on the test results of each sample and the results of prenatal diagnosis/postpartum follow-up. Specific sample information As shown in Table 5.
  • the traditional Z value calculation method is as follows:
  • SD i represents the standard deviation of the UR of chromosome i
  • SD j represents the standard deviation of the UR of chromosome j
  • L i represents the number of windows divided by chromosome i
  • L j represents the number of windows divided by chromosome j
  • Z i Indicates the significance of aneuploidy of chromosome i, the difference between reaction and euploidy.
  • the chimerism degree of the above 10,240 samples was calculated. Taking T13 as an example, the results are shown in Figure 3. The chimerism degree can be compared. It is good to distinguish true positive, false positive and true negative samples. Further input the three variables of chimerism, fetal concentration, and traditional Z value into the trained machine learning model, and then use the Z value to map Generate new Z values. Some sample data used for model testing are shown in Table 6. The above 10,240 samples were re-judged based on the new Z value. Z>3 was judged as positive and Z ⁇ 1.96 was judged as negative to generate new test results. The results are shown in Table 7.
  • This example uses the established model to detect continuous samples from the production line.
  • the samples with karyotype are not continuous samples from a single center. Therefore, their data distribution characteristics cannot reflect the true distribution characteristics of the population. Therefore, the true distribution characteristics of the new Z value cannot be evaluated. . Therefore, using the continuous samples received by a certain medical examination of BGI over a period of time, we evaluate the distribution characteristics of the new Z value and compare it with the traditional Z value to demonstrate the actual use of the new Z value.
  • the new Z value has smaller fluctuations than the traditional Z value, which can bring about the effect of reducing the gray area rate.
  • This example further demonstrates this with a larger sample size. Specifically, take the data of a single medical laboratory of BGI for the whole year of 2020 360,786 clinical samples were tested, and a total of 383,306 tests were performed on these samples. The new Z value produced 785 T21 gray areas, 345 T18 gray areas, and 288 T13 gray areas in 383,306 inspections. The gray area rates of T21, T18, and T13 were 0.22%, 0.09%, and 0.08% respectively. The overall gray area rate of body detection is 0.39%. In comparison, the traditional Z value produced 3071 T21 gray areas, 4350 T18 gray areas, and 2335 T13 gray areas. The gray area rates of T21, T18, and T13 were 0.80%, 1.14%, and 0.61% respectively. Three-body The overall gray area rate detected is 2.55%, as shown in Table 8.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本申请公开了一种检测胎儿染色体非整倍体异常的方法、装置及存储介质。本申请检测胎儿染色体非整倍体异常的方法,包括根据待测孕妇血液游离DNA的胎儿DNA浓度、Z值、嵌合度,计算待测样本新的Z值,根据新的Z值判断胎儿染色体是否发生非整倍体异常;嵌合度为胎儿异常细胞占所有胎儿细胞的比率。本申请率先将嵌合度纳入胎儿染色体非整倍体异常检测,综合考虑胎儿DNA浓度、嵌合度和Z值三个变量计算新的Z值,能提升NIPT检测准确度,对真阳性和假阳性样本具有很好的区分度,减少假阳性;新的Z值从分布上符合正态分布,能满足目前监管和临床使用的要求,降低数据分布波动性,从而降低灰区率、降低重测率,提升检测结果稳定性。

Description

检测胎儿染色体非整倍体异常的方法、装置及存储介质 技术领域
本申请涉及胎儿染色体非整倍体异常检测技术领域,特别是涉及一种检测胎儿染色体非整倍体异常的方法、装置及存储介质。
背景技术
胎儿染色体非整倍体异常,即胎儿染色体为非整倍体,正常胎儿的染色体为23对(46条),即染色体呈整倍体,如果出现了染色体缺失或染色体增多,形成非整倍体,则说明胎儿染色体存在异常,即胎儿染色体非整倍体异常。
目前,临床上比较常见的胎儿染色体非整倍体异常为唐氏综合征、爱德华氏综合征和Patau综合征。
唐氏综合征(21-三体综合征,Down syndrome)是21号染色体的三体现象造成的遗传疾病,常见症状有发育迟缓、特殊的面部特征以及轻度到中度的智能障碍。目前唐氏综合症并无有效的治疗方法,仅能透过生活照料及教育来改善患者的生活品质。除唐氏综合征之外,临床上比较常见的胎儿染色体非整倍体异常还包括爱德华氏综合征(18-三体综合症,Edwards syndrome)和Patau综合征(13-三体综合征,Patau syndrome)等,均会导致患儿发生严重发育异常。
唐氏综合征的分子生物学机制是生殖细胞在生成时21号染色体不分离,导致受精卵中含有3个21号染色体的拷贝,进而导致一系列分子与发育生物学过程的异常。由于以唐氏综合征为代表的染色体非整倍体综合征尚无有效治疗手段且尚未发现与其发病相关的特定行为或环境因素,因此,目前的主要应对手段是通过对孕妇进行产前筛查避免唐氏综合征等具有严重遗传疾病婴儿的出生,即在母亲怀孕时进行相应检测,若检测到相关指标为阳性或高风险,则通过终止妊娠来避免三体婴儿的诞生。
传统筛查通过血清学标志物,如AFP、游离β-hCG、uE3、Inhibin-A,进行三体风险的评估。由于血清学标志物是间接指标,并不能直接反应胎儿染色体非整倍体状态,因此灵敏度和特异性均较差。2010年前后,高通量测序技术逐渐兴起与普及,通过高通量测序技术可以精确地对母亲血浆中的游离DNA(cfDNA)进行检测和定量,进而通过目标染色体的相对含量高低来筛查包含21-三体在内的染色体异常(即NIPT,Non-Invasive Prenatal Testing)。2015年,《新英格兰》杂志发表文章,通过前瞻性、多中心临床试验对15841例样本进行分析表明,NIPT的性能显著优于传统筛查,其灵敏度和特异性均达到99.9%以上;相比之下,传统血清学筛查手段的灵敏度仅为78.9%,特异性仅为94.6%, 证明NIPT大大提升了以唐氏综合征为代表的染色体非整倍体综合征筛查的效果。
然而,NIPT检测的性能依然有待提升。根据Zhang等人2015年发表的文章,作者对112669例具有随访结果的NIPT检测结果进行分析,发现传统NIPT的检测性能主要存在以下两方面的问题:第一,阳性预测值(PPV)有待提升。根据文章中的数据,T21的阳性预测值为92.2%,而T18的阳性预测值为76.6%,T13的阳性预测值更是仅有32.8%,显示出传统NIPT检测方法的假阳较多,阳性预测值有待提升;第二,重测率高。根据文章中的数据,112669例样本产生了3213次重抽血,重抽血率为2.8%。重抽血意味着第一次NIPT检测值处于灰区,因此不能给出阴性或阳性的检测结果,需要重新抽一管血,再次检测。在这种情况下,孕妇不仅额外经受了一次抽血的痛苦;更重要的是,延长了NIPT检测报告结果的周期,可能导致孕妇错过最佳的干预时期,对孕妇的生命健康带来重大隐患。
因此,如何提升NIPT检测的阳性预测值,降低重测率,是胎儿染色体非整倍体异常检测的研究重点和难点。
发明内容
本申请的目的是提供一种改进的检测胎儿染色体非整倍体异常的方法、装置及存储介质。
为了实现上述目的,本申请采用了以下技术方案:
本申请的第一方面公开了一种检测胎儿染色体非整倍体异常的方法,包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,标记为Znew,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常;嵌合度为胎儿异常细胞占所有胎儿细胞的比率。
需要说明的是,本申请检测胎儿染色体非整倍体异常的方法,关键在于将胎儿DNA浓度、本申请独特的指标:嵌合度、以及传统Z值三个变量计算获得目前领域内常用且认可的新的Z值,即Znew。其中,“传统Z值”即根据传统的常规方法获得的Z值;本申请的“新的Z值”即本申请通过三个变量计算获得的Z值。本申请的方法,将嵌合度作为计算“新的Z值”的输入变量,有助于提升NIPT检测的准确度,对真阳性和假阳性样本具有很好的区分度,减少假阳性;新的Z值从分布上符合正态分布,不仅能够满足目前监管和临床使用的要求;而且还能够大大降低数据分布的波动性,从而降低灰区率、降低重测率,提升检测结果的稳定性。
本申请的一种实现方式中,根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,包括将胎儿DNA浓度、 Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值,由模型输出值印射获得待测样本的新的Z值;其中,胎儿染色体非整倍体异常检测模型是采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型。
可以理解,机器学习模型训练获得新的Z值只是本申请的一种实现方式,不排除还可以采用其他计算方式由胎儿DNA浓度、Z值、嵌合度计算获得本申请的新的Z值。
本申请的一种实现方式中,由模型输出值印射获得待测样本的新的Z值,包括根据待测样本的模型输出值、阳性阈值、阴性阈值、所有阴性样本的模型输出值的中位数,计算获得待测样本的新的Z值;其中,阳性阈值为阳性样本对应的模型输出值的阈值,阴性阈值是阴性样本对应的模型输出值的阈值。
需要说明的是,模型输出值,或称机器学习模型生成值,是通过胎儿染色体非整倍体异常检测模型输出的评估胎儿染色体非整倍体异常的数值,该数值无法像传统Z值一样根据统计学意义来划分阈值,只有通过训练数据的特征进行阈值划定;例如,划定阴性阈值,使得训练数据中所有的真阳性样本均不会被判定为阴性,保证模型不会产生假阴性;划定阳性阈,使得尽可能多的真阳性样本能够被判断为阳性,同时尽可能少的原始假阳性样本被判断为阳性,从而降低假阳性以提升NIPT检测的性能;阳性阈值与阴性阈值之间为灰区;为了使待测样本的模型输出值能够直接用于判断胎儿染色体非整倍体异常状态,本申请进一步的将模型输出值印射为新的Z值,即Znew;并且,试验结果显示,通过本申请印射获得的新的Z值符合正态分布,且分布的中心位于0,因此,依然可以用Z>3作为阳性的判断标准,Z<1.96作为阴性的判断标准。
本申请的一种实现方式中,阴性样本的模型输出值的中位数,是把所有阴性训练样本再次输入胎儿染色体非整倍体异常检测模型中,获得的所有阴性样本的模型输出值的中位数;
本申请的一种实现方式中,由模型输出值印射获得待测样本的新的Z值,包括以下印射方式,
当待测样本的模型输出值大于阳性阈值时,Znew=LD-cutp+3;
当待测样本的模型输出值小于阳性阈值、且大于阴性阈值时,
当待测样本的模型输出值小于阴性阈值时,
以上公式中,Znew为新的Z值,LD为待测样本的模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为所有阴性样本的模型输出值的中位数。
需要说明的是,本申请中,由模型输出值印射获得新的Z值的理念如下:
1)当模型输出值大于阳性阈值时,最终转换得到的新的Z值需要大于3,因为临床上对于三体阳性的判定习惯以Z>3作为阈值。
2)当模型输出值处于灰区时,最终转换得到的新的Z值需要介于1.96~3之间,因为临床上习惯以Z~[1.96,3)作为灰区范围。
3)当模型输出值小于阴性阈值时,最终转换得到的新的Z值需要小于1.96,因为临床上对于三体阴性的判定习惯以Z<1.96作为阈值。
另外,本申请还保证了阴性样本新的Z值的中位数为0,因为对于标准正态分布其中位数就应该等于0;因此,公式里面有考虑Med,即阴性样本的模型输出值中位数。根据以上印射公式可以发现,当模型输出值等于阴性样本的模型输出值中位数的时候,新的Z值为0。
还需要说明的是,以上公式中的具体数值是本申请的一种实现方式中具体得出的数据;可以理解,如果训练样本改变,相应的印射公式中的数据也会改变;但是,通过印射公式获得新的Z值的基本原则不变。
本申请的一种实现方式中,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常,包括,新的Z值大于3判断为阳性,即胎儿染色体非整倍体异常;新的Z值小于1.96判断为阴性,即胎儿染色体正常。
本申请的一种实现方式中,机器学习模型为线性判别分析模型(linear discriminant analysis,缩写LDA)。
本申请的一种实现方式中,胎儿异常细胞为含有胎儿染色体非整倍体异常的细胞。
本申请的一种实现方式中,孕妇血液游离DNA中的胎儿DNA浓度、Z值,通过孕妇血液游离DNA的高通量测序数据计算获得。
本申请的一种实现方式中,嵌合度由公式一计算获得;
公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;
frak采用公式二计算获得;
公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;
公式一和公式二中,k的取值为1至22;
Mosaick为0,说明胎儿的第k条染色体正常;Mosaick为1,说明胎儿的第k条染色体完全为三体;Mosaick介于0-1之间,说明胎儿的第k条染色体存在嵌合。本申请中,胎儿染色体k存在嵌合是指,一部分胎儿细胞中的染色体k是三体状态,另一部分胎儿细胞中的染色体k是非三体状态;原则上,在固定胎儿DNA浓度的情况下,若胎儿是完全三体,则孕妇外周血中的三体信号较强;若胎儿是嵌合三体,则孕妇外周血中三体信号较弱;并且,嵌合度较低时一般为数据波动带来的假阳性,嵌合度较高时一般为真阳性。
本申请的一种实现方式中,每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值,通过孕妇血液游离DNA的高通量测序数据计算获得。
本申请的一种实现方式中,本申请检测胎儿染色体非整倍体异常的方法,包括以下步骤:
数据获取步骤,包括获取待测孕妇血液游离DNA的高通量测序数据;
数据处理步骤,包括根据获取的待测孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、Z值;
嵌合度计算步骤,包括根据公式一计算每条染色体的嵌合度;
新的Z值计算步骤,包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;
胎儿染色体非整倍体异常判断步骤,包括根据新的Z值判断待测胎儿的染色体是否发生非整倍体异常。
需要说明的是,本申请检测胎儿染色体非整倍体异常的方法,关键在于将胎儿DNA浓度、嵌合度、以及传统Z值三个变量通过胎儿染色体非整倍体异常检测模型综合考虑得到模型输出值,并将该值转换成目前领域内常用且认可的Z值,即新的Z值(Znew)。其中,“传统Z值”即“数据处理步骤”根据传统的常规方法获得的Z值,为了更好的区分本申请的“新的Z值”,将“数据处理步骤”获得的Z值称为“传统Z值”,将本申请通过模型输出值印射获得的Z值称为“新的Z值”。本申请将嵌合度纳入机器学习模型中有助于提升NIPT检测的准确度,对真阳性和假阳性样本具有很好的区分度,减少假阳性;新的Z值从分布上符合正态分布,不仅能够满足目前监管和临床使用的要求;而且还 能够大大降低数据分布的波动性,从而降低灰区率、降低重测率,提升检测结果的稳定性。
本申请的第二方面公开了一种胎儿染色体非整倍体异常检测模型的构建方法,包括采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型。
需要说明的是,本申请的胎儿染色体非整倍体异常检测模型的构建方法,实际上就是本申请的检测胎儿染色体非整倍体异常的方法中,胎儿染色体非整倍体异常检测模型的构建方法;因此,胎儿DNA浓度、Z值和嵌合度的计算方法都可以参考本申请的检测胎儿染色体非整倍体异常的方法,在此不累述。
本申请的第三方面公开了一种检测胎儿染色体非整倍体异常的装置,包括新的Z值计算模块和胎儿染色体非整倍体异常判断模块;新的Z值计算模块包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;嵌合度为胎儿异常细胞占所有胎儿细胞的比率;胎儿染色体非整倍体异常模块包括根据所述新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常。
本申请的一种实现方式中,新的Z值计算模块还包括将胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值,由模型输出值印射获得待测样本的新的Z值;其中,胎儿染色体非整倍体异常检测模型是采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,由此获得的模型;模型输出值用于综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况。
因此,本申请的一种实现方式中,本申请的装置还包括模型训练模块,采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型。优选地,机器学习模型为线性判别分析模型。
本申请的一种实现方式中,新的Z值计算模块包括模型输出值分析子模块和Z值印射子模块;模型输出值分析子模块,包括用于将待测样本的胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值;Z值印射子模块,包括用于根据待测样本的模型输出值,以及阳 性阈值、阴性阈值、所有阴性样本的模型输出值的中位数,计算获得待测样本的新的Z值;阳性阈值为阳性样本对应的模型输出值的阈值,阴性阈值是阴性样本对应的模型输出值的阈值。
本申请的一种实现方式中,Z值印射子模块,根据以下方式获得新的Z值,
当待测样本的模型输出值大于阳性阈值时,Znew=LD-cutp+3;
当待测样本的模型输出值小于阳性阈值、且大于阴性阈值时,
当待测样本的模型输出值小于阴性阈值时,
以上公式中,Znew为新的Z值,LD为待测样本的模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为所有阴性样本的模型输出值的中位数;
本申请的一种实现方式中,胎儿染色体非整倍体异常模块,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常,包括,新的Z值大于3判断为阳性,即胎儿染色体非整倍体异常;新的Z值小于1.96判断为阴性,即胎儿染色体正常。
需要说明的是,本申请的装置中,模型训练模块可以根据需求使用,例如在已经获得胎儿染色体非整倍体异常检测模型、阳性阈值、阴性阈值和所有阴性样本的模型输出值的中位数的情况下,其他模块可以直接调用模型和数据;因此,不必每次检测都运行模型训练模块。当然,如果训练样本发生改变,例如增加训练样本,则建议运行模型训练模块,以进一步完善模型和各项数据。
还需要说明的是,本申请检测胎儿染色体非整倍体异常的装置,实际上就是通过各模块实现本申请的检测胎儿染色体非整倍体异常的方法;因此,各模块的具体限定可以参考本申请的检测胎儿染色体非整倍体异常的方法。例如,胎儿DNA浓度、Z值和嵌合度的计算,具体的Znew计算方式、线性判别分析模型、如何根据Znew判断阳性和阴性等,都可以参考本申请的检测胎儿染色体非整倍体异常的方法。
本申请的第四方面公开了一种检测胎儿染色体非整倍体异常的装置,该装置包括存储器和处理器;存储器包括用于存储程序;处理器包括用于通过执行存储器存储的程序以实现本申请的检测胎儿染色体非整倍体异常的方法或者本申请的胎儿染色体非整倍体异常检测模型的构建方法。
可以理解,在本申请的装置通过执行存储器存储的程序以实现本申请的胎儿染色体非整倍体异常检测模型的构建方法时,本申请的装置实际上是一个用 于模型构建的装置,由该装置构建获得的模型可以按照本申请的方法用于检测胎儿染色体非整倍体异常。
本申请的第五方面公开了一种计算机可读存储介质,该存储介质中存储有程序,该程序能够被处理器执行以实现本申请的检测胎儿染色体非整倍体异常的方法或者本申请的胎儿染色体非整倍体异常检测模型的构建方法。
可以理解,在本申请的计算机可读存储介质中存储的程序能够被处理器执行以实现本申请的胎儿染色体非整倍体异常检测模型的构建方法时,本申请的计算机可读存储介质实际上是一个用于模型构建的计算机可读存储介质,该计算机可读存储介质可以直接被使用,以实现胎儿染色体非整倍体异常检测模型的构建,由此构建获得的模型可以按照本申请的方法用于检测胎儿染色体非整倍体异常。
由于采用以上技术方案,本申请的有益效果在于:
本申请检测胎儿染色体非整倍体异常的方法和装置,率先将嵌合度纳入胎儿染色体非整倍体异常检测,综合考虑胎儿DNA浓度、嵌合度和传统Z值三个变量,计算新的Z值。本申请的方法和装置能够提升NIPT检测的准确度,对真阳性和假阳性样本具有很好的区分度,减少假阳性。并且,新的Z值从分布上符合正态分布,不仅能够满足目前监管和临床使用的要求;还能够降低数据分布的波动性,从而降低灰区率、降低重测率,提升检测结果的稳定性。
附图说明
图1是本申请实施例中检测胎儿染色体非整倍体异常的方法的流程框图;
图2是本申请实施例中检测胎儿染色体非整倍体异常的装置的结构框图;
图3是本申请实施例中10240例样本的T13嵌合度分析图;
图4是本申请实施例中10000例样本21号染色体的新的Z值的Q-Q图;
图5是本申请实施例中10000例样本13号染色体的传统Z值和新的Z值的分布图。
具体实施方式
下面通过具体实施方式结合附图对本申请作进一步详细说明。在以下的实施方式中,很多细节描述是为了使得本申请能被更好的理解。然而,本领域技术人员可以毫不费力的认识到,其中部分特征在不同情况下是可以省略的,或者可以由其他装置、材料、方法所替代。在某些情况下,本申请相关的一些操作并没有在说明书中显示或者描述,是为了避免本申请的核心部分被过多的描述所淹没,而对于本领域技术人员而言,详细描述这些相关操作并不是必要的,根据说明书中的描述以及本领域的一般技术知识即可完整了解相关操作。
本申请创造性的将嵌合度作为计算新的Z值的一个变量,以此提升NIPT检测的准确度。因此,本申请提出了一种检测胎儿染色体非整倍体异常的方法,包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常;其中,嵌合度为胎儿异常细胞占所有胎儿细胞的比率。
本申请的一种实现方式中,检测胎儿染色体非整倍体异常的方法,如图1所示,具体包括数据获取步骤11、数据处理步骤12、嵌合度计算步骤13、新的Z值计算步骤14、胎儿染色体非整倍体异常判断步骤15。
其中,数据获取步骤11,包括获取待测孕妇血液游离DNA的高通量测序数据。例如本申请的一种实现方式中,下机数据为测序仪产生的fastq格式的文件。
数据处理步骤12,包括根据获取的待测孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、Z值、每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值。在本申请的一种实现方式中,该步骤包含常规NIPT流程的通用操作,具体包括如下:
A)序列比对及过滤,将测序仪产生的fastq格式文件中包含的序列信息通过公开软件,如BWA(0.7.7-r441),比对至人体参考基因组,如GRCh37/hg19,过滤去除比对质量较差序列、多重比对序列、重复序列、非完美比对序列,留下唯一比对序列,将每一条唯一比对序列的坐标等信息储存在bam格式文件中。
B)窗口划分以及数据矫正,将人类参考基因组划分为60kb左右的窗口,统计每一个60kb的窗口内唯一比对的序列数,作为该窗口原始的深度信息,即窗口深度。进一步,对每个窗口的原始深度进行GC矫正以及样本间矫正,得到每一个窗口矫正后的深度信息(即UR),对一条染色体上所有窗口矫正后的深度取平均,即得到“第k条染色体矫正后的深度的平均值”,本申请计算了所有常染色体校正后的深度的平均值。
C)胎儿DNA浓度计算,本申请对男胎和女胎分别采用不同的计算方式,具体如下:
男胎浓度计算方式如下:
男胎的胎儿浓度通过Y染色体的占比来确定,Y染色体的窗口UR均值除以常染色体的UR均值,再乘以2即为男胎的胎儿浓度FF:
女胎浓度计算方式如下:
女胎的胎儿浓度通过利用胎儿游离DNA在基因组上的非均匀分布建立一个高维回归模型来进行估算,背景假设在于胎儿,无论男胎还是女胎,其胎儿 cfDNA和母体cfDNA在基因组上的分布特征不同。因此使用男胎的Y染色体方法估计的胎儿浓度作为训练模型的输入,利用神经网络机器学习的方法构建出回归模型,具体如下:
其中l为网络的层的序号,第一层为输入层,最后一层为输出层(只有一个神经元),中间为隐藏层。为第l层第j个神经元的数值,为第l-1层第k个神经元的数值,为第l-1层第k个神经元到第l层第j个神经元的连接权重,为第l层第j个神经元的输入偏差。函数f的最常用形式为rectified linear unit,亦即f(x)=max(0,x)。w与b在训练模型时得到。应用模型时,按照以上公式逐层计算神经元的数值,最后一层的神经元数值即为胎儿浓度模型预测值。
D)传统Z值的计算,某一条染色体上所有区间的深度符合正态分布,因此以某一条染色体作为参照,即可利用待测染色体区间深度的分布计算待测染色体的Z值,该Z值即可以作为判定该条染色体是否为三体的依据。
本申请具体的,传统Z值计算方式如下:
常染色体每个窗口矫正后的深度信息UR服从泊松分布,窗口数较大时服从正态分布,对于正常样本,待测染色体UR的分布和参照染色体UR的分布是不存在显著差异的,对于异常样本则存在微小的差异,利用Z检验即可判断出胎儿染色体非整倍体异常,具体如下:
其中:
i号染色体UR的均值;
j号染色体UR的均值;
SDi:表示i号染色体的UR的标准差;
SDj:表示j号染色体的UR的标准差;
Li:表示i号染色体划分的窗口数目;
Lj:表示j号染色体划分的窗口数目;
Zi:表示i号染色体的非整倍体的显著性,反应与整倍性的差异。
上述公式是在同一个样本内部的22条常染色体之间进行相互比较,这样做的背景假设在于,一个样本的绝大部分染色体均应该是正常的二倍体。因此,将目标染色体与其余21条染色体进行21次比较,如果目标染色体是正常的二倍体,则21次Z检验绝大部分的值应该接近于0,取平均就得到一个阴性的Z 值;反之如果目标染色体是三体,则21次Z检验绝大部分的值都远大于0,取平均就得到一个阳性的Z值。
嵌合度计算步骤13,包括根据胎儿DNA浓度计算每条染色体的嵌合度。
例如,具体根据公式一计算每条染色体的嵌合度:
公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;frak采用公式二计算获得:
公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;公式一和公式二中,k的取值为1至22。
Mosaick为0,说明染色体k正常;Mosaick为1,说明胎儿的染色体k完全为三体;Mosaick介于0-1之间,说明胎儿的染色体k存在嵌合。
需要说明的是,嵌合度计算以及将其纳入胎儿染色体非整倍体异常,是本申请的创新改进之一。研究显示,胎儿三体并不都是完全三体的情形,也就是说,并不是胎儿身上的每一个细胞都是三体状态,胎儿细胞一部分是三体状态,一部分非三体状态的这种情形叫做嵌合。胎儿嵌合情况会影响NIPT的检出结果,例如,在固定胎儿DNA浓度的情况下,若胎儿是完全三体,则孕妇外周血中的三体信号较强;若胎儿是嵌合三体,则孕妇外周血中三体信号较弱。由于NIPT涉及到血浆采集、保存、运输、cfDNA分离、建库、上机测序等多个步骤,其中任何一步的细微波动都会导致最终的检测结果产生波动,对于阴性样本来说,可能由于数据波动带来类似低嵌合度的弱三体信号。因此,本申请创造性的提出,对于嵌合度进行定量描述,并进一步明确真阳性样本的嵌合度与由于数据波动造成的弱三体信号的嵌合度之间的差异,以此更好对真阳性和假阳性样本进行区分。
新的Z值计算步骤14,包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;其中,嵌合度为胎儿异常细胞占所有胎儿细胞的比率。
例如,将新的Z值计算步骤14分为模型输出值分析子步骤和Z值印射子步骤。
模型输出值分析子步骤,包括将待测样本的胎儿DNA浓度、传统Z值和嵌合度,输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出 值。其中,胎儿染色体非整倍体异常检测模型是采用若干个已知胎儿染色体非整倍体异常情况的样本作为训练样本,以胎儿DNA浓度、传统Z值和嵌合度为输入,模型输出值为输出,进行机器学习模型训练,获得的模型。
需要说明的是,机器学习模型训练是本申请的另一个创新改进,本申请在训练模型之前,研究发现,胎儿DNA浓度、嵌合度、传统Z值三个变量之间存在非常好的线性关系;因此,将胎儿DNA浓度、嵌合度、传统Z值三个变量放入LDA(线性判别分析)模型中进行模型训练,得到训练后的模型,即胎儿染色体非整倍体异常检测模型。
LDA模型的一般形式如下:
LD=w1a1+w2a2+…+wkak
其中wk为系数,即模型训练得到的模型输出值,而ak为变量,为输入模型的样本信息,在本例即胎儿浓度、传统Z值和嵌合度。因此,模型训练后实际上得到的是胎儿浓度、传统Z值、嵌合度这三个变量的系数,有了这三个系数,再加上样本的胎儿浓度、传统Z值、嵌合度,就能够通过上述公式得到机器学习模型的结果,即模型输出值(LD值)。
Z值印射子步骤,包括根据待测样本的模型输出值,阳性阈值、阴性阈值、所有阴性样本的模型输出值中位数,计算获得待测样本的新的Z值,标记为Znew
需要说明的是,机器学习模型生成的结果不再符合某一具有统计学意义的分布,因此无法像传统Z值一样根据统计学意义来划分阈值,只有通过训练数据的特征进行阈值划定。划定阴性阈值,使得训练数据中所有的真阳性样本均不会被判定为阴性,保证模型不会产生假阴性。划定阳性阈值,使得尽可能多的真阳性样本能够被判断为阳性,同时尽可能少的原始假阳性样本被判断为阳性,从而降低假阳性以提升NIPT检测的性能。阳性阈值与阴性阈值之间为灰区。
机器学习模型生成的结果不再符合某一具有统计学意义的分布,然而实际临床使用中,根据临床的使用习惯以及监管的要求,NIPT三体检测结果必须以Z值的形式进行反馈,且以3作为阳性阈值,如何使得不具有统计学意义的机器学习模型的结果转变为具有统计学意义的Z值是本申请的第三个创新改进。本申请的一种实现方式中采用的机器学习模型是线性模型,使得机器学习模型最终生成的结果能够保持传统Z值的分布特征;因此,本申请创造性的采用印射方法,将模型输出值印射为新的Z值,这样既能够提升NIPT检测的性能,又使得最终的结果具有和Z值类似的分布特征,即符合中心为0的正态分布。
本申请的一种实现方式中,具体印射方法如下:
当待测样本的模型输出值大于阳性阈值时,Znew=LD-cutp+3;
当待测样本的模型输出值小于阳性阈值、且大于阴性阈值时,
当待测样本的模型输出值小于阴性阈值时,
以上公式中,Znew为新的Z值,LD为待测样本的模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为阴性样本的模型输出值中位数。
胎儿染色体非整倍体异常判断步骤15,包括根据新的Z值判断待测胎儿的染色体是否发生非整倍体异常。
本申请的一种实现方式中,通过分段印射得到的新的Z值也符合正态分布,且分布的中心位于0;因此,依然可以用Z>3作为阳性判断值,Z<1.96作为阴性判断值。
基于本申请的检测胎儿染色体非整倍体异常的方法,本申请提出了一种胎儿染色体非整倍体异常检测模型的构建方法,包括采用若干个已知胎儿染色体情况的样本作为训练样本,所述训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型。其中,胎儿DNA浓度、Z值和嵌合度的计算方法都可以参考本申请的检测胎儿染色体非整倍体异常的方法,在此不累述。
本领域技术人员可以理解,上述方法的全部或部分功能可以通过硬件的方式实现,也可以通过计算机程序的方式实现。当上述方法中全部或部分功能通过计算机程序的方式实现时,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器、随机存储器、磁盘、光盘、硬盘等,通过计算机执行该程序以实现上述功能。例如,将程序存储在设备的存储器中,当通过处理器执行存储器中程序,即可实现上述全部或部分功能。另外,当上述实施方式中全部或部分功能通过计算机程序的方式实现时,该程序也可以存储在服务器、另一计算机、磁盘、光盘、闪存盘或移动硬盘等存储介质中,通过下载或复制保存到本地设备的存储器中,或对本地设备的系统进行版本更新,当通过处理器执行存储器中的程序时,即可实现上述方法中全部或部分功能。
因此,基于本申请检测胎儿染色体非整倍体异常的方法,本申请提出了一种检测胎儿染色体非整倍体异常的装置,包括新的Z值计算模块和胎儿染色体非整倍体异常判断模块,新的Z值计算模块包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;嵌 合度为胎儿异常细胞占所有胎儿细胞的比率;胎儿染色体非整倍体异常模块包括根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常。
本申请的一种实现方式中,检测胎儿染色体非整倍体异常的的装置,如图2所示,包括数据获取模块21、数据处理模块22、嵌合度计算模块23、模型训练模块24、新的Z值计算模块25、胎儿染色体非整倍体异常判断模块26。
其中,数据获取模块21,包括用于获取待测孕妇血液游离DNA的高通量测序数据。例如获取测序仪产生的fastq格式的文件。
数据处理模块22,包括用于根据获取的待测孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、传统Z值、每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值。例如参考现有的常规NIPT方案进行胎儿DNA浓度、传统Z值、每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值等的计算。
嵌合度计算模块23,包括根据胎儿DNA浓度计算每条染色体的嵌合度。
例如,根据公式一计算每条染色体的嵌合度;
公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;
frak采用公式二计算获得;
公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;
公式一和公式二中,k的取值为1至22;
Mosaick为0,说明染色体k正常;Mosaick为1,说明胎儿的染色体k完全为三体;Mosaick介于0-1之间,说明胎儿的染色体k存在嵌合。
模型训练模块24,包括采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型;并在进行模型训练后,利用阳性样本获得对应的阳性阈值,利用阴性样本获得对应的阴性阈值,利用所有阴性样本模型输出值获得其中位数。
新的Z值计算模块25,包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;其中,嵌合度为胎儿异常细胞占所有胎儿细胞的比率。
例如,新的Z值计算模块25包括模型输出值分析子模块和Z值印射子模块;模型输出值分析子模块,包括用于将待测样本的胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值;Z值印射子模块,包括用于根据待测样本的模型输出值,以及阳性阈值、阴性阈值、所有阴性样本的模型输出值的中位数,计算获得待测样本的新的Z值;阳性阈值为阳性样本对应的模型输出值的阈值,阴性阈值是阴性样本对应的模型输出值的阈值。
胎儿染色体非整倍体异常判断模块26,包括用于根据新的Z值判断待测胎儿的染色体是否发生非整倍体异常。例如,新的Z值大于3判断为阳性,即胎儿染色体非整倍体异常;新的Z值小于1.96判断为阴性,即胎儿染色体正常。
本申请的另一实现方式中还提供了一种检测胎儿染色体非整倍体异常的装置,该装置包括存储器和处理器;存储器,包括用于存储程序;处理器,包括用于通过执行存储器存储的程序以实现以下方法:根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常;其中,嵌合度为胎儿异常细胞占所有胎儿细胞的比率。或者,具体的用于实现以下方法:数据获取步骤,包括获取待测孕妇血液游离DNA的高通量测序数据;数据处理步骤,包括根据获取的待测孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、传统Z值;嵌合度计算步骤,包括根据胎儿DNA浓度计算每条染色体的嵌合度;模型值分析步骤,包括将待测的胎儿DNA浓度、传统Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值;Z值印射步骤,包括根据待测样本的模型输出值、阳性阈值、阴性阈值、阴性样本的模型输出值中位数,计算获得新的Z值;胎儿染色体非整倍体异常判断步骤,包括根据新的Z值判断待测胎儿的染色体是否发生非整倍体异常。
或者,该装置包括存储器和处理器;存储器,包括用于存储程序;处理器,包括用于通过执行存储器存储的程序以实现以下方法:包括采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型。
本申请另一种实现方式中还提供一种计算机可读存储介质,该存储介质中包括程序,该程序能够被处理器执行以实现如下方法:根据待测样本孕妇血液 游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常;其中,嵌合度为胎儿异常细胞占所有胎儿细胞的比率。或者,具体的用于实现以下方法:数据获取步骤,包括获取待测孕妇血液游离DNA的高通量测序数据;数据处理步骤,包括根据获取的待测孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、传统Z值;嵌合度计算步骤,包括根据胎儿DNA浓度计算每条染色体的嵌合度;模型值分析步骤,包括将待测的胎儿DNA浓度、传统Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值;Z值印射步骤,包括根据待测样本的模型输出值、阳性阈值、阴性阈值、阴性样本的模型输出值中位数,计算获得新的Z值;胎儿染色体非整倍体异常判断步骤,包括根据新的Z值判断待测胎儿的染色体是否发生非整倍体异常。
或者,该存储介质中包括程序,该程序能够被处理器执行以实现如下方法:包括采用若干个已知胎儿染色体情况的样本作为训练样本,训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型。
本申请的方法和装置,与现有技术的不同之处在于:
(1)本申请独创了检测指标——嵌合度,研究发现,嵌合度对于当前仅通过Z值(即传统Z值)方法报出的真阳性和假阳性样本具有很好的区分度。
(2)本申请综合胎儿浓度、本申请独特的指标——嵌合度、以及传统Z值三个变量,并在一种实现方式中,具体选择线性判别分析(LDA)作为机器学习模型,进行模型的训练以及结果的判定。研究发现,该模型的判定结果相比原始结果能够降低假阳性,提升检测效果。
(3)本申请使用的三个变量之间均是线性关系,线性关系简单明确,避免变量过多,变量之间量纲、分布特征不同带来的复杂性。在一种实现方式中采用线性判别分析模型(LDA)模型进行分析,模型简单且不存在过拟合的问题。
(4)本申请研发了一种新的Z值的转换的方法,将机器学习得到的不具备统计学含义的数值转化成临床上常用、且符合监管要求的Z值,即本申请的新的Z值。并且,通过本申请的Z值转换方法得到的新的Z值从分布上符合正态分布,能够满足目前监管和临床使用的要求。
(5)将本申请的新的Z值与传统Z值对比,可以发现新的Z值大大降低了数据分布的波动性,降低灰区、重测,提升检测结果的稳定性。
(6)本申请提供的基于机器学习的方案,在同样考虑多个变量的同时,利用华大基因积累的真实样本数据进行模型训练,使得模型能很好学习和把握华 大基因自身数据由于实验试剂、测序平台等因素产生的独特特征,因而能够更好地运用于华大基因当前实际生产产生的数据当中。可以理解,本申请建立的是一套针对个性化数据进行学习的方法,而并非只能针对华大基因的自身数据。
本申请首先独创了新的检测指标——嵌合度,并进一步综合胎儿DNA浓度、嵌合度、以及传统Z值三个变量,克服传统NIPT仅依靠Z值进行三体判断带来的结果的不准确性;且使用线性模型综合上述三个变量,模型简单且不存在过拟合的问题。进一步的,本申请还研发了一种Z值转换的方法,即Z值印射,将机器学习模型得到的无意义的数值转换成有意义且临床认可的Z值,即Znew,同时降低传统Z值的灰区、重测,提升检测结果的稳定性。
可以理解,在本申请的基础上,不排除还可以采用更多的参数进行模型训练和胎儿染色体非整倍体异常分析,例如考虑孕周、孕妇年龄等变量。当然,变量增加,相应的机器学习模型也需要进行更换,例如采用非线性的QDA模型。此外,本申请具体的Z值分段印射,也可以根据需求进行调整。
实施例1
本例使用建立的胎儿染色体非整倍体异常检测模型对有诊断结果/随访结果的样本进行预测。具体的,本例共采用108293例样本进行模型训练,这些样本在进入模型训练时分为阴性和阳性两类,但其中又包含3类核型,阴性样本包含真阴性和假阳性样本,阳性包含真阳性样本。由于男、女胎的胎儿浓度计算方式不同,导致男、女胎的胎儿浓度数据特征有差异,而胎儿浓度又是模型的关键变量之一,因此分男、女分别训练2个模型。具体样本数如表1所示。
表1用于模型训练的样本
表2用于模型训练的样本数据示例
本例将训练样本的胎儿DNA浓度、传统Z值和嵌合度,如表2所示,输入LDA模型中进行训练,获得模型输出值。将阴性样本机器学习得到的值取中位数,即得到“模型输出值中位数”,即后续印射公式中的Med。本例计算获得的中位数如表3所示。
表3模型输出值的中位数
在印射前,通过人为观察真阴性、假阳性与真阳性样本的分布,划定LD值的阈值,使得:1.真阳性样本均不会被判定为阴性;2.尽可能多的真阳性样本被判断为阳性;3.尽可能少的假阳性样本被判断为阳性。根据上述原则划定LD值的阈值,即印射公式中的阳性阈值(cutp)和阴性阈值(cutn),本例的具体值如表4所示。
表4LD值的阈值
在经过印射后,即取临床上常用的1.96和3作为新的Z值的阈值,即获得如下印射方法:
当模型输出值大于阳性阈值时,Znew=LD-cutp+3;
当模型输出值小于阳性阈值、且大于阴性阈值时,
当模型输出值小于阴性阈值时,
以上公式中,Znew即新的Z值,LD为模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为阴性样本的模型输出值中位数。
选取华大基因在实际临床应用中检测的,并且进行了产前诊断/产后随访的样本共10240例。这些样本在实际临床检测中依据传统Z值给出检测结果,并 依据检测结果进行后续产前诊断/产后随访,因此根据每个样本的检测结果与产前诊断/产后随访的结果可以将每一个样本归为真阳性、假阳性、真阴性3类,具体样本信息如表5所示。
其中,传统Z值计算方式如下:
其中:
i号染色体UR的均值;
j号染色体UR的均值;
SDi:表示i号染色体的UR的标准差;
SDj:表示j号染色体的UR的标准差;
Li:表示i号染色体划分的窗口数目;
Lj:表示j号染色体划分的窗口数目;
Zi:表示i号染色体的非整倍体的显著性,反应与整倍性的差异。
表5传统Z值给出的三体检测结果
表6用于模型测试的样本数据示例
可以看到,依据传统Z值进行检测,T21、T18、T13的阳性预测值分别为0.86、0.58和0.36,假阳性问题较为突出。
采用本申请的胎儿染色体非整倍体异常检测模型和检测胎儿染色体非整倍体异常的方法,计算上述10240例样本的嵌合度,以T13为例,结果如图3所示,嵌合度能够较好地区分真阳性、假阳性以及真阴性样本。进一步将嵌合度、胎儿浓度、传统Z值三个变量输入训练好的机器学习模型中,再通过Z值印射 生成新的Z值。用于模型测试的部分样本数据如表6所示。通过新的Z值对上述10240例样本进行重新判定,以Z>3判定为阳性,Z<1.96判定为阴性生成新的检测结果,结果表7所示。
表7改进的胎儿染色体非整倍体异常检测方法给出的三体检测结果
表7的结果显示,使用新的Z值将14例T21假阳性、33例T18假阳性以及39例T13假阳性全部正确判定为阴性,同时87例T21真阳性、45例T18真阳性、22例T13真阳性以及10000例真阴性样本依然能够正确判定,因此T21、T18、T13的阳性预测值均达到100%,灵敏度为100%,特异性为100%,保证灵敏度的同时大幅降低检测的假阳性、提升PPV以及特异性。
实施例2
本例使用建立的模型对产线连续样本进行检测。
由于诊断/随访结果的搜集等因素,有核型的样本并不是单一中心连续样本,因此其在数据分布上的特征并不能反应人群真实的分布特征,因此无法评估新的Z值真实的分布特征。因此利用一段时间内,华大基因某一医检所收到的连续样本,对于得到的新的Z值的分布特征进行评估,并与传统Z值进行比较,以展示新的Z值在实际使用中的真实特征与规律。
抽取华大基因某一单一医检所某一时间段内进行了临床检测的10000例连续样本,利用本申请的胎儿染色体非整倍体异常检测模型和检测胎儿染色体非整倍体异常的方法对这10000例样本计算新的Z值,以21号染色体的Z值为例,查看21号染色体Z值的分布是否符合正态分布,结果如图4所示。图4的结果显示,10000例单一中心、连续时间段内的样本新的Z值基本位于Q-Q图的对角线上,其中个别偏离Q-Q图对角线较多的样本是信号较强的阳性样本,图4显示出新的Z值具有非常好的正态性。
进一步对比新的Z值与传统Z值的分布,以13号染色体的Z值为例,如图5所示。图5的结果显示,首先,新的Z值分布的中心更接近于0,表明新的Z值相比传统Z值更符合以0为中心的正态分布。其次,新的Z值分布相比传统Z值更加集中,说明新的Z值波动性相比传统Z值更低,稳定性更佳。
新的Z值相比传统Z值波动更小,可以带来灰区率下降的效果。本例进一步用更大样本量证明这一点。具体的,取华大基因某一单一医检所2020年全年 检测的360786例临床样本,这些样本共进行了383306次检测。新的Z值在383306次检测中产生了785次T21灰区、345次T18灰区以及288次T13灰区,T21、T18、T13的灰区率分别为0.22%、0.09%、0.08%,三体检测的整体灰区率为0.39%。相比之下,传统Z值产生了3071次T21灰区、4350次T18灰区以及2335次T13灰区,T21、T18、T13的灰区率分别为0.80%、1.14%、0.61%,三体检测的整体灰区率为2.55%,如表8所示。
表8传统Z值和新的Z值灰区样本数和灰区率对比结果
表8的结果显示,通过本申请的方法生成的新的Z值可以将三体检测的灰区率下降到之前的约十分之一,大幅降低由于灰区导致的重测,提升NIPT的检测性能。
以上内容是结合具体的实施方式对本申请所作的进一步详细说明,不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干简单推演或替换。

Claims (11)

  1. 一种检测胎儿染色体非整倍体异常的方法,其特征在于:包括根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,根据所述新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常;
    所述嵌合度为胎儿异常细胞占所有胎儿细胞的比率。
  2. 根据权利要求1所述的方法,其特征在于:根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值,包括将胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值,由模型输出值印射获得待测样本的新的Z值;
    所述胎儿染色体非整倍体异常检测模型是采用若干个已知胎儿染色体情况的样本作为训练样本,所述训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型;
    优选地,由模型输出值印射获得待测样本的新的Z值,包括根据待测样本的模型输出值、阳性阈值、阴性阈值、所有阴性样本的模型输出值的中位数,计算获得待测样本的新的Z值;
    所述阳性阈值为阳性样本对应的模型输出值的阈值,所述阴性阈值是阴性样本对应的模型输出值的阈值;
    优选地,所有阴性样本的模型输出值的中位数,是把所有阴性训练样本再次输入胎儿染色体非整倍体异常检测模型中,获得的所有阴性样本的模型输出值的中位数;
    优选地,由模型输出值印射获得待测样本的新的Z值,包括以下印射方式,
    当待测样本的模型输出值大于阳性阈值时,Znew=LD-cutp+3;
    当待测样本的模型输出值小于阳性阈值、且大于阴性阈值时,
    当待测样本的模型输出值小于阴性阈值时,
    以上公式中,Znew为新的Z值,LD为待测样本的模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为所有阴性样本的模型输出值的中位数;
    优选地,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常,包括,新的Z值大于3判断为阳性,即胎儿染色体非整倍体异常;新的Z值小于1.96判断为阴性,即胎儿染色体正常;
    优选地,所述机器学习模型为线性判别分析模型;
    优选地,所述胎儿异常细胞为含有胎儿染色体非整倍体异常的细胞;
    优选地,孕妇血液游离DNA中的胎儿DNA浓度、Z值,通过孕妇血液游离DNA的高通量测序数据计算获得。
  3. 根据权利要求1或2所述的方法,其特征在于:所述嵌合度由公式一计算获得;
    公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;
    frak采用公式二计算获得;
    公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;
    公式一和公式二中,k的取值为1至22;
    Mosaick为0,说明胎儿的第k条染色体正常;Mosaick为1,说明胎儿的第k条染色体完全为三体;Mosaick介于0-1之间,说明胎儿的第k条染色体存在嵌合;
    优选地,每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值,通过孕妇血液游离DNA的高通量测序数据计算获得。
  4. 一种胎儿染色体非整倍体异常检测模型的构建方法,其特征在于:包括采用若干个已知胎儿染色体情况的样本作为训练样本,所述训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此训练获得的模型,即胎儿染色体非整倍体异常检测模型。
  5. 根据权利要求4所述的构建方法,其特征在于:所述胎儿DNA浓度和Z值,根据孕妇血液游离DNA的高通量测序数据计算获得;所述嵌合度为胎儿异常细胞占所有胎儿细胞的比率;
    优选地,所述胎儿异常细胞为含有胎儿染色体非整倍体异常的细胞;
    优选地,所述嵌合度由公式一计算获得;
    公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;
    frak采用公式二计算获得;
    公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;
    公式一和公式二中,k的取值为1至22;
    Mosaick为0,说明胎儿的第k条染色体正常;Mosaick为1,说明胎儿的第k条染色体完全为三体;Mosaick介于0-1之间,说明胎儿的第k条染色体存在嵌合;
    优选地,每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值,根据孕妇血液游离DNA的高通量测序数据计算获得;
    优选地,所述机器学习模型为线性判别分析模型。
  6. 一种检测胎儿染色体非整倍体异常的装置,其特征在于:包括新的Z值计算模块和胎儿染色体非整倍体异常判断模块;
    所述新的Z值计算模块,包括用于根据待测样本孕妇血液游离DNA中的胎儿DNA浓度、Z值、嵌合度,计算获得待测样本的新的Z值;所述嵌合度为胎儿异常细胞占所有胎儿细胞的比率;
    所述胎儿染色体非整倍体异常模块,包括用于根据所述新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常。
  7. 根据权利要求6所述的装置,其特征在于:所述新的Z值计算模块,还包括用于将胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值,由模型输出值印射获得待测样本的新的Z值;
    所述胎儿染色体非整倍体异常检测模型是采用若干个已知胎儿染色体情况的样本作为训练样本,所述训练样本包括胎儿染色体非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,由此获得的模型;所述模型输出值用于综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况。
  8. 根据权利要求7所述的装置,其特征在于:还包括模型训练模块,采用若干个已知胎儿染色体情况的样本作为训练样本,所述训练样本包括胎儿染色体 非整倍体异常的阳性样本和阴性样本,以胎儿DNA浓度、Z值和嵌合度为输入,进行机器学习模型训练,获得一个综合胎儿DNA浓度、Z值和嵌合度三个变量表征胎儿染色体情况的模型输出值,由此获得的模型,即胎儿染色体非整倍体异常检测模型;
    优选地,所述机器学习模型为线性判别分析模型;
    优选地,所述新的Z值计算模块包括模型输出值分析子模块和Z值印射子模块;所述模型输出值分析子模块,包括用于将待测样本的胎儿DNA浓度、Z值和嵌合度输入胎儿染色体非整倍体异常检测模型,获得待测样本对应的模型输出值;所述Z值印射子模块,包括用于根据待测样本的模型输出值,以及阳性阈值、阴性阈值、所有阴性样本的模型输出值的中位数,计算获得待测样本的新的Z值;所述阳性阈值为阳性样本对应的模型输出值的阈值,所述阴性阈值是阴性样本对应的模型输出值的阈值;
    优选地,所述Z值印射子模块,根据以下方式获得新的Z值,
    当待测样本的模型输出值大于阳性阈值时,Znew=LD-cutp+3;
    当待测样本的模型输出值小于阳性阈值、且大于阴性阈值时,
    当待测样本的模型输出值小于阴性阈值时,
    以上公式中,Znew为新的Z值,LD为待测样本的模型输出值,cutp为阳性阈值,cutn为阴性阈值,Med为所有阴性样本的模型输出值的中位数;
    优选地,所述胎儿染色体非整倍体异常模块中,根据新的Z值判断待测样本的胎儿染色体是否发生非整倍体异常,包括,新的Z值大于3判断为阳性,即胎儿染色体非整倍体异常;新的Z值小于1.96判断为阴性,即胎儿染色体正常。
  9. 根据权利要求6所述的装置,其特征在于:还包括数据获取模块,用于获取待测样本的孕妇血液游离DNA的高通量测序数据;
    优选地,还包括数据处理模块,用于根据获取的孕妇血液游离DNA的高通量测序数据,计算胎儿DNA浓度、Z值;
    优先地,所述数据处理模块还包括用于根据获取的待测孕妇血液游离DNA的高通量测序数据,计算每条染色体矫正后的深度的平均值、所有常染色体校正后的深度的平均值;
    优选地,还包括嵌合度计算模块,用于根据公式一计算每条染色体的嵌合度;
    公式一中,Mosaick为第k条染色体的嵌合度,frak为第k条染色体的相对胎儿浓度,FF为胎儿DNA浓度;
    frak采用公式二计算获得;
    公式二中,frak为第k条染色体的相对胎儿浓度,为第k条染色体矫正后的深度的平均值,为所有常染色体校正后的深度的平均值;
    公式一和公式二中,k的取值为1至22;
    Mosaick为0,说明胎儿的第k条染色体正常;Mosaick为1,说明胎儿的第k条染色体完全为三体;Mosaick介于0-1之间,说明胎儿的第k条染色体存在嵌合。
  10. 一种检测胎儿染色体非整倍体异常的装置,其特征在于,所述装置包括:
    存储器,用于存储程序;
    处理器,用于通过执行所述存储器存储的程序以实现权利要求1-3任一项所述的检测胎儿染色体非整倍体异常的方法或者权利要求4或5所述的胎儿染色体非整倍体异常检测模型的构建方法。
  11. 一种计算机可读存储介质,其特征在于:包括程序,所述程序能够被处理器执行以实现权利要求1-3任一项所述的检测胎儿染色体非整倍体异常的方法或者权利要求4或5所述的胎儿染色体非整倍体异常检测模型的构建方法。
PCT/CN2023/080510 2022-07-13 2023-03-09 检测胎儿染色体非整倍体异常的方法、装置及存储介质 WO2024011929A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210825534.2A CN115223654A (zh) 2022-07-13 2022-07-13 检测胎儿染色体非整倍体异常的方法、装置及存储介质
CN202210825534.2 2022-07-13

Publications (1)

Publication Number Publication Date
WO2024011929A1 true WO2024011929A1 (zh) 2024-01-18

Family

ID=83611265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080510 WO2024011929A1 (zh) 2022-07-13 2023-03-09 检测胎儿染色体非整倍体异常的方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN115223654A (zh)
WO (1) WO2024011929A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223654A (zh) * 2022-07-13 2022-10-21 深圳华大基因股份有限公司 检测胎儿染色体非整倍体异常的方法、装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017009372A2 (en) * 2015-07-13 2017-01-19 Cartagenia Nv System and methodology for the analysis of genomic data obtained from a subject
CN107133495A (zh) * 2017-05-04 2017-09-05 北京医院 一种非整倍性生物信息的分析方法和分析系统
CN112669901A (zh) * 2020-12-31 2021-04-16 北京优迅医学检验实验室有限公司 基于低深度高通量基因组测序的染色体拷贝数变异检测装置
CN115223654A (zh) * 2022-07-13 2022-10-21 深圳华大基因股份有限公司 检测胎儿染色体非整倍体异常的方法、装置及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017009372A2 (en) * 2015-07-13 2017-01-19 Cartagenia Nv System and methodology for the analysis of genomic data obtained from a subject
CN107133495A (zh) * 2017-05-04 2017-09-05 北京医院 一种非整倍性生物信息的分析方法和分析系统
CN112669901A (zh) * 2020-12-31 2021-04-16 北京优迅医学检验实验室有限公司 基于低深度高通量基因组测序的染色体拷贝数变异检测装置
CN115223654A (zh) * 2022-07-13 2022-10-21 深圳华大基因股份有限公司 检测胎儿染色体非整倍体异常的方法、装置及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XU, XUPING: "Methods to Quantify Cell-free Fetal DNA Fraction in Maternal Plasma: Its Application in Non-invasive Prenatal Chromosomal Aneuploidy Detection Using Next Generation Sequencing", MEDICINE & PUBLIC HEALTH, CHINA MASTER’S THESES FULL-TEXT DATABASE, no. 201901, 15 January 2019 (2019-01-15) *
YANG, JIANFENG ET AL.: "Improving the calling of non-invasive prenatal testing on 13-/18-/21-trisomy by support vector machine discrimination", PLOS ONE, DOI:10.1371/JOURNAL.PONE.0207840, vol. 13, no. 12, 5 December 2018 (2018-12-05), XP093068911, DOI: 10.1371/journal.pone.0207840 *

Also Published As

Publication number Publication date
CN115223654A (zh) 2022-10-21

Similar Documents

Publication Publication Date Title
Nicolaides et al. First‐trimester contingent screening for trisomy 21 by biomarkers and maternal blood cell‐free DNA testing
Kagan et al. First‐trimester contingent screening for trisomies 21, 18 and 13 by fetal nuchal translucency and ductus venosus flow and maternal blood cell‐free DNA testing
Liu et al. Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor
Seidman et al. Rapid, phase-free detection of long identity-by-descent segments enables effective relationship classification
WO2018161245A1 (zh) 一种染色体变异的检测方法及装置
JP6623400B2 (ja) 染色体異数性を測定するためのキット、装置及び方法
CN107133491B (zh) 一种获取胎儿游离dna浓度的方法
WO2024011929A1 (zh) 检测胎儿染色体非整倍体异常的方法、装置及存储介质
Lu et al. Noninvasive prenatal testing for assessing foetal sex chromosome aneuploidy: a retrospective study of 45,773 cases
JP7467504B2 (ja) 染色体異数性を判定するためおよび分類モデルを構築するための方法およびデバイス
KR20230110615A (ko) 태아 염색체 이상을 검출하는 방법 및 시스템
CN112331340B (zh) 育龄夫妇妊娠概率的智能预测方法及系统
CN110580934A (zh) 一种基于外周血游离dna高通量测序预测妊娠期相关疾病的方法
CN110191964B (zh) 确定生物样本中预定来源的游离核酸比例的方法及装置
WO2017059185A1 (en) Detection systems using fingerprint images for type 1 diabetes mellitus and type 2 diabetes mellitus
Wei et al. The value of exome sequencing in thoracoamniotic shunt for severe pleural effusion with fetal hydrops: a retrospective clinical study
Boddupally et al. Artificial Intelligence for Prenatal Chromosome Analysis
Liu et al. Amniotic fluid karyotype analysis and prenatal diagnosis strategy of 3117 pregnant women with amniocentesis indication
CA2695080A1 (en) Testing process
Anandakumar et al. The sensitivity of the trivariate analysis using maternal serum alpha-feto protein, human chorionic gonadotrophin and maternal age in screening for fetal aneuploidy in mothers above the age of 35
KR102532991B1 (ko) 태아의 염색체 이수성 검출방법
CN109686401B (zh) 一种识别异源低频基因组信号唯一性的方法及其应用
Anuwutnavin et al. Reference Ranges and Development Patterns of Fetal Myocardial Function Using Speckle Tracking Echocardiography in Healthy Fetuses at 17 to 24 Weeks of Gestation
WO2023010242A1 (zh) 估计无创产前基因检测数据中胎儿核酸浓度的方法和系统
Wee et al. Automated trisomy 21 assessment based on maternal serum markers using trivariate lognormal distribution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838431

Country of ref document: EP

Kind code of ref document: A1