Disclosure of Invention
The present invention aims to overcome at least one of the deficiencies of the prior art and to provide a system for non-invasive prenatal detection of a fetus.
The technical scheme adopted by the invention is as follows:
a system for noninvasive prenatal detection of a fetus comprising a sequencing device, a data analysis device, and a result output device, wherein:
the sequencing device is used for determining sequence information of a free DNA sequencing library in peripheral blood of the pregnant woman to obtain a sequencing result;
the result output device is used for outputting the analysis result of the data analysis device;
the analysis method of the data analysis device comprises the following steps:
comparing the sequencing result to a human reference genome, constructing a unique comparison sequencing sequence set, and recording the position information of each sequencing sequence comparison;
cutting a reference genome according to a unit window length, and dividing the reference genome into a plurality of primary windows;
counting the number of uncorrected frequencies of the unique comparison sequences in each window according to the comparison position information of each unique sequence;
correcting the number of uncorrected frequencies according to the GC value to obtain the corrected number of frequencies;
prediction of fetal nucleic acid percentages:
estimating fetal nucleic acid percentage PY from the Y chromosome:
calculating average Depth Y of Y chromosome, wherein the sum of the number of frequencies of the Y chromosome after correction of each window of the Y chromosome and/or the length of the whole chromosome base of the Y chromosome is calculated as Dmate value, and the average Depth Y of the Y chromosome of a normal male is calculated as Dfemale value; the average Y chromosome depth of the sample to be detected is calculated as Dtest, and the fetal nucleic acid percentage PY= [ (Dtest-Dfemale)/Dhale ] -Dfemale of the sample to be detected;
Carrying out principal component analysis on a known normal sample by an unsupervised learning method to obtain a principal component set PCs;
selecting PCs as a main component set, introducing an artificial neural network, and re-weighting the PCs by using a sample set of known fetal Y chromosome fetal nucleic acid percentage PY on a label to construct a weight-main component-PY neural network model;
predicting fetal nucleic acid percentage PF: converting the corrected frequency number of the chromosome window of the sample to be detected into a principal component set PCs through a principal component analysis technology, and then outputting a neuron value PF according to a weight-principal component-PY neural network model;
determination of the anomaly percentage PA of microdeleted microreplicated fragments:
randomly setting a breakpoint position, calculating a significance level p-value by a mathematical test method for the frequency number of windows on the left and right sides of the breakpoint position after correction, and selecting the window as a candidate microdeletion micro-repetition window if the calculated p-value is smaller than a preset significance level value p-set; if the calculated p-value is larger than the set significance level value p-set, continuing to merge the windows for next examination until a micro-missing micro-repeated window to be selected is obtained;
merging the adjacent windows of the determined to-be-selected microdeletion micro-repetition windows to obtain a selected microdeletion micro-repetition region;
Performing a depth depth (abnormal) calculation for each region of the microdeletion microrepeat, depth (abnormal) = corrected number of frequencies falling within the microdeletion microrepeat region/the microdeletion microrepeat region size;
the average depth (normal) calculation was also performed for other regions of the chromosome that do not contain the microdeletion microrepeat, depth (normal) =corrected frequency number falling on the remaining regions of each chromosome that do not contain the microdeletion microrepeat/size of the remaining regions that do not contain the microdeletion microrepeat;
calculating an anomaly percentage pa=2×|depth (normal) -depth (abnormal) |/depth (normal) containing microdeletion microrepeat or a whole chromosome copy number variation, and when depth (normal) -depth (abnormal) is positive, identifying as a preliminary microdeletion variation; when depth (normal) -depth (abnormal) is negative, it is considered a preliminary micro-repeat variation; when depth (normal) -depth (abnormal) is 0, it is judged to be normal;
color body copy number variation judgment:
the percent chromosomal copy number variation was obtained as pa_chri (i=1, 2, … …,22, x, y), comprising the steps of:
performing a duty cycle coverage_chri (i=1, 2, … …, x, y) calculation for each chromosome, coverage_chri=the sum of the number of corrected window frequencies falling on that chromosome/the number of window frequencies falling on that chromosome;
Calculating the occupancy ratio Coverage-normal_chri (i=1, 2, … …, x, y) of each chromosome based on known normal samples, wherein Coverage-normal_chri=the sum of the number of window frequencies after correction falling on the chromosome/the number of window frequencies falling on the chromosome, and averaging the occupancy ratio of the chromosomes of all samples to obtain Average (Coverage-normal_chri) (i=1, 2, … … … …, x, y);
calculating the copy number variation percentage PC_chri of each chromosome of the sample to be detected, wherein PC_chri=2× (coverage_chri-Average (Coverage-normal_chri))/Average (Coverage-normal_chri);
calculating the ratio R of the copy number variation percentage PC of each chromosome of the sample to be detected to the fetal nucleic acid percentage PF, wherein R= |PC|/PF;
determining chromosome copy number variation according to the PC_chri and the R value;
prediction of monogenic genetic disease:
sequentially aligning reads from the same cfDNA at each base site, counting from the first base site of the set of reads, if only less than or equal to 30% of reads are the same base at that site, the base is considered to be background noise; if more than or equal to 70% of reads are the same base at the site, the base type of the site is confirmed; if only 30% -70% of reads contain the same base, then that base is designated as an N base;
The same statistics are carried out on the second base site of the reads until the last site of the reads is finished, and the base sequence of the reads from the same cfDNA molecule is obtained;
aligning the ready base sequence of the sequenced cfDNA molecule to a human reference genome using the bwa aln algorithm;
detecting the base of each covered site, and counting the respective depth and the depth ratio of A, T, G, C, N, insertion, deletion on each site;
selecting a single gene disease position to check the total coverage depth of the position, if the total depth coverage is smaller than 1000X, the quality control cannot pass, and if the total depth is larger than 1000X, the quality control passes;
finding out the pathogenic mutation sites which are definitely required to be observed, and if the depth percentage of the pathogenic mutation is more than 3%, considering that the mutation exists; if the depth percentage of the pathogenic mutation is 1% -3%, judging that the gray area range is needed to be detected again; if the depth percentage of the pathogenic mutation is below 1%, the judgment mutation is absent.
In some examples of the system, the chromosomal microdeletion microreplication variation determination comprises the steps of:
calculating a ratio R of the percent of microdeletion microrepeat abnormalities to the percent of fetal nucleic acid, r=pa/PF;
Judging whether the sample is positive or negative of microdeletion and microduplication variation according to the ratio value R:
if the depth (normal) -depth (abnormal) of the microdeletion repeat region is positive, primarily judging that the microdeletion variation is detected, and then filtering a negative signal through an R value to determine that the microdeletion variation is the final microdeletion positive variation;
if R >5, prompting that the positive signal is possibly from a mother, and whether the fetus carries an unpredictable;
if R is more than or equal to 0.8 and less than or equal to 5, positive variation is determined;
if R is less than or equal to 0.5, determining negative variation;
if R is 0.5< 0.8, the dust area is the dust area, and re-detection is needed;
if depth (normal) -depth (abnormal) of the microdeletion repeat region is negative, primarily judging that the microdeletion repeat region is micro-repeat variation, and filtering a negative signal through an R value to confirm that the microdeletion repeat region is final microdeletion positive variation;
if R >5, prompting that the positive signal is possibly from a mother, and whether the fetus carries an unpredictable;
if R is more than or equal to 0.8 and less than or equal to 5, positive variation is determined;
if R is less than or equal to 0.5, determining negative variation;
if 0.5< R <0.8, it is the gray zone, and re-detection is required.
In some system examples, PCs are selected as a main component set, an artificial neural network with 2 hidden layers is introduced, a method of elastic back propagation with weight backtracking is adopted, and the lossfunction adopts a residual variance sum algorithm, and the PCs are re-weighted by using a sample set of the fetal nucleic acid percentage PY of the known fetal Y chromosome on the label, so as to construct a weight-main component-PY neural network model.
In some examples of systems, the criteria for chromosomal copy number variation are as follows:
and (3) three-body judgment:
if PC_chri >0.03 and R >1.8, reporting trisomy positive, and prompting mother or placenta effect;
if PC_chri >0.03, and 0.2< R <1.8, then it is determined as chri trisomy positive;
if PC_chri >0.03,0.1< R <0.2, determining as an ash zone, and requiring re-determination;
if PC_chri is less than or equal to 0.03 and R <0.1, determining that the report is negative;
determination of XO:
when PC_chrX is less than or equal to-0.03, -0.01< PC_chrY <0.01, and R (i.e., |PC_chrX|/PF) is more than or equal to 0.8, determining positive;
when-0.03 < PC_chrX <0.03, -0.01< PC_chrY <0.01, R is less than 0.3, then judging as negative;
determination of XXX:
when PC_chrX >0.03, -0.01< PC_chrY <0.01, R (i.e., |PC_chrX|/PF) is larger than or equal to 0.8, the positive result is judged;
when-0.03 < PC_chrX <0.03, -0.01< PC_chrY <0.01, R is less than 0.3, then judging as negative;
determination of XXY:
when-0.01 < pc_chrx <0.01 and pc_chry > =0.04, then it is determined as positive;
determination of XYY:
when PC_chrX >3% and PC_chrY >3% and the ratio of PC_chrY to PC_chrX is greater than 1.7, then XYY positive is determined.
In some system examples, the unit window is 100k to 5 mbp in length. The length of the unit window can be adjusted accordingly according to the depth, quality and the like of the sequencing.
In some system examples, when the calculated fetal nucleic acid percentage PF is less than 3.5%, the confidence of the result is determined and the peripheral blood sample is re-obtained.
In some system examples, the number of uncorrected frequencies is corrected by a model of linear regression based on GC values and as-batch systematic errors.
In some system examples, p-set is less than or equal to 0.05. This is a common significance criterion. The value may be adjusted as desired.
In some examples of systems, the method of constructing a sequencing library comprises the steps of:
extracting free DNA from a pregnant woman peripheral blood sample;
performing end repair, A adding, connector adding and PCR amplification on the free DNA, wherein the implementation method of PCR amplification is one of the following 4 methods:
1) Simultaneously adding a gene specific primer, a specific joint and a specific barcode primer, wherein the gene specific primer is combined with a plasma free DNA template to form T m T with specific adapter and specific barcode primer combined with gene specific primer and plasma free DNA template amplified product m The value is 2-6 ℃, the PCR process uses high annealing temperature to amplify the gene specific primer to a set concentration, and uses low annealing temperature to amplify the specific joint and the specific barcode primer, and finally forms a complete library; or (b)
2) Synthesizing a fusion primer, namely, a primer contains a gene specific module and a specific joint or a specific barcode primer module, forming an upstream primer and a downstream primer, amplifying and enriching a plasma free DNA template, and finally forming a complete library; or (b)
3) The upstream gene specific primer and the specific joint form an upstream fusion primer, a downstream gene specific primer and a specific barcode primer, when the amplification is started, the high annealing temperature is used firstly, the fusion primer and the downstream gene specific primer are utilized to amplify the plasma free DNA template, and then the low annealing temperature is used to amplify the upstream fusion primer and the specific barcode primer, so that a complete library is finally formed; or (b)
4) The downstream gene specific primer and the specific joint form a downstream fusion primer, an upstream gene specific primer and a specific barcode primer, when the amplification is started, the high annealing temperature is used firstly, the downstream fusion primer and the upstream gene specific primer are utilized for amplifying the plasma free DNA template, and then the low annealing temperature is used for amplifying the downstream fusion primer and the specific barcode primer, so that a complete library is finally formed.
In some system examples, downstream gene specific primers and specific barcode primer T m The difference value of the values is 3-7 ℃; upstream Gene-specific primer and T of specific barcode primer m The difference in values is 3-7 ℃. Such amplification is more effective.
The free DNA length of the fetus is typically less than 200 bp, and in some examples of systems, fragments above 200 bp are screened out after the free DNA is obtained. This can increase the concentration of fetal free DNA.
The beneficial effects of the invention are as follows:
the system of the invention can detect fetal chromosomal aneuploidy, microdeletion microreplication syndrome and single gene dominant genetic disease simultaneously. Compared with the prior art, the detection of the microdeletion microreplication syndrome is realized on the basis that the detection cost and the detection time are hardly increased.
The system of the invention uses a one-step method for multiple amplification and library establishment to complete noninvasive detection of fetal single-gene dominant genetic diseases. Multiple monogenic genetic diseases can be detected at one time, and theoretically all monogenic dominant genetic diseases with deterministic mutation sequences (point mutations, indels, etc.) can be detected.
According to the system disclosed by the invention, the whole genome library and the multiple one-step library are mixed for sequencing, namely 1 sample is shared by 1 sample and 1 sample barcode, so that limited samples which can be sequenced in one sequencing run due to insufficient sample barcode can not be generated, and the sequencing cost is high. The scheme is simple and easy to use, and meets clinical timeliness and practicality.
The system of the invention can meet the requirements of fragment and site specificity analysis when the sequencing depth of a single-gene disease specific region reaches 1000x, and hardly increases the sequencing cost.
The system of the invention can obviously improve the concentration of the free DNA of the fetus, reduce the probability of resampling, and further reduce multiparty resource and cost consumption; even meets the detection requirements of pregnant women, part of which cannot be subjected to conventional NIPT detection.
The system of the invention adopts an analysis algorithm which is developed independently. The detection result of the invention is obviously improved for the micro-missing micro-repetition with lower common detection precision. And the fetal concentration enrichment method and the neural network are innovatively applied to estimate the total fetal DNA percentage ratio, and the data display can improve the detection accuracy to 2M resolution.
Detailed Description
The technical scheme of the invention is further described below by combining examples.
The following examples are described with respect to a Hua Dazhi-build (MGI) high throughput sequencing platform. Of course, other high throughput sequencing platforms may be used.
Enrichment of fetal DNA:
the plasma free DNA was extracted using magnetic beads according to a conventional procedure, and the plasma free DNA solution after extraction was subjected to a large fragment removal treatment. And (3) carrying out specific fragment screening on the product by using prepared magnetic beads and buffer solution, and removing fragments with more than 200bp in plasma free DNA and greatly retaining the plasma free DNA of small fragments through 2-step magnetic bead screening.
Comparison of different sequencing library construction methods:
performing end repair, A addition, joint addition and PCR amplification on the obtained part of the sample after the fetal DNA enrichment is completed (the process is marked as a first part); multiple one-step libraries (this procedure is labeled as the second part) may use samples after fetal DNA enrichment or samples without fetal DNA enrichment. The first part completes the detection of aneuploidy and microdeletion microduplications. The second part completes the detection of single gene dominant genetic disease.
Alternatively, the extracted plasma free DNA is split into 2 fractions, and the first fraction is subjected to free fetal DNA enrichment followed by end repair, a-addition, linker-addition, PCR amplification of the fragment of interest. The other part directly enters a second part of the multiple one-step method library establishment program without free fetal DNA enrichment; alternatively, the second fraction uses a sample enriched in free fetal DNA as a template.
Adding enzyme, dNTP, buffer solution and the like into the free DNA of the blood plasma to repair the tail end; adding the linker and the linker ligase into the repaired sample to connect the target fragment linker, purifying by using magnetic beads, and removing enzyme mixed solution and the unconnected linker. And then carrying out PCR amplification on the product of the last step by using enzyme, dNTP, buffer solution and specific primer. The amplified product was purified using magnetic beads, the enzyme cocktail and excess primer were removed, and the product was quantified.
The second part is multiple one-step library building. In the invention, optionally, specific molecular tags UMI of 6-8bp are added, and specific molecular tags of 4096-65536 can be formed by adding specific molecular tags UMI of 6-8 bp. The detection template is free DNA of fetus in peripheral blood plasma of pregnant woman, 4096-65536 specific molecular labels are enough to make each target fragment labeled with specific UMI. UMI is introduced at both ends of the target fragment at the beginning of fetal free DNA amplification, and the same UMI is labeled at the time of subsequent target fragment re-amplification, i.e., a single molecule is replicated to thousands of molecules with the same UMI label. The sequence of interest to which UMI is added can be assembled by identifying specific UMI sequences in subsequent analyses, i.e.UMI can help to identify errors in the amplification process and the sequencing process.
Firstly, downloading a gene sequence related to a single-gene genetic disease, and then designing a primer aiming at a mutation site/deletion or repeated fragment so as to ensure that the designed primer can amplify mutant fragments and wild fragments simultaneously. That is, one solution provided by the present invention is to amplify and enrich fragments (including wild type and mutant) containing mutation sites using a multiple one-step method, and then to high-throughput sequence the amplified products, and to analyze the obtained sequencing results. The second partial protocol was designed to detect single-gene dominant inherited diseases, such as GG (allele ratio 100%, homozygous) at the c.1138 locus on FGFR3 gene, which could potentially lead to disease if the genotype was mutated to GT. The inventors have analyzed the percentage of allele G, A, and the percentage of the other pathogenic allele, named pathologic variant allele percentage, by sequencing data, B. When the value of B is 3% or more, the pathogenic mutation is considered to be present. If the value of B is 1% -3%, judging that the gray area is in the range, and detecting again. If the value of B is 1% or less, the discrimination mutation does not exist.
The second part of the process is to use enzyme, dNTP, buffer and multiple primer to enrich the specific fragment of the free DNA in plasma, and complete the process of free DNA in plasma and library out in one step of experiment. There are 3 implementations of this part:
Firstly, adding a gene specific primer, a specific joint and a specific barcode primer at the same time, setting the TM value of the combination of the gene specific primer and a plasma free DNA template to be higher than the TM value of the combination of a product amplified by the gene specific primer and the plasma free DNA template, the specific joint and the specific barcode primer, wherein the temperature is 4+/-2 ℃ higher, the PCR process firstly uses high annealing temperature to preferentially amplify the gene specific primer, and after a few cycles (generally 6-8 cycles), then uses low annealing temperature to amplify the specific joint and the specific barcode primer, thus forming a complete library.
Secondly, synthesizing a fused primer, namely, a primer containing a gene specific module and a specific joint or a specific barcode primer module, forming an upstream primer and a downstream primer, amplifying and enriching a plasma free DNA template, and finally forming a complete library.
Thirdly, the upstream gene specific primer and the specific joint form a fusion primer, the downstream gene specific primer and the specific barcode primer are respectively 2 primers, when the amplification is started, the high annealing temperature is used first, the amplification of the upstream fusion primer and the downstream gene specific primer to the plasma free DNA template is preferentially carried out, and then the low annealing temperature is used, so that the amplification of the upstream fusion primer and the specific barcode primer is carried out. Meanwhile, in the scheme, a fusion primer can be formed by the downstream gene specific primer and the specific joint, the upstream gene specific primer and the specific primer are respectively 2 primers, when the amplification is started, high annealing temperature is firstly used, the amplification of the downstream fusion primer and the upstream gene specific primer to the plasma free DNA template is preferentially carried out, and then the low annealing temperature is used, so that the amplification of the downstream fusion primer and the specific primer is carried out, and finally the complete library is formed.
Meanwhile, in all the implementation modes, a specific recognition tag of 4-8bp is added at the 5' end of a gene specific primer, and the tag is screened by using an information analysis method, wherein the screening principle is as follows: 1) The tag is not present in the amplified target gene; 2) The probability of occurrence in the human genome is low, preferably no sequences in the genome are present, and the probability of occurrence is selected to be the lowest. Preferably, the specific recognition tag is selected to be 5bp in length. The specific identification tag can distinguish and separate the first part and the second part of products in the final sequencing stage, and the addition of the tag brings great convenience in the actual use process.
The enrichment of the target fragments of the plasma free DNA and the construction of the library are completed by a multiple one-step method, the amplified products are purified by using magnetic beads, enzyme mixed solution and redundant primers are removed, and then the products are quantified.
Meanwhile, the inventors compared the results of this system with the conventional PCR system by the multiple one-step method (the first implementation method described above).
The inventors used plasma DNA sample 1, plasma DNA sample 2, and plasma DNA sample 3 to perform multiplex PCR according to a conventional method, first using gene-specific primers, purifying amplified products using magnetic beads, removing enzyme mixed solution and redundant primers, amplifying the products of multiplex PCR using linker primers and barcode primers, and completing library construction, and then performing electrophoresis detection after the completion of this step, as shown in fig. 1. As can be seen from the gel, the amplified product is about 230bp, and there is a primer dimer of greater than 100bp, about 130bp, which is difficult to remove cleanly by 1.0 volume of magnetic beads when the product is purified, while the primer dimer remaining in the sample will affect sequencing, mainly expressed as: 1) Affecting quantification, resulting in unexpected sequencing data; 2) Primer dimers affect other libraries, such as occupancy data; 3) Primer dimer affects sequencing quality.
The product electrophoreses obtained by the inventors, which were initially amplified using the multiplex one-step system of plasma DNA sample 1, plasma DNA sample 2 and plasma DNA sample 3, are shown in FIG. 2, and it can be seen that there are a very large number of non-specific sequences and primer dimers. The inventor performs multiple rounds of testing and optimizing the system. The final system was amplified using plasma DNA sample 1, plasma DNA sample 2, and plasma DNA sample 3. As shown in FIG. 3, the final amplification product of the system used had a single target band of about 230bp, and primer dimer of less than 100bp, about 80bp, which was very easily removed by 1.0 volume of magnetic beads during product purification, as known to those skilled in the art.
In each of the examples below, the specific procedure for plasma free DNA to linker ligation is as follows:
1. extraction of plasma free DNA
The method is strictly operated according to the instruction of a free DNA extraction kit (such as IVD5432, guangdong ear mechanical equipment 20150062) by a magnetic bead method, the elution volume is 65 mu L, the concentration of the extracted DNA is in the range of 0.01-0.2 ng/. Mu.L, and the next library construction operation can be carried out. The obtained DNA sample was dissolved in AE, stored at 2-8deg.C and processed for 7 days, and frozen at-20+ -5deg.C for 3 years. Transporting at low temperature of-20+ -5deg.C for no more than 7 days, and freezing and thawing the extracted DNA sample for no more than 5 times.
2. Fetal DNA enrichment
Dividing the obtained plasma free DNA sample into 2 parts, taking 50 [ mu ] L and entering the following enrichment procedure; the other part directly enters a multiple one-step procedure;
2.1 Adding magnetic beads A (50 mu L) with 1.2 times of volume (60 mu L) into a sample, fully mixing, standing for 5min at room temperature, and placing in a magnetic rack for 3min until the solution is clarified;
2.2 Transferring the supernatant to a corresponding hole filled with 1.0 times of the original sample volume (50 mu L) of magnetic beads B, fully mixing, standing at room temperature for 5min, and placing in a magnetic rack for 3min until the solution is clear; discarding the supernatant;
2.3 Adding 200 μL 80% ethanol, blowing against non-magnetic bead precipitation for 6 times, standing on a magnetic rack until the solution is clear, and discarding supernatant;
2.4 Repeating the step 2.3 for 1 time, standing for 3min at room temperature, airing, and taking down the centrifuge tube from the magnetic rack;
2.5 Adding 27 mu L of an absorption Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
2.6 Taking 25 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
3. A first part: end repair
3.1 Taking a proper amount of plasma free DNA sample to be detected into 1 0.2ml PCR tube, supplementing nucleic-free Water to a total volume of 50 mu L, fully mixing, and performing instantaneous centrifugation;
3.2 Respectively adding 10 mu L of Endprep Mix end repair reaction mixed solution into the PCR tube in the step 2.1, fully and uniformly mixing, performing instantaneous centrifugation, and placing the mixture on a PCR instrument for reaction according to the following procedures:
3.3 After the reaction, the PCR tube was taken out and centrifuged instantaneously.
4. Joint connection
4.1 Preparing a connection reaction mixed solution with the amount required for detection in a centrifuge tube according to the proportion of the table,
and (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
4.2 Adding the prepared connection reaction mixed solution (35 mu L) into the end repair product of each sample, adding 6 mu L of next holy (10 pmol/ul) Adapter X into each sample, fully uniformly mixing, and performing instantaneous centrifugation. The reaction was performed on a PCR instrument according to the following procedure:
note that: one Adapter X was added to each sample.
4.3 After the reaction, the PCR tube was taken out and centrifuged instantaneously.
4.4 Joint ligation product purification
4.4.1 Adding 80 mu L of next holy XP magnetic beads into each sample, fully mixing, standing at room temperature for 5min, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
4.4.2 Adding 200 mu L of 75% ethanol, blowing for 3 times, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
4.4.3 Repeating for 2.2.4 times, standing at room temperature for 3min, airing, and taking down the centrifuge tube from the magnetic rack;
4.4.4 Adding 22 mu L of an absorption Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
4.4.5 Taking 20 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
The specific data analysis method is as follows:
the analysis method comprises the following steps:
comparing the sequencing result to a human reference genome, constructing a unique comparison sequencing sequence set, and recording the position information of each sequencing sequence comparison;
cutting a reference genome according to the length of a unit window, dividing the reference genome into a plurality of first-level windows, wherein the length of the unit window is 100 k-5 Mbp;
counting the number of uncorrected frequencies of the unique comparison sequences in each window according to the comparison position information of each unique sequence;
correcting the number of uncorrected frequencies according to the GC value through a linear regression model and the batch system error;
prediction of fetal nucleic acid percentages:
estimating fetal nucleic acid percentage PY from the Y chromosome:
calculating average Depth Y of Y chromosome, wherein the sum of the number of frequencies of the Y chromosome after correction of each window of the Y chromosome and/or the length of the whole chromosome base of the Y chromosome is calculated as Dmate value, and the average Depth Y of the Y chromosome of a normal male is calculated as Dfemale value; the average Y chromosome depth of the sample to be detected is calculated as Dtest, and the fetal nucleic acid percentage PY= [ (Dtest-Dfemale)/Dhale ] -Dfemale of the sample to be detected;
Carrying out principal component analysis on a known normal sample by an unsupervised learning method to obtain a principal component set PCs;
selecting PCs as a main component set, introducing an artificial neural network of 2 hidden layers, adopting an elastic back propagation method with weight backtracking and a lossfunction adopting a residual variance sum algorithm, and re-weighting the PCs by using a sample set of the known fetal Y chromosome fetal nucleic acid percentage PY on a label to construct a weight-main component-PY neural network model;
predicting fetal nucleic acid percentage PF: converting the corrected frequency number of the chromosome window of the sample to be detected into a principal component collection PCs through a principal component analysis technology, outputting a neuron value PF according to a weight-principal component-PY neural network model, and judging the reliability of a result when the calculated fetal nucleic acid percentage PF is lower than 3.5%, wherein a peripheral blood sample is required to be acquired again;
determination of the anomaly percentage PA of microdeleted microreplicated fragments:
randomly setting a breakpoint position, calculating a significance level p-value by a mathematical test method for the frequency number of windows on the left and right sides of the breakpoint position after correction, and selecting the window as a candidate microdeletion micro-repetition window if the calculated p-value is smaller than a preset significance level value p-set 0.05; if the calculated p-value is greater than the set significance level value p-set 0.05, continuing to merge windows for the next step of inspection until a to-be-selected micro-missing micro-repeated window is obtained;
Merging the adjacent windows of the determined to-be-selected microdeletion micro-repetition windows to obtain a selected microdeletion micro-repetition region;
performing a depth depth (abnormal) calculation for each region of the microdeletion microrepeat, depth (abnormal) = corrected number of frequencies falling within the microdeletion microrepeat region/the microdeletion microrepeat region size;
the average depth (normal) calculation was also performed for other regions of the chromosome that do not contain the microdeletion microrepeat, depth (normal) =corrected frequency number falling on the remaining regions of each chromosome that do not contain the microdeletion microrepeat/size of the remaining regions that do not contain the microdeletion microrepeat;
calculating a ratio R of the percent of microdeletion microrepeat abnormalities to the percent of fetal nucleic acid, r=pa/PF;
judging whether the sample is positive or negative of microdeletion and microduplication variation according to the ratio value R:
if the depth (normal) -depth (abnormal) of the microdeletion repeat region is positive, primarily judging that the microdeletion variation is detected, and then filtering a negative signal through an R value to determine that the microdeletion variation is the final microdeletion positive variation;
if R >5, prompting that the positive signal is possibly from a mother, and whether the fetus carries an unpredictable;
if R is more than or equal to 0.8 and less than or equal to 5, positive variation is determined;
If R is less than or equal to 0.5, determining negative variation;
if R is 0.5< 0.8, the dust area is the dust area, and re-detection is needed;
if depth (normal) -depth (abnormal) of the microdeletion repeat region is negative, primarily judging that the microdeletion repeat region is micro-repeat variation, and filtering a negative signal through an R value to confirm that the microdeletion repeat region is final microdeletion positive variation;
if R >5, prompting that the positive signal is possibly from a mother, and whether the fetus carries an unpredictable;
if R is more than or equal to 0.8 and less than or equal to 5, positive variation is determined;
if R is less than or equal to 0.5, determining negative variation;
if R is 0.5< 0.8, the dust area is the dust area, and re-detection is needed;
when depth (normal) -depth (abnormal) is 0, it is judged to be normal;
chromosome copy number variation determination:
the percent chromosomal copy number variation was obtained as pa_chri (i=1, 2, … …,22, x, y), comprising the steps of:
performing a duty cycle coverage_chri (i=1, 2, … …, x, y) calculation for each chromosome, coverage_chri=the sum of the number of corrected window frequencies falling on that chromosome/the number of window frequencies falling on that chromosome;
calculating the occupancy ratio Coverage-normal_chri (i=1, 2, … …, x, y) of each chromosome based on known normal samples, wherein Coverage-normal_chri=the sum of the number of window frequencies after correction falling on the chromosome/the number of window frequencies falling on the chromosome, and averaging the occupancy ratio of the chromosomes of all samples to obtain Average (Coverage-normal_chri) (i=1, 2, … … … …, x, y);
Calculating the copy number variation percentage PC_chri of each chromosome of the sample to be detected, wherein PC_chri=2× (coverage_chri-Average (Coverage-normal_chri))/Average (Coverage-normal_chri);
calculating the ratio R of the copy number variation percentage PC of each chromosome of the sample to be detected to the fetal nucleic acid percentage PF, wherein R= |PC|/PF;
and (3) three-body judgment:
if PC_chri >0.03 and R >1.8, reporting trisomy positive, and prompting mother or placenta effect;
if PC_chri >0.03, and 0.2< R <1.8, then it is determined as chri trisomy positive;
if PC_chri >0.03,0.1< R <0.2, determining as an ash zone, and requiring re-determination;
if PC_chri is less than or equal to 0.03 and R <0.1, determining that the report is negative;
determination of XO:
when PC_chrX is less than or equal to-0.03, -0.01< PC_chrY <0.01, and R (i.e., |PC_chrX|/PF) is more than or equal to 0.8, determining positive;
when-0.03 < PC_chrX <0.03, -0.01< PC_chrY <0.01, R is less than 0.3, then judging as negative;
determination of XXX:
when PC_chrX >0.03, -0.01< PC_chrY <0.01, R (i.e., |PC_chrX|/PF) is larger than or equal to 0.8, the positive result is judged;
when-0.03 < PC_chrX <0.03, -0.01< PC_chrY <0.01, R is less than 0.3, then judging as negative;
determination of XXY:
when-0.01 < pc_chrx <0.01 and pc_chry > =0.04, then it is determined as positive;
Determination of XYY:
when PC_chrX is more than 3 percent and PC_chrY is more than 3 percent, and the ratio of PC_chrY to PC_chrX is more than 1.7, the result is judged to be XYY positive;
prediction of monogenic genetic disease:
sequentially aligning reads from the same cfDNA at each base site, counting from the first base site of the set of reads, if only less than or equal to 30% of reads are the same base at that site, the base is considered to be background noise; if more than or equal to 70% of reads are the same base at the site, the base type of the site is confirmed; if only 30% -70% of reads contain the same base, then that base is designated as an N base;
the same statistics are carried out on the second base site of the reads until the last site of the reads is finished, and the base sequence of the reads from the same cfDNA molecule is obtained;
aligning the ready base sequence of the sequenced cfDNA molecule to a human reference genome using the bwa aln algorithm;
detecting the base of each covered site, and counting the respective depth and the depth ratio of A, T, G, C, N, insertion, deletion on each site;
selecting a single gene disease position to check the total coverage depth of the position, if the total depth coverage is smaller than 1000X, the quality control cannot pass, and if the total depth is larger than 1000X, the quality control passes;
Finding out the pathogenic mutation sites which are definitely required to be observed, and if the depth percentage of the pathogenic mutation is more than 3%, considering that the mutation exists; if the depth percentage of the pathogenic mutation is 1% -3%, judging that the gray area range is needed to be detected again; if the depth percentage of the pathogenic mutation is below 1%, the judgment mutation is absent.
Example 1:
when a pregnant woman is known to have a single gene dominant disease, it is necessary to detect the genotype of the fetus.
Extracting plasma free DNA, fetal DNA enrichment, first fraction according to the above steps: after the terminal repair and the joint connection, the following operations are further performed:
5. multiplex one-step amplification
5.1 The reaction mixtures were prepared in 200 μl PCR tubes according to the ratios in the table below.
And (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
Note that: the same sample was used as the barcode.
5.2 Adding 2 mu L of free plasma DNA into the prepared reaction mixed solution, and carrying out instantaneous centrifugation after fully and uniformly mixing. The reaction was performed on a PCR instrument according to the following procedure:
after the reaction, the PCR tube was taken out and centrifuged instantaneously.
5.3 Multiple one-step process product purification
5.3.1 Adding 25 mu L of next holy XP magnetic beads into each sample, fully mixing, standing at room temperature for 5min, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.2 Adding 200 mu L of 75% ethanol, blowing for 3 times, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.3 Repeating for 2.2.4 times, standing at room temperature for 3min, airing, and taking down the centrifuge tube from the magnetic rack;
5.3.4 Adding 17 mu L of the solution Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
5.3.5 Taking 15 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
6.YS-PCR
6.1 preparing PCR reaction mixed solution with the amount required for detection in a 200 mu L PCR tube according to the proportion of the table below;
and (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
6.2 adding the prepared 30 mu L PCR reaction mixed solution into the joint product obtained in 3.4.5, fully and uniformly mixing, and then carrying out instantaneous centrifugation. The reaction was performed on a PCR instrument according to the following procedure:
after the reaction, the PCR tube was taken out and centrifuged instantaneously.
6.3 YS-PCR product purification
6.3.1 Adding 50 mu L of next holy XP magnetic beads into each sample, fully mixing, standing at room temperature for 5min, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
6.3.2 Adding 200 mu L of 75% ethanol, blowing for 3 times, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
6.3.3 Repeating for 2.2.4 times, standing at room temperature for 3min, airing, and taking down the centrifuge tube from the magnetic rack;
6.3.4 Adding 17 mu L of the solution Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
6.3.5 Taking 32 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
7. Library quantification
5.3.5 The samples in (2) and 6.3.5 were equilibrated to room temperature for qkit detection. The sample in 5.3.5 and the sample in 6.3.5 were mixed in a ratio of 1:12000.
8. Single stranded circularized library construction
8.1 denaturation
8.1.1 according to the fragment length of the last step, 1 pmol of DNA was taken into 0.2 mL PCR tubes, and ddH was used 2 O was replenished to 34. Mu.L.
8.1.2 The denaturation reaction mixture was prepared according to the following table
And (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
8.1.3 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
immediately after the reaction was completed, the PCR tube was transferred to ice and left to stand for 2 min.
8.2 Single Strand cyclization
8.2.1 Single Strand cyclization reaction solutions were prepared on ice according to the following table:
shaking and mixing at low speed, and centrifuging for a short time to centrifuge the reaction solution to the bottom of the tube.
8.2.2 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
after the reaction is finished, the reaction mixture is transferred to the next step.
8.3 digestion by enzyme digestion
8.3.1 preparation of the digestion reaction System on ice according to the following table:
shaking and mixing at low speed, and centrifuging for a short time to centrifuge the reaction solution to the bottom of the tube.
8.3.2 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
after the reaction, the mixture was centrifuged instantaneously and immediately purified.
8.4 digestion product purification
8.4.1 sucking 120 mu L of Hieff NGS DNA Selection Beads to 7.3.2 digestion products, mixing by vortex or blowing, incubating for 10min at room temperature;
8.4.2 the PCR tube was briefly centrifuged and placed in a magnetic rack to separate the beads from the liquid, after the solution was clarified (about 2 min), the supernatant was carefully removed;
8.4.3 the PCR tube was kept always in a magnetic rack, 200. Mu.L of freshly prepared 80% ethanol was added to rinse the beads, and after 30 sec incubation at room temperature, the supernatant was carefully removed;
8.4.4 repeat step 5 for a total of two rinses;
8.4.5 keeping the PCR tube in the magnetic rack all the time, and uncovering the air to dry the magnetic beads until cracks just appear;
8.4.6 taking the PCR tube out of the magnetic frame, adding 22 mu L of TE Buffer, and carrying out vortex oscillation or lightly blowing by using a pipettor until the mixture is fully and uniformly mixed, and standing for 10min at room temperature;
8.4.7 short centrifugation, the PCR tube was kept still in a magnetic rack and after the solution was clarified (about 2 min), the supernatant was carefully transferred to a new PCR tube.
Stopping point: the purified product was cyclized and stored at-20℃for one month.
8.5 digestion product control
The digested products were quantified using the Qubit ssDNA Assay Kit fluorescent reagent.
9 on-machine sequencing
And (5) performing on-machine sequencing on the library with qualified quality control according to the on-machine sequencing protocol.
10 data analysis
Analysis of fetal chromosomal aneuploidies and microdeletions, and mainly analysis of single-gene dominant genetic disease gene detection of a mother.
The analysis results were as follows:
t21 detection example
T18 detection example
T13 detection example
Example 2: realizing aneuploidy+microdeletion microreplication detection
Extracting plasma free DNA, fetal DNA enrichment, first fraction according to the above steps: after the terminal repair and the joint connection, the following operations are further performed:
5.1 preparing PCR reaction mixed solution with the amount required for detection in a 200 mu L PCR tube according to the proportion of the table below;
and (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
5.2 adding the prepared 30 mu L PCR reaction mixed solution into the joint product obtained in 3.4.5, fully and uniformly mixing, and then carrying out instantaneous centrifugation. The reaction was performed on a PCR instrument according to the following procedure:
after the reaction, the PCR tube was taken out and centrifuged instantaneously.
5.3 YS-PCR product purification
5.3.1 Adding 50 mu L of next holy XP magnetic beads into each sample, fully mixing, standing at room temperature for 5min, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.2 Adding 200 mu L of 75% ethanol, blowing for 3 times, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.3 Repeating for 2.2.4 times, standing at room temperature for 3min, airing, and taking down the centrifuge tube from the magnetic rack;
5.3.4 Adding 17 mu L of the solution Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
5.3.5 Taking 32 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
6. Library quantification
Samples from 5.3.5 were taken out and equilibrated to room temperature for qkit testing.
7. Single stranded circularized library construction
7.1 denaturation
7.1.1 according to the fragment length of the previous step, 1 pmol of DNA was taken into 0.2 mL PCR tubes, and ddH was used 2 O was replenished to 34. Mu.L.
7.1.2 The denaturation reaction mixture was prepared according to the following table
And (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
7.1.3 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
immediately after the reaction was completed, the PCR tube was transferred to ice and left to stand for 2 min.
7.2 Single Strand cyclization
7.2.1 Single Strand cyclization reaction solutions were prepared on ice according to the following table:
Shaking and mixing at low speed, and centrifuging for a short time to centrifuge the reaction solution to the bottom of the tube.
7.2.2 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
after the reaction is finished, the reaction mixture is transferred to the next step.
7.3 digestion by enzyme digestion
7.3.1 preparation of the digestion reaction on ice according to the following table:
shaking and mixing at low speed, and centrifuging for a short time to centrifuge the reaction solution to the bottom of the tube.
7.3.2 The PCR tube was placed on a PCR instrument and reacted under the following conditions:
after the reaction, the mixture was centrifuged instantaneously and immediately purified.
7.4 digestion product purification
7.4.1 sucking 120 mu L of Hieff NGS DNA Selection Beads to 7.3.2 digestion products, mixing by vortex or blowing, incubating for 10min at room temperature;
7.4.2 the PCR tube was briefly centrifuged and placed in a magnetic rack to separate the beads from the liquid, after the solution was clarified (about 2 min), the supernatant was carefully removed;
7.4.3 keep the PCR tube always placed in the magnetic rack, rinse the beads with 200. Mu.L of freshly prepared 80% ethanol, incubate for 30 sec at room temperature, carefully remove the supernatant;
7.4.4 repeating step 5 for a total of two rinses;
7.4.5 keeping the PCR tube in the magnetic rack all the time, and uncovering the air to dry the magnetic beads until cracks just appear;
7.4.6 taking the PCR tube out of the magnetic frame, adding 22 mu L of TE Buffer, and carrying out vortex oscillation or gentle blowing by using a pipettor until the mixture is fully and uniformly mixed, and standing for 10 min at room temperature;
7.4.7 the PCR tube was kept still in a magnetic rack for a short centrifugation, and after the solution was clarified (about 2 min), the supernatant was carefully transferred to a new PCR tube.
Stopping point: the purified product was cyclized and stored at-20℃for one month.
7.5 digestion product control
The digested products were quantified using the Qubit ssDNA Assay Kit fluorescent reagent.
8. Sequencing on machine
And (5) performing on-machine sequencing on the library with qualified quality control according to the on-machine sequencing protocol.
9. Data analysis
2 examples of microdeletion detection:
three bodies: examples of detection of T13, T21, T18
XO, XXX, XXY, XYY detection example
Example 3: detection of aneuploidy + microdeletion + achondroplasia
Extracting plasma free DNA, fetal DNA enrichment, first fraction according to the above steps: after the terminal repair and the joint connection, the following operations are further performed:
5. multiplex one-step amplification
5.1 The reaction mixtures were prepared in 200 μl PCR tubes according to the ratios in the table below.
And (5) after fully and uniformly mixing, carrying out instantaneous centrifugation.
Note that: the same sample was used as the barcode.
5.2 Adding 2 mu L of free plasma DNA into the prepared reaction mixed solution, and carrying out instantaneous centrifugation after fully and uniformly mixing. The reaction was performed on a PCR instrument according to the following procedure:
after the reaction, the PCR tube was taken out and centrifuged instantaneously.
5.3 Multiple one-step process product purification
5.3.1 Adding 25 mu L of next holy XP magnetic beads into each sample, fully mixing, standing at room temperature for 5min, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.2 Adding 200 mu L of 75% ethanol, blowing for 3 times, placing in a magnetic rack for 3min until the solution is clear, and discarding the supernatant;
5.3.3 Repeating for 2.2.4 times, standing at room temperature for 3min, airing, and taking down the centrifuge tube from the magnetic rack;
5.3.4 Adding 17 mu L of the solution Buffer, fully and uniformly mixing, standing at room temperature for 5min, and placing the centrifuge tube in a magnetic rack for 3min until the solution is clear;
5.3.5 Taking 15 mu L of supernatant in a new 200 mu L PCR tube, and preserving at 4 ℃ for later use.
6 YS-PCR
Reference is made to the YS-PCR procedure of example 1.
7. Library mixing
7.1 The sample in 4.3.5 was taken out and equilibrated to room temperature for qkit detection.
7.2 Samples from 5.3.5 were taken out and equilibrated to room temperature for qkit testing.
7.3 According to the detection results of 6.1 and 6.2, 5.3.5 samples and 4.3.5 samples were mixed in a ratio of 2000:1. If a gradient dilution of 4.3.5 samples is required.
8. Single stranded circularized library construction
Reference is made to the single stranded circularized library construction procedure of example 1.
9. Sequencing on machine
And (5) performing on-machine sequencing on the library with qualified quality control according to the on-machine sequencing protocol.
10. Data analysis
Results: 1T 21 positive was detected in 20 samples, and other abnormalities were not detected.
Example 4: realizing detection of aneuploid, microdeletion microreplication and single gene dominant genetic disease
Extracting plasma free DNA, fetal DNA enrichment, first fraction according to the above steps: after the terminal repair and the joint connection, the following operations are further performed:
5. multiplex one-step amplification
Reference is made to the multiplex one-step amplification procedure of example 1.
6 YS-PCR
Reference is made to the YS-PCR procedure of example 1.
7 library mix
7.1 the sample in 5.3.5 was taken out and equilibrated to room temperature for Qkit detection.
7.2 the sample in 6.3.5 was taken out and equilibrated to room temperature for qkit detection.
7.3 according to the detection results of 5.3.5 and 6.3.5, 5.3.5 samples and 6.3.5 samples were mixed in a ratio of 1:1000. Samples 5.3.5 were subjected to gradient dilution if necessary.
8 Single Strand cyclization library construction
Reference is made to the single stranded circularized library construction procedure of example 1.
9. Sequencing on machine
And (5) performing on-machine sequencing on the library with qualified quality control according to the on-machine sequencing protocol.
10. Data analysis
Monogenic genetic disease analysis procedure
1, after sequencing, the different samples were distinguished according to index number. The same sample is first subjected to UMI to correct the measured molecular sequence.
a) Since individual cfDNA molecules are uniquely linked by a specific UMI (unique molecular indexing). All reads split according to different UMI sequences, reads containing the same UMI are grouped together, meaning that the group of reads are derived from the same cfDNA molecule.
b) Reads from the same cfDNA molecule are aligned sequentially at each base site. Counting from the first base site of the set of reads, if only.ltoreq.30% of reads are the same base at that site, the base is considered background noise since the set of reads is derived from the same cfDNA molecule; if more than or equal to 70% of reads are the same base at the site, the base type of the site is confirmed; if only 30% -70% of reads contain the same base, then that base is designated as an N base (no call).
c) The same statistics then continue to be performed at the second base site of the set of reads until the last site of the set of reads ends. Whereby the base sequences of reads derived from the same cfDNA molecule are obtained.
2, comparing the ready base sequence of the sequenced cfDNA molecule to hg19 of chr4 using the bwa aln algorithm.
3, detecting the base of each coverage site on hg19 by using samtools. The parameter choice is mpileup. The respective depth, to depth ratio of A, T, G, C, N, insertion, deletion at each site was counted.
4, selecting a single gene disease position to check the total coverage depth of the position. If the total depth coverage is less than 1000X, the quality control is not passed, and if the total depth is greater than 1000X, the quality control is passed.
6, finding the site of the pathogenic mutation which is clearly needed to be observed, and considering the mutation to exist if the depth percentage of the pathogenic mutation is more than 3%. If the depth percentage of the pathogenic mutation is 1% -3%, the gray area range is judged, and detection is needed again. If the depth percentage of the pathogenic mutation is below 1%, the judgment mutation is absent.
The analysis results are shown in the following table:
the above description of the present invention is further illustrated in detail and should not be taken as limiting the practice of the present invention. It is within the scope of the present invention for those skilled in the art to make simple deductions or substitutions without departing from the concept of the present invention.