WO2018157861A1 - 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法 - Google Patents

一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法 Download PDF

Info

Publication number
WO2018157861A1
WO2018157861A1 PCT/CN2018/077895 CN2018077895W WO2018157861A1 WO 2018157861 A1 WO2018157861 A1 WO 2018157861A1 CN 2018077895 W CN2018077895 W CN 2018077895W WO 2018157861 A1 WO2018157861 A1 WO 2018157861A1
Authority
WO
WIPO (PCT)
Prior art keywords
window
copy number
embryo
haplotype
translocation
Prior art date
Application number
PCT/CN2018/077895
Other languages
English (en)
French (fr)
Inventor
薄世平
张振
任军
高玉梅
陆思嘉
Original Assignee
上海亿康医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海亿康医学检验所有限公司 filed Critical 上海亿康医学检验所有限公司
Priority to US16/490,488 priority Critical patent/US11837325B2/en
Publication of WO2018157861A1 publication Critical patent/WO2018157861A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation

Definitions

  • the present application relates to the field of biotechnology, and relates to the field of genomic sequence analysis, and in particular to a method for identifying an embryonic equilibrium translocation breakpoint and a balanced translocation carrying state.
  • balanced translocation is a very common defect in neonatal chromosome structural abnormalities, accounting for about 1/500-1/625 of births.
  • Balanced translocation carriers usually have normal phenotypes, and some have genetic mutations such as micro-duplication, deletion, and genetic damage, leading to diseases such as autism, mental retardation, and congenital malformation.
  • Balanced translocation carriers are more likely to produce unbalanced gametes during the offspring, leading to habitual abortion and even infertility. Therefore, in order to balance the translocation of the translocation carriers, it is necessary to select and identify embryos that do not carry balanced translocations.
  • the current methods for identifying balanced translocations are: comparative genomic hybridization (CGH), fluorescence in situ hybridization (FISH), SNP array (SNP array), microdissection combined with second generation sequencing (MicroSeq). -PGD) and other technologies.
  • comparative genomic hybridization technology has lower resolution, Mb level, low flux and high cost; fluorescence in situ hybridization is only for specific positions, low resolution, and probe hybridization efficiency is unstable, because equilibrium translocation involves almost every Each zone of the chromosome, so each balanced translocation carrier must design the probe separately, so it is time consuming and costly, can not be used as a general detection technology; SNP chip is designed for the whole genome, SNP distribution is not It will be absolutely uniform, so the uncertainty of the effective locus that can be used for linkage analysis around the breakpoint of the equilibrium translocation will result in an inability to distinguish whether it is an embryo carried by a balanced translocation.
  • the above detection techniques can not accurately determine the position of the equilibrium translocation break point. If the position of the break point is not accurate enough, beyond a certain range, due to recombination interchange, the embryos of the balanced translocation carried/not carried will be judged. error.
  • Microdissection combined with second-generation sequencing (MicroSeq-PGD) technology can accurately determine the position of the equilibrium translocation breakpoint, but it needs to undergo cell culture, micro-cutting, etc., the operation is complicated, the detection cycle is very long, and the price is expensive. The requirements of personnel and instruments are high and cannot be promoted on a large scale.
  • the present application provides a method for identifying an embryonic equilibrium translocation breakpoint and a balanced translocation carrying state, which can accurately determine the position of the equilibrium translocation breakpoint and utilize a small amount.
  • the SNP locus accurately judges and classifies the embryos that do not carry the balanced translocation.
  • the present application provides a method of determining an embryonic equilibrium translocation breakpoint, comprising the steps of:
  • each region segment is a window, and calculating a copy number of each window
  • step (6) Define the continuous window of step (5) as the first-level area, and continue to calculate the three-mean M nps of each window and surrounding window of the first-level area.
  • the first window is the first break point bp 1 , and each encounters normal And the window of abnormal conversion, which is the breakpoint bp i ;
  • Each of the two breakpoints is defined as a secondary region, and the three-mean M j of each window of the secondary region is continuously calculated, and the window in which M j falls outside the threshold range is an accurate copy number variation region.
  • the start and end positions of the region are the start and end breakpoints of the copy number variation;
  • i and j are independently any positive integer from 1 to N.
  • the sample to be tested according to the step (1) is a biopsy cell of an embryo
  • the biopsy cell is an ectoderm cell which is removed from the embryo to the blastomere stage or the blastocyst stage
  • the ectoderm cell may be One can also be multiple nourishing ectoderm cells.
  • the parent DNA is applicable to any human source sample capable of extracting DNA, and is not particularly limited herein. Those skilled in the art can perform extraction according to experimental needs, and the parent DNA of the present application is selected from peripheral blood. Any one or a combination of at least two of lymph, tissue cells, hair or oral mucosal cells is preferably peripheral blood.
  • the amplification described in the step (2) is single cell amplification, and the micronucleic acid in the biopsy cells is amplified by single cell amplification to obtain more nucleic acids for subsequent analysis.
  • the single cell amplification is any method capable of performing single cell amplification, and is not limited in nature. Those skilled in the art can select according to the experimental needs.
  • the present application uses primer amplification before amplification (Primer extension).
  • DOP-PCR Degenerate oligonucleotide primer-PCR
  • MDA Multiple Displacement Amplification
  • MALBAC Multiple Annealing cyclic circulation amplification technique.
  • the sequencing described in the step (2) is performed after the amplified sample is constructed by using a high-throughput sequencing platform, and the high-throughput sequencing platform is the second generation sequencing platform, the second in the field.
  • the sequencing platform is feasible, and is not particularly limited herein. Those skilled in the art can select according to the needs.
  • the present application can adopt the IL, GAII, GAIIx, HiSeq1000/2000/2500/3000/4000, X Ten of Illumina Company.
  • any of the Huainkang genes HYK-PSTAR-IIA preferably uses IlHia's HiSeq2500 high-throughput sequencing platform.
  • the sequencing type is single-ended sequencing and/or double-ended sequencing, preferably single-ended sequencing.
  • the length of the sequencing is not less than 30 bp, and may be, for example, 30 bp, 40 bp, 50 bp, 80 bp, 100 bp, 150 bp, 300 bp, 500 bp, preferably 50 bp, and a specific point value between the above values, limited to the length
  • the application does not exhaustively recite the specific point values included in the scope.
  • the depth of the sequencing is not less than 0.1 times of the genome, and may be, for example, 0.1 times, 0.5 times, 1 time, 2 times, 5 times, 10 times, 30 times, 50 times, 100 times, preferably 0.1 times the genome, and the specific point values between the above values, limited to the length and for the sake of brevity, the present application no longer exhaustively enumerates the specific point values included in the range.
  • the sequencing in the present application uses the MALBAC single cell amplification method, Illumina HiSeq2500 high-throughput sequencing platform, the sequencing type is single-end sequencing, the sequencing length is 50 bp, and the sequencing depth is 0.1 times of the genome.
  • the reference genome includes a whole genome, and the reference genome has a coverage of more than 50% of the whole genome, for example, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, preferably 60% or more, more preferably 70% or more, still more preferably 80% or more, most preferably 95% or more, and specific values between the above values are limited
  • the present application is not exhaustively enumerated.
  • the reference genome is a whole genome.
  • the reference genome typically selects a sequence that has been generally identified, such as hg18 (GRCh18), hg19 (GRCh19) or hg38 (GRCh38) whose human genome can be NCBI or UCSC.
  • the genomic sequence is compared with a reference genome, and the alignment is feasible by using comparable software in the field, and any free or commercial software may be used, and no particular limitation is made herein.
  • Those skilled in the art can select according to need, for example, any one of BWA (Burrows-Wheeler Alignment tool), SOAPaligner/soap2 (Short Oligonucleotide Analysis Package), and Bowtie/Bowtie 2.
  • the length of the window described in the step (4) is 1 ⁇ 10 2 -1 ⁇ 10 6 , and may be, for example, 1 ⁇ 10 2 , 2 ⁇ 10 2 , 5 ⁇ 10 2 , 8 ⁇ 10 2 , 1 ⁇ . 10 3 , 5 ⁇ 10 3 , 8 ⁇ 10 3 , 1 ⁇ 10 4 , 5 ⁇ 10 4 , 1 ⁇ 10 5 , 5 ⁇ 10 5 , 8 ⁇ 10 5 , 1 ⁇ 10 6 , and the specific values between the above values Point values, limited to length and for the sake of brevity, the present application no longer exhaustively enumerates the specific point values included in the ranges.
  • the step (4) further comprises the steps of correcting the copy number of each window and calculating the corrected copy number of each window, and the method for correcting the copy number of each window is a correction window copy in the field.
  • the method of the number is all feasible, and is not particularly limited herein. Those skilled in the art can select according to the needs, and the present application adopts Loess correction.
  • the number of sequences falling into each window, the base distribution, and the base distribution of the reference genome are counted, and each sequence is corrected according to the sequence of each window and the GC content of the base.
  • the number of copies of the window, the number of copies corrected for each window is calculated.
  • the threshold range described in the step (5) is between N- ⁇ and N+ ⁇ , wherein the N is a ploid of the sample to be tested, and the ⁇ is a predetermined range of the normal fluctuation range of the copy number.
  • the predetermined value ⁇ is 0.05-0.2, and may be, for example, 0.05, 0.06, 0.08, 0.1, 0.12, 0.15, 0.18, 0.2, and a specific point value between the above values, limited by space and for the sake of conciseness. This application is not exhaustive of the specific point values included in the scope.
  • the threshold range described in the step (5) is between Nm ⁇ SD and N+m ⁇ SD, wherein the N is a multiple of the sample to be tested, and the m is any integer of 1-3, SD is the standard deviation of all window copy numbers of the sample to be tested.
  • the gene component when calculating the copy number, the gene component is into many windows, each window has a copy number, and most of the window copy numbers are normal, and the values fluctuate from 2 to the normal distribution, and m is a multiple of the standard deviation.
  • m is a multiple of the standard deviation.
  • m 1, 68.3% of the numbers fall in [N-SD, N+SD]
  • m 2, 95.5% of the numbers fall in [N-2 ⁇ SD, N+2 ⁇ SD]
  • m At 3 o'clock, the number of 99.7% falls in [N-3 ⁇ SD, N+3 ⁇ SD]
  • m is a statistical concept, and those skilled in the art can select according to actual conditions.
  • the threshold range of normal copy number can be (2-2 ⁇ SD, 2+2 ⁇ SD).
  • both thresholds can be used for subsequent experiments.
  • the ⁇ of the threshold range "N- ⁇ to N+ ⁇ ” is generally obtained based on the distribution characteristics of a large number of samples, and is suitable for most cases; "Nm
  • the SD of the threshold range of ⁇ SD to N+m ⁇ SD is calculated according to the distribution characteristics of the copy number of the sample to be tested, and the scope of application is wider, and is applicable to all cases.
  • the number of surrounding windows described in step (5) is 10-100, for example, 10, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, preferably 10 -60, further preferably 10, and the specific point values between the above values, which are limited to the length and for the sake of brevity, the present application is not exhaustively enumerated.
  • the number of surrounding windows described in step (6) is 3-10, and may be, for example, 3, 4, 5, 6, 7, 8, 9, 10, preferably 3-8, further preferably 3-5
  • the specific point values between the above values which are limited to the length and for the sake of brevity, the present application is not exhaustively enumerated.
  • the window for each of the normal and abnormal transitions described in step (6) is specifically that the first window is the first breakpoint bp 1 , and each window is calculated one by one, and when at least two consecutive n nps fall within the abnormal range, Record the window as the second breakpoint bp 2 and continue scanning until at least 2 consecutive Mnps return to the normal range, and record the window as the third breakpoint bp 3 , so that every time a window of normal and abnormal transition is encountered, A breakpoint bp i is recorded until the last window of the primary region, where i is any positive integer from 1 to M.
  • the detected breakpoint is the equilibrium translocation breakpoint.
  • the number of the plurality of embryos described in step (8) is a positive integer greater than 5.
  • the breaking point of the embryo's balanced translocation according to the step (8) is specifically that n embryos are obtained by IVF, and n's of the embryos having abnormal copy number (including waste embryos that are not qualified for biopsy) are detected by the same method as above.
  • bp chrM and bp chrN is the position of the exact break point of two reciprocal chromosomes.
  • the present application provides a method for identifying an embryo's balanced translocation carrying state, comprising the following steps:
  • Embryo haplotype analysis select effective SNPs, classify according to one of the SNP genotypes, construct another haplotype by copying normal embryos, and haplotypes of embryos with abnormal copy number. After typing, compare and determine the typing result of the haplotype;
  • the length around the breaking point described in the step (2') is 2 ⁇ 10 5 - 5 ⁇ 10 6 , for example, 2 ⁇ 10 5 , 3 ⁇ 10 5 , 4 ⁇ 10 5 , 5 ⁇ 10 5 , 6 ⁇ 10 5 , 7 ⁇ 10 5 , 8 ⁇ 10 5 , 9 ⁇ 10 5 , 1 ⁇ 10 6 , 2 ⁇ 10 6 , 3 ⁇ 10 6 , 4 ⁇ 10 6 , 5 ⁇ 10 6 , preferably 2 ⁇ 10 5 -1 ⁇ 10 6 , and the specific point values between the above values, which are limited to the length and for the sake of brevity, the present application no longer exhaustively enumerates the specific point values included in the range.
  • the number of SNPs in step (2') is generally detected by more than 30, and the available sites on each embryo account for about 1/3, and more than 10 available sites can determine the haplotype linkage relationship.
  • the number of SNPs mentioned in the application is 10-500, and may be, for example, 10, 20, 30, 40, 50, 60, 80, 100, 120, 130, 150, 200, 250, 300, 350, 400, 450, 500.
  • the preferred values are between 30 and 100, and the above values are limited. For the sake of brevity and for the sake of brevity, the present application is not exhaustively enumerated.
  • the method for detecting the SNP around the breakpoint is a method well known in the art, and is not particularly limited herein. Those skilled in the art can select according to the needs.
  • the present application uses the design probe chip to capture and sequence the primer.
  • the amplicon is subjected to one generation sequencing or a primer is designed to perform any one or a combination of at least two of the second generation of the amplicon.
  • the method for determining the genotype is determined by one generation sequencing according to the peak map of the sequencing result; the second generation sequencing, after the above steps (sequencing, aligning the genomic sequence with the reference genome), analyzing and analyzing by the analysis software
  • the software is any one or a combination of at least two of SAMtools, GATK, Varscan, and the like.
  • the effective SNP is one of the maternal maternal ones being homozygous and one of the heterozygotes is heterozygous, and the number of the effective SNPs is 10-500.
  • the specific process of selecting an effective SNP is: the detected species is a human, the human is a diploid, and one of the parents of the embryo is a translocation carrier, and one of the parents is a normal (non-carrier).
  • the M and N chromosomes of the translocation carriers have mutual translocation and contain a normal chromosome, so there will be normal chromosome chrM, translocation-derived chromosome der (chrM), normal chromosome chrN, translocation-derived chromosome der (chrN)
  • the normal side has normal chromosomes chrM', chrM", chrN', chrN". According to the Mendelian inheritance law and the chain exchange law, SNPs that can effectively distinguish between normal chromosome and translocation-derived chromosome haplotypes are selected upstream and downstream of the breakpoint.
  • the specific result of the determination of the embryo-carrying state is that each embryo having a normal copy number is examined based on the results of haplotype typing of two chromosomes. Judging the chromosome of the translocation carrier (in the parent), if the chromosome is the same as the translocation chromosome, the embryo carrying state is translocated; if the chromosome is the same as the normal chromosome, the embryo carrying state is not carried ( Normal); thus determined to carry the embryo as a balanced translocation.
  • the method for identifying an embryo's balanced translocation carrying state comprises the following steps:
  • each region segment is a window, and calculating a copy number of each window
  • step (6) Define the continuous window of step (5) as the first-level area, and continue to calculate the three-mean M nps of each window and surrounding window of the first-level area.
  • the first window is the first break point bp 1 , and each encounters normal And the window of abnormal conversion, which is the breakpoint bp i ;
  • Each of the two breakpoints is defined as a secondary region, and the three-mean M j of each window of the secondary region is continuously calculated, and the window in which M j falls outside the threshold range is an accurate copy number variation region.
  • the start and end positions of the region are the start and end breakpoints of the copy number variation;
  • Embryo haplotype analysis select effective SNPs, classify according to one of the SNP genotypes, construct another haplotype by copying normal embryos, and haplotype embryos with abnormal copy number. After the type is compared, the typing result of the haplotype is determined;
  • step (10) Determining the state of embryo carrying: haplotype the embryos with normal copy number, and then comparing with the typing results of the haplotype described in step (10) to determine the embryo translocation state;
  • i and j are independently any positive integer from 1 to N.
  • the method of the present application provides a method for accurately determining the position of the break point of the equilibrium translocation, so that the resolution of the break point is greatly improved, thereby detecting the SNP in a more accurate and effective area, and avoiding the occurrence of recombination exchange;
  • This application utilizes the method of embryo reciprocal, which can accurately perform haplotype typing by using a small number of embryos and SNP sites, and identify the embryo's balanced translocation carrying state, which greatly saves cost and improves accuracy;
  • the detection method of the present application is applicable to a wider population, the operation on the detection process is simpler, and the detection cycle is shorter.
  • 1 is a flow chart of a method for identifying an embryo's balanced translocation carrying state in the present application.
  • the embryos and family samples of a balanced translocation carrier are tested, and the identification results are compared with the gold standard of clinical karyotype detection (amniotic water puncture verification).
  • the family had a normal karyotype, and the mother was a chr7 and chr16 balanced translocation carrier with a karyotype of 46, XX, t(7; 16) (p12.3; q22.1).
  • Single cell amplification was performed using the single-cell whole genome amplification kit YK001A of Shanghai Yikang Medical Laboratory Co., Ltd., which was operated by Shanghai Yikang Medical Laboratory Co., Ltd., and the embryo biopsy was performed according to the instructions provided by Shanghai Yikang Medical Laboratory Co., Ltd. The cells undergo whole genome amplification.
  • the sequencing was performed using Illumina's HiSeq2500 high-throughput sequencing platform, following the instructions provided by Illumina.
  • the sequencing type was single-end sequencing, the sequencing length was 50 bp, and the sequencing depth was 0.1 times that of the genome.
  • the sequencing results were removed from the linker and the low quality data was compared to the reference genome.
  • Reference genome hg19 (GRCh19).
  • the comparison software is BWA (Burrows-Wheeler Alignment tool), the default parameters are used, the sequences are aligned to the reference genome, the positions of the sequences on the genome are obtained, and the sequences that are uniquely aligned on the genome are selected.
  • the gene was assembled into a window of 5 x 10 7 bp in length. Based on the position of the sequence on the genome, the number of sequences falling into each window, the base distribution, and the base distribution of the reference genome are counted. According to the sequence of each window and the base GC content, the copy number of each window is corrected, and the correction method is Loess, and the corrected copy number of each window is calculated.
  • the copy number of each window is obtained.
  • Set the range of the normal copy number Calculate the standard deviation (SD) of all window copy numbers of the sample according to the sample copy number distribution feature.
  • SD standard deviation
  • the threshold range of the normal copy number is the normal value ⁇ 2.
  • Standard deviation the range is (1.78, 2.22).
  • the three-mean M i of each window and the surrounding 10 window copy numbers are calculated one by one.
  • the three-mean M i is recorded in a window outside the normal copy number range, and successive windows are merged until a normal window is encountered.
  • a continuous window of copy number anomalies is defined. These consecutive windows are defined as a first-level area, and the first window defining the first-level area is the first break point bp 1 , and then each window of the first-level area and three surrounding windows are calculated.
  • the window is the third breakpoint bp 3 , so that every time a window of normal and abnormal transition is encountered, a breakpoint bp i is recorded until the last window of the first-level region is recorded as bp f .
  • Breakpoint bp 1 to breakpoint bp f divides the primary region into (f-1) secondary segments, defined as secondary regions, calculates the three-mean M j of the copy number of each secondary region window, and the normal range of copy number In comparison, the secondary region where M j falls within the abnormal range is the exact copy number variation region, where M j is the copy number of the region, and the start and end positions of the region are the start and end of the copy number variation. point.
  • the detected breakpoint is the equilibrium translocation breakpoint.
  • Table 2 Determine the exact break point based on the break points of 4 copy number anomalies
  • the detection in the range of 1 ⁇ 10 6 bp distance chr7 breaking points are 61 SNP loci embryo parents, embryo; detecting embryos parents in the range of 1 ⁇ 10 6 bp distance chr16 breaking point, respectively, the embryo 63 SNP sites point.
  • the SNP locus is detected by designing primers for second-generation sequencing of the amplicon.
  • the method for determining genotypes was second-generation sequencing, and the SNP genotype was determined for SAMtools using analysis software.
  • the SNP genotypes of the father's chr7 and chr16 were typed.
  • the maternal haplotype was constructed based on embryos with normal copy number (E2, E4, E5, and E6). Among them, the construction of the maternal haplotype is obtained by subtracting the father's homozygous haplotype from the embryos with normal copy number (E2, E4, E5 and E6), and each embryo is used for haplotype typing on chr7.
  • the number of SNPs is 9-11, and the number of SNPs used for haplotypes on chr16 is 13-16.
  • haplotypes were performed on embryos with abnormal CNV (E1, E3, E7 and E8). Each embryo can determine whether the haplotype on chr7/chr16 is normal or translocated, and the results of 4 embryos. Mutual verification and comprehensive judgment results are obtained. The results are shown in Table 8-9.
  • N/A indicates that the site is not available for linkage analysis, and the subsequent tables use the same representation.
  • H Achr7 , H Bchr7 , H Achr16 , and H Bchr16 were compared with the haplotype results of H Achr7 , H Bchr7 , H Achr16 , and H Bchr16 to determine which haplotype represents "normal” and which haplotype represents "translocation carrying".
  • Table 10 four abnormal embryos consistently determined that H Achr7 is a normal haplotype, and H Bchr7 is a translocation carrying haplotype; similarly, it is determined that H Achr16 is a normal haplotype, and H Bchr16 is a translocation carrying haplotype. .
  • the carrying states of the embryos E2, E4, E5 and E6 with normal copy number were determined.
  • the results of the judgment are shown in Table 11, and the results showed that embryos E2, E4, and E5 were normal (no translocation) embryos.
  • a normal (no translocation) embryo was selected by the method of the present application, and the identification process was completed. After the embryo is implanted, the pregnant woman is normally conceived, and the amniotic fluid puncture test confirms that the fetal karyotype is normal.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Organic Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供了一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法,其包括如下步骤:对样本进行扩增、测序;将测序序列比对到参考基因组,拷贝数分析;精确确定易位断裂点的位置;检测断裂点周围SNP,对SNP基因型分型;胚胎单倍型分析,综合判断正常染色体和易位染色体单倍型;确定胚胎携带状态,根据单倍型挑选不携带平衡易位的胚胎。

Description

一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法 技术领域
本申请涉及生物技术领域,涉及基因组序列分析领域,具体涉及一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法。
背景技术
在生物医学、生殖遗传学的科学研究及临床应用领域,平衡易位是一种非常常见的新生儿染色体结构异常缺陷,大约占新生儿出生的1/500-1/625。平衡易位携带者通常表型正常,也有一部分有微重复、缺失、基因损坏等遗传学变异,从而导致自闭、智障、先天畸形等疾病。平衡易位携带者在生育后代时,更容易产生不平衡的配子,从而导致习惯性流产甚至不孕不育。因此,为平衡易位携带者后代阻断平衡易位,挑选、鉴定出不携带平衡易位的胚胎非常有必要。
目前鉴定平衡易位的方法主要有:比较基因组杂交(comparative genomic hybridization,CGH),荧光原位杂交(fluorescence in situ hybridization,FISH),SNP芯片(SNP array),显微切割结合二代测序(MicroSeq-PGD)等技术。
然而,比较基因组杂交技术分辨率比较低,Mb级,通量低,成本高;荧光原位杂交,只针对特定位置,分辨率低,探针杂交效率不稳定,因为平衡易位几乎涉及每条染色体的每条区带,所以每位平衡易位携带者都要单独设计探针,所以会很耗时,成本高,不能作为一个通用检测技术;SNP芯片是针对全基因组设计的,SNP分布不会绝对均一,所以对于平衡易位断裂点周围可用于连锁分析的有效位点不确定,会导致无法区分是否为平衡易位携带的胚胎。
除上述技术上的缺陷,以上检测技术都不能精准的确定平衡易位断裂点位置,如果断裂点位置不够精确,超出一定范围,由于重组互换,会导致平衡易 位携带/不携带的胚胎判断错误。
显微切割结合二代测序(MicroSeq-PGD)的技术虽然可以精准的确定平衡易位断裂点位置,但需经过细胞培养、显微切割等阶段,操作复杂,检测周期非常长,价格昂贵,对于人员、仪器要求高,不能大规模推广。
因此,本领域迫切需要开发一种能够更有效综合鉴定胚胎平衡易位的方法,提高判断平衡易位的精准性。
发明内容
针对现有技术的不足及实际的需求,本申请提供一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法,所述方法既可以精准确定平衡易位断裂点位置,又能利用少量SNP位点准确的判断、分型,鉴定出不携带平衡易位的胚胎。
为达此目的,本申请采用以下技术方案:
第一方面,本申请提供一种确定胚胎平衡易位断裂点的方法,其包括如下步骤:
(1)获取胚胎待测样本和父母DNA;
(2)对待测样本进行扩增,构建文库后测序;
(3)将步骤(2)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
(4)将所述的参考基因组分成N个区域片段,其中每个区域片段为一个窗口,计算每个窗口的拷贝数;
(5)确定正常拷贝数的阈值范围,逐个计算每个窗口及周围窗口拷贝数的三均值M i,将M i不落在阈值范围的窗口记录下来,连续的窗口合并,直到遇到正常窗口;
(6)将步骤(5)连续的窗口定义为一级区域,继续计算一级区域每个窗 口及周围窗口的三均值M nps,第一个窗口为第1断点bp 1,每遇到正常和异常转换的窗口,为断点bp i
(7)每两个断点之间定义为二级区域,继续计算二级区域每个窗口的三均值M j,将M j落在阈值范围之外的窗口即为精确的拷贝数变异区域,所述区域的起始和终止的位置即为拷贝数变异的起始和终止断点;以及
(8)选取多个拷贝数异常的胚胎,用步骤(1)-(7)计算每个胚胎的两条染色体平衡易位的断裂点,分别计算两条染色体平衡易位断裂点的三均值,即为胚胎精确的平衡易位的断裂点;
其中,所述i和j独立地为1至N的任意正整数。
根据本申请,步骤(1)所述的待测样本为胚胎的活检细胞,所述活检细胞为胚胎发育到卵裂球时期或囊胚时期取下的外胚层细胞,所述外胚层细胞可以是1个也可以是多个滋养外胚层细胞。
根据本申请,所述父母DNA为能够提取DNA的任何人源样本都是可行的,在此不做特殊限定,本领域技术人员可以根据实验需要进行提取,本申请的父母DNA选取来自外周血、淋巴液、组织细胞、头发或口腔黏膜细胞中的任意一种或至少两种的组合,优选为外周血。
本申请,步骤(2)所述的扩增为单细胞扩增,通过单细胞扩增对活检细胞中的微量核酸进行扩增,以获得更多的核酸用于后续分析。
本申请,所述单细胞扩增为能够进行单细胞扩增的任意方法,在此不做特性限定,本领域技术人员可以根据实验需要进行选择,本申请采用扩增前引物延伸PCR(Primer extension preamplification PCR,PEP-PCR)、退变寡核苷酸引物PCR(Degenerate oligonucleotide primer-PCR,DOP-PCR)、多重置换扩增技术(Multiple Displacement Amplification,MDA)或多次退火环状循环扩增技术 (Multiple Annealing and Looping Based Amplification Cycles,MALBAC)中的任意一种或至少两种的组合,优选为多次退火环状循环扩增技术。
本申请,步骤(2)所述的测序将扩增后的样本进行文库构建后,采用高通量测序平台进行测序,所述高通量测序平台为第二代测序平台,本领域的第二代测序平台都是可行的,在此不做特殊限定,本领域技术人员可以根据需要进行选择,本申请可采用Illumina公司的GA、GAII、GAIIx、HiSeq1000/2000/2500/3000/4000、X Ten、X Five、NextSeq500/550、MiSeq、MiSeqDx、MiSeq FGx、MiniSeq、NovaSeq 5000/6000;Applied Biosystems的SOLiD,Roche的454FLX,Thermo Fisher Scientific(Life Technologies)的Ion Torrent、Ion PGM、Ion Proton I/II,华大基因的BGISEQ1000、BGISEQ500、BGISEQ100、BGISEQ50,博奥生物集团的BioelectronSeq 4000,中山大学达安基因股份有限公司的DA8600,贝瑞和康的NextSeq CN500,紫鑫药业旗下子公司中科紫鑫的BIGIS,华因康基因HYK-PSTAR-IIA中的任意一种,本申请优选采用Illumina公司的HiSeq2500高通量测序平台。
优选地,所述测序类型为单端测序和/或双端测序,优选为单端测序。
根据本申请,所述测序的长度为不小于30bp,例如可以是是30bp、40bp、50bp、80bp、100bp、150bp、300bp、500bp,优选为50bp,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
根据本申请,所述测序的深度为不小于基因组的0.1倍,例如可以是是0.1倍、0.5倍、1倍、2倍、5倍、10倍、30倍、50倍、100倍,优选为基因组的0.1倍,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
优选地,本申请中测序采用MALBAC单细胞扩增方法,Illumina公司的HiSeq2500高通量测序平台,测序类型为单端测序,测序长度50bp,测序深度为基因组的0.1倍。
本申请,所述参考基因组包括全基因组,所述参考基因组的覆盖率达到全基因组的50%以上,例如可以是50%、55%、60%、65%、70%、75%、80%、85%、90%、95%、98%,优选为60%以上,进一步优选为70%以上,再优选为80%以上,最优选为95%以上,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
所述参考基因组为全基因组。参考基因组通常选择已被公认确定的序列,如人的基因组可为NCBI或UCSC的hg18(GRCh18)、hg19(GRCh19)或hg38(GRCh38)。
根据本申请,所述将基因组序列与参考基因组进行比对,所述比对采用本领域的可进行比对的软件都是可行的,可用任何一种免费或商业软件,在此不做特殊限定,本领域技术人员可根据需要进行选择,例如可以是BWA(Burrows-Wheeler Alignment tool)、SOAPaligner/soap2(Short Oligonucleotide Analysis Package)、Bowtie/Bowtie2中的任意一种。
根据本申请,步骤(4)所述的窗口的长度为1×10 2-1×10 6,例如可以是1×10 2、2×10 2、5×10 2、8×10 2、1×10 3、5×10 3、8×10 3、1×10 4、5×10 4、1×10 5、5×10 5、8×10 5、1×10 6,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
根据本申请,所述步骤(4)还包括校正每个窗口的拷贝数,计算每个窗口校正后的拷贝数的步骤,所述校正每个窗口拷贝数的方法为本领域可进行校正窗口拷贝数的方法都是可行的,在此不做特殊限定,本领域技术人员可以根据 需要进行选择,本申请采用Loess校正。
根据本申请,根据测得的序列在基因组上的位置,统计落到每个窗口的序列数目、碱基分布、参考基因组的碱基分布,根据每个窗口的序列及碱基GC含量,校正每个窗口的拷贝数,计算每个窗口校正后的拷贝数。
本申请,步骤(5)所述的阈值范围为N-σ到N+σ之间,其中,所述N为待测样本的倍体,所述σ为设定的拷贝数正常波动范围的预定值,所述预定值σ为0.05-0.2,例如可以是0.05、0.06、0.08、0.1、0.12、0.15、0.18、0.2,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
以人为例,人是二倍体,则N=2,设定正常波动范围的预定值(σ)为0.05,正常拷贝数的阈值范围为(2-0.05,2+0.05)。
或者,步骤(5)所述的阈值范围为N-m×SD到N+m×SD之间,其中,所述N为待测样本的倍体,所述m为1-3中的任意整数,所述SD为待测样本所有窗口拷贝数的标准差。
本申请中,计算拷贝数时,基因组分成很多窗口,每个窗口都有一个拷贝数,大部分窗口拷贝数是正常的,其数值在2上下波动,符合正态分布,m是标准差的倍数,理论上,m=1时,68.3%的数落在[N-SD,N+SD];m=2时,95.5%的数落在[N-2×SD,N+2×SD];m=3时,99.7%的数落在[N-3×SD,N+3×SD];m是一个统计学概念,本领域技术人员可以根据实际情况进行选择。
以人为例,正常拷贝数的阈值范围可以为(2-2×SD,2+2×SD)。
本申请中,两种阈值都可以用于后续的实验,“N-σ到N+σ”这种阈值范围的σ一般根据大量样本拷贝数分布特点经验得来,适用于大部分情况;“N-m×SD到N+m×SD”这种阈值范围的SD是根据待测样本本身拷贝数的分布特点算出来 的,适用范围更广,适用于全部情况。
根据本申请,步骤(5)所述的周围窗口的数量为10-100,例如可以是10、12、15、20、30、40、50、60、70、80、90、100,优选为10-60,进一步优选为10,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
优选地,所述三均值的计算公式为:M=Q1/4+M d/2+Q3/4,其中,Q1为下四分位数,M d为中位数,Q3为上四分位数。
优选地,步骤(6)所述的周围窗口的数量为3-10,例如可以是3、4、5、6、7、8、9、10,优选为3-8,进一步优选为3-5,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
步骤(6)所述的每遇到正常和异常转换的窗口具体为第一个窗口为第1断点bp 1,逐一计算每个窗口,当出现至少连续2个M nps落在异常范围时,记录该窗口为第2断点bp 2,继续扫描,直到出现至少连续2个M nps回到正常范围时,记录该窗口为第3断点bp 3,这样每遇到正常和异常转换的窗口,记录一个断点bp i,直到一级区域的最后一个窗口,其中,所述i为1至M的任意正整数。
根据本申请,对于非平衡易位的配子导致的胚胎拷贝数变异,检测到的断点即为平衡易位断裂点。
优选地,步骤(8)所述的多个胚胎的数量为大于5的正整数。
步骤(8)所述的胚胎平衡易位的断裂点具体为通过IVF会得到n个胚胎,其中拷贝数异常(包括活检不合格的废胚)的胚胎n’个,用上述同样的方法检测,其中选取n”(n”为小于等于n’的正整数,优选n”=n’)个拷贝数异常胚胎,确定两条相互易位的染色体断裂点,定义两条相互易位的染色体分别为chrM和chrN,n”个拷贝数异常的胚胎,在chrM上会得到nM”个的断裂点位置,在chrN 上会得到nN”个的断裂点位置,分别计算两条染色体断裂点位置的三均值,得到bp chrM和bp chrN即为两条相互易位的染色体的精确断裂点的位置。
第二方面,本申请提供一种鉴定胚胎平衡易位携带状态的方法,包括如下步骤:
(1’)采用如第一方面所述的方法确定胚胎平衡易位断裂点;
(2’)检测断裂点周围的SNP;
(3’)胚胎单倍型分析:挑选有效的SNP,根据其中一方的SNP基因型进行分型,通过拷贝数正常的胚胎构建另外一方的单倍型,将拷贝数异常的胚胎进行单倍型分型后进行比较,确定单倍型的分型结果;
(4’)确定胚胎携带状态:将拷贝数正常的胚胎进行单倍型分型,再将分型结果分类,分成易位携带和不携带两类,确定单倍型的分型结果;以及
(5’)确定胚胎携带状态:将拷贝数正常的胚胎进行单倍型分型,再与步骤(4’)所述单倍型的分型结果进行比对,确定胚胎易位携带状态。
根据本申请,步骤(2’)所述的断裂点周围的长度为2×10 5-5×10 6,例如可以是2×10 5、3×10 5、4×10 5、5×10 5、6×10 5、7×10 5、8×10 5、9×10 5、1×10 6、2×10 6、3×10 6、4×10 6、5×10 6,优选为2×10 5-1×10 6,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
优选地,步骤(2’)所述的SNP的数量一般检测30个以上,每个胚胎上的可用位点约占1/3,10个以上可用位点就可以确定单倍型连锁关系,本申请中所述SNP的数量为10-500,例如可以是10、20、30、40、50、60、80、100、120、130、150、200、250、300、350、400、450、500,优选为30-100,以及上述数值之间的具体点值,限于篇幅及出于简明的考虑,本申请不再穷尽列举所述范围包括的具体点值。
根据本申请,所述检测断裂点周围的SNP的方法为本领域公知的方法,在此不做特殊限定,本领域技术人员可以根据需要进行选择,本申请采用设计探针芯片捕获测序、设计引物对扩增子进行一代测序或设计引物对扩增子进行二代测序中的任意一种或至少两种的组合。
本申请中,所述确定基因型的方法为一代测序根据测序结果的峰图确定;二代测序,经过上述步骤(测序、将基因组序列与参考基因组进行比对)后,用分析软件分析,分析软件为SAMtools、GATK、Varscan等软件中的任意一种或至少两种的组合。
根据本申请,所述有效的SNP为父方母方中有一方为纯合子有一方为杂合子,所述有效SNP的数量为10-500。
本申请中,所述挑选有效SNP的具体过程为:被检测物种为人,人是二倍体,胚胎的父母中有一方为易位携带者,一方为正常(不携带者)。易位携带者M和N号染色体部分发生相互易位,同时含有一条正常的染色体,所以会有正常染色体chrM、易位衍生染色体der(chrM)、正常染色体chrN、易位衍生染色体der(chrN);正常一方有正常染色体chrM’、chrM”、chrN’、chrN”。根据孟德尔遗传规律和连锁交换规律,在断裂点上游和下游选取可以有效区分正常染色体和易位衍生染色体单倍型的SNP。
本申请中,所述胚胎携带状态的确定的具体结果为:根据两条染色体单倍型分型的结果,检验每个拷贝数正常的胚胎。判断遗传自(父母中)易位携带方的染色体,如果这条染色体与易位染色体相同,则胚胎携带状态为易位携带;如果这条染色体与正常染色体相同,则胚胎携带状态为不携带(正常);从而确定为平衡易位携带胚胎。
根据本申请,所述鉴定胚胎平衡易位携带状态的方法,包括如下步骤:
(1)获取胚胎待测样本和父母DNA;
(2)对待测样本进行扩增,构建文库后测序;
(3)将步骤(2)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
(4)将所述的参考基因组分成N个区域片段,其中每个区域片段为一个窗口,计算每个窗口的拷贝数;
(5)确定正常拷贝数的阈值范围,逐个计算每个窗口及周围窗口拷贝数的三均值M i,将M i不落在阈值范围的窗口记录下来,连续的窗口合并,直到遇到正常窗口;
(6)将步骤(5)连续的窗口定义为一级区域,继续计算一级区域每个窗口及周围窗口的三均值M nps,第一个窗口为第1断点bp 1,每遇到正常和异常转换的窗口,为断点bp i
(7)每两个断点之间定义为二级区域,继续计算二级区域每个窗口的三均值M j,将M j落在阈值范围之外的窗口即为精确的拷贝数变异区域,所述区域的起始和终止的位置即为拷贝数变异的起始和终止断点;
(8)选取多个拷贝数异常的胚胎,用步骤(1)-(7)计算每个胚胎的两条染色体平衡易位的断裂点,分别计算两条染色体平衡易位断裂点的三均值,即为胚胎精确的平衡易位的断裂点;
(9)检测断裂点周围的SNP;
(10)胚胎单倍型分析:挑选有效的SNP,根据其中一方的SNP基因型进行分型,通过拷贝数正常的胚胎构建另外一方的单倍型,将拷贝数异常的胚胎进行单倍型分型后进行比较,确定单倍型的分型结果;以及
(11)确定胚胎携带状态:将拷贝数正常的胚胎进行单倍型分型,再与步 骤(10)所述单倍型的分型结果进行比对,确定胚胎易位携带状态;
其中,所述i和j独立地为1至N的任意正整数。
与现有技术相比,本申请具有如下有益效果:
(1)本申请方法提供了一个精准确定平衡易位断裂点位置的方法,使断裂点判定分辨率大大提高,从而在更精准有效的区域内检测SNP,避免重组交换的发生;
(2)本申请利用胚胎互推的方法,实现了利用少量的胚胎、SNP位点就能准确的进行单倍型分型,鉴定胚胎平衡易位携带状态,大大的节约成本,提高准确度;
(3)本申请检测方法适用人群更广泛,检测流程上操作更加简单,检测周期更短。
附图说明
图1是本申请鉴定胚胎平衡易位携带状态的方法的流程图。
具体实施方式
为更进一步阐述本申请所采取的技术手段及其效果,以下结合附图并通过具体实施方式来进一步说明本申请的技术方案,但本申请并非局限在实施例范围内。
本申请已经应用到30个例子,经临床验证,检测结果和实际临床结果符合率为100%。为了使本申请的用法和效果更加易于理解和掌握,下面将举一个实例进行进一步的阐述。实施的简要流程图如图1所示,详细实施过程如下:
本实施例中,对某平衡易位携带者胚胎及家系样本进行检测,鉴定结果与临床核型检测的金标准(羊水穿刺验证)结果比较。
该家系夫妇父方核型正常,母方为chr7与chr16平衡易位携带者,核型为 46,XX,t(7;16)(p12.3;q22.1)。
具体实施过程如下:
实施例1 平衡易位断裂点的确定
(1)获取胚胎待测样本和父母DNA
经过IVF此家系获得8个胚胎(分别以E1、E2、E3、E4、E5、E6、E7、E8表示),胚胎发育到囊胚时期,每个胚胎活检取下的3-5个滋养外胚层细胞。
(2)对样本进行扩增、测序
单细胞扩增采用上海亿康医学检验所有限公司的多次退火环状循环扩增技术单细胞全基因组扩增试剂盒YK001A,按照上海亿康医学检验所有限公司提供的说明书操作,对胚胎活检细胞进行全基因组扩增。
测序采用Illumina公司的HiSeq2500高通量测序平台,按照Illumina公司提供的说明书操作。测序类型为单端(Single End)测序,测序长度50bp,测序深度为基因组的0.1倍。
(3)序列比对,进行拷贝数分析
将测序结果去掉接头及低质量数据,比对到参考基因组。参考基因组hg19(GRCh19)。比对软件为BWA(Burrows-Wheeler Alignment tool),采用默认参数,将序列比对到参考基因组,得到序列在基因组上的位置,选择在基因组上唯一比对的序列。
将基因组分成长度为5×10 7bp的窗口。根据序列在基因组上的位置,统计落到每个窗口的序列数目、碱基分布、参考基因组的碱基分布。根据每个窗口的序列及碱基GC含量,校正每个窗口的拷贝数,校正方法为Loess,计算每个窗口校正后的拷贝数。
拷贝数结果如表1所示,由于人的基因组非常大,有3×10 9bp,表1只展示 chr7部分区域拷贝数情况:
表1 各胚胎chr7部分区域拷贝数
染色体 区域 E1 E2 E3 E4 E5 E6 E7 E8
chr7 44650001-44700000 1.71 2.08 0.86 2.11 2.14 2.19 3.22 3.08
chr7 44700001-44750000 1.93 2.40 0.96 2.17 2.02 2.22 2.92 3.20
chr7 44750001-44800000 2.25 2.09 1.07 2.24 1.93 2.02 2.79 3.22
chr7 44800001-44850000 2.16 2.01 1.12 2.41 2.08 1.98 2.49 3.38
chr7 44850001-44900000 2.27 2.09 1.24 2.31 2.01 2.00 3.03 3.55
chr7 44900001-44950000 1.94 2.16 0.97 2.03 1.83 1.90 2.71 2.94
chr7 44950001-45000000 2.13 2.14 0.83 1.89 1.83 2.11 2.32 2.45
chr7 45000001-45050000 1.83 1.97 0.89 2.00 2.02 2.25 2.28 2.22
chr7 45050001-45100000 1.97 1.97 1.08 2.18 2.30 2.30 2.53 2.70
chr7 45100001-45150000 1.84 2.04 1.09 2.29 2.36 2.18 2.73 2.88
chr7 45150001-45200000 1.78 2.01 1.06 2.21 2.27 2.11 2.88 2.93
chr7 45200001-45250000 1.79 1.95 0.96 2.28 2.32 1.88 3.14 2.98
chr7 45250001-45300000 1.68 2.02 1.18 1.96 2.18 2.00 3.10 2.87
chr7 45300001-45350000 1.77 2.11 1.07 2.19 2.02 1.92 3.13 2.95
chr7 45350001-45400000 2.03 1.96 1.03 2.07 1.77 2.28 3.03 2.78
chr7 45400001-45450000 2.10 1.86 0.92 2.05 2.03 2.14 2.97 2.69
chr7 45450001-45500000 2.10 1.93 0.87 2.01 1.98 2.35 2.77 2.85
chr7 45500001-45550000 2.14 1.90 1.05 1.96 2.20 2.04 2.91 2.92
chr7 45550001-45600000 2.10 1.96 1.09 2.09 2.16 2.06 2.95 3.12
chr7 45600001-45650000 2.09 2.01 1.08 2.19 2.26 2.02 3.12 3.09
chr7 45650001-45700000 2.15 2.07 1.05 2.20 2.15 2.27 3.09 3.29
chr7 45700001-45750000 1.93 2.05 0.95 2.19 2.15 2.30 2.76 3.04
chr7 45750001-45800000 2.10 2.12 0.98 2.07 2.49 2.18 2.69 2.74
chr7 45800001-45850000 2.28 1.85 1.39 2.13 2.21 2.12 2.27 2.30
chr7 45850001-45900000 2.60 1.94 1.71 2.03 2.17 1.85 1.99 1.81
chr7 45900001-45950000 2.49 2.05 2.06 2.09 2.14 1.93 1.79 1.75
chr7 45950001-46000000 2.55 2.19 1.68 2.11 2.29 1.81 1.88 1.77
chr7 46000001-46050000 2.75 2.04 1.73 2.03 2.48 2.08 1.93 1.86
chr7 46050001-46100000 2.94 1.66 1.78 1.97 2.57 2.00 1.96 1.66
chr7 46100001-46150000 3.17 2.08 2.00 2.17 2.48 2.04 2.11 1.68
chr7 46150001-46200000 3.08 1.87 1.87 2.05 2.35 1.95 1.89 1.80
chr7 46200001-46250000 3.35 2.00 1.75 2.16 2.35 1.84 2.14 2.11
chr7 46250001-46300000 3.08 2.14 1.73 2.12 2.27 1.99 2.04 1.77
chr7 46300001-46350000 3.14 2.32 1.79 2.08 2.09 1.94 2.06 1.77
chr7 46350001-46400000 2.98 2.26 2.17 2.05 1.98 1.99 2.07 1.85
chr7 46400001-46450000 3.19 2.53 2.31 2.01 2.06 1.74 1.89 1.78
chr7 46450001-46500000 2.99 2.28 2.33 2.25 2.13 1.78 1.90 1.86
chr7 46500001-46550000 3.18 2.22 2.24 2.23 1.98 1.80 1.87 1.96
chr7 46550001-46600000 3.00 2.10 2.12 2.38 2.01 1.99 1.90 2.04
chr7 46600001-46650000 2.95 1.87 1.96 2.20 1.94 1.93 1.65 2.07
chr7 46650001-46700000 3.00 2.05 1.90 2.18 1.94 2.00 1.82 2.09
chr7 46700001-46750000 2.78 2.02 1.90 2.03 1.82 2.06 1.93 2.09
chr7 46750001-46800000 2.52 2.39 2.01 2.31 1.75 2.03 1.85 2.00
chr7 46800001-46850000 2.81 2.24 2.23 2.41 2.13 1.94 1.86 1.82
chr7 46850001-46900000 3.14 2.18 2.38 2.29 2.07 1.85 1.79 1.89
chr7 46900001-46950000 3.20 2.27 2.24 2.23 2.06 1.77 1.98 1.68
chr7 46950001-47000000 3.13 2.32 2.09 2.45 2.01 2.00 1.82 1.85
chr7 47000001-47050000 3.02 2.35 2.25 2.11 2.11 2.13 1.99 1.82
chr7 47050001-47100000 3.00 2.24 2.18 2.20 2.12 2.09 1.69 1.92
chr7 47100001-47150000 3.05 2.30 2.30 2.19 2.09 2.00 1.64 2.05
chr7 47150001-47200000 2.93 2.24 2.05 2.14 2.08 2.37 1.58 1.94
…… …… …… …… …… …… …… …… …… ……
(4)精确确定断裂点的位置
经以上步骤,得到各窗口的拷贝数。设定正常拷贝数的范围:根据样本拷 贝数分布特征,计算样本所有窗口拷贝数的标准差(Standard Deviation,SD),本实施例SD=0.11,确定正常拷贝数的阈值范围为正常值±2倍标准偏差,范围为(1.78,2.22)。
逐个计算每个窗口及周围10个窗口拷贝数的三均值M i。三均值M i落在正常拷贝数范围以外的窗口记录下来,连续的窗口合并,直到遇到正常窗口。
经以上计算得到拷贝数异常的连续窗口,这些连续窗口定义为一级区域,定义一级区域的第一个窗口为第1断点bp 1,然后计算一级区域每个窗口及周围3个窗口的三均值M nps。逐一计算每个窗口,当出现至少连续2个M nps落在异常范围时,记录该窗口为第2断点bp 2,继续扫描,直到出现至少连续2个M nps回到正常范围时,记录该窗口为第3断点bp 3,这样每遇到正常和异常转换的窗口,记录一个断点bp i,直到一级区域的最后一个窗口,记录为bp f
断点bp 1到断点bp f将一级区域分成(f-1)个次级片段,定义为二级区域,计算每个二级区域窗口拷贝数的三均值M j,和拷贝数正常范围比较,M j落在异常范围的二级区域即为精确的拷贝数变异区域,其中M j为该区域的拷贝数,该区域起始和终止的位置即为拷贝数变异的起始和终止断点。对于非平衡的配子导致的胚胎拷贝数变异,检测到的断点即为平衡易位断裂点。
本实施例得到8个胚胎,其中拷贝数异常4个,分别为E1、E3、E7、E8,用上述同样的方法检测,全部选取4个拷贝数异常胚胎,确定两条相互易位的染色体断裂点。4个拷贝数异常的胚胎,在chr7上得到4个的断裂点位置,在chr16上同样得到4个的断裂点位置,分别计算两条染色体断裂点位置的三均值,得到两条相互易位的染色体的精确断裂点的位置为:chr7:45,900,001±50,000;chr16:43,100,001±50,000。检测结果如表2所示。
表2 根据4个拷贝数异常的断裂点确定精确断裂点
Figure PCTCN2018077895-appb-000001
实施例2 胚胎平衡易位携带状态的确定
(1’)从实施例1得到的确定的平衡胚胎平衡易位断裂点;
(2’)检测断裂点周围SNP
在距离chr7断裂点1×10 6bp的范围内分别检测胚胎父母、胚胎的61个SNP位点;在距离chr16断裂点1×10 6bp的范围内分别检测胚胎父母、胚胎的63个SNP位点。SNP位点的检测方法为,设计引物对扩增子进行二代测序。确定基因型的方法为二代测序,使用分析软件为SAMtools确定SNP基因型。
该家系chr7及chr16断裂点上下游的SNP结果如表3和表4所示。
表3 家系样本chr7断裂点上下游SNP基因型
SNP编号 父方 母方 E1 E2 E3 E4 E5 E6 E7 E8
1 C/C T/T T/C T/C C/C T/C T/C T/C T/C T/C
2 A/A C/C A/C A/C A/A A/C A/C A/C A/C A/C
3 G/G T/T T/G T/G - - T/G T/G G/G T/G
4 C/C T/T T/C T/C C/C T/C T/C T/C T/C T/C
5 G/G A/A A/G A/G G/G A/G A/G A/G A/G A/G
6 T/T G/G G/T G/T T/T T/T G/T G/T G/T G/T
7 A/A G/A G/A G/A A/A G/A G/A A/A G/A G/A
8 G/G G/G G/G G/G G/G - G/G G/G G/G G/G
9 G/A G/G G/A G/G A/A G/G G/A G/G G/G G/A
10 A/A C/A C/A A/A A/A A/A A/A C/A A/A C/A
11 T/G T/G T/G G/G T/T - T/G T/G G/G T/G
12 G/T G/G G/T G/G T/T G/G G/T G/G G/G G/T
13 G/G G/G G/G G/G G/G G/G G/G G/G G/G G/G
14 C/T T/T C/T T/T C/C T/T C/T T/T T/T C/T
15 G/G G/G G/G G/G G/G G/G G/G G/G G/G G/G
16 A/A G/A G/A G/A A/A G/A G/A A/A G/A G/A
17 A/G A/G A/G A/G A/A A/G A/A G/G A/G A/G
18 T/C T/T T/C T/T C/C - T/C T/T T/T T/C
19 G/T G/G G/T G/G T/T G/G G/T G/G G/G G/T
20 G/T G/G G/T G/G T/T G/G G/T G/G G/G G/T
21 T/G T/T T/G T/T G/G T/T T/G T/T T/T T/G
22 G/A A/A G/A A/A G/G A/A G/A A/A A/A G/A
23 T/T C/T C/T T/T T/T T/T T/T C/T T/T C/T
24 T/T C/T C/T C/T T/T C/T C/T T/T C/T C/T
25 C/T C/T C/T C/T T/T G/A T/T C/C C/T C/T
26 T/C T/C T/C T/C C/C - C/C - T/C T/C
27 G/C G/C G/C - - - C/C G/G G/C G/C
28 C/C T/C T/C C/C C/C C/C C/C T/C C/C T/C
29 T/C T/C T/C T/C - T/C C/C T/T T/C T/C
30 G/A A/A A/A G/G A/A G/A A/A G/A G/A A/A
31 G/C G/G G/C G/G G/C G/G G/C G/G G/G G/C
32 T/C T/C T/C C/C T/T C/C T/C T/C T/C T/C
33 A/G A/G A/G G/G A/A G/G A/G A/G A/G A/G
34 C/T C/T C/T C/C T/T - C/T C/T C/T C/T
35 G/A A/A A/A G/A A/A G/G A/A G/A G/A A/A
36 G/A A/A A/A G/A A/A G/A A/A G/A G/A A/A
37 T/T T/A T/A T/A T/T - T/A T/T T/T T/A
38 C/C T/T T/C T/C T/C T/T T/C T/C T/C T/C
39 A/A G/G A/G A/G A/G A/G A/G A/G A/G A/G
40 G/C G/C G/G C/C C/C G/G G/G C/C G/C G/G
41 G/G G/G G/G G/G G/G G/G G/G G/G G/G G/G
42 G/A G/A G/G G/A G/A A/A G/G A/A G/A G/G
43 A/A G/A G/A G/A A/A G/A G/A A/A G/A G/A
44 T/C T/T T/T T/C T/T T/C T/T T/C T/C T/T
45 T/G T/T T/T T/G T/T - T/T T/G T/G T/T
46 A/G G/G - - G/G - G/G A/A A/G G/G
47 T/T T/T T/T T/T T/T T/T T/T T/T T/T T/T
48 C/C C/C C/C C/C C/C C/C C/C C/C C/C C/C
49 G/G G/G G/G G/G G/G - G/G G/G G/G G/G
50 A/G A/G A/A A/G A/G - A/A G/G A/G A/A
51 A/A A/G A/A A/A A/G A/A A/A A/G A/G A/A
52 G/G A/A G/A G/A G/G - G/A G/A G/A G/A
53 G/G G/G G/G G/G G/G G/G G/G G/G G/G G/G
54 A/A A/G A/G A/G A/A A/G A/G A/A A/G A/G
55 G/A G/G G/A G/G G/A - G/A G/G G/G G/A
56 C/C C/T T/T C/T C/C - C/T C/C C/T C/T
57 G/A G/G G/A G/G G/A G/G G/A G/G G/G G/A
58 G/T G/G G/T G/G G/T G/G G/T G/G G/G G/T
59 G/A A/A G/A A/A G/A - G/A A/A A/A G/A
60 G/A G/A A/A G/A G/A - A/A G/G G/A A/A
61 C/T C/T C/T C/C T/T C/C C/T C/T C/T C/T
“-”:表示该位点未检测到,后续表格采用相同的表示方法。
表4-家系样本chr16断裂点上下游SNP基因型
SNP编号 父方 母方 E1 E2 E3 E4 E5 E6 E7 E8
1 C/C C/T C/C C/T C/T C/T C/T C/C C/C C/C
2 T/T T/C T/C T/T T/T T/T T/T T/C T/T T/C
3 T/T C/T - - T/T - T/T C/T - C/T
4 A/A C/C - C/C - C/A C/A C/A - C/A
5 A/C C/C A/C A/C A/C C/C C/C C/C A/A C/C
6 A/A G/G A/G A/G A/G A/G A/G A/G A/A A/G
7 G/G A/G A/G G/G G/G G/G G/G A/G G/G A/G
8 A/A C/A C/A A/A A/A A/A A/A C/A A/A C/A
9 T/T A/A T/A T/A T/A T/A T/A T/A T/T T/A
10 G/G G/G G/G G/G G/G - G/G G/G G/G G/G
11 C/A C/A C/C C/A C/A A/A A/A C/A C/C C/A
12 T/G T/G T/G T/T T/T T/G T/G G/G T/T G/G
13 A/A G/G G/A G/A G/A - G/A G/A A/A G/A
14 G/T T/T G/T G/T G/T T/T T/T T/T G/G T/T
15 C/C C/T C/T C/C C/C C/C C/C C/T - C/T
16 A/G A/G A/G G/G G/G G/G A/G A/A G/G A/A
17 T/C T/C T/C C/C C/C T/C T/C T/T C/C T/T
18 C/C C/C C/C C/C C/C C/C C/C C/C C/C C/C
19 T/T C/T T/T C/T C/T C/T C/T T/T T/T T/T
20 C/T T/T C/T C/T C/T - T/T T/T C/C T/T
21 T/T T/A T/T T/A T/A - T/A T/T T/T T/T
22 T/T C/T T/T C/T C/T C/T C/T T/T - T/T
23 C/C T/C C/C T/C T/C T/C T/C C/C C/C C/C
24 A/A C/A A/A C/A C/A - C/A A/A A/A A/A
25 C/A A/A C/A C/A C/A A/A A/A A/A C/C A/A
26 C/T T/T C/T C/T C/T T/T T/T T/T C/C T/T
27 T/G G/G T/G T/G T/G G/G G/G G/G T/T G/G
28 A/G G/G A/G A/G A/G G/G G/G G/G A/A G/G
29 C/A C/C C/C C/C C/C C/A C/A C/A C/C C/A
30 G/T G/G G/G G/G G/G - G/T G/T G/G G/T
31 G/A A/A A/A A/A A/A - G/A G/A A/A G/A
32 T/A T/T T/T T/T T/T T/A T/A T/A T/T T/A
33 A/G A/G G/G G/G A/G - A/G - - A/A
34 C/C C/A C/C C/A C/A C/A C/A C/C C/C C/C
35 C/T T/T T/T T/T T/T C/C C/T C/T T/T C/C
36 C/C C/T C/C C/T C/C T/T C/T C/C C/C C/C
37 C/G C/C G/G C/G C/G C/C C/C C/C C/G C/C
38 G/A G/G A/A G/A G/A G/G G/G G/G G/A G/G
39 C/T C/C T/T C/T C/T C/C C/C C/C C/T C/C
40 G/G G/G G/G G/G G/G G/G G/G G/G G/G G/G
41 C/C C/C C/C C/C C/C - C/C C/C C/C C/C
42 A/A G/G A/A A/G G/G G/G G/G G/G A/G -
43 A/A G/G A/A G/A G/A G/A G/A G/A G/A A/A
44 C/C C/C C/C C/C C/C - C/C C/C C/C C/C
45 T/G G/G G/G G/G G/G T/G T/G T/G G/G T/T
46 T/C C/C - T/T C/C - C/C - T/C C/C
47 C/T T/T C/C C/T C/T T/T T/T T/T C/T T/T
48 A/A A/A A/A A/A A/A A/A A/A A/A A/A A/A
49 T/T T/T T/T T/T T/T T/T T/T T/T T/T T/T
50 A/A A/A A/A A/A A/A - A/A A/A A/A A/A
51 A/A A/A A/A A/A A/A A/A A/A A/A A/A A/A
52 T/T T/T T/T T/T T/T T/T T/T T/T T/T T/T
53 A/G A/G G/G G/G A/G A/G A/G A/A A/G A/A
54 G/G G/C G/G G/G G/C G/G G/G G/C G/C G/G
55 A/G A/G - A/G A/G A/A A/A - G/G A/A
56 G/C G/G C/C G/C G/C G/G G/G G/G G/C G/G
57 C/C T/T C/C C/T C/T C/T C/T C/T C/T C/C
58 A/G A/A A/A A/A A/A A/G A/G - A/A G/G
59 G/A G/A - G/G G/A G/A G/A A/A G/A A/A
60 T/T T/T T/T T/T T/T T/T T/T T/T T/T T/T
61 T/T T/C T/T T/C T/C T/C T/C T/T T/T T/T
62 T/G T/T G/G T/G T/G T/T T/T T/T T/G T/T
63 C/C C/T C/C C/T C/T C/T C/T C/C C/C C/C
(3’)胚胎单倍型分析
根据孟德尔遗传规律和连锁交换规律,从表3-4中的SNP中挑选出有效的信息SNP,结果如表5所示。
表5 家系样本断裂点上下游有效SNP基因型
染色体 SNP编号 父方 母方 E1 E2 E3 E4 E5 E6 E7 E8
chr7 7 A/A G/A G/A G/A A/A G/A G/A A/A G/A G/A
chr7 10 A/A C/A C/A A/A A/A A/A A/A C/A A/A C/A
chr7 16 A/A G/A G/A G/A A/A G/A G/A A/A G/A G/A
chr7 23 T/T C/T C/T T/T T/T T/T T/T C/T T/T C/T
chr7 24 T/T C/T C/T C/T T/T C/T C/T T/T C/T C/T
chr7 28 C/C T/C T/C C/C C/C C/C C/C T/C C/C T/C
chr7 37 T/T T/A T/A T/A T/T - T/A T/T T/T T/A
chr7 43 A/A G/A G/A G/A A/A A/G G/A A/A G/A G/A
chr7 51 A/A A/G A/A A/A A/G A/A A/A A/G A/G A/A
chr7 54 A/A A/G A/G A/G A/A A/G A/G A/A A/G A/G
chr7 56 C/C C/T T/T C/T C/C - C/T C/C C/T C/T
chr16 1 C/C C/T C/C C/T C/T C/T C/T C/C C/C C/C
chr16 2 T/T T/C T/C T/T T/T T/T T/T T/C T/T T/C
chr16 3 T/T C/T - - T/T - T/T C/T - C/T
chr16 7 G/G A/G A/G G/G G/G G/G G/G A/G G/G A/G
chr16 8 A/A C/A C/A A/A A/A A/A A/A C/A A/A C/A
chr16 15 C/C C/T C/T C/C C/C C/C C/C C/T - C/T
chr16 19 T/T C/T T/T C/T C/T C/T C/T T/T T/T T/T
chr16 21 T/T T/A T/T T/A T/A - T/A T/T T/T T/T
chr16 22 T/T C/T T/T C/T C/T C/T C/T T/T - T/T
chr16 23 C/C T/C C/C T/C T/C T/C T/C C/C C/C C/C
chr16 24 A/A C/A A/A A/C C/A - C/A A/A A/A A/A
chr16 34 C/C C/A C/C C/A C/A C/A C/A C/C C/C C/C
chr16 36 C/C C/T C/C C/T C/C T/T C/T C/C C/C C/C
chr16 54 G/G G/C G/G G/G G/C G/G G/G G/C G/C G/G
chr16 61 T/T T/C T/T T/C T/C T/C T/C T/T T/T T/T
chr16 63 C/C C/T C/C C/T C/T C/T C/T C/C C/C C/C
chr7断裂点上下游有11个有效SNP,chr16断裂点上下游有16个有效SNP。对父方的chr7和chr16的SNP基因型进行分型。再根据拷贝数正常的胚胎(E2、E4、E5和E6),构建母方单倍型。其中,构建母方单倍型是通过将拷贝数正常的胚胎(E2、E4、E5和E6)减去父方的纯合子单倍型得到,每个胚胎在chr7上用于单倍型分型的是SNP数目为9-11,在chr16上用于单倍型分型的是SNP数目为13-16,合并这些母方单倍型,构建母方在chr7单倍型H Achr7和H Bchr7,构建母方在chr16单倍型H Achr16和H Bchr16,详细结果见表6-7所示。
表6 家系拷贝数正常胚胎chr7断裂点上下游单倍型
Figure PCTCN2018077895-appb-000002
表7 家系拷贝数正常胚胎chr16断裂点上下游单倍型
Figure PCTCN2018077895-appb-000003
同上方法,对CNV异常的胚胎(E1、E3、E7和E8)进行单倍型分型,每个胚胎都可以确定chr7/chr16上的单倍型是正常还是易位携带,4个胚胎的结果相互验证,得到综合判定结果,结果如表8-9所示。
表8 家系拷贝数异常胚胎chr7断裂点上下游单倍型判定结果
Figure PCTCN2018077895-appb-000004
“N/A”:表示该位点不可用于连锁分析,后续表格采用相同的表示方法。
表9 家系拷贝数异常胚胎chr16断裂点上下游单倍型判定结果
Figure PCTCN2018077895-appb-000005
综合判定结果和H Achr7、H Bchr7、H Achr16、H Bchr16的单倍型分型结果比较,判定哪个单倍型代表“正常”,哪个单倍型代表“易位携带”。如表10所示,4个异常胚胎一致判定H Achr7为正常单倍型,H Bchr7为易位携带单倍型;同样,判定H Achr16为正常单倍型,H Bchr16为易位携带单倍型。
表10 家系单倍型分型结果
Figure PCTCN2018077895-appb-000006
(4’)确定胚胎携带状态
根据上述步骤的单倍型的分型结果,判定拷贝数正常的胚胎E2、E4、E5和E6的携带状态。判定结果如表11所示,结果显示胚胎E2、E4、E5均为正常(不携带易位)胚胎。用本申请方法挑选出1个正常(不携带易位)的胚胎,鉴定过程结束。胚胎植入后,孕妇正常受孕,经羊水穿刺检测确认该胎儿核型正常。
表11 家系胚胎携带状态确定结果
Figure PCTCN2018077895-appb-000007
申请人声明,本申请通过上述实施例来说明本申请的详细方法,但本申请并不局限于上述详细方法,即不意味着本申请必须依赖上述详细方法才能实施。所属技术领域的技术人员应该明了,对本申请的任何改进,对本申请产品各原料的等效替换及辅助成分的添加、具体方式的选择等,均落在本申请的保护范围和公开范围之内。

Claims (15)

  1. 一种确定胚胎平衡易位断裂点的方法,其包括如下步骤:
    (1)获取胚胎待测样本和父母DNA;
    (2)对待测样本进行扩增,构建文库后测序;
    (3)将步骤(2)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
    (4)将所述的参考基因组分成N个区域片段,其中每个区域片段为一个窗口,计算每个窗口的拷贝数;
    (5)确定正常拷贝数的阈值范围,逐个计算每个窗口及周围窗口拷贝数的三均值M i,将M i不落在阈值范围的窗口记录下来,连续的窗口合并,直到遇到正常窗口;
    (6)将步骤(5)连续的窗口定义为一级区域,继续计算一级区域每个窗口及周围窗口的三均值M nps,第一个窗口为第1断点bp 1,每遇到正常和异常转换的窗口,为断点bp i
    (7)每两个断点之间定义为二级区域,继续计算二级区域每个窗口的三均值M j,M j落在阈值范围之外的窗口即为精确的拷贝数变异区域,所述区域的起始和终止的位置即为拷贝数变异的起始和终止断点;以及
    (8)选取多个拷贝数异常的胚胎,用步骤(1)-(7)计算每个胚胎的两条染色体平衡易位的断裂点,分别计算两条染色体平衡易位断裂点的三均值,即为胚胎精确的平衡易位的断裂点;
    其中,所述i和j独立地为1至N的任意正整数。
  2. 根据权利要求1所述的方法,其中,步骤(1)所述的待测样本为胚胎的活检细胞。
  3. 根据权利要求2所述的方法,其中,所述活检细胞为胚胎发育到卵裂球 时期或囊胚时期取下的外胚层细胞。
  4. 根据权利要求1-3中任一项所述的方法,其中,步骤(2)所述的扩增为单细胞扩增。
  5. 根据权利要求4所述的方法,其中,所述单细胞扩增采用扩增前引物延伸PCR、退变寡核苷酸引物PCR、多重置换扩增技术或多次退火环状循环扩增技术中的任意一种或至少两种的组合,优选为多次退火环状循环扩增技术。
  6. 根据权利要求4所述的方法,其中,步骤(2)所述的测序采用高通量测序平台进行测序;
    优选地,所述测序类型为单端测序和/或双端测序,优选为单端测序;
    优选地,所述测序的长度为不小于30bp,优选为50bp;
    优选地,所述测序的深度为不小于基因组的0.1倍,优选为基因组的0.1倍。
  7. 根据权利要求1-6中任一项所述的方法,其中,所述参考基因组包括全基因组;
    优选地,所述参考基因组的覆盖率达到全基因组的50%以上,优选为60%以上,进一步优选为70%以上,再优选为80%以上,最优选为95%以上;
    优选地,步骤(4)所述的窗口的长度为1×10 2-1×10 6
    优选地,所述步骤(4)还包括校正每个窗口的拷贝数,计算每个窗口校正后的拷贝数的步骤。
  8. 根据权利要求1-7中任一项所述的方法,其中,步骤(5)所述的阈值范围为N-σ到N+σ之间,其中,所述N为待测样本的倍体,所述σ为0.05-0.2;
    或者,步骤(5)所述的阈值范围为N-m×SD到N+m×SD之间,其中,所述N为待测样本的倍体,所述m为1-3中的任意整数,所述SD为待测样本所有窗口拷贝数的标准差。
  9. 根据权利要求1-8中任一项所述的方法,其中,步骤(5)所述的周围窗口的数量为10-100,优选为10-60,进一步优选为10;
    优选地,所述三均值的计算公式为:M=Q1/4+M d/2+Q3/4,其中,Q1为下四分位数,M d为中位数,Q3为上四分位数;
    优选地,步骤(6)所述的周围窗口的数量为3-10,优选为3-8,进一步优选为3-5。
  10. 一种鉴定胚胎平衡易位携带状态的方法,其包括如下步骤:
    (1’)采用如权利要求1-9中任一项所述的方法确定胚胎平衡易位断裂点;
    (2’)检测断裂点周围的SNP;
    (3’)胚胎单倍型分析:挑选有效的SNP,根据其中一方的SNP基因型进行分型,通过拷贝数正常的胚胎构建另外一方的单倍型;
    (4’)将拷贝数异常的胚胎进行单倍型分型,再将分型结果分类,分成易位携带和不携带两类,确定单倍型的分型结果;以及
    (5’)确定胚胎携带状态:将拷贝数正常的胚胎进行单倍型分型,再与步骤(4’)所述单倍型的分型结果进行比对,确定胚胎易位携带状态。
  11. 根据权利要求10所述的方法,其中,步骤(2’)所述的断裂点周围的长度为2×10 5-5×10 6,优选为2×10 5-1×10 6
  12. 根据权利要求11所述的方法,其中,步骤(2’)所述的SNP的数量为10-500,优选为30-100。
  13. 根据权利要求11所述的方法,其中,所述检测断裂点周围的SNP的方法为设计探针芯片捕获测序、设计引物对扩增子进行一代测序或设计引物对扩增子进行二代测序中的任意一种或至少两种的组合。
  14. 根据权利要求10-13中任一项所述的方法,其中,所述有效的SNP为 父方母方中有一方为纯合子有一方为杂合子。
  15. 根据权利要求10-14中任一项所述的方法,其中,所述方法包括如下步骤:
    (1)获取胚胎待测样本和父母DNA;
    (2)对待测样本进行扩增,构建文库后测序;
    (3)将步骤(2)获得的基因组序列与参考基因组进行比对,从而获得基因组序列在参考基因组上的位置信息;
    (4)将所述的参考基因组分成N个区域片段,其中每个区域片段为一个窗口,计算每个窗口的拷贝数;
    (5)确定正常拷贝数的阈值范围,逐个计算每个窗口及周围窗口拷贝数的三均值M i,将M i不落在阈值范围的窗口记录下来,连续的窗口合并,直到遇到正常窗口;
    (6)将步骤(5)连续的窗口定义为一级区域,继续计算一级区域每个窗口及周围窗口的三均值M nps,第一个窗口为第1断点bp 1,每遇到正常和异常转换的窗口,为断点bp i
    (7)每两个断点之间定义为二级区域,继续计算二级区域每个窗口的三均值M j,将M j落在阈值范围之外的窗口即为精确的拷贝数变异区域,所述区域的起始和终止的位置即为拷贝数变异的起始和终止断点;
    (8)选取多个拷贝数异常的胚胎,用步骤(1)-(7)计算每个胚胎的两条染色体平衡易位的断裂点,分别计算两条染色体平衡易位断裂点的三均值,即为胚胎精确的平衡易位的断裂点;
    (9)检测断裂点周围的SNP;
    (10)胚胎单倍型分析:挑选有效的SNP,根据其中一方的SNP基因型进 行分型,通过拷贝数正常的胚胎构建另外一方的单倍型,将拷贝数异常的胚胎进行单倍型分型后进行比较,确定单倍型的分型结果;以及
    (11)确定胚胎携带状态:将拷贝数正常的胚胎进行单倍型分型,再与步骤(10)所述单倍型的分型结果进行比对,确定胚胎易位携带状态;
    其中,所述i和j独立地为1至N的任意正整数。
PCT/CN2018/077895 2017-03-02 2018-03-02 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法 WO2018157861A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/490,488 US11837325B2 (en) 2017-03-02 2018-03-02 Method for identifying balanced translocation break points and carrying state for balanced translocations in embryos

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710119785.8A CN106834490B (zh) 2017-03-02 2017-03-02 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法
CN201710119785.8 2017-03-02

Publications (1)

Publication Number Publication Date
WO2018157861A1 true WO2018157861A1 (zh) 2018-09-07

Family

ID=59137720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077895 WO2018157861A1 (zh) 2017-03-02 2018-03-02 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法

Country Status (3)

Country Link
US (1) US11837325B2 (zh)
CN (1) CN106834490B (zh)
WO (1) WO2018157861A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436680A (zh) * 2020-05-22 2021-09-24 复旦大学附属妇产科医院 一种同时鉴别胚胎染色体结构异常和致病基因携带状态的方法

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107058465B (zh) * 2016-10-14 2021-10-01 南方科技大学 一种利用单倍体测序技术检测染色体平衡易位的方法
CN106834490B (zh) 2017-03-02 2021-01-22 上海亿康医学检验所有限公司 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法
CN110349631B (zh) 2019-07-30 2021-10-29 苏州亿康医学检验有限公司 确定子代对象的单体型的分析方法和装置
CN111276189B (zh) * 2020-02-26 2020-12-29 广州市金域转化医学研究院有限公司 基于ngs的染色体平衡易位检测分析系统及应用
CN113270141B (zh) * 2021-06-10 2023-02-21 哈尔滨因极科技有限公司 一种基因组拷贝数变异检测整合算法
CN115433777A (zh) * 2022-10-26 2022-12-06 北京中仪康卫医疗器械有限公司 胚胎cnv、sv、sgd异常及异常来源的一体化鉴别方法
CN115620810B (zh) * 2022-12-19 2023-03-28 北京诺禾致源科技股份有限公司 基于第三代基因测序数据的外源插入信息的检测方法和装置
CN116030892B (zh) * 2023-03-24 2023-06-09 北京大学第三医院(北京大学第三临床医学院) 一种鉴定染色体相互易位断点位置的系统和方法
CN116386718B (zh) * 2023-05-30 2023-08-01 北京华宇亿康生物工程技术有限公司 检测拷贝数变异的方法、设备和介质
CN117577178B (zh) * 2024-01-16 2024-03-26 山东大学 一种结构变异精确断裂信息的检测方法、系统及其应用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005007869A2 (en) * 2003-07-10 2005-01-27 Third Wave Technologies, Inc. Assays for the direct measurement of gene dosage
WO2007070482A2 (en) * 2005-12-14 2007-06-21 Xueliang Xia Microarray-based preimplantation genetic diagnosis of chromosomal abnormalities
CN102171565A (zh) * 2008-08-04 2011-08-31 吉恩安全网络公司 等位基因调用和倍性调用的方法
CN105874082A (zh) * 2013-10-07 2016-08-17 塞昆纳姆股份有限公司 用于非侵入性评估染色体改变的方法和过程
CN106834490A (zh) * 2017-03-02 2017-06-13 上海亿康医学检验所有限公司 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2640776T3 (es) * 2009-09-30 2017-11-06 Natera, Inc. Métodos para denominar de forma no invasiva ploidía prenatal
WO2013033169A1 (en) * 2011-08-31 2013-03-07 Sanofi Methods of identifying genomic translocations associated with cancer
CN105574361B (zh) * 2015-11-05 2018-11-02 上海序康医疗科技有限公司 一种检测基因组拷贝数变异的方法
CN105543339B (zh) * 2015-11-18 2021-07-16 上海序康医疗科技有限公司 一种同时完成基因位点、染色体及连锁分析的方法
CN105543372B (zh) * 2016-01-19 2017-04-19 北京中仪康卫医疗器械有限公司 一种检测染色体罗氏易位的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005007869A2 (en) * 2003-07-10 2005-01-27 Third Wave Technologies, Inc. Assays for the direct measurement of gene dosage
WO2007070482A2 (en) * 2005-12-14 2007-06-21 Xueliang Xia Microarray-based preimplantation genetic diagnosis of chromosomal abnormalities
CN102171565A (zh) * 2008-08-04 2011-08-31 吉恩安全网络公司 等位基因调用和倍性调用的方法
CN105874082A (zh) * 2013-10-07 2016-08-17 塞昆纳姆股份有限公司 用于非侵入性评估染色体改变的方法和过程
CN106834490A (zh) * 2017-03-02 2017-06-13 上海亿康医学检验所有限公司 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436680A (zh) * 2020-05-22 2021-09-24 复旦大学附属妇产科医院 一种同时鉴别胚胎染色体结构异常和致病基因携带状态的方法
CN113436680B (zh) * 2020-05-22 2022-03-25 复旦大学附属妇产科医院 一种同时鉴别胚胎染色体结构异常和致病基因携带状态的方法

Also Published As

Publication number Publication date
CN106834490A (zh) 2017-06-13
US20200010890A1 (en) 2020-01-09
CN106834490B (zh) 2021-01-22
US11837325B2 (en) 2023-12-05

Similar Documents

Publication Publication Date Title
WO2018157861A1 (zh) 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法
CN110176273B (zh) 遗传变异的非侵入性评估的方法和过程
JP6328934B2 (ja) 非侵襲性出生前親子鑑定法
CN105441432B (zh) 组合物及其在序列测定和变异检测中的用途
CN113366122B (zh) 游离dna末端特征
CN105555970B (zh) 同时进行单体型分析和染色体非整倍性检测的方法和系统
WO2016049993A1 (zh) 用于鉴定多个生物样本之间身份关系的方法和系统
CN106939334B (zh) 一种孕妇血浆中胎儿dna含量的检测方法
TWI675918B (zh) 基於單倍型之通用非侵入性單基因疾病產前檢測
JP7362789B2 (ja) 精子提供者、卵母細胞提供者、及びそれぞれの受胎産物の間の遺伝的関係を決定するためのシステム、コンピュータプログラム及び方法
WO2012019190A1 (en) Compositions and methods for high-throughput nucleic acid analysis and quality control
JP7446343B2 (ja) ゲノム倍数性を判定するためのシステム、コンピュータプログラム及び方法
WO2020047694A1 (zh) 确定新发突变在胚胎中的遗传状态的方法和装置
CN109280697B (zh) 利用孕妇血浆游离dna进行胎儿基因型鉴定的方法
CN111593108A (zh) 与噪声性听力下降发生相关的7q36.3区域的多态性的检测方法、试剂盒及其应用
JP7331325B1 (ja) 2種以上の検査を実施可能な遺伝学的解析方法
US20230162814A1 (en) Method for the analysis of genetic material
CN117925820A (zh) 一种用于胚胎植入前变异检测的方法
CN111534605A (zh) 一种基于snp基因型的单卵双胞胎、异卵双胞胎和第二极体参与受精双胞胎的鉴定方法
Anemia et al. 21 Cytogenetics and Molecular Pathology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18760921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18760921

Country of ref document: EP

Kind code of ref document: A1